Skip to main content
Premium Trial:

Request an Annual Quote

Researchers Develop New Method for Untargeted Peptide Identification in DIA Data


NEW YORK (GenomeWeb) – A team led by researchers at the University of California, San Diego has developed a method for analyzing data-independent acquisition mass spec experiments in an untargeted manner.

Detailed in a paper published this week in Nature Methods, the approach, named MSPLIT-DIA, uses a spectral mapping approach to identify peptides in DIA datasets and, according to its developers, could improve both DIA-based peptide identification and quantification.

The method is also part of what appears to be an emerging trend of applying untargeted data analyses to DIA, which has typically been considered a targeted, though very highly multiplexed, mass spec approach.

In DIA the mass spec selects broad m/z windows and fragments all precursors in that window, allowing the machine to collect MS/MS spectra on all ions in a sample. Rather than looking to match the spectra to peptide sequence databases, as is done in conventional shotgun mass spec experiments, traditional DIA tools query the resulting data by peptide, asking for each peptide if evidence of it exists in the data.

This is in contrast to traditional shotgun proteomics experiments, which use data-dependent acquisition in which the instrument performs an initial scan of precursor ions entering the instrument and selects a sampling of those ions for fragmentation and generation of MS/MS spectra. Because instruments can't scan quickly enough to acquire all the precursors entering at a given moment, many ions — particularly low-abundance ions — are never selected for MS/MS fragmentation in such experiments, and so are not detected.

Because DIA methods like Swath fragment all the precursors in a sample, they don't suffer from this sampling issue. This means that DIA experiments are better able to measure the same peptides consistently across multiple samples, making it an approach well-suited to large-scale protein quantitation experiments.

The limitation of such DIA approaches is that the peptides being queried are typically drawn from a previously created library (often generated by an initial DDA mass spec experiment), meaning that researchers are only measuring proteins they already know to be in their sample. This, along with lower sensitivity and dynamic range, means that DIA methods typically identify fewer peptides than DDA.

To an extent, a targeted, peptide-centric approach is necessary in DIA due to the complexity of the multiplex spectra generated by fragmenting across large precursor windows. Because these spectra are much more complicated than those generated by a conventional shotgun workflow — which typically isolates and fragments one precursor at a time — targeted searching by peptide was, at least initially, the obvious choice for making IDs within such spectra.

Recently, however, several groups have begun exploring how to apply untargeted searching methods to DIA data, which could help improve the technique's sensitivity and increase the number of peptides it is able to identify.

For instance, in January, a team led by University of Michigan researcher Alexey Nesvizhskii published a paper in Nature Methods detailing its DIA-Umpire method, which generates pseudo-tandem MS spectra from DIA data, allowing for conventional shotgun-style database searching and the generation of spectral libraries without the need for a separate data-dependent acquisition run.

And, this week, the UCSD team, led by researchers Nuno Bandeira and Jian Wang, published on their MSPLIT-DIA approach, which uses spectral matching somewhat similar to that used in a typical DDA experiment to do untargeted searches of DIA data.

"I was talking to both [Bandeira] and [Nesvizhskii] on different occasions and we started asking well, if targeted extraction is the only way you should look at DIA data, does that make it just a high-throughput MRM experiment?" said University of Toronto researcher Anne-Claude Gringas, a co-author on both studies. "Or can you do something a little bit different and use the data in a different way?"

Nesvizhskii and his colleagues took the aforementioned pseudo-spectra approach in which they used m/z and retention times to detect and match precursor and fragment ion levels in DIA MS1- and MS2-level data, and then used these groupings to generate pseudo-spectra that can be searched using conventional DDA database search tools.

Bandeira and his team, on the other hand, applied a spectral matching approach in which they made identifications by matching library spectra of single peptides to portions of the multiplexed spectra generated experimentally through DIA runs.

While DIA's quantitative capabilities are one of 'its main draws, MSPLIT-DIA is focused primarily on identification, Bandeira noted.

"If we want to quantify something, there is first an even simpler question, which is, can you even confidently say that that peptide is in the run?" he said. "And that is essentially an identification problem."

In the Nature Methods paper, the researchers compared MSPLIT-DIA's peptide ID capabilities to a variety of other approaches, finding that it enabled considerably more identifications.

For instance, the method identified 26 percent to 31 percent more human peptides than an equivalent DDA run. It also identified 66 percent to 89 percent more than DIA-Umpire, 81 percent to 88 percent more than Sciex's PeakView quantitative proteomics software, and 86 percent to 107 percent more than the University of Washington's Skyline software.

Bandeira noted that MSPLIT-DIA's focus on identification actually makes it complementary to targeted extraction tools like PeakView and Skyline.

"We first do this identification step [using MSPLIT-DIA] and then we do quantification [using a tool like Skyline or PeakView] only on the things that were identified," he said "And by decoupling the two steps we are actually able to obtain more significant quantification for those that were identified, and overall we end up with more detected peptides than if we did everything at once."

"If you have identified a list of what is in your sample, and now not only do you know that these peptides are in your samples but you know exactly when they elute in your samples it really improves the quantitation," Gringas said. "Because first, we know the retention time alignment from what we have identified, and all those [targeted extraction] tools are heavily biased for proper retention time alignment as part of their scoring, so it's much better than just a realignment based on spiked-in standards or something like that. And then the second thing is we really reduce our search space."

Gringas said that her lab, which used DIA mass spec extensively, has incorporated DIA-Umpire and MSPLIT-DIA into its standard mass spec pipeline, but is still working out how to best combine them.

"They complement each other, but we haven't yet figured out quite the smartest way to run them together," she said.

Bandeira suggested that continued work on new methods of DIA data analysis could further improve what has already become in short time a quite popular proteomics research tool.

"There is an attractiveness to the idea that we don't have to pick and choose what to acquire in every given mass spec run," he said. "However, one of the things we have realized when looking at many of these DIA types of data is that, unfortunately, dynamic range and sensitivity are still challenges. So I expect to see technological developments substantially improving the utility of this approach."

"The cool thing about DIA data is that it is a full MS2 map of your sample," Gringas said. "People have used that really well for quantification, but it is also a map that can be used for identification, and there is no reason why you shouldn't combine both approaches. I think it is all complementary, and people are starting to realize that there is more than one way to look at this data."