NEW YORK(GenomeWeb) – Interest in data-independent acquisition (DIA) mass spec has exploded in recent years as a range of proteomics researchers have taken up the technique and a variety of vendors have brought new DIA-focused tools and instruments to market.
Nonetheless, there still remain some misunderstandings in the field regarding the approach and its benefits vis-à-vis conventional shotgun, or data-dependent (DDA) mass spec analyses, two leading proteomics researchers told GenomeWeb this week.
Aiming to address these misconceptions, the researchers — the University of Washington's Michael MacCoss and the Swiss Federal Institute of Technology Zurich's Ruedi Aebersold — along with several colleagues authored a paper published last week in Molecular & Cellular Proteomics intended to better delineate the fundamental differences between DIA and DDA mass spec.
At its root, the distinction is between analyzing mass spec data via a "spectrum-centric" approach, as is generally done in DDA, and a "peptide-centric" approach, as is typical in DIA. In the former, researchers try to match mass spectra generated from their sample to peptide sequences. In the latter, MacCoss noted, researchers take the peptide as the main query unit and ask whether there evidence for said peptide in a sample.
It might seem a subtle difference, but has important implications for the strengths and weaknesses of the different methods, MacCoss noted.
Fundamentally, a "spectrum-centric" approach will be better suited to identification of proteins in a sample, whereas, a "peptide-centric" approach will be better suited to reproducible detection of proteins already demonstrated to be present in a sample.
"There's confusion with [DIA], because people still talk about it in terms of how many things do you identify?" MacCoss said. "And we don't really identify anything. We detect things. So we have been pretty specific about trying to use the language of 'detect,' because these are all things we found previously in these sorts of samples."
To an extent this is a simply a matter of semantics, MacCoss allowed. However, he noted, those semantics are a reflection of researcher expectations.
"People tend to struggle with the idea of [DIA] when they try to make that comparison to [DDA] because they are very caught up on how many things they are measuring," he said. "People say, 'I can identify more things by data-dependent acquisition.'"
In DDA mass spec, the instrument performs an initial scan of precursor ions entering the instrument and selects a sampling of those ions for fragmentation and generation of MS/MS spectra. Typically, these spectra are then compared to a sequence database and matched to the peptide sequences that best explain them.
Because the spectra are being searched against all possible peptide sequences, this "spectrum-centric" approach allows for the identification of novel proteins, making it well-suited for use in discovery proteomics experiments.
In DIA, on the other hand, the mass spec selects broad m/z windows and fragments all precursors in that window, allowing the machine to collect MS/MS spectra on all ions in a sample. Rather than looking to match the spectra to peptide sequence databases, however, popular DIA tools like Swath query the resulting data by peptide, asking for each peptide if evidence of it exists in the data.
"Many people think that in this DIA method we are just acquiring convoluted spectra which are then deconvoluted," Aebersold said. "This is true, but the fundamental shift is that for each peptide we ask a very specific question. We have the hypothesis that the peptide is not in the sample, and then this hypothesis can be tested and rejected."
"It's a much more straightforward approach for peptide identification," he said. One of the major differences, he added, is that because DIA analyses fragment all the precursors in a sample, researchers can confidently say that "if a peptide is not detected in the data set, then... above a certain threshold, the peptide is not in there."
This, Aebersold noted, differs from "spectrum-centric" DDA approaches in that, because instruments can't scan quickly enough in a DDA-style experiment to acquire all the precursors entering at a given moment, many ions – particularly low-abundance ions – are never selected for MS/MS fragmentation. This makes it more difficult to say if a given peptide was not identified because it was not present in the sample or, rather, because it was present but was not sampled by the mass spec.
"The peptide-centric approach has a fundamental strength in that you can use very solid statistical tools to say with a certain probability, the answer is yes or the answer is no" regarding the presence of a specific peptide, he said.
The limitation is that the peptides being queried are typically drawn from a previously created library (often generated by an initial DDA mass spec experiment), meaning that researchers are only measuring proteins they already know to be in their sample.
As such, "these DIA or peptide-centric methods don't discover a lot new that has not been seen before," Aebersold said. However, he added, "they discover context or information about these [previously observed] proteins."
Indeed, the ability to reproducibly measure large numbers of proteins across multiple samples has emerged as the key advantage of DIA, even if, as Aebersold and MacCoss suggest, it is still underappreciated.
And, Aebersold said, he believes this ability will grow in importance as proteomics continues to move from a field focused on cataloging proteins present in various samples to one focused on quantitative exploration of various biological questions.
"Maybe I am biased, but what I think is happening is there is a very significant shift away from the perpetual rediscovery of proteins," he said, noting that the majority of proteins that can be discovered using mass spec have been discovered.
"Right now, if you want to claim that you have discovered some new human proteins from a particular sample it is extremely hard to do, because the coverage is already quite high," he said. "So to credibly show that you have identified 50 or 100 or 200 new proteins is very difficult."
And, Aebersold added, even if you do, "many people will say, 'so what?' We already have 15,000 or so proteins identified. So if someone finds some spectral evidence for [new] proteins, relatively few will get excited about it."
"But what I think people are excited about is that people can do quantitative re-measurements of proteins across various conditions," he said. "Because that shows how the biochemical workings of the cell adjust through specific changes, and I think that is a very important question that can be addressed only through proteomics."
Peptide-centric analysis could in theory be applied to DDA data, as well, MacCoss said, noting that his group has done some "preliminary experiments along those lines."
Though the benefits are less obvious given DDA's stochastic sampling, the approach "still has value [from a DDA perspective] because you calculate a direct statistical measure for each target," he said. This, the MCP authors noted, allows researchers to directly assign each peptide a specific "confidence estimate of being detected/not detected because each peptide is directly investigated."
"In contrast, spectrum-centric analysis implicitly assigns all 'missing' peptides equal, very low confidence estimates," they wrote.
Conversely, not all DIA methods take a peptide-centric approach. For instance, Waters' MSE method, which was the first commercially available DIA technique upon its launch in 2006, uses a "spectrum-centric" method, MacCoss noted.
And earlier this year researchers from the University of Michigan presented an informatics package named DIA-Umpire that allows users to generate pseudo-tandem MS spectra from DIA data, enabling conventional "spectrum-centric" mass spec analysis of DIA datasets.
In an interview with GenomeWeb following release of the software, Michigan researcher Alexey Nesvizhskii observed that the multiplex fragmentation and complex spectra generated by DIA experiments made targeted, "peptide-centric" searching a superior choice for analysis of such data.
However, he added that he thought the untargeted, "spectrum-centric" analysis enabled by DIA-Umpire was a "natural second step, to see what, with improved signal processing and improved algorithms, we can actually get out of those datasets."