NEW YORK (GenomeWeb) – Researchers at Sweden's Karolinska Institute and Royal Institute of Technology have developed a new data analysis workflow for shotgun mass spec that could help improve the technique's quantitative reproducibility.
Detailed in a paper published this month in Molecular & Cellular Proteomics, the approach uses a new quality scoring system that allows for more reliable recovery of missing data points across multiple mass spec runs.
According to Roman Zubarev, Karolinska researcher and senior author on the paper, the approach allows data-dependent (DDA) mass spec workflows to match or even surpass the quantitative reproducibility of data-independent (DIA) mass spec methods while retaining their traditional advantages in terms of depth of coverage.
In DDA mass spec, the instrument performs an initial scan of precursor ions entering the instrument and selects a sampling of those ions for fragmentation and generation of MS/MS spectra. However, because instruments can't scan quickly enough to acquire all the precursors entering at a given moment, many ions — particularly low-abundance ions — are never selected for MS/MS fragmentation and so are not detected.
This has given rise to what is commonly known in the field as the "missing value problem" wherein, because different precursors are selected for MS/MS in each run, it becomes difficult to do reproducible quantitation across different samples. This presents a problem for biomarker discovery or validation work where researchers are looking for changes in protein abundance across a number of samples.
DDA's missing value problem has led many researchers to explore the potential of DIA mass spec, where the mass spec selects broad m/z windows and fragments all precursors in that window, allowing the machine to collect MS/MS spectra on all ions in a sample.
Use of broad m/z windows, however, presents a challenge for DIA analysis in that they result in very complicated spectra with considerable noise as the precursors captured in these windows interfere with one another. This has meant that, though DIA offers much more reproducible quantification, it typically measures less of the proteome than a DDA experiment.
As Zubarev and his colleagues noted, researchers have devised approaches to mitigate DDA's missing values problem. Although DDA runs select only a sample of all available precursor ions for MS/MS-based identification, the information (such as monoisotopic m/z and retention time) required for identification and quantification is still present at the MS1 level. And, using this information, researchers can fill in missing values across DDA sample sets, taking sets with good MS/MS data for specific peptides and identifying and quantifying the same peptides in runs where the MS/MS data is missing or of poor quality.
A number of strategies for doing this are currently in use, but the MCP authors suggested they have not taken advantage of all the information present in the MS1 spectra. Specifically, the authors proposed that additional gains can be made by including peptide abundance information and, based on this notion, they developed a shotgun mass spec workflow that uses peptide abundance data, along with more commonly used measures like retention time and mass error, to score the error involved in filling in missing values across DDA datasets.
"If you think about it, MS/MS and the precursor are two different things," Zubarev said. "We tie the MS/MS to the chromatographic peak, but MS/MS is one spectra and chromatographic peaks are collections of MS1 spectra, so they are two different entities. So we make a logical leap and connect them, but this procedure has to have some error associated with it."
And, given that linking MS/MS data to MS1-level data is key to various approaches used for filling in the missing values in DDA experiments, error in making this association can result in false assignments and less-reproducible quantification.
The approach developed by Zubarev and his colleagues, which they named DeMix-Q, uses peptide abundance data along with other parameters to score the reliability of peptides inferred at the MS1 level from those observed at the MS/MS level.
"We show that this [inference] is a statistical procedure that has error associated with it that can be calculated and estimated through the [false discovery rate] distribution," Zubarev said. "The key innovation here is the use of the scoring with peptide abundance in the same way people use scoring for MS/MS [identification] data. It does the same thing FDR scoring does for peptide identification. First, you get more confidence in your data, and, second, the quality of your data improves because your thresholds are optimal."
He added that, while the scoring function is not ideal, it represents proof of concept and he hopes other labs will take up the effort and develop improved functions going forward.
In the MCP paper, Zubarev and his colleagues applied their method to the dataset used in the Association of Biomolecular Resource Facilities Proteome Informatics Research Group's 2015 study, comparing its performance to that of several other common approaches: MS/MS-based spectral counting without MS/MS-based identity propagation (PIP) to fill in missing values, MS1-based label-free quantitation using feature-based PIP via the OpenMS and MaxQuant tools, and ion-based PIP via Skyline.
The spectral counting without PIP left more than 40 percent of peptide abundance values missing, with only 26 percent of all peptides in the dataset detected across 12 mass spec runs. The OpenMS and MaxQuant approaches resulted in 15 percent and 13 percent of values missing, respectively, with more than 60 percent of peptide detected across all the runs. Skyline left only 1.5 percent of peptide abundances missing and quantified over 90 percent of peptides in all 12 runs.
The DeMix-Q approach, in fact, was slightly less sensitive than Skyline, leaving 2.8 percent of values missing and quantifying 86 percent of peptides across all runs. However, the authors noted, DeMix-Q's scoring function provided higher quality data, as reflected in its significantly lower average coefficients of variation.
The method also compared favorably to recent DIA analyses, the authors said, citing an MCP paper from last year in which researchers used Swath DIA mass spec to quantify 80 percent of 18,600 yeast peptides from 2,333 proteins across four mass spec runs.
"Compared to 86 percent of 26,753 peptides we quantified here in 12 runs, the DIA study did not demonstrate any advantage," they wrote. "In our case, DDA required less experimental time and produced both deeper proteomics analysis as well as fewer missing values."
Zubarev said that he and his colleagues have incorporated DeMix-Q into their lab's standard DDA workflows.