Skip to main content
Premium Trial:

Request an Annual Quote

Michigan Researchers Extend DIA-Umpire Algorithm to Orbitrap Data


NEW YORK (GenomeWeb) – University of Michigan researcher Alexey Nesvizhskii and his colleagues have developed a new version of their DIA-Umpire algorithm that uses data generated by Thermo Fisher Scientific Orbitrap instruments.

The software, which generates pseudo-tandem MS spectra from data-independent acquisition mass spec data to allow for conventional database searching, previously worked only with DIA data from AB Sciex TripleTOF instruments.

Detailed in a paper published last month in Proteomics, the new version of the software also includes improved signal processing and better statistical modeling, "but the key part is that we demonstrated that our strategy is a really effective way to analyze data generated on the Orbitrap family of instruments,"Nesvizhskii told GenomeWeb.

Data-dependent acquisition mass spec uses conventional database searching in which spectra are matched in an untargeted fashion to peptide databases consisting of spectra predicted by the underlying genomic content of the sample or organism being queried. DIA analyses, on the other hand, typically use a targeted approach for identifying and quantifying peptides much like that used in multiple-reaction monitoring mass spec assays.

In this approach, researchers first build a spectral library for their sample using a conventional DDA run. They are then able to do targeted searches of data from subsequent DIA runs against this spectral library.

Such a targeted approach is necessary because the large m/z fragmentation windows used in DIA methods like Swath lead to highly complex spectra with considerable interference between the multiple precursors contained in each window.

With DIA-Umpire, the original version of which they published in 2015, Nesvizhskii and his colleagues devised a method that uses m/z and retention times to detect and match precursor and fragment ion levels in DIA MS1 and MS2 level data and then uses these groupings to generate pseudo-MS/MS spectra that can be searched using conventional database search engines as is commonly done in DDA experiments.

The ability to do untargeted searching could allow researchers to identify peptides not identified in targeted searches. Additionally, once generated, the pseudo-spectra can be used as a spectral library for targeted searching using traditional DIA informatics programs like Skyline, Nesvizhskii said. It also allows researchers to skip the initial DDA run traditionally required for setting up a spectral library.

Sciex introduced the first commercially available Swath-style DIA product in 2012 for use with its TripleTOF 5600 instrument, but since then Thermo Fisher has released similar products for instruments including its Q Exactive and Orbitrap Fusion.

"Our question was: Most people are using Orbitrap instruments, so would we be able to get good results if we approached that data from the untargeted, spectral library-free perspective?" Nesvizhskii said.

"When we started this study we had a pretty good feel for how you would perform database searching with DIA data from Thermo and from AB Sciex data," he said. "But this was sort of the first work where for ourselves we were discovering what the differences [are] in the spectra processing and database searching [techniques required]."

The researchers applied the new version of the DIA-Umpire algorithm to two publicly available HEK-293 cell lines and human liver microtissue datasets generated using DIA on a Q Exactive as well as a series of DIA experiments in HeLa cell lysates using an Orbitrap Fusion.

Nesvizhskii said the analysis demonstrated that the DIA-Umpire method could identify numbers of peptides and proteins comparable to that identified in conventional DDA experiments, a significant finding given that most analyses still find that DDA experiments are able to go deeper into the proteome than DIA analyses.

"It is an interesting comparison," he said. "What we know now is that if we have a sufficient number of DIA runs, not just a single run but what would be a typical experiment — 5, 10, 20 runs — we can identify pretty much as many peptides and proteins as one would from a DDA data acquired from the same sample."

That also suggests that the spectral libraries generated by the technique are roughly as complete as those generated by DDA runs, indicating that researchers can use the DIA-Umpire pseudo-spectra for targeted analysis without fear of missing proteins.

Nesvizhskii acknowledged that for maximum coverage, researchers could use extensive fractionation when generating their spectral libraries via DDA, and that the DIA-Umpire could not compete with that approach in terms of the comprehensiveness of its spectral libraries. But, he said, even in such cases it could be useful to supplement the standard DIA workflow with analysis using DIA-Umpire, "because we know we are seeing peptides and proteins that you don't see in DDA-based spectral libraries. So there is always some advantage."

"The best way forward is combining DIA-Umpire-derived [spectral] libraries with DDA-derived libraries," he suggested.

Nesvizhskii said that the analysis found that the new version of the algorithm got slightly higher numbers of peptide and protein identifications using the Q Exactive HF and Orbitrap Fusion instruments compared to the TripleTOF 5600, but, he said, that is largely beside the point.

"I don't want to get into a comparison of different instruments and which is better, because what you would like to say is that for each instrument we can get the best quality results," he said. "We are trying to get the best results possible on whatever instrument."

In addition to Swath-style DIA, Thermo Fisher has put out DIA methods like its pSMART approach, which collects quantitative data at the MS1 level and then uses DIA-style analysis at the MS2 level for confirmation of the peptide sequence, allowing for use of narrowed isolation windows which leads to less complicated spectra and, in theory, improved peptide identification rates.

Nesvizhskii said the work presented in the Proteomics study did not address this approach, though.

The pSMART method is "available, but I do not see a lot of reports yet describing its application in practical settings," he said. "So I think there is potential, but it is probably something that has to be explored a little bit more to see what the advantages are and where and when."