Skip to main content
Premium Trial:

Request an Annual Quote

Deep-Learning Tools Are Democratizing DIA Mass Spec

deep learning computing in data independent acquisition

NEW YORK – Since Sciex introduced its Swath data independent acquisition (DIA) mass spectrometry workflow almost a decade ago, DIA mass spec has seen steady growth in popularity, with a number of labs taking up the technique and vendors developing methods for their instruments.

Much of this uptake, however, has been concentrated in leading research labs, which are often first to adopt new technologies. Facilities like core labs, on the other hand, have been slower to try out the technique, put off by questions around the ease of method development and implementation as well as customer benefit.

Recent developments on the software side are changing those attitudes, though, streamlining DIA implementation and leading some once-hesitant proteomics core directors to offer the approach in their labs.

Prior to the development of DIA methods, proteomics experiments used data-dependent acquisition (DDA), wherein the mass spectrometer performs an initial scan of precursor ions entering the instrument and selects a sampling of those ions for fragmentation and generation of MS/MS spectra. Because instruments can't scan quickly enough to acquire all the precursors entering at a given moment, many ions – particularly low-abundance ions – are never selected for MS/MS fragmentation and so are not detected.

In DIA, on the other hand, the mass spec selects broad m/z windows and fragments all precursors in that window, allowing the machine to collect MS/MS spectra on all ions in a sample. This means that, unlike in DDA where values can be present for a protein in one sample and missing in another, DIA datasets are highly reproducible across many samples, which improves quantitation and the ability to, for instance, evaluate the levels of different biomarkers across a number of conditions or samples.

But while the promise of DIA was alluring, the technique was too unwieldy for many core labs to bother implementing.

One of the main stumbling blocks was the fact that to create the spectral libraries used for identifying peptides in DIA runs, researchers first had to do a series of DDA runs, adding instrument and sample processing time.

"We've been listening to the DIA field and hearing about how it can outperform DDA, but that it requires searching the data against a sample-specific spectral library, which is typically obtained using DDA," said Aaron Storey, an assistant professor at the University of Arkansas for Medical Sciences proteomics core who has been leading its DIA efforts. "So there was a hidden DDA requirement in order to get DIA working well."

Storey said that getting DIA approaches working well also required a level of expertise and familiarity with the latest research in the area. It wasn't a particularly straightforward method to implement, at least if you wanted to get good results out of it.

His investigations of the approach had convinced him that "the amount of time it would take to refine the methods and extract information out of the raw DIA files would be too much of a time and energy investment for it to work in a core facility," he said.

Susan Weintraub, professor of biochemistry and director of the mass spec core at the University of Texas Health Science Center at San Antonio, said that in her experience the extensive effort required to make a high-quality DDA library doesn’t fit well into core lab workflows. While it is worthwhile to spend several weeks fractionating and running samples to build a “deep” library for a long-term project, the wide variety of species, cell types and tissues being studied by core lab users makes that kind of time investment unrealistic, she said.

"DIA never really worked very well for me," said Brett Phinney, manager of the proteomics core at the University of California, Davis, noting that he has explored the method over the years but found it challenging to implement effectively in his core facility.

Recent software developments have changed this landscape, though. Specifically, several groups have put out deep-learning tools that can be used to generate predicted spectral libraries, meaning researchers can run DIA experiments without having to do rounds of DDA mass spec runs first. Additionally, software firm Proteome Software developed is Scaffold DIA tool, which a number of scientists said has significant streamlined DIA workflows.

The deep-learning tools arrived a little over a year ago, with the publication in Nature Methods of a pair of studies, one by a team led by researchers at the Max Planck Institute of Biochemistry and Verily and the other by a team led by researchers at the Technical University of Munich (TUM).

The software packages, called DeepMass:Prism by the Max Planck team and Prosit by the TUM team, both used deep-learning tools for predicting patterns of ion fragmentation in mass spec-based proteomics, allowing for more confident assignment of spectra to peptides, which lets researchers identify more peptides and proteins from a given dataset. The predictive capabilities also allow researchers to generate predicted spectral libraries for DIA experiments.

"I think it is really feasible to replace [experimentally generated] libraries," with deep-learning spectral prediction tools, Jürgen Cox, group leader in computation systems biochemistry at Max Planck and senior author on one of the papers, said at the time. "In the long run, I don't think that we will be generating libraries anymore, which is actually the part that makes DIA a little bit work intensive, especially for smaller labs."

The "long run" has arrived perhaps more quickly than Cox anticipated, as such tools have begun making their way into proteomics cores and driving uptake of DIA.

"That's where everyone is heading," Phinney said. "There's no question."

He noted that his lab had begun using Prosit to generate predicted libraries for DIA work as well as deep-learning tools like the DIA-NN software developed by researchers at the Francis Crick Institute, which uses a combination of signal correction strategies to reduce interferences and neural networks to assign the confidence of peak identifications in DIA workflows.

After struggling for years to get high-quality DIA data, Phinney said he now finds it often outperforms DDA approaches.

"I get far better data with my DIA approaches than with DDA, now," he said.

"I don't think the data was bad before," said Ben Neely, a research chemist at the National Institute of Standards and Technology whose work includes efforts to optimize and standardize DIA approaches. "I think it was on par with a good DDA run, but I think with this addition of Prosit … now you're definitely better than DDA."

Storey said that Proteome Software's release of its Scaffold DIA software was also key to his lab deciding to move into DIA. The lab was already using the company's Scaffold software for its proteomics work, and Storey saw that the firm was advertising Scaffold DIA as a tool for turnkey DIA analysis.

"I thought we would give that a try," he said. Proteome Sciences recommended they use the software with the Prosit workflow. "And when we tried that we were just stunned with the results and how well it worked. The nice thing was, one, that we were seeing data that outperformed DDA, but then also how easy it was."

Weintraub also highlighted the Scaffold DIA software, and in particular a module within it called EncyclopeDIA that uses empirical data to refine predicted spectral libraries, which she said further improves the data generated by DIA workflows. In March, a team led by Brian Searle, co-founder of Proteome Software and a translational research fellow at the Institute for Systems Biology, and including Bernhard Küster, one of the TUM professor's behind Prosit, published a paper in Nature Communications detailing the approach.

"Using a predicted library and then searching against it after it has been empirically corrected gives you astounding results," she said, noting that while this requires some additional mass spec time to generate the data used to refine the predicted libraries, "it isn't anything like acquiring a huge DDA library for every project."

"We were pretty skeptical at first, but it really seems that being able to generate these predicted spectral libraries really solves a problem in the field and makes it a lot easier for us to run these experiments in a core," Storey said.

Neely said that with the new informatics tools in place, DIA could be a particularly attractive approach for small cores or other facilities without the resources of some larger centers.

Isobaric labeling approaches like TMT can be expensive, he noted, and many smaller labs don't have the top-of-the-line instrumentation needed to run the latest and most effective TMT workflows.

Storey said his lab decides which route to take largely depending on the number of samples a researcher wants to run.

"If you have 16 or fewer samples, we recommend a TMT workflow because we can perform offline fractionation and apply enough instrument time for that sample set to achieve up to 10,000 protein IDs," he said. "We can't currently match the depth of a single batch TMT run using DIA."

On the other hand, "If you have 30 or more samples where you are going to get into multiple batches of TMT labeling, DIA is our recommended workflow," he said, noting that his lab's adoption of DIA has let it take on larger projects than it typically saw in the past. "Since we started doing DIA more frequently, we've seen projects with 60, 70, 90, in one case 168 samples. For those large, high-throughput experiments, DIA just outperforms the other techniques."