Skip to main content
Premium Trial:

Request an Annual Quote

ETH Zurich Team Develops Algorithm for DIA-Based PTM Analysis

Premium

This article has been updated to note the release last year of a DIA post-translational modification tool by researchers at the Institute for Systems Biology.

NEW YORK (GenomeWeb) – A team led by researchers at The Swiss Federal Institute of Technology (ETH) Zurich have developed an algorithm for identification and quantification of protein post-translational modifications in Swath-style data-independent acquisition data sets.

The algorithm, which was detailed in a paper published last week in Nature Biotechnology, offers an automated method for PTM analysis in Swath data, said ETH professor Ruedi Aebersold, senior author on the study, and will enable DIA-based studies of protein PTMs similar to what has been possible using conventional data-dependent acquisition mass spec methods.

The algorithm follows the release last year by researchers at the Institute for Systems Biology of their SwathProphetPTM algorithm, which similarly brought PTM analysis tools to DIA data.

Aebersold noted that the method, which he and his colleagues have termed inference of peptidoforms (IPF), could improve upon standard PTM analysis methods by allowing for broader searches across different classes of modifications as well as detection and quantification of peptides with multiple modifications.

Protein PTMs like phosphorylation, glycosylation, and many others play keys roles in protein function and significantly expand the diversity of the proteome beyond what is provided for at the genetic level. Measuring these modifications presents a number of challenges, however, with one of the foremost among them being that taking into account the wide range of possible modifications vastly expands the search space of a mass spec experiment.

This has meant researchers have typically focused on analyzing one type of PTM at a time, which allows them to search for mass shifts specific to, for instance, phosphorylation, which limits an experiment's search space to something more manageable.

DIA mass spec data analysis differs from conventional DDA methods in that it uses targeted matching of fragment ion spectra to previously generated spectral libraries. DIA methods select broad m/z windows and fragment all precursors in that window, which allows the instrument to collect MS/MS spectra on all the ions in a sample. However, these broad windows also create complicated spectra with considerable interference from the various precursors captured in a given window.

This presents a challenge for DIA data analysis generally and for DIA analysis of PTMs specifically. To tackle this issue, Aebersold and his colleagues developed an approach in which the analysis starts with groups of peptide fragments that are expected to be present in the case of most protein modifications and then adds to them fragment ion transition signals that would be present if a particular modification or set of modifications were present, testing each of these hypothesis individually.

"So, we have identification, basically, of backbone [peptide] transitions, and then we have a large number of PTM-specific transitions," Aebersold said, "and the [algorithm] tests with certain error models that tells us that this particular peptide form or peptide forms are present."

Because the analysis is targeted, the method does not suffer from the massive combinatorial expansion of the search space that occurs with DDA methods, he noted.

In the Nature Biotechnology paper, the researchers used the approach to reanalyze DIA data collected in a longitudinal twin study looking at 116 subjects (58 pairs of twins) whose plasma was collected at two time points within a period of two to seven years. With the IPF algorithm, they investigated 10 PTMs — oxidation, deamidation, carbamylation, formylation, acetylation, methylation, carboxylation, ubiquitination, nitrosylation, and phosphorylation — investigating to what extent these different modifications on different proteins were genetically controlled.

The researchers looked at modified peptides they detected in 20 or more of the twin samples, which accounted for 4,532 peptide forms. Looking at the biological variability of these modified peptides, they found that the major contributor to this variability was the longitudinal component, or variance between the two visits, which accounted for 15.3 percent of the variability of the peptide forms. Second was heritability, which accounted for 12 percent, with individual and common environmental effects contributing, respectively, 8.2 percent and 7.8 percent. The remaining variability, 56.7 percent, was due to unexplained effects.

The authors noted that this was roughly consistent with their original findings looking at just the protein-level data, which determined that the longitudinal component accounted for 13.5 percent of variability, with heritability, individual environment, common environment, and unexplained effects accounting for 13.6 percent, 11.6 percent, 10.8 percent, and 50.5 percent, respectively.

"We could go back to this old dataset, reanalyze it, and find new information in there," Aebersold said, adding that while they limited this analysis to 10 modifications, there was no reason this number couldn't be expanded.

The additional analysis required by the IPF approach "takes a bit of time and effort," Aebersold said, "but it's not a huge effort compared to the overall generation of the data." He said that his team now planned to go back and apply PTM analyses to a number of the datasets it had previously generated.

"Over the last few years we have generated quite a number of data sets, and they have been or are being analyzed at the level of protein abundance patterns," he said. "Now we want to systematically run this IPF algorithm over them."

"From the acquisition point of view, it is basically a freebee," he added. "We have to add on some labor and computer time to research these data with this new algorithm, but whatever we find can be found without doing any more lab work, and that is certainly a nice situation."

While the method was developed mainly on Sciex instruments, it is applicable to any DIA data, Aebersold said. He added that he expects to see a number of new methods for addressing PTM analysis in DIA data arise in the near future.

"There are a lot of discussions around how to generate libraries for modified peptides directly from DIA data and so on," he said. "There are a lot of additional ideas out there that are being discussed and will certainly be implemented to further advance PTM analysis."