Skip to main content
Premium Trial:

Request an Annual Quote

Using Mass-Tolerant Searching, Gygi Lab Finds Most Unassigned Peptides Due to Unexpected PTMs


NEW YORK (GenomeWeb) – Using mass-tolerant database searching, researchers at Harvard Medical School have managed to identify a large number of peptides left unassigned by conventional shotgun mass spec methods.

The effort, which was detailed in a study published this week in Nature Biotechnology, found that modified peptides make up a significant proportion of these unassigned peptides.

As the authors noted, while current shotgun proteomic approaches allow for the identification of hundreds of thousands of peptides and in the range of 10,000-plus proteins per experiment, most MS/MS spectra generated in such experiments are never confidently matched to a peptide, despite the fact that most of the spectra are of sufficient quality to, in theory, make a match. 

One common explanation for the phenomenon has been that many of these unmatched spectra correspond to peptides with unexpected post-translational modifications. Conventional database searching methods search spectra only against peptides with masses that match those spectra within a very narrow window (for instance .005 Daltons), which limits the search space, improving sensitivity.

Conventional searching methods can account for the mass shifts created by common and expected post-translational modifications like phosphorylation, acetylation, and ubiquitination, allowing databases to take peptides with such modifications into account when matching spectra. 

However, in addition to commonly studied PTMS, there are a wide variety of less common PTMs not accounted for in traditional database searching. And so peptides with these modifications would likely go unassigned.

To get at this question, the Harvard researchers used mass-tolerant searching, in which instead of searching spectra only against peptides with closely matching masses, they searched spectra against broad mass windows, which would, in theory, allow them to make matches even in the case of unexpected modifications.

This approach, noted Harvard's Steven Gygi, senior author on the paper, relies on the ability of newer mass spec instruments to collect very high resolution and very high mass accuracy MS/MS spectra. This, he noted, let the researchers search the precursor ions against a large mass window while searching the fragment ions generated by MS/MS against a conventionally narrow window, allowing for simultaneously wide-ranging searching and accurate matching.

Mass-tolerant, or error-tolerant, searching is commonly used in top-down proteomics, where it is necessary for making identifications given the complexity of modifications and polymorphisms that can be present in intact proteins.

Gygi told GenomeWeb that his inspiration for applying the approach to bottom-up shotgun proteomics was curiosity as to why the Sequest search algorithm allowed large mass tolerances as an option.

"The idea just started as a whim to see why on earth [Sequest] would allow such massive tolerance, and I wanted to try to break it," he said.

Depending on the chosen search parameters, Sequest allows users to search with mass tolerance of up to around 2,000 Daltons, Gygi said. He and his team chose to do their experiment using mass windows of 500 Daltons after determining that almost all known PTMs fell into that range.

Using Thermo Fisher Scientific Q Exactive and Orbitrap Elite instruments, the researchers separated HEK293 cell digests into 24 fractions and analyzed them using three-hour gradients. They searched the generated spectra first using a conventional "closed" search, in which they considered peptides with masses within .005 Daltons of the spectra being matched. In this analysis they identified a total of 396,736 peptides from 9,513 proteins.

They followed this with a mass-tolerant "open" search, in which they searched precursor ion spectra against 500 Dalton windows. This search identified 510,139 peptides from 9,178 proteins, including 325,157 peptides identified by the conventional search and 184,982 modified peptides not identified by the conventional search.

The open search's increased breadth did come at the cost of sensitivity, Gygi noted, observing that compared to closed searches designed to take into account common modifications, the open search identified on average 50 percent fewer modified peptides.

"This means we only identified about half of what was in there," he said. "Most likely any targeted approach would work better at identifying the modified peptides than this unbiased one would."

The open search, however, allowed the researchers to detect an array of modifications that went unidentified in the conventional searches, including carbamylation, deamidation, formylation, pyroglutamate, and aminoethylbenzenesulfonylation, as well as metal ion adducts including iron, sodium, and calcium. They also identified peptides with multiple modifications, including 86 peptides with multiple phosphorylations, as well as mono-, di-, and tri-methylated forms of histone H3.

The majority of the detected modifications were not biologically relevant, Gygi said, adding, however, that they are pervasive. "For example, iodine and iron modifications might not make a big difference [functionally], but they are always there."

The search did detect certain rare and biologically important modifications such as glutamylation of nucleophosmin (NPM1), which, the authors noted, is the first identification of glutamylated NPM1. They also identified previously unreported mutations due both to amino acid substitutions and insertions.

Gygi characterized the study's major finding as establishing that the majority of unassigned spectra do indeed stem from peptides with unexpected modifications. This finding, he noted, has implications for the limitations of approaches like fractionation in improving proteomic coverage.

"Given that as much as half of all acquired spectra may be derived from modified peptides, more fractionation or longer gradients will not be very productive in shotgun proteomics," he said. "These data suggest that every time you go deeper, another layer of confounding modified peptides could be revealed."

This, Gygi said, suggested that approaches combining shotgun and targeted assays might be the best way to move deep into the proteome.

"Combined approaches might use shotgun proteomics to identify as many peptides as possible and then focused experiments to quantify the ones that matter," he said.