Researchers in Pavel Pevzner’s lab at the University of California, San Diego, have developed a new approach for mass spectrometry-based protein identification that overcomes certain limitations of commonly used software packages like Mascot, Sequest, and X!Tandem.
The method, called spectral network analysis, eliminates a key element of these algorithms, which match spectra against protein databases in order to identify peptides in a sample.
The UCSD approach does away with the database-searching step altogether by relying on the concept of spectral pairs: sets of spectra that occur naturally in most mass spec experiments as the result of overlapping peptides or from modified and unmodified versions of the same peptide.
“Traditional approaches like Sequest and Mascot take a spectrum and compare it against a theoretical spectrum that you get from a database of known protein sequences,” said Nuno Bandeira, a PhD student at UCSD's department of computer science and engineering and lead author on a paper describing the method.
The paper, which appears in the online version of the Proceedings of the National Academy of Sciences this week, shows “that if you have a spectrum from the peptide that you want to compare it against, that you can actually match the spectra and find the identification just by matching spectra as opposed to needing to know the protein sequence,” Bandeira said.
Pevzner told BioInform that the method addresses an important limitation for current protein identification approaches: “Not everything is in a database,” he said, citing antibodies and snake venom proteins among numerous sets of proteins of interest to drug development that lack a comprehensive protein sequence database.
“If there is no database, then this is the only technique available,” he said.
The network-based approach should also be of use in identifying post-translational modifications, Pevzner said, because “you’re not exploring a huge search space.”
Post-translational modifications pose a particular challenge for traditional protein-identification algorithms because of the combinatorial explosion that results when all potential modifications are taken into account. The UCSD approach, which uses the correlations between spectral pairs in a spectral network to quickly identify modifications, can save computational time while increasing the confidence in predicted modifications, according to Pevzner.
In the PNAS paper, the UCSD researchers compared their method against InsPecT, an algorithm that had previously been shown to be two orders of magnitude faster than Sequest because it filters the protein database in order to reduce the search time.
The researchers reported that the spectral networks analysis method took nine minutes to process 11,760 spectra on a Pentium 4-based PC, compared to InsPecT’s run time of 55 minutes using a “moderately sized” database of 13,749 human proteins.
InsPecT identified 515 unmodified proteins in the sample, of which 413 had a prefix, suffix, or modified variant and would therefore be “amenable to pairing.” The UCSD method identified 386 of those 413 peptides, according to the PNAS paper.
“If there is no database, then this is the only technique available.”
The difference in identifications highlights one drawback of the approach, which is that not all peptides have spectral pairs. “If we don’t have a pair, this doesn’t apply,” Bandeira acknowledged, noting that the method should serve as an effective complement to database-searching methods, but not a replacement for them.
Pevzner said that his group is further developing the method for de novo protein sequencing and that the UCSD team has submitted another paper for publication that discusses the use of spectral networks analysis for this application.
Currently, he said, his lab is working with Genentech on using the approach to sequence antibodies. “Alternative techniques would take one year to sequence a single antibody,” he said. The researchers are also building genome-wide spectral networks for several bacterial genomes, Pevzner said.
The UCSD team is still working on a more user-friendly version of the spectral networks analysis software that they plan on releasing as an open source package in the next few weeks, Bandeira said. The software will be available through the UCSC bioinformatics lab’s website.
Longer-term goals include providing a web-based tool that will allow users to submit their spectra for analysis, “but that would be subject to the availability of computers and other resources,” Bandeira said.