This story originally ran on Nov. 11.
Despite advances in instrumentation and software, identifying post-translational modifications continues to be a challenge in high-throughput, mass spec-based proteomics work, with incorrect site assignments and even incorrect sequence identifications making their way into journals and databases.
The problem, however, may be less a matter of technical capabilities and more one of sloppiness and inexperience on the part of some proteomics researchers interpreting mass spectral data.
That was the implication of a talk given by Katalin Medzihradszky at the American Society for Biochemistry and Molecular Biology's special symposium on post-translational modifications held last month in Tahoe City, Calif. During her talk, Medzihradszky presented several examples of incorrectly identified nitrotyrosine-containing peptides, sulfopeptides, and phosphopeptides that were published in respected peer-reviewed journals.
Although her critique wasn't meant to be a comprehensive assessment, the problem of misidentified PTMs in databases and scientific literature is probably "more widespread than we'd like to believe," Medzihradszky, an adjunct professor of pharmaceutical chemistry at the University of California, San Francisco, told ProteoMonitor. The issue, she suggested, stems from the nature of high-throughput, mass spec-based proteomics work, with its reliance on automated database searches for protein and PTM identifications.
"People acquire tons of data and then use a search engine or two and probability-based scoring and decoy databases [to make identifications and site assignments]," she said. "But the problem is that there are always things which happen that you can't predict biologically or chemically – an isoform or a new modification or something that you couldn't tell the computer to consider – so these things may lead to misidentifications."
This ambiguity requires that researchers bring their own expertise and judgment to interpreting results provided by the identification software – something that, Medzihradszky said, too many fail to do.
"A lot of people buy mass spectrometers, and then they just accept what the software spits out," she said. "They have to use their common sense. They have to figure out for themselves what are the limitations of the software, when you can trust the results. And a lot of people just don't take this step."
"It's really a misunderstanding of the complexity," David Chiang, CEO of proteomics bioinformatics firm Sage-N Research, told ProteoMonitor. "Most people understand that when you're doing an experiment, you're dealing with [raw] data. So they slice it and dice it and stare at it to see if they can trust it. They throw out the outliers, get the linear regression, that sort of thing."
"For some reason, the proteomics field expects this to be different," he said. "Especially for PTM analysis, where [researchers] expect to just push one big button and have the software give them the answer."
According to Broad Institute researcher Karl Clauser, the developer of the Protein Prospector and SpectrumMill proteomics analysis packages, current software is generally capable of correctly identifying PTMs. The main difficulty now lies in developing software and standards for making site assignments.
"With most of the software that's currently used in high-throughput proteomics, you can be confident that you got the [modified] peptide right," he told ProteoMonitor. "But most of the widely used software doesn't specifically address whether you can correctly and confidently tell whether [the PTM is localized to] serine 1 or serine 2," for example.
One of the challenges facing the field in this regard, he said, is to develop standards for setting how aggressive or conservative software packages are in making PTM site assignments, he said. Because peptides often don't fragment completely during mass spec analysis, spectral data may not provide enough evidence to make a site assignment with total confidence.
"The evidence of the spectrum is what it is. Then it comes down to how risk tolerant you are in making your decision. If you showed the same spectrum to 10 people, they wouldn't all necessarily reach agreement" as to whether or not you could make a site assignment and then as to where it would go if you could, he said.
Medzihradszky, though, said that despite current software capabilities, PTM identification – and not just site localization – remains an issue in proteomics research. Even scientists manually interpreting a small number of spectra can run into trouble if they lack mass spec experience or are too eager to believe exciting but questionable findings, she noted. The sheer volume of data produced by high-throughput studies compounds this problem.
[ pagebreak ]
She cited a recent paper she reviewed for the journal Molecular & Cellular Proteomics that came with 1,500 pages of supplementary material containing the raw spectral data behind the researchers' PTM identifications, as required by the journal's submission guidelines.
"Obviously one cannot check 1,500 pages really carefully, so what I do is I randomly go through the data and if something randomly catches my eye as obviously wrong, then I have a more careful look," she said. "And in this way with this dataset, I identified just randomly at least 20 spectra where I had to write them and say, 'Guys, I don't care what the software says; this doesn't look good.'"
At journals where researchers aren't required to submit raw data supporting PTM identifications, it's even "easier to get away with murder," Medzihradszky added. Researchers will sometimes select one spectrum that looks very reliable for inclusion in the main text of the article and then claim to have identified thousands more based on less solid data.
"If you look at how many times even the data that are included are questionable, it really makes you wonder how the rest looked," she said, "because people usually include in the main text the best spectrum."
As with most methods, researchers using mass spectrometry for PTM work can develop "tunnel vision when they want to find something," Medzihradszky noted. "[Scientists] want to find exciting things so much that they ignore the warning signs and don't use checks and balances. Unfortunately, a lot of times not the most exciting but the simplest answer is true, and that's not sexy."
Further contributing to the problem, she said, is the sense among some scientists that mass spec work can be done without significant expertise – a notion that she suggested has been driven in part by mass spec vendors.
"The instrument companies try to make it sound like you don't have to be an expert to run these instruments," she said, although she added that she wasn't "blaming the instrument companies," for researchers' lack of mass spec knowledge.
"[Mass spec] just isn't perfect, and a lot of people don't know it, and they don't have the expertise when they use these tools," she said.
Advances in instrumentation – like better mass accuracy and technologies to allow for more complete peptide fragmentation – will likely improve proteomics researchers' PTM identifications and site assignments, Medzihradszky said, and software packages for PTM analysis should also continue to improve. However, she observed, as her ASMBM presentation demonstrated, "people can still ignore everything."
"In that paper, the guy had pretty good instruments. He had very good data, but he ignored all the warning signs because it didn't fit his goal. And then, all the reviewers ignored the warning signs too," she said. "So, I don't know what you can do about that. There's no cure for stupidity."
Have topics you'd like to see covered in ProteoMonitor? Contact the editor at abonislawski [at] genomeweb [.] com.