During a National Institutes of Health workshop last week covering standards in proteomics, Steven Carr, a researcher at the Broad Institute of Harvard and MIT, outlined a series of guidelines for researchers and journals to follow when publishing papers about proteomics experiments.
Carr, an associate editor of the journal Molecular and Cellular Proteomics, developed the guidelines in collaboration with a group called the “Working Group on Publication Guidelines for Peptide and Protein Identification Data.” The guidelines were originally published in the April 8, 2004, issue of MCP.
The guidelines’d authors include Carr, Ruedi Aebersold of the Federal Technical University in Switzerland; Michael Baldwin and Al Burlingame of the University of California San Francisco; Karl Clauser of Millennium Pharmaceuticals and Alexey Nesvizshskii of the Institute for Systems Biology.
As a next step, the MCP working group intends to hold a meeting that includes journal editors, database providers, tool developers, and users from the international community to hammer out the next steps for publication criteria, Carr said.
With the implementation of a proper set of guidelines, the fear that journals are publishing incorrect interpretation of mass spectrometry data should be alleviated, said Carr.
“MCP took the lead in putting a working group together to tackle this issue, as standards were clearly needed,” Carr wrote in an email to ProteoMonitor. “There were no guidelines out there until we published ours, and no groups specifically focused on the issue of what constitutes acceptance criteria for proteomic data in the literature.”
The first guideline put forth by Carr and his group tells authors to cite the name and version of the sequence database, and the total number of entries in the database at the time it was used to generate a peak list. In addition, it requests that authors give the parameters used to create the peak list, as well as the scores and threshold values used to interpret MS/MS data.
Authors should include “values specific to judging certainty of identification, whether any statistical analysis was applied to validate the results, and a description of how applied,” the paper states.
In their second guideline, the MCP group requests that authors tell information about sequence coverage, such as the total number of peptides belonging to each protein. Authors are encouraged to provide tables that list for each protein the sequences of all identified peptides.
“For example, if the same peptide is identified in both 2+ and 3+ charge forms, the number of interpreted spectra equals 2, but the count of identified peptides that count toward protein sequence measure is only 1,” the MCP group explained in their paper.
“For example, if the same peptide is identified in both 2+ and 3+ charge forms, the number of interpreted spectra equals 2, but the count of identified peptides that count toward protein sequence measure is only 1,” the MCP group explained in their paper. In their third guideline, the MCP group states that protein assignments based on single peptides must undergo increased stringency in presentation. Specifically, authors must show the sequence of the peptide used to make each such assignment, the precursor mass and charge, and the scores for this peptide.
“When we have a single peptide being used for protein identification, the score must be excellent, or we don’t accept it,” said Carr.
Guideline four states that in cases where biological conclusions are based on a single peptide matching to a protein, the identification must be supported by inclusion of the corresponding MS/MS spectrum, appropriately labeled.
Guideline five states that for peptide mass fingerprint data, “in addition to listing the number of masses matched to each identified protein, authors should also state the number of masses not matched in the spectrum and the sequence coverage observed.”
Guideline six states that when the same protein appears under different names and accession numbers, it is the authors’ responsibility to show that they are aware of this problem and have taken measures to eliminate the redundancy.
The final guideline encourages authors to provide access to raw data through a website.
“Creation of public repositories [for raw data] is an essential next step,” said Carr.
This point was debated during the NIH meeting, with some scientists saying that raw data takes up too much storage space and is not practically used for re-analysis.
The guidelines proposed in the MCP paper have been implemented de facto by the journal, said Carr.
“We are still working on the somewhat more difficult task of making sure that all MCP reviewers are aware of, understand and use the guidelines during the review process,” Carr told ProteoMonitor.
When asked whether MCP’s publication guidelines are similar to the “Minimum Information About a Proteomics Experiment” guidelines put forth by the Human Proteome Organization’s Proteomics Standards Initiative, Carr said that his group hopes to work with the PSI and other journals in putting forth guidelines.
Carr said that the MCP guidelines are still in draft form. One area that needs to be improved upon is guidelines for quantitative data, Carr pointed out.