The Human Proteome Organization took a step forward in its goal to create proteomic reporting standards with the publication last week of the first implementation module of its Proteomic Standards Initiative as well as a “parent” document outlining the effort.
The papers, “The minimum information required for reporting a molecular interaction experiment (MIMix),” available here and “The minimum information about a proteomics experiment (MIAPE),” available here, appear in the August edition of Nature Biotechnology.
The MIAPE paper is a broad overview of how PSI is proceeding in creating its guidelines and the type of information researchers should deposit in a repository containing information about proteomics.
The MIMix paper, the first PSI module, describes specific information that scientists should provide when publishing molecular interaction data. Further modules dealing with specific aspects of proteomic experiments will be released in the future.
The publication of the papers is part of a large-scale effort by HUPO to improve proteomics experiments by creating reporting standards in a number of areas dealing with experimental design and methodology. The work is being carried out by HUPO’s PSI working group.
The first set of guidelines was sent to Nature Biotechnology for review and public comments last year.
The PSI is addressing one of the most pressing areas in proteomics research. While there is a growing consensus around the importance for standards, to date, setting those standards and getting the research community to agree to them has been vexing.
Indeed, public comments submitted to Nature Biotechnology suggested there remains deep skepticism about PSI’s work: Some writers said they saw no need for standards; others wanted to know what PSI’s efforts would allow them to do that they couldn’t already; and others said that creating standards would only create more work for laboratories [See PM 09/28/06].
The PSI sought to assuage such concerns in the MIAPE document published last week. MIAPE is an attempt to define the minimum set of information to be deposited in a repository containing information about proteomics experiments.
While some journals and funding agencies have reporting guidelines already, the authors of the report said MIAPE is different from some other standards in that it does not provide guidance on “appropriate experimental processes,” but rather “requires the provision of sufficient information to allow quality to be independently assessed,” they said.
“It has always been a matter of policy that the PSI should neither attempt to produce standard operating procedures specifying how particular techniques should be performed nor attempt to establish quality assessment benchmarks.”
What’s in it for me?
According to the authors, both producers and consumers of data stand to benefit from the push to create reporting standards. For consumers of data, accepting and following MIAPE guidelines will result in greater ease of identification and retrieval of data sets generated by specific techniques; the ability to use the datasets for purposes other than originally intended; and easier retrieval of protocols associated with high-quality data.
On the other hand, PSI was mindful that researchers creating the data “are often under severe time, budget, and productivity constraints [and] must be assured that they, too, will reap direct benefits, not just the kudos of enhancing the publicly available corpus of biological data.”
For those researchers in the public sector, the benefits include the contribution of new protocols and best practice to others; elimination of the need to reconstruct sets of “appropriate contextualizing” information; and support for the assessment of results generated months or even years ago.
For producers of data in the private sector, MIAPE’s benefit is primarily one of efficiency — the ability to capture a “reduced set of metadata in a rigorous way [facilitating] more efficient retrieval, reanalysis, and integration of data.”
In creating its modules and determining what data and metadata to require, the authors said the PSI was guided by two principles — sufficiency and practicability.
On the first, PSI said that there should be enough information about a dataset and its experimental context that a reader can understand and evaluate the researcher’s interpretation and conclusions. But, at the same time, “Achieving compliance with MIAPE should not be so burdensome as to prohibit its widespread use,” the authors said.
They acknowledged that wide-scale adoption of MIAPE guidelines can happen only when the proper tools are available.
“Experimentalists are often under severe time, budget, and productivity constraints [and] must be assured that they, too, will reap direct benefits, not just the kudos of enhancing the publicly available corpus of biological data.”
“…much of the required data should be readily available in electronic form and therefore amenable to export, especially as vendors of instruments, analysis software, and LIMS implement standards-compliant export facilities,” they said.
While commercial vendors have developed such tools, the authors also conceded that not every researcher or lab has the funds to acquire them. However, they said that they believe that in time free tools will be available to the research community that will make complete compliance with MIAPE possible.
“Substantial tool development can be achieved by the public sector, as shown by projects such as [the Computational Proteomics Analysis System], and public funders look ever more favorably on projects that aim to develop appropriate tools to support data sharing,” the authors said.
In the meantime, PSI will provide specially designed Microsoft Excel spreadsheets, similar to the one being used for Proteome Harvest for the PRIDE database, to help researchers capture MIAPE-specified data and metadata.
Name and Identification of Your Molecule, Please
Along with the MIAPE document, PSI’s MIMix module was also published as a guide to scientists on “information to be supplied when describing experimental molecular interaction data in a journal article, displaying data on a website, or depositing data directly in a public database.”
MIMix, the authors said, “is not intended to allow an interaction experiment to be reproduced from a database record but to enable database users to quickly assess and focus on data relevant to them and then link to the source publications for the full experimental context.”
Among the kind of experimental data PSI recommends for submission are: the host system; interaction detection method; and participant identification method.
The authors also call for greater standards in molecule identifiers. “The single greatest source of data loss in transferring interaction data into a database is the use of ambiguous molecule identifiers,” according to the authors. “According to anecdotal estimates … as much as 70 percent of overall curation time is spent mapping molecule identifiers unambiguously to well characterized database entries.”
The PSI recommends that molecules include the database accession number from a public database. They also suggest that a molecule’s role in the experiment be classified by its biological role, for example enzyme or enzyme target, and the experimental role, for example bait or prey.
Finally, the authors recommend that all reported molecular interaction data be deposited in a public molecular interaction database before publication. The step has several advantages, they said. First, the databases will work more efficiently and have more direct access to the data producer if unclear issues need to be addressed.
Fellow researchers will also have more precise information in the databases, and journals and data producers will have access to consistently formatted database records, “which can be included in the supplementary material of a publication.” Additionally, they will have greater exposure for the publication “through cross references from the database records.”