Leaders of the HUPO Proteomics Standards Initiative are working to engage the wider proteomics world in their effort to create bioinformatics standards for the submission and storage of mass spec data, protein interaction data, and general proteomics experimental information.
“This is not a process that we can dare to go any further with on our own — we need the participation of the community,” Sandra Orchard, a scientist at the European Bioinform-atics Institute at EMBL and one of PSI’s coordinators, said following a lecture during a session on the initiative at the HUPO Congress Oct. 9.
Currently, there is no common language among protein interaction databases like BIND and Hybrigenics, no widely accepted public reposi-tories for mass spectrometry-based protein identifications, and no common way of describing proteomics experiments. This makes data retrieval and comparison for proteomics experiments difficult. Of course, as several attendees at the HUPO conference pointed out, there is also no set standard for performing the experiments themselves — but the PSI has chosen not to involve itself in this problem. “That was wiped off the table quite quickly as Mission Impossible,” PSI chair Rolf Apweiler, also of the EBI, said. “Biologists are individualists.”
Apweiler kicked off the HUPO session by announcing that the first version of the HUPO PSI Molecular Interaction format in the XML programming language for describing protein-protein interactions, or PSI-MI XML, had been submitted to a major journal and that the group was currently working through the reviewers’ comments. Henning Hermjakob, sequence database group coordinator at the EBI and leader of the protein interactions workgroup within the initiative, later told ProteoMonitor that he hoped the paper would be published in early 2004.
Publication in a journal may be all the more important given the emphasis that the PSI organizers are placing on the participation of journals in creating a standardized database. “Today data is more important than paper,” Apweiler said in his talk. Unlike microarray data, which several journals now require scientists to submit to a public database in standardized format before it is published, the lack of such a requirement for proteomics makes data impossible to collect, Hermjakob later explained. “The current motto for proteomics data still seems to be ‘publish and vanish,’” he told ProteoMonitor. Hermjakob acknowledged that the effort to create such a requirement was garnering some resistance, but said that “overall the scientific community is well aware of the long-term benefits of such requirements.”
This awareness of long-term benefits in the midst of short-term sacrifices is what Weimin Zhu, leader of the mass spec PSI workgroup, is counting on as he continues to try to entice vendors into participating in the effort. Zhu has already succeeded in roping vendors into cooperating on negotiating the conversion of proprietary mass spec input formats into a standard format, and he is currently working with Bruker Daltonics, Waters, and Ciphergen to create draft XML-based schema and a standard peaklist output format. He is also working with Waters and Matrix Science to implement standards and format them in their search engines, and he said Protagen, Thermo Finnigan, and Shimadzu have contributed to the project as well.
In addition to standardizing the input and output of data, PSI is also focusing on creating a standardized vocabulary and method for recording the plethora of information required for describing the different parts of a proteomics experiment. The need for standardizations in description is clear: According to Orchard, there are, for example, over 20 different ways that the term “yeast-two-hybrid” has been entered in various databases.
PSI is approaching this issue in several ways. Chris Taylor, another PSI leader based at EBI, said that organizers hope to eventually establish an official PSI-sponsored ontology — a controlled vocabulary used to describe the relationships among the different pieces of data in a database. This will only work, however, if the scientists doing the experiments are willing to take the time to enter their data in the ontological format. “We intend … to investigate the issues around getting experimentalists, who care little about the added value obtained from the use of such ontologies, to use them — this involves [making] an interface [that] is simple and quick enough that users won’t ignore it and just type things in as free text,” Taylor told ProteoMonitor in an e-mail.
Taylor has already taken several steps toward the development of a related standardization effort: MIAPE, or Minimum Information About a Proteomics Experiment. He co-authored a paper in March with Ruedi Aebersold, John Yates, and two dozen other scientists across the US and Europe describing a pilot project called the Proteomics Experiment Data Repository, or PEDRo, that begins the work of developing a proteomics equivalent to the already implemented MIAME, or Minimum Information About a Microarray Experiment, that gene array experimenters are required to provide to many journals when describing their experiments.
The purpose is to aid researchers in being able to recreate the experiment that another researcher did, and to standardize the language used in the descriptions. Taylor said that PSI wants to collaborate with the microarray standards efforts to “afford us a common handle for the various ‘omics’ experiments” so that coordination among analysis of the transcriptome, proteome, and metabolome can eventually be achieved. Currently, Taylor said in his talk at HUPO, the project is still in the proof of concept stage, but “we intend over the next year to get a standard to recommend to the community.” In between now and then, Taylor is seeking as much input from proteomics and microarray experts as he can get in order to optimize the model.
The next PSI meeting is scheduled for January 2004.