To push proteomics standards further along, HUPO’s Proteomics Standards Initiative (PSI) held its second meeting last month at the European Bioinformatics Institute in Hinxton near Cambridge, UK, attracting 58 participants.
Progress was made both in the areas of protein interaction and mass spectrometry standards, PSI’s initial focus, according to Henning Hermjakob, sequence database group coordinator at the EBI and one of the meeting organizers.
Moreover, “as a response to community requests,” PSI has now expanded its scope to standards for all types of proteomics data, he said, including, for example, technologies like liquid chromatography or ICAT. “We want to develop an overall proteomics format with shared components and specific modules for the specific experimental technologies,” Hermjakob said, acknowledging that this will be a complex and long-term project.
The aim is to develop standards that would emerge along with the fledgling HUPO initiatives, which will soon start to churn out data. “We hope to start with a simplified standard…relatively soon,” Hermjakob said. “We want to be ready when the data is ready.” Rather than starting from scratch, he said, PSI works together with existing projects, for example PEDRo.
The XML model for the exchange of protein interaction data that came out of the first conference in October was revised during the recent meeting, partly to make it compatible with BioPAX, an emerging standard for protein pathway data. According to Hermjakob, PSI is planning to publish the XML schema and controlled vocabularies by April in a scientific journal and propose them as a standard.
Most importantly, a number of protein interaction databases are already on board: Databases and companies that were represented at the meeting said they intend to adopt the standard by the end of this year. These include BIND, DIP, MINT, IntAct, and Hybrigenics. Missing from the list are a number of commercial providers as well as the MIPS database and Cellzome, said Hermjakob, but he believes that at least some of them might join later.
Once the standard is published, PSI also intends to approach journals and ask them to recommend submission of protein interaction datasets in the new format.
One focus of PSI’s mass spectrometry working group has been to find a way to exchange peak lists between mass spectrometers from different vendors easily, according to Weimin Zhu, head of database applications at the EBI and co-organizer of the PSI meeting. During a meeting directly preceding the PSI conference, three mass spec vendors — Waters, Ciphergen Biosystems, and Bruker Daltonics — said they would be willing to establish and support a standard representation for proteomics-related mass spec data, he said.
PSI is now collaborating with ASTM International, a standards organization that publishes the current netCDF-based mass spectrometry standard. Randall Julian, a senior research scientist at Eli Lilly and chair of an ASTM group developing analytical data exchange protocols, is aiming to establish a consensus among mass spec vendors on an XML protocol for proteomics data interchange.
According to Zhu, Julian is working with vendors on a draft for an XML-based standard representation of annotated peak lists and plans to present it at the June meeting of the American Society for Mass Spectrometry. “This generic protocol will be instrument-independent,” said Zhu. “People in the lab can take this output from the instrument and work with the files right away. Since it is in XML format, it will also be easy to write a program to parse the data, to extract it, and to put it into a database.”
In addition to a standard for the output from mass spec instruments, PSI also aims to develop a generic input file format and a standardized output file format for various database search engines. At the moment, Zhu said, most search engines only accept certain data formats, and produce results in HTML format that cannot be easily parsed, extracted, and put into a data repository. Two software vendors who attended the meeting, Waters and Matrix Science, have already agreed to be actively involved in the effort, Zhu said. However, more vendors need to come on board, he added.
Finally, the mass spec group is also working on guidelines — similar to the MIAME guidelines for microarray experiments — for submitting mass spec data to a future data repository. “The data should meet minimum requirements for publication, for data comparison and exchange, and for repeating the experiment,” Zhu said. Two sub-groups are currently working on data models — one focusing on mass spec data, the other one on sample preparation or other data that precedes mass spectrometry — which should be ready by the next HUPO congress in Montreal in October.