The Human Proteome Organization's Proteomic Standards Initiative has released a new version of its Molecular Interaction format for the representation of protein interaction data.
PSI-MI 2.5, released following a meeting last week of the PSI in Geneva, includes a curation manual that tells users how to put data into the format. MI 2.5 also allows for more information to be stored in the format. Instead of only storing only protein-protein interaction data, protein-DNA interactions can also now be stored in MI 2.5.
"The biggest improvement for 2.5 is that there's now a curation manual," said Chris Hogue, the principal investigator at Blueprint Initiative, which maintains the Biomolecular Interaction Network Database, or BIND. "You can define a schema, but if you don't write a corresponding book for how to put in data, it's a recipe for disaster. When 1.0 was implemented, there was no rulebook to go with it, so the 1.0 data had to be dealt with on a case-by-case basis and repaired."
Aside from implementing a new MI format, the PSI is also developing a controlled vocabulary for experimental methods.
"The biggest improvement for 2.5 is that there's now a curation manual. You can define a schema, but if you don't write a corresponding book for how to put in data, it's a recipe for disaster.
"A lot of problems originate form the fact that people are using different versions of names for example yeast-two-hybrid, or Y2H," said Lukasz Salwinski, an assistant researcher at the University of California Los Angeles-Department of Energy Institute for Genomics and Proteomics. "The controlled vocabulary deals with the different versions of names for the same technology."
The MI format was one of the first standard formats implemented by the PSI to allow for easier data sharing. Following the release of MI 1.0 in 2003, various software programs were developed that could read and import data from all interaction databases that supported the standard format.
"Instead of writing five different programs, each of which is going to work with only one database, you can write just one program," said Salwinski. "For example, with Hybrigenic's PIMWalker, you can import data not only from Hybrigenics, but also from MIPS, or BIND, or IntAct."
According to Alain Meil, the PIMWalker and PIMRider platform manager at Hybrigenics, PIMWalker has been downloaded 552 times since the protein interaction visualization tool was released in October 2003.
PIMWalker is freely accessible via the Hybrigenics website, while PIMRider is a more sophisticated tool that is not usually free, unless the user has paid for Hybrigenic services, such as screen or assay development, Meil said.
Other protein interaction visualization tools that have been developed using the MI format include ProViz and MINT Viewer.
Though some software tool developers view standardized data formats as an asset, Hogue said that because the formats are not lightweight, they may hinder applications, such as protein interaction visualization software, by bogging them down.
"When you're saving data into an application, you want it to be lightweight. The simple rule is if your race car is lighter, it'll go faster," said Hogue. "With applications, you'll find that the most efficient applications don't use the PSI format. Your interaction network illustration will respond faster if the data is more lightweight."
Hogue pointed out that the interaction viewer that BIND uses, Cytoscape, has its own format, which is more lightweight than the PSI format.
"The PSI format is not a panacea," said Hogue. "It has its job to do, which is to reconcile all the databases so that they have the same data. But I don't agree that the PSI format is the right way to deliver data to any of those interaction applications."
Having said that, Hogue added that the updated MI format will be very useful now that the executive teams of five major interaction database have signed an agreement to share curation efforts and exchange completed records through a mechanism known as the International Molecular Exchange, or IMEx consortium. Members of the IMEx consortium have agreed to curate papers only once in order to save resources and reduce redundancy.
"Before papers were read by five different people and data was entered in five different ways. Now it's going to be entered only once, and distributed between all the participating members." Salwinski said. He added that the IMEx consortium would not be possible without having first developed a standard molecular interaction format.
Though Hogue said he was pleased with the development of the IMEx consortium, he pointed out that Blueprint's funding problems could render all the developments moot, as far as BIND is concerned (see
"We don't have 70 people anymore. We have 12 people in Singapore working their hearts out, and they might not be able to continue that," he said. "I think if you look at all the IMEx partners and how much data they've entered into their database within the last year, you'll be shocked at some of the answers."
MI 2.5 holds great promise, Hogue said, but there is no guarantee that members of the IMEx consortium will continue to have the resources to deal with it.
"If there's no funding from granting agencies, there's not going to be any databases," said Hogue. "The new schema and curation manual it all holds great promise, but without funding it's moot."
Tien-Shun Lee ([email protected])