This week, the Human Proteome Organization's Proteomics Standards Initiative released dataXML 0.9 — the first publicly available version of the new mass-spectrometry data format.
PSI has been working on the new standard since last June, when it first released a roadmap for merging two existing MS formats — mzData from the Institute of Systems Biology and mzXML from HUPO-PSI — in an effort to alleviate confusion in the proteomics community over which standard to adopt [BioInform 06-09-06].
Pierre-Alain Binz, a researcher at the Swiss Institute of Bioinformatics proteome informatics group who worked on merging the two standards, told BioInform via e-mail that dataXML 0.9 “is the first version made available outside of the developers who actually did the merge of mzData and mzXML.”
PSI is seeking feedback on dataXML 0.9, available here, in order to finalize version 1.0 of the standard around the time of its spring workshop in late April.
Eric Deutsch, a senior database designer at ISB and co-chair of the PSI MS group, noted in an e-mail that dataXML 0.9 “is based on what we thought were the best ideas from both mzData and mzXML,” and noted that the resulting format “is not a large departure from the previous two formats.”
Deutsch said that the PSI developers would like the proteomics community “to point out any issues they may have with the way we have structured the data model or if we have forgotten any use cases that should be considered before it is released.”
The biggest challenge in merging the existing formats, Deutsch said, was “a design philosophy difference between the two that needed to be resolved.” He explained that mzData relied heavily on ontology terms, which provided flexibility for encoding new pieces of information without changing the schema, “but on the other hand created a risk of slightly different dialects of mzData.”
The mzXML format, however, “employed a more strict schema, which allowed little ambiguity in how information should be encoded, but required a new version of the schema when new types of information needed to be accommodated.”
Binz noted that the developers have a laundry list of tasks to accomplish prior to launching dataXML 1.0, including the creation of the specification documentation and use case documentation and the development of “at least one” dataXML writer and reader program.
He added that PSI expects solid support for dataXML from industry. “The vast majority” of proteomics software and instrumentation vendors have been present at the past two PSI meetings, he said. “They have all committed that they will support the unified standard released by the PSI.”
Information on the PSI spring meeting, to be held in Lyon, France, April 23-25, is available here.