SEATTLE, June 1 (GenomeWeb News) - The Human Proteome Organization's Proteomics Standards Initiative announced this week that it will combine the current HUPO-PSI format, mzData, with the mzXML format developed by the Institute for Systems Biology.
The new, combined format will be called "dataXML." PSI officials said they expect the dataXML project to be mostly completed by the end of the year. They made their announcement at the American Society for Mass Spectrometry conference, held here this week.
"This is a major undertaking for the proteomics informatics community and represents widespread agreement on the need to improve data interchange," said PSI officials, who met here this week at the American Society for Mass Spectrometry conference.
The new format will incorporate features from both mzData and mzXML, including an interchange schema that has split data vectors compatible with other analytical interchange formats. It will also support both random access indexes and digital signatures via a wrapper schema.
The new format will also include tools to support developers and users, including a conocalization program to format legal XML documents before binary indexes or signatures are computed; a validation program to insure that the use of controlled vocabulary terms matches MIAPE requirements; an "Application Programming Interface" including language bindings for popular programming languages; and abstract data models and other documentation to help software developers who want to implement systems based on the interchange format.
PSI officials said they expect to complete a data model and ontology models in August, while documentation, draft specification of schema, and language bindings will be done in September. In December, they expect to complete binary indexing and signature programs, a validation program, and reference implementations of converters.