Following efforts by the Human Proteome Organization to standardize mass spectrometer-data output formats, a slew of mass-spectrometer and mass spectrometry data-analysis vendors are planning to adopt an open-data format called mzData designed to allow users to share data more easily by standardizing peak lists in one data format.
“It’s a good step forward,” said John Chakel, a product manager for proteomics and metabolomics software at Agilent, which, along with Bruker Daltonics, Matrix Science, Kratos, ABI, and Waters, is planning to adopt the new format. “It’s a means that allows individuals to share data more easily, though to some degree it still puts the burden on the instruments that have generated the data in terms of how well they perform at peak picking and generating peak lists.”
According to Chris Taylor, a software engineer at the European Bioinformatics Institute who helped organize the collaborative effort between companies and academic institutions to produce mzData, the new data format was first “put to keyboard” at last year’s American Society for Mass Spectrometry meeting in Montreal.
Then, in April of this year, about 10 mass-spec vendors met again in Nice, France, to hammer out more details of the format. Following some final editing, many of the companies are now ready to release products that use the new format by spring 2005.
Chakel said that Agilent will incorporate mzData into its next release of SpectraMill in the late spring or early summer of next year, while David Creasy, the technical director of Matrix Science, said that mzData would be incorporated into Mascot 2.1 beginning next year. Other vendors are expected to follow suit.
Taylor said that while there had been resistance in the past to adopting an open data format, mzData has succeeded in being adopted by many vendors because it was collaboratively developed, with representatives from companies contributing time and resources to develop the format.
“It wasn’t that it required brilliance to do it on anyone’s part,” said Taylor. “It’s not exactly rocket science. It was a political job of making sure that everyone’s seen it. All the mass-spec people were informed, and took part in various degrees.”
There are other open data formats that have been developed to deal with mass-spec data, but they have not been adopted by vendors, probably because the vendors were not consulted during the process of development, Taylor said.
One of the other open-data formats designed to deal with mass-spectrometry data is mzXML, which was designed by the Institute of Systems Biology in Seattle. MzXML differs from mzData in that it allows raw mass-spectrometry data to be shared with other users, as well as peak lists.
“There are minor differences between mzData and mzXML. MzData is the more flexible of the two — there’s almost no specific content in there,” said Taylor. “For some reason, mzXML just hasn’t quite caught on in the same way as mzData, and I think it’s just the sociological aspect of it — that mzXML wasn’t paraded around in the same way with vendors as mzData.”
Taylor said that there has not been a great push by scientists to share the large raw data files that emerge from mass spectrometry analysis, so adopting mzData over mzXML was not a hard decision.
“We think that on a broader range, the mzData will be the more general path,” said Herbert Thiele, the director of bioinformatics at Bruker Daltonics. “If everyone can make use of the information at the peak list level, that greatly accelerates the knowledge. It will help to make use of the instrumentation, to make use of the mass spectrometry techniques and to make use of the software tools in proteomics experiments.”
Thiele noted that it is in the vendors’ interest to reduce any incompatibilities between mass spectrometry data, because when a lot of different interfaces are implemented, it means time devoted to fixing those interfaces so that they can handle different file formats.
“Fixing interfaces is very time-consuming, and, actually, for a lot of mass spectrometry suppliers, it takes up development time and keeps resources busy that you could invest much better into other activities,” said Thiele. “We are therefore very appreciative of the activities initiated by EBI and HUPO’s Proteomics Standards Initiative to avoid incompatibilities.”
Following the development of mzData, EBI is now working on developing mzIdent, another XML format that will unify results from different search engines, such as Mascot and Sequest, facilitating cross-comparisons.
Taylor said he thinks that now that mzData has been accepted and adopted, the development of mzIdent will move forward at a faster rate.
“It’s going to provide a point of focus for people doing mainstream analysis of protein search engines,” said Taylor of mzIdent. “It will provide a central nexus point through which everything passes. Data in large repositories will be able to synchronize with one another.”