Next month, Thermo Electron will become one of the first mass-spectrometer vendors to launch a software product that incorporates an open-data format called mzData, which may enable researchers using different mass spectrometry manufacturers to share peak data.
In addition to incorporating the latest version of mzData, mzData 1.05, Thermo’s software, called Bioworks 3.2, uses a new SRS file system and features an additional way of sequence scoring based on probability. The software is currently being beta tested, and is expected to be officially released at the PITTCON conference in Orlando, Fla., at the end of February.
“We’re really excited about the mzData,” said Shane Burgess, an assistant professor at the College of Veterinary Medicine at Mississippi State University, who has been beta-testing the software since spring of last year (see Proteomics Pioneer, p. 6). “We work with a lot of non-traditional genomes, and we collaborate with a number of chemistry departments that have a number of mass specs that are not Thermo. Because of the fact that we work in these non-traditional species, we like to be able to get inside what’s going on with the searching.”
Thermo Electron is one of a slew of mass spectrometry vendors that have been working with the Human Proteome Organization to adopt mzData into their mass spectrometry software (see ProteoMonitor 11/19/2004). The adoption of the open-source data format by these vendors is aimed at enabling researchers to share data more easily by standardizing peak lists in one data format.
Other vendors that are adopting the mzData format include Agilent, Bruker Daltonics, Matrix Science, Kratos, Applied Biosystems, and Waters.
“We have been working the entire time with Chris Taylor and Weimin Zhu at the European Bioinformatics Institute to incorporate the data format into our software,” said Robert Barkovich, product marketing specialist for bioapplications software at Thermo Electron.
While adopting the mzData format is in general straightforward, one of the challenges is making sure that versions of mzData are back compatible, said Barkovich.
“We’re not sure how often [the versions of mzData] are going to change, and we don’t want to have to adjust the software all the time,” said Barkovich. “Whenever they implement a new version of mzData, we want to make sure that previous versions can be maintained.”
Without the implementation of mzData, Thermo Electron data can be run through Mascot, but users can’t take Bruker’s or Waters’ data and run them through Sequest, said Burgess.
“All of the common mass specs — Brukers, Waters — we want to use the data off their machines,” said Burgess, the beta tester. “That’s why we’re really excited about mzData.”
The version of Bioworks 3.2 that Burgess is testing does not yet have mzData incorporated into it because Thermo Electron was waiting for the newest version of the open-source data format to be released before incorporating it into the software. The incorporation of mzData should not affect the rest of the software too much, said Barkovich.
“It’s a good way for people to get around standards,” said Barkovich. “If you have a raw file, you can transfer into an mzData file. And once we’ve come out with the mzData file, you should be able to transfer to one of our new SRS files as well. There’ll be a way of going back and forth between data formats.”
Burgess, who has used previous versions of Thermo Electron’s Bioworks software, including Bioworks 3.0 and 3.1, said that version 3.2 is much faster and more efficient because of the incorporation of the new SRS file system.
“It’s like night and day,” said Burgess. “The new file storage system has enabled us to massively increase our output. Our turnaround time is much faster now. It appears to be a smaller file, and the program can handle it easier.”
The software’s new probability algorithm, written by Fernando Maroto of Thermo Electron, was designed to give people more confidence in interpreting mass spectra.
“With Sequest, it’ll give you a lot of hits and you have to be somewhat judicious of [them],” said Barkovich. “You’re not going to miss a lot of data, but you may have additional data you need to look at to make sure that the hits are real. The probability algorithm is another way of filtering this data and looking at things that may be more probable hits. It’s a statistical expectation in response to the [false hits] problem.”
Burgess said the probability feature is not particularly useful when analyzing data from non-traditional animals because researchers generally want to check through non-traditional data manually. However, the feature is useful in calculating the area under curves when doing protein quantitations, Burgess said.
While XML formats such as mzData are known for being bulky, Barkovich said he does not envision the open data format slowing down processing because users can convert the mzData format to the SRS format.
“MzData will be used for going back and forth,” said Barkovich. “Most of the work in everyday operation will be using SRS.”