A team of researchers in Germany hopes to soon release a new open-source mass spectrometry-analysis software designed to help people developing proteomics software to manage, evaluate, visualize, and analyze liquid chromatography and mass spectrometry data, ProteoMonitor has learned.
The software, called OpenMS, is an open source framework library in C++ that should make it easier for algorithms and data to be shared.
“OpenMS is not intended for end users, but for people developing software for computation and proteomics,” said Oliver Kohlbacher, a professor of simulation and biological systems at the University of Tubingen, who is one of the lead developers of OpenMS. “The library should allow people to import and export a number of file formats and to build their own applications without too much effort.”
Kohlbacher began developing OpenMS two years ago in collaboration with Knut Reinert, a professor at the Institute of Computer Science at the Free University Berlin.
“We basically noticed there’s little software out there to handle proteomic analysis,” said Kohlbacher. “In particular in proteomics, we need more open- source software. Right now it’s a major hurdle towards developing more sophisticated algorithms.”
Kohlbacher said he started out developing OpenMS for his own lab, which was working on software for differential analysis of blood, and for shotgun proteomics.
“Basically, wherever you can use HPLC/MS-based software, you can use this,” Kohlbacher said.
The researchers plan to release OpenMS in the spring. It is currently available, free of charge, in unfinished form, at http://open-ms.sourceforge.net/main.html.
According to the OpenMS website, the new software covers a wide range of functionalities needed for software development, including database support, signal processing, data reduction and data viewing.
Database support is necessary because high-throughput proteomics technologies generate a huge amount of data that needs to be annotated, analyzed, and stored. OpenMS offers database support through the QtSQL module, which allows for the use of a variety of different SQL databases.
Signal processing involves the processing of information related to mass spectrometry peaks, such as peak height, area, and centroid position. OpenMS uses a wavelet-based scheme for the processing of mass-spectrometry data that, according to the developers, effectively copes with the difficulties posed by applications in proteomics.
After processing, OpenMS aims to reduce the amount of data even further by determining all peaks belonging to the same molecule. According to the developers, OpenMS algorithms can further reduce LC/MS peak data to .5 percent of its original volume.
The data viewer that comes with OpenMS is called “SpecView.” The viewer can visualize 1D and 2D MS spectra both from files and from a database. The representation of the spectra is configurable and images can be saved or printed directly from the viewer.
“There are lots of people who would be interested in doing things in computational proteomics, but they are deterred by the data- exchange situation,” said Kohlbacher. “It’s hard to get data in the format that you want it. We figured if we have a library of open-source data, it will enable the writing of more sophisticated algorithms and the sharing of algorithms for proteomics analysis.”
A slew of mass spectrometry vendors are now adopting an open-data format called mzData that is designed to allow users to share data more easily by standardizing peak lists in one data format. OpenMS differs from mzData in that it is a software library that can be used to build software, rather than a standard for data exchange.
“There is a lot of software implementing mzData, but it doesn’t allow you do any processing,” Kohlbacher explained. “Open MS allows you to actually do something with the data.”
OpenMS, like mzData, will not allow for the exchange of raw mass-spectrometry data because raw data comes in proprietary formats.
“We have access to some raw data formats, but we can’t distribute that as open source software [because it is proprietary],” said Kohlbacher. “There’s very much reluctance to exchange when it comes to raw data formats. I think there’s a lot of hesitation from instrument makers to disclose that because they see that as their intellectual property and they are afraid to expose the internals of their machine.”
OpenMS is still a work in progress, said Kohlbacher. Some aspects of the project that have not been finished include the visualization of data and the integration of additional file formats.
The project is expected to be finished in spring of 2005, Kohlbacher said. Once it is finished, the developers will publicize the finished product on the web and at meetings. They will seek feedback from the community regarding how well the software works and additional features that should be included.
“It’s an open-source initiative, so anyone who feels they should be a part of it is welcome to contribute,” said Kohlbacher.
Aside from Kohlbacher and Reinert, other major players in developing OpenMS include Eva Lange, Marc Sturm and Sandra Lovenich, all PhD students at the University of Tubingen; Clemens Gropl, a postdoc at the Free University Berlin; and Andreas Hildebrandt, a PhD student at Saarland University in Saarbrucken, Germany.