The developers of the ProteoWizard proteomics software platform announced this week that the software now allows sharing and integration of mass spectrometry data across all vendor formats.
The platform is the first to fully integrate data from all major vendors – AB Sciex, Agilent, Bruker, Thermo Fisher Scientific, and Waters – and aims in this way to improve the ease of proteomics data sharing and software development, said Parag Mallick, the platform's lead developer and the director of clinical proteomics at the Center for Applied Molecular Medicine at the University of Southern California.
The software's developers introduced the new capability in a brief correspondence published this week in Nature Biotechnology.
Although open formats for mass spectrometry data have existed for years, no single software platform has allowed researchers to easily share and compare data generated on instruments from different vendors.
"Many labs would have a particular instrument – a Thermo machine, for instance – so they could read Thermo files and manipulate them," Mallick told ProteoMonitor. "But if they then wanted to share them with a collaborator, and particularly with a bioinformaticist collaborator who didn't have an instrument, they were really stuck. So the ability to share your data with other people who didn't have your instrument was a major bottleneck [for the field]."
Open formats like mzXML "helped with that somewhat," he said. "But even beyond the open formats there is a lot of data in the primary instrument file, and you want to be able to access it."
And while many researchers and vendors developed their own converters for moving between different file formats, this proved a tedious and time-consuming undertaking.
"It was a tremendous hassle," Mallick said. "I came from [the Institute for Systems Biology] where we were working on building sets of converters, but, again, it ended up being that every mass spec data type had its own converter… and depending on which vendor you were interacting with, you had to do a different process."
Given these issues, he noted, a system was needed to integrate data converters for all the vendors into a single coherent platform that would be "completely agnostic to where your data came from."
"We had to write an interface that would be transparent for developers, so that regardless of where the data came from it came into an interface that was the same," Mallick said. That way, developers could "write a single algorithm and have it apply to data from any vendor."
"Because, again, you had this problem that of saying, 'Oh, I have this Thermo instrument in my lab, let me write some software.' And it would work great on the Thermo instrument, but it could never be applied to any other instrument from anywhere else," he said.
To put together a platform integrating the various formats, the developers needed cooperation from the vendors – particularly with regard to obtaining the licenses necessary to share and exchange the data in these proprietary formats. And, Mallick said, at first the vendors weren't particularly interested in being part of the process.
"Even when the early converters were being built, the vendors weren't particularly supportive," he said. "They sort of liked the fact that in order to read their data files, you had to have their software."
Beyond that, Mallick suggested, working on data format converters simply wasn't that high of a priority for the vendors. "Creating [such] modules takes work, and they were busy building better instruments."
Recently, though, vendor attitudes toward such efforts have "absolutely changed," he said.
"I think through things like [the University of Washington's] Skyline [software] and some other academically created software, vendors are now seeing value in having users exchange their data and share their data and having bioinformatic developers write tools to analyze their data," Mallick said.
Indeed, in an interview with ProteoMonitor at this year's American Society for Mass Spectrometry annual meeting, Brad Gibson, director of the chemistry and mass spectrometry core facility at the Buck Institute, cited Skyline – whose developers are also part of the ProteoWizard project – as an area where academic and open source software development was perhaps outpacing work being done by vendors in house (PM 6/8/2012).
"They're pushing the boundaries of trying to get the kind of software tools in an open source platform for doing these kinds of analyses," he said. "Every vendor is scrambling to do their own in-house versions, because they pretty much have to, but right now you see the open source community right in line with what the vendors are developing."
"And this is something very new," Gibson added. "I've never seen it quite like this before where there's almost no lag time, where, in fact, you might even say that the open source community is even ahead of what the vendors have. The vendors are producing the instruments that allow you to do this, but already the open source and academic community is moving ahead in terms of how to do the data analysis and processing."
"I think [vendors] are beginning to appreciate that there is a huge community of willing and excited computational biologists out there and that if they give them access to their data they might come up with something pretty awesome beyond what they would have done themselves," Mallick said.
He added that the growing emphasis in the field on data sharing and multi-lab consortia-based projects was further incentive for vendors to cooperate with the project.
"This all goes hand in hand with the [National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium] efforts and the large consortium efforts and just standardization in the field overall," Mallick said. "As those [projects] start to become priorities, data sharing, data integration, and data openness all become very significant issues."
To secure the necessary vendor licenses, the developers formed a foundation – the ProteoWizard Software Foundation – which enabled the vendors to sign agreements with a single entity rather than having to sign licenses with every individual lab or industry player wanting to access their proprietary formats through the platform.
The foundation is currently supported by individual researchers working under their own individual grants, and does not have a dedicated source of funding, Mallick said, adding that this is "the one risky" factor for the project.
He suggested, though, that vendor support might be forthcoming in the future, noting that they have agreed to support their own modules and that "it's an open source project, so vendors can hire their own programmers to contribute code to [it]."
"We're in sort of a great situation now, because now that all the vendors have subscribed to it, if they stop supporting it then they are now the vendor that doesn't support it," Mallick said. "As opposed to before when they would have been the only vendor who did it."