There is an abundance of open-source bioinformatics software available for sequence and gene-expression analysis, but proteomics software has traditionally been a proprietary effort, with commercial packages like Mascot from Matrix Science and Sequest from Thermo Electron leading the field.
That dynamic is shifting. Several groups have recently released open source software packages to identify peptide sequences using tandem mass spectrometry data. One of these, X! Tandem, developed by Ronald Beavis of Beavis Informatics, makes up part of a larger open source package called the Global Proteomics Machine that was released earlier this year (http://www.thegpm.org/). Another, called OMSSA (Open Mass Spectrometry Search Algorithm), from the National Center for Biotechnology Information, is already available for download (http://www.ncbi.nlm.nih.gov/Structure/OMSSA/), and a web-based service similar to NCBI’s Blast is planned for release before the end of the year.
Proteomics software “lagged genomics a little bit in terms of discovering the basic techniques that could be used,” said Beavis, and “became proprietary almost immediately.” One reason for this, he surmised, was that this late start placed the bulk of proteomics informatics development in the mid to late 1990s, “and if you remember back then, everyone was going to be a millionaire on the Internet, and I think that kind of affected the scientists associated with developing commercial informatics.”
Beavis’s own software development efforts at that time resulted in a suite of products marketed by ProteoMetrics, a bioinformatics company that was acquired by Genomic Solutions in early 2002. In launching his consulting business in late 2003, Beavis said he opted to take a different approach.
“My current strategy is informed by my experience at ProteoMetrics,” he said. “Because of the low volume of software you can actually sell into a fairly niche sort of thing like proteomics, you really have to price the product at the level of the other proteomics search engines like Mascot or Sequest,” he said. “You’ve got to be in the $6,000 to $10,000 range almost to break even. But with Proteometrics, we made almost all of our money from consulting. …
“So I decided, when formulating the new company, just to make the software open source and to go for the side of the business where I know the commercial value is, which is the consulting,” Beavis added.
Beavis Informatics and a number of academic collaborators — the Manitoba Center for Proteomics, Rockefeller University, the University of Michigan, and Michigan State University — released X! Tandem as the first tool in the GPM project last September. The project was initiated to meet a growing demand in the proteomics research market for open alternatives to commercial packages.
“Proteomics informatics had become so proprietary that it was difficult for groups doing large-scale proteomics to really effectively use a lot of the software,” Beavis said. “Since you can’t get in under the hood and change output formats or compensate for different input formats, you really don’t have very much control over how your data is analyzed.”
Lewis Geer, an NCBI developer who leads the OMSSA project, agreed that the demand for open source alternatives to commercial packages was driven by a desire for transparency — not cost. “People wanted software that wasn’t a black box,” he said. With commercial packages, “they don’t really understand what the algorithms are doing and it’s affecting their results.”
An Open Playing Field?
X! Tandem and OMSSA both claim to be much faster than methods like Mascot and Sequest, and they both rely on methods similar to those that make Blast so much faster than Smith-Waterman for sequence alignment. Beavis said that X! Tandem “does a quick pass first through the proteome, selects the proteins that are most likely to fit, and then looks at them very intensely.” This allows the algorithm to run a thousand times faster than “conventional” algorithms, he said.
Geer said that OMSSA is “several times” faster than Mascot and other algorithms.
The two methods are entering the public domain almost exactly two years after a researcher at Washington University in St. Louis pulled a similar academic software package, STLmass, from his website after allegedly receiving a cease-and-desist letter from Thermo Electron, which holds an exclusive license for the University of Washington’s patent covering Sequest (see BioInform’s sister publication, ProteoMonitor 10-28-02 for more information on this incident).
Some say that Sequest’s patent claims are excessively broad, and could stifle development of alternative software packages for converting raw peptide tandem mass spectra into protein identifications. At least one such program is not publicly available for exactly this reason. The Institute for Systems Biology, which has released most of its software under open source licenses, including an MS/MS peptide identification tool called ProbID (http://projects.systemsbiology.net/probid/), has decided not to release a similar program called Comet.
A notice on the ISB proteomics software page claims that “when there are no longer IP issues surrounding uninterpreted tandem mass spectral database search routines, as ProbID and X! Tandem are paving the way for, COMET will [then] be distributed as open source.”
Geer and Beavis both claim that the underlying methods behind their software packages are novel, which may insulate them from a similar situation. Beavis said he has never encountered any IP-related obstacles in his career in the field. Thermo “certainly never mentioned anything like that to me at ProteoMetrics,” he said. “I really don’t know what their plans are for enforcing that particular patent, but I’d be surprised if they’d spend a lot of money on it.”
Beavis and Geer have both taken a low-key approach to releasing their software. “I’ve been putting a lot of effort into making sure everything is working properly before I actually try and promote it too heavily,” said Beavis. “My experience with ProteoMetrics is that it takes three to five years for this community to really adopt things in a general fashion, so I think we’re still very early in the product cycle for this.”
Geer echoed this sentiment. “We’ve been relatively quiet about it, he said. “We wanted to get a few beta testers for it first.”
But while NCBI has no plans to benefit commercially from OMSSA, Beavis is building a business around the open source software he developed. While Beavis Informatics has “no interest whatsoever in commercializing this software,” Beavis said that he has already signed consulting contracts with several companies and academic groups who are using X! Tandem and other GPM tools. So far, he said, Eli Lilly, Merck, and USDA are among the larger groups using the package, while total downloads are in the 200-300 range.
Beavis said he doesn’t see his software or his company as direct competition for firms selling shrink-wrapped proteomics software. “X! Tandem was really an attempt to move the field forward on the intellectual and academic side rather than produce a product that would directly compete with those companies,” he said.
At least one of those firms agrees. John Cottrell, CEO of Matrix Science, pointed out that there are certain challenges that open source software development projects face that commercial efforts don’t. “It will be difficult for open source proteomics software to offer a stable alternative to commercial software over the longer term,” he explained in an e-mail to BioInform. “The general trend is that programs adopted by commercial companies survive for an extended period, while those that are not commercialized fade away through lack of on-going development and support.”
He added that general-purpose open source software packages like Linux or Apache have large user bases that can support commercial firms that make their money through consulting or technical support, but “this doesn’t work well for very specialized proteomics applications.”
Open source software offers many benefits for research groups who don’t require formal support, Cottrell said, but “if researchers simply want to use the applications, and have confidence that technical assistance will be readily available, and the product will continue to be developed and improved, the traditional model will continue to apply.”
But according to Beavis, the smaller scale of the proteomics community may work to his advantage. “As a consultant, being the person who wrote the software that a company is using is a good position to be in,” he said.