In an effort to support the daunting bioinformatics requirements of its SOLiD next-generation sequencer, Applied Biosystems has enlisted the help of Geospiza and GenomeQuest as its first commercial partners to develop an integrated suite of third-party tools for analyzing data from the instrument.
ABI said this week that it had signed formal agreements with the two firms under its Software Development Community program, which it created in 2006 to help encourage third-party software development around its instrumentation platforms and then expanded last year in order to focus on the challenges of analyzing and managing data generated by next-generation sequencers [BioInform 09-07-07].
ABI officials told BioInform that the company plans to include other informatics providers in the program, and is currently in discussions with several potential partners but has not yet signed any formal agreements.
Michael Hadjisavas, director of commercial development at ABI, said that the company is structuring these agreements to protect the confidentiality of commercial partners in the Software Development Community, which has to date involved primarily academic developers.
“As we invite these members into the software community, there’s the need to have agreements in place that protect the business interests of both parties,” he said. “A large part of this has to do with confidentiality, protecting those interests, so they feel unencumbered and free to work with us so that they can develop their products, services, and capabilities around SOLiD.”
Under the terms of the Geospiza and GenomeQuest agreements, ABI has shared file formats, sample data sets, and analysis pipeline information with both firms so that they can optimize their products for handling data from the SOLiD.
Officials from both Geospiza and GenomeQuest stressed that they plan to make their software compatible with all next-generation sequencing platforms, but noted that the ABI program has made it easier for them to ensure compatibility with the SOLiD.
For example, Rob Arnold, president of Geospiza, said that the company’s flagship Finch data-management platform “has now been adapted to handle all the different workflows that exist not only with SOLiD, but all the other next-generation instruments.”
Nevertheless, he noted that the ABI program “enables us to provide a more seamless integration between our software and the instrument, so that will give our customers a substantial improvement in performance.”
Likewise, Ron Ranauro, president and CEO of GenomeQuest, said that the firm is currently “in discussions at differing levels with all the other” next-gen sequencing vendors, but said that ABI is different in that it offers “a structured program that allows us to approach their customers in a way that signals that we’re working together and there’s a level of competency that they’ve validated.”
Exchanging information with ABI is “going to help us render results from SOLiD in a way that best shows the advantage of SOLiD, but nothing that we do will preclude us from doing same thing for Illumina or 454,” Ranauro said. “Our solution won’t be tied to a vendor, but we will be looking to take the best advantage of the vendor platform.”
A Burgeoning Market
Bioinformatics firms like Geospiza and GenomeQuest stand to benefit from the rise of next-gen sequencing instruments — a trend that has driven a surge in demand for new tools to capture, analyze, and manage the enormous quantities of data the new tools yield.
Instruments like the SOLiD, 454 Life Science’s GS 20 and GS FLX, and Illumina’s Genome Analyzer have opened up a new market for sequencing technology, expanding the reach of such systems beyond the major genome centers and into smaller core labs and research centers. Many of these customers, however, are finding that they lack the bioinformatics resources required to properly support these systems.
Arnold noted that a single next-generation instrument “is effectively a genome center” in terms of data throughput. “If you’re a genome center and you have all the staff and resources, you can pretty well manage these types of instruments,” he said. “But if you’re a core laboratory that doesn’t necessarily have the full IT staff and so forth, you have to be thinking pretty pragmatically about how you’re going to deal with all the data that comes off these instruments.”
Arnold said Geospiza customers who have purchased next-gen sequencers “usually buy about 10 terabytes of storage for their first year of operation, and by the end of the first year they’re buying 100 terabytes of storage. So it’s that kind of scaling that they’re really faced with.”
Ranauro said that GenomeQuest determined that processing a single Fasta file from a next-gen sequencer could involve five to seven full-time employees, including a database administrator, a bioinformaticist, a web developer, a system administrator, a cluster computing technician, a security specialist, and other potential positions.
“It’s a significant investment, and it’s probably beyond the scope of most but the largest core labs or the largest sequencing centers,” he said.
Exchanging information with ABI is “going to help us render results from SOLiD in a way that best shows the advantage of SOLiD, but nothing that we do will preclude us from doing the same thing for Illumina or 454.”
This challenge was the driving force behind ABI’s decision to reach out to third-party software developers. “There is a broad recognition by the next-generation sequencing community that all of these instruments generate an enormous amount of data, and there are numerous things that need to be put in place in order to manage that enormous amount of data,” Hadjisavas said. “So where I think we find ourselves today is that … the community is very open and inviting of the next-generation sequencing companies to engage in these sorts of partnerships.”
He added that these partnerships should help ensure that SOLiD customers get the most out of their investment in the instrument. “For companies like AB, it becomes incumbent upon us to forge relationships with companies that provide those types of solutions so that when a customer has discussions with those entities, the work in the background has been done, whereby there is some level of compatibility in how their solutions interface with ours.”
Furthermore, he noted that ABI doesn’t expect to develop all the bioinformatics tools required for next-gen sequencing data on its own — especially for emerging applications in areas like epigenomics, transcriptomics, genotyping, and structural variation analysis.
“In order to really accelerate the development of tools to analyze data for all of those different types of readouts, we think it’s important for us to be able to work with a number of parties that have capabilities in that area,” he said. “Otherwise, any company like ours could be saturated in terms of its capacity to develop all of the analysis tools for those applications.”
Roger Canales, senior manager of the SOLiD Software Development Community, said that the company has developed its own analysis tools for the SOLiD, which are available with the system, including the SOLiD Analysis Tools (SAT), SOLiD Experimental Tracking Software (SETS), and SOLiD Alignment Browser (SAB).
While these tools overlap with some software that companies like GenomeQuest may provide, “the difference is that [GenomeQuest’s tools are] integrated into a suite of other software applications that address the downstream analysis once you’ve gotten past that point,” he said.
ABI is the first next-generation sequencing vendor to launch a formal partnership program with commercial bioinformatics vendors, but isn’t the only such firm to acknowledge the data-management challenges of the field. Earlier this week, Illumina’s CEO Jay Flatley said that the company plans to improve the way its Genome Analyzer manages and analyzes data.
“I think the biggest challenges in the small labs are on the informatics side, having both the compute infrastructure … and also the ability to run the software and do the data analysis,” he said during a conference call to discuss the company’s quarterly earnings results.
BioInform’s sister publication In Sequence reported this week that Illumina is developing a program that will reduce the amount of data that comes off the system. “That new capability will come to the market fairly quickly and provide great relief, not only to the single-user customers but also for the genome centers,” Flatley said.
The company is also defining computer configurations for customers and is “working very hard” on “making the analysis software package more and more shrink-wrapped” he said, adding that “it’s not fully available yet, but it will be.”
An Illumina spokesperson could not be reached for comment on whether the firm plans to support third-party software development for the Genome Analyzer.
Geospiza: IT and Data Management
Geospiza’s partnership with ABI is an extension of a longstanding relationship between the firms. Under the terms of the agreement announced this week, Geospiza will extend its Finch Suite to manage data from the SOLiD alongside that of ABI’s 3130 and 3730 sequencers.
The company is developing a version of Finch called FinchLab Next Gen Edition that will manage data from next-gen sequencers and “interface with a high-performance storage system” that Geospiza will resell under an agreement with an undisclosed hardware provider.
Arnold said that Geospiza has been adapting Finch for next-gen sequencing data “in stealth mode” for around 18 months. “As the next-gen instruments were starting to hit the market, it was still pretty new for folks, and they were still trying to figure out the applications and how they’re really going to be using them,” he said. “But we started work on our third generation of the Finch system with the expectation that we’d be supporting next-generation technology.”
Geospiza plans to adapt Finch so that it can process both Sanger and next-gen sequencing data using the same processing pipeline, which will allow researchers to integrate and visualize data from multiple platforms in the same environment. FinchLab Next Gen Edition will include data management systems for defining experiments, the ability to track data through production processes, and genetic analysis tools.
GenomeQuest: Sequence Search and Analysis
GenomeQuest’s software is expected to complement the Geospiza Finch platform. The company specializes in rapid sequence search and alignment tools, and maintains a comprehensive database of reference sequences that it updates continuously.
Ranauro said that the company released a module for its software last November called High Throughput Extension, or HTx, which should be particularly applicable to next-gen data. HTx is a desktop system that allows GenomeQuest users to easily script sequence analysis workflows. One feature of HTx is a component called HS3, or high speed sequence search, which is a suite of alignment tools that are three to four orders of magnitude faster than Blast, Ranauro said.
HS3 is “really geared toward finding highly similar sequences of varying lengths with varying levels of stringency, and that allows different mapping approaches to be used to increase the yield on next generation sequencing read results,” he said.
He said the platform should be particularly useful for resequencing, variant detection, and metagenomics applications.