As Applied Biosystems prepares to launch its SOLiD sequencer this fall, the company is encouraging outside researchers and software vendors to develop and commercialize bioinformatics applications for the new platform.
As part of the initiative, ABI last week made freely available to researchers DVDs containing a software tool suite and a SOLiD data set, allowing them to align SOLiD reads to a reference sequence.
And in mid-July, ABI invited more than 40 researchers from 30 academic institutions and companies to its Foster City, Calif., headquarters for a two-day workshop to discuss software development for the instrument.
The reason the company wants to tap into outside expertise is that it cannot possibly develop software in-house for all the applications of the instrument. “It would be a substantial investment for anybody to develop the software tools needed to analyze the readouts for the constellation of applications,” Michael Hadjisavas, director of commercial development at ABI, told In Sequence last week. “We felt it would be much more productive to work hand-in-hand with some of our early-access customers and some of our key collaborators to … develop some of the tools.”
Attendees for the mid-July meeting included researchers from Agencourt Bioscience, Baylor College of Medicine, BC Cancer Agency, the Broad Institute, Children’s Hospital of Pennsylvania, Cogenics, Cold Spring Harbor Laboratory, Columbia University, GATC Biotech, GenomeQuest, Genome Institute of Singapore, Geospiza, Interdisciplinary Center for Biotechnology, the J. Craig Venter Institute, the Joint Genome Institute, Kiel University, Macrogen, Plant Research International, Sistemas Genomics, University of California San Diego, University of Delaware, University of Lausanne, Stony Brook University, University of Queensland, Washington University School of Medicine, and the Wellcome Trust Sanger Institute.
ABI hopes that bioinformaticians will develop analysis tools for SOLiD’s three main application areas: genotyping and sequencing; epigenomics; and transcriptomics and gene expression.
For example, “some of the people we have been providing data to before have been looking at using short reads in de novo assembly algorithms,” said Michael Rhodes, applications manager at ABI.
Initially, the company is providing interested researchers with a collection of software modules on DVD. These align SOLiD reads to a reference sequence, convert data from color-space — the instrument’s readout, where each color encodes two nucleotides — to base space, and vice versa, and evaluate the quality of reference sequences. The modules also support two-base encoding and mate-pair reads, according to ABI’s website.
Researchers can also obtain SOLiD data for the 2-megabase Streptococcus suis genome on DVD, a control organism ABI researchers have used, as well as the S. suis reference sequence and data analyses.
A number of collaborators have received other data sets as well, according to Rhodes.
One reason why ABI is actively encouraging outside software development before it launches the instrument is that customers need those tools in order to make sense of the data. “It’s a bit of a chicken and an egg” scenario, said Rhodes. “If you don’t have the software, you can’t analyze the data.”
Thinking about software early could also prevent new customers from being overwhelmed by the amount of data the instrument generates. It is easy to talk to customers about several gigabases of data, said Rhodes, but “until they actually see it, they really don’t realize how much data that really is, and some of their existing systems have to be remodeled. And it’s definitely a big challenge for many of the customers we spoke to.”
The University of Kiel in Germany, for example, expects to receive its SOLiD sequencer this fall, but its informatics experts are already thinking about developing ways to automate software modules for the platform, according to Michael Wittig, a bioinformatician in the Institute for Clinical Molecular Biology, who attended ABI’s workshop.
The Kiel researchers are especially interested in resequencing genomic regions that come out of large-scale genotyping studies.
Not just ABI, but all next-generation sequencing technology vendors are interested in third-party software development, according to Stephen Kingsmore, president of the National Center for Genome Resources.
“The issue for the instrument vendors is that they really want to sell instruments and sequencing kits,” said Kingsmore, who did not participate in the July workshop. “They don’t see software sales as a business, that’s not their core business. And yet, they created a product that must have pretty advanced compute power and software to use the data.”
So far, sequencing vendors have not been particularly proactive in developing advanced software for their instruments, Kingsmore believes. As a result, “people are not going to be buying 50 kits for sequencing, because they can’t do anything with the sequence they already generated,” he said.
NCGR has recently started offering a sequence pipelining service, using its Alpheus software. The service aligns sequence reads from Illumina’s Genome Analyzer, 454’s platform, ABI’s SOLiD, or Sanger sequencers to reference genomes or transcriptomes, and identifies base variants, splice isoforms, and genomic rearrangements. The center offers the service either on its own or in combination with sequencing services on its recently acquired Illumina sequencer (see In Sequence 6/26/2007).
According to Kingsmore, ABI is “in a unique predicament” because its SOLiD platform produces data in color-space, where each color represents two bases, meaning customers cannot use standard applications that were written for base sequences.
“The readouts [of the SOLiD system] are unusual,” compared to other sequencing platforms, agreed Gabor Marth, an assistant professor in the department of biology at Boston College who has developed software tools for several next-gen sequencers. “And the tools, although we think that they can be modified fairly easily to color-space reads, … have to be [rewritten].”
ABI is “pretty keen on getting the community involved, especially because of their color-space analysis,” said Marth, who also did not attend ABI’s July workshop.
In terms of informatics, instrument vendors typically focus on image processing and other areas “close to the machine” that have to do with proprietary technology, according to Marth.
“When you actually compose a sequence trace from the images, at that point, there is a lot of expertise out there that is not specific, necessarily, to their machine,” he said. “There is a good group of informaticians out there that are ready to jump on this.”
“The issue for the [sequencing] instrument vendors is that they really want to sell instruments and sequencing kits. They don’t see software sales as a business; that’s not their core business.”
Marth’s group has been focusing on SNP calling and resequencing analysis tools. “What you want from the company is early access to their newest data,” he said, as well as “the heads up on which ways things are going, and how things are changing.”
ABI’s new initiative expands on an earlier software-development program launched a year ago. Under that program, the company for the first time released file formats and software for its capillary sequencers and PCR instruments.
ABI has not decided yet whether it will distribute or support any of the software that might come out of the initiative. “We shall certainly facilitate disseminating the information” if developers make a software package publicly available, ABI’s Hadjisavas said.
So far, the initiative is “very much an embryonic program,” he said, although the July meeting has spawned “a couple of key relationships” and “ongoing discussions” with a number of researchers.
Helicos, 454 Weigh In
Other next-gen vendors are also taking software development for sequencing applications seriously.
Helicos BioSciences, for example, told In Sequence by e-mail that it will start two software-related marketing programs when it launches its Helicos Genetic Analysis System, currently planned for next year. The first program will provide researchers with information, standard file formats, published application programming interfaces, source code, documentation, and sample data so that they can develop analysis methods and tools. The second program will be focused on data storage, high performance computing, bioinformatics, and systems integration providers.
Helicos also said it is collaborating with “thought leaders in the bioinformatics community” to help develop analytical techniques that do not yet exist for certain application areas.
Tim Harkins, Roche’s marketing manager for genome sequencing, told In Sequence by e-mail that 454 has “several active relationships where researchers have either worked independently of us or as collaborators to provide additional [software] enhancements for the entire research community.”
For example, the company worked with scientists earlier this year to generate 454 read traces that can be viewed in the Consed assembly and editing program.
The company, which has “always provided open source” for its software to researchers, is also working with a “leading academic organization” to develop a per-read quality score software application that provides Phred-based quality scoring, according to Harkins.
Since 454’s platform has been on the market for a few years now, “many aspects of our software have been well validated,” he said. Existing analysis tools can be modified as 454’s sequence reads increase, he added.