Illumina officials said this week that they are working on a streamlined informatics workflow for the company’s Genome Analyzer sequencing platform that they expect will reduce the analysis time for a typical run by a half day or more.
The company is also extending its Illumina Connect partnership program for third-party bioinformatics providers to support software development for the next-generation sequencer.
Jordan Stockton, market manager for computational biology at Illumina, told BioInform that the company is working on a new informatics configuration that will reduce the data-transfer times associated with the Genome Analyzer. The system currently generates more than half a terabyte of data per run, which takes around six hours to transfer onto external hard drives for downstream processing.
“One thing we’re doing to help our customers in that regard is taking some of the primary analysis that’s currently done off the instrument and moving it onto the instrument,” Stockton said. “So they can make the decision about what data they need to physically move … at the time of the run.”
Stockton noted that for certain experiments, “you may need to keep only one-tenth or one-thousandth of the number of bytes that you collect,” but said that with the current configuration of the system, most users aren’t able to analyze the data that is of interest to them until it has all been transferred to external drives.
“By making the decision point about what you decide to keep or not to keep at the time of the run, you simplify the data-management challenges,” he said.
Stockton declined to provide a timeline for when the company expects to make this new capability available, but said it will be “soon.”
The benefits for Genome Analyzer users should be “significant,” according to Scott Kahn, chief information officer for Illumina. “It’s the difference between tens of gigabytes and multiple terabytes” that they need to transfer off the machine, he said.
“For the average customer,” Stockton said, “we’re looking at between a half day to full day time savings per run.”
The improvement could be welcome news to users who have reported a number of challenges associated with the volume of data that the Genome Analyzer produces, including multiple data-transfer steps and enormous storage and computational requirements [BioInform 02-15-08].
Steven Jones, head of bioinformatics at the British Columbia Cancer Research Center’s Genome Sciences Center, which currently has five Genome Analyzers, said that providing some computational capacity “adjacent to the instrument” should be of particular interest to departmental labs and core facilities that have limited bioinformatics teams and IT resources.
“If they can be helped somewhat by having a machine right on the device that is going to process the data and not have to worry about moving images around, then I guess they would see some benefit from that,” he said.
“By making the decision point about what you decide to keep or not to keep at the time of the run, you simplify the data management challenges.”
He noted, however, that for bigger research groups that plan to use a large number of Genome Analyzers, “some re-engineering” might be required. “Our concerns would be just in the extra power requirements and cooling requirements to have basically the equivalent of a compute farm in the same room as your DNA sequencing machines,” he said.
The change is also a competitive move for Illumina because it would bring the configuration for the Genome Analyzer closer to that of Applied Biosystem’s SOLiD next-generation sequencer, which comes with three compute nodes and more than 10 terabytes of storage in addition to its instrument-control computer.
And like ABI, which recently signed its first commercial partners under its Software Development Community program [BioInform 02-08-08], Illumina plans to extend its own Illumina Connect partnership program, which it launched last year to help vendors create plug-ins for its BeadStudio software, to include new applications for the Genome Analyzer.
An Illumina spokeswoman said that several partners in the program are developing applications for the Genome Analyzer, including DNAStar, Partek, InforSense, GenomeQuest, and the BioTeam.
Tom Schwei, vice president and general manager of DNAStar, told BioInform that while the firm hasn’t yet “officially” joined Illumina’s program, it has had access to data from the instrument since last summer and recently gained access to paired-end read data that it will use to develop an updated version of its SeqMan Genome Assembler software.
Stockton said that a number of partners in the Illumina Connect program that have already developed applications for the company’s BeadArray platform are working on updates to their software that could be applicable to the Genome Analyzer.
“We’ve seen a lot of vendors who have done applications for [chromatin immunoprecipitation]-on-chip now building tools around ChIP-sequencing, and vendors who have developed tools for both microarray gene expression and [serial analysis of gene expression] building tools for digital gene expression,” he said. “So I think we’re actually benefiting from the legacy of the microarray world.”
Kahn said that one of Illumina’s goals is to ensure that customers can view experimental data from different platforms in the same environment. “One thing that we’ve tried to do particularly well [is support] data sets that span multiple types of experiments and are representative of the kinds of problems that people are trying to address,” he said.
“It could be as simple as, ‘I’m doing a next-gen gene-expression application, and I’d like to compare that with pre-existing gene-expression data so I can make comparisons and know where I’ve learned more and know where there’s concordance,’” Kahn said. “So you want to be able to transition from the previous to the current to the new, and to do this, you need this environment that basically lets you speak all languages.”
In order to support this goal, the company has provided data for some of its Illumina Connect partners from both its microarray platform and the Genome Analyzer. Specifically, Stockton said, the company ran on the sequencer a sample from the MicroArray Quality Control project that it previously ran on its gene-expression microarrays.
“An obvious starting place for a lot of people is to look at the performance characteristics of the two platforms, but also how concordant the data is,” he said.
Stockton said that the company also makes all the source code for the Genome Analyzer pipeline available to academic customers for free and to commercial customers “under a limited license.”
This arrangement has “inspired a lot of developers around the world, especially in the academic community, to write their own tools around the Genome Analyzer platform,” he said. He cited the Wellcome Trust Sanger Center’s MAQ, or Mapping and Assembly with Quality, alignment algorithm as a particularly high-profile example. A list of additional academic software packages for the Genome Analyzer is available here.
“I’m not ashamed to say that at Illumina we certainly don’t have resources that match the aggregate resources of these development projects around the world,” Stockton said. “What we’re seeing is that these projects are largely need-driven, and it’s kind of the ultimate customer-driven development paradigm.”
Schwei said that DNAStar welcomes partnership programs like Illumina Connect, but noted that support for third-party bioinformatics tools currently “varies by instrument company.” In the case of 454, for example, “they don’t have a program like that, and then it becomes more of making sure we’re talking to the right people who can make sure we’re connected to file formats and that sort of thing.”
He noted, however, that instrument vendors may ultimately benefit by working more closely with third-party vendors. “What we’re hearing from the market is that there is a need beyond the capabilities offered by each vendor with their own software,” he said.