By Julia Karow
This article was originally published Feb. 17.
Illumina has shed further light on a consumables and software upgrade it is planning for the HiSeq 2000 sequencer that will almost triple its output — to 600 gigabases per run — and improve coverage of GC-rich regions, while reducing reagent costs for sequencing a human genome at least two-fold.
Users of the HiSeq 1000 platform will see equal performance gains, but will generate half as much data per run as with the HiSeq 2000.
Illumina initially disclosed its plans for the upgrade, which will be available in the spring, at the JP Morgan Healthcare Conference in January (IS 1/18/2011), but provided additional detail during a user meeting at the Advances in Genome Biology and Technology conference this month. At the conference, an Illumina official explained how the company has achieved the boost in performance, and several early-access users spoke about their initial experience with the upgrade.
Part of the increase in data output will come from wider flow cell channels that increase the total imaging area by about 50 percent, from 3 mm2 to 4.2 mm2. Because the overall size of the new flow cells remains constant, they will still be compatible with the existing hardware.
Furthermore, a new TruSeq cBot reagent kit for cluster generation will lead to a "substantial reduction" in GC bias, according to Vincent Smith, director of consumables development at Illumina's UK facility.
Previously, he said, clusters with high GC content would grow more slowly than those low in GC, and at high cluster densities, these would become hard to detect by the imaging software.
But Illumina researchers have been able to increase the growth rate of the GC-rich clusters, he said, thus cutting by tenfold, to 0.8 percent, the fraction of GC-rich sequences that drop out. "We now get excellent coverage of high GC regions in the human genome," he said.
With the new cluster generation kit, users will grow clusters at a density of more than 800,000 per mm2 instead of about 500,000 per mm2. Smith explained that when clusters that grow side by side hit each other, they stop growing rather than merge and retain their sharp boundaries. This enables Illumina to use the entire space on the flow cell and to increase cluster density more than it previously thought possible.
Along with the new cluster kit comes a new TruSeq SBS sequencing reagent kit that contains a new polymerase, called EDP, that incorporates nucleotides more efficiently at high cluster densities. The kit also includes a new scan reagent, called SRE, that helps reduce signal decay. Based on preliminary data, Smith said, 84 percent of bases from a 2x100-base run have a quality value greater than Q30, but the company expects "to be better at launch." With the old reagents and cluster densities, about 87 percent of bases pass Q30.
New versions of Illumina's Sequencing Control Software and Real Time Analysis module will have improved image analysis algorithms that generate more data at higher cluster densities, and increase the fraction of clusters passing filter, Smith said.
Illumina will also provide new versions of its Eland alignment software and Consensus Assessment of Sequence and Variation, or CASAVA, software, which Smith said further improve coverage of the human genome. The new Eland, for example, will allow orphan reads to be aligned.
In total, the improvements increase the yield of a 2x100-base run from about 250 gigabases in 8 days, or 31 gigabases per day, to 650 gigabases in 10 days, or 65 gigabases per day. Overall, coverage of the human genome has improved from 85 percent to 91 percent.
[ pagebreak ]
"We typically get 500 gigabases of perfect reads for a well-defined genome," Smith said.
The company is also continuing to reduce the size of the data footprint, he said. A new sequencing workflow manager, due to be released near the end of the year, will simplify and automate sequencing data generation and analysis, he said, and will enable integration of analysis tools and a LIMS system.
Internally, Illumina has already surpassed the terabase-per-run mark with the HiSeq 2000, by further increasing the cluster density to almost 1,000 per mm2 and by generating 2x150-base reads, Smith said. One particular run yielded 1.3 terabases of data in 14 days, or 80.7 gigabases per day, with 78 percent of reads passing Q30, and 80 percent of reads passing filter.
Early-Access Users Comment
Several Illumina customers have had early access to the HiSeq upgrade, and at least three presented data during the AGBT meeting.
The Washington University Genome Center has completed three flow cells with the upgrade, each yielding between about 250 and 300 gigabases of data, or twice as much as before, according to Elaine Mardis, the center's co-director. This would translate to runs of between 500 gigabases and 600 gigbases with two flow cells. About 87 percent of clusters passed filter, and the error rate for the first read was reduced from 1.3 percent to 0.9 percent. The run time increased from 8 to 10.5 days.
Representation of GC-rich areas has indeed improved with the new cluster generation kits, Mardis said, so certain genes are now covered "much better" than before.
Because generating sequence data is now cheaper, the HiSeq upgrade might "change the paradigm of 30x coverage" for human genome sequencing, she said, and researchers may want to sequence at a higher fold-coverage. Also, the improved representation of the genome and the increased per-read accuracy "will enhance the impact of higher coverage," leading to better variant calling and better ability to detect subpopulations in tumors.
For human genome sequencing, this "may move us into a realm to be able to do clinical diagnostic sequencing without a validation step," she said, which is important because "we are always up against time" in returning results to physicians.
Chad Nusbaum, co-director of the Broad Institute's genome sequencing and analysis program, said that his institute has generated a 534-gigabase run with the HiSeq upgrade, where 81 percent of the reads passed Q30 and the average error was about 1.5 percent.
He said that the new cluster chemistry, combined with improvements to the library prep process developed at the Broad (IS 2/15/2011), "very much improves" the coverage of regions with high GC content.
Like WashU, the Broad is applying its sequencing capacity to characterizing cancer genomes, and "with the greater output, we can sequence thousands of tumors," he said.
Harold Swerdlow, head of sequencing technology at the Wellcome Trust Sanger Institute, called the increased imaging area the "single biggest improvement" of the HiSeq upgrade.
He said that the institute has so far achieved a 520-gigabase run and a 495-gigabases run with the upgrade. Each run took about 8 days, and the error rate was about 1 percent. He said an output of 600 gigabases should be possible.
Have topics you'd like to see covered in In Sequence? E-mail the editor at jkarow [at] genomeweb [.] com.