Sequencing chemistry Pyrosequencing (polymerase-based real-time sequencing-by-synthesis) Sequencing by ligation Polymerase-based sequencing by synthesis; reversible terminators System list price (US) $500,000 $591,000 $430,000 (as of May '07) Ancillary equipment/ computer system included in list price Basic server (can support data assembly and store up to 50 runs) Emuls-O-Matic device: vortexer Hydroshear from Genomic Solutions: shears DNA Covaris S2 system: shears DNA Computer system: Head node: 2 Dual Core processors; 8 GB RAM; dual 750 GB SATA hard drives 3 compute nodes: each 2 Dual Core processors; 8 GB RAM; 80 GB SATA hard drives Storage: 15 SATA hard drives; 11.25 TB total Cluster station: amplifies DNA Paired-end module: for modified sample prep Computer system: Illumina GA: 3.6 GHz Xeon Dual processor; 4 GB RAM; 4x300 GB SCSI, 10K rpm Cluster station: 2.8 GHz processor; 512 MB RAM; 80 GB hard drive Recommended additional computational infrastructure (not included) $3,000 computer to store data and perform analysis off-line (64-bit dual processor, 8 GB RAM, 500 GB hard drive, running Red Hat Enterprise Linux 4 workstation OS. Java 1.5 support also required.) 2 - 3 DVDs/run for data storage None Analysis pipeline server: 8 kernel, 32 GB RAM, 9 TB RAID storage; pre-configured server; automated data transfer; real-time data quality control Recommended additional equipment (not included) Hydroshear apparatus Agilent BioAnalyzer TissueLyser Particle counter (such as Beckman Coulter counter) Vented hood to break emulsions None N/A Data analysis software included Alignment/mapping software (up to 3 gigabase genomes) Assembly software (Newbler) (up to 120 megabase genomes) Software for paired-end sequencing GUI-based software for amplicon variant detection and identification Software to support multiplexing of samples SOLiD Analysis Tools (SAT) SOLiD Experimental Tracking Software (SETS) SOLiD Alignment Browser (SAB) Genome Analyzer Pipeline Software (image calibration and analysis, base calling, alignment) Real-time data analysis during run? Yes (image processing) Yes N/A Consumables cost per run (list price; give range) N/A $3,400 (1 slide) $6,800 (2 slides) $3,000 (as of May '07) Cost per raw gigabase N/A (depends on application and whether including filtered reads only) $2,300 $3,000 (as of May '07) High-quality, filtered bases/run >100 megabases 1.5 - 3 gigabases (single reads)* 2 - 4 gigabases (paired 1.3 gigabases (single reads) >2.6 gigabases (paired reads) Average read length, single reads 250 - 310 bases 35 bases 32 bases Average read length, paired reads 2x110 bases 2x25 bases 2x35 bases Fragment/insert size, paired read libraries 3 kilobases 0.6-10 kilobases N/A Reads/run >400,000 filtered reads 88-132 million (44-66 million per slide) N/A Recommended amount of input DNA For genomic studies, 1 to 5 micrograms 100 nanograms to 20 micrograms, depending on application 100 nanograms to 1 microgram Sample amplification Emulsion PCR Emulsion PCR Cluster amplification Sample prep time (prior to sequence run start) 2 days 7 days 11 hours Instrument run time (maximum read length) 7.5 hours Up to 8 days (single reads) Up to 10 days (paired reads) 3 days (single reads) 6 days (paired reads) Time required for base calling, data transfer after run Approx. 8 hours for base calling Real-time data analysis in color space 8 hours (base calling and data transfer to automated analysis pipeline) Read accuracy >99.5% N/A >98.5% Other data quality metrics Quality scoring for individual base calls planned for Feb. 2008 (will improve single read accuracy) 99.999% at 15x coverage 99.99% at 3x coverage Raw data per run 12 - 15 gigabytes 2 - 5 terabytes image data N/A Subdivisions/run 4 gaskets to subdivide picotiter plate into 2, 4, 8, or 16 regions Two different plate sizes available 1 or 2 slides per run; each slide can be divided to run up to 8 samples 8 channels/slide Bar-coding tags available? 12 unique identifiers for complex samples, 96 identifiers for amplicon resequencing Supporting software for project management available No N/A Applications currently supported with kits/protocols Whole-genome sequencing (microbial genomes and more complex genomes) Metagenomics Viral metagenomics (mutation detection, pathogen discovery) Targeted resequencing using both PCR and sequence capture arrays Gene expression analysis microRNA discovery and screening ChIP-sequencing Ancient DNA studies of complex genomes Whole-genome sequencing Targeted resequencing Gene expression analysis microRNA discovery DNA sequencing Tag profiling (gene expression) Small RNA discovery and analysis ChIP-Seq Peer-reviewed publications >130 0 At least 6 Anticipated system improvements/additions in 2008 Q3 of 2008: 400+ base reads, minimum of 500 megabases per instrument run Run time less than 10 hours Data storage requirements will double to less than 40 gigabytes per run 45-base reads, fragment library 9 gigabases per run DNA input reductions Bar-coding tags N/A Long-term improvements (beyond 2008) End of 2008/early 2009: paired-end sequencing with spacing of ~20kb Continued improvements of read length and throughput N/A N/A SOURCE: Companies.
Name of platform
Genome Sequencer
FLX System
Genome Analyzer System
reads)*
(overall accuracy 99.94% after error correction from 2-base encoding*)
(>2 gigabases of error-free reads in 2.3 gigabase paired-end run)
(primary/secondary data files 100 megabytes)
1 Data provided by vendors (performance as of Jan. 1, 2008)
2 Data based on Illumina GA specification sheet (performance as of October 2007, unless otherwise noted)
* ABI SOLiD: Metrics based on high-quality data as defined by ability to be mapped back to a reference genome with fewer than 3 mismatches.
Next-Gen Sequencers Improve in ’07; Vendors Promise More Gains in 2008
Premium
When it comes to tracking price and performance, next-generation sequencing systems are a moving target.
Since the spring of 2007, when In Sequence last compared the current batch of next-gen sequencers (see In Sequence 5/29/2007), vendors have improved their performance in several ways and promised to make them better and more affordable this year.
Roche/454 GS FLX
Since May 2007, Roche has added several new features to its 454 GS FLX platform. Notably, the company kept its promise to introduce mate pairs with increased read lengths of 110 bases, which improved the 20-base length of earlier paired reads.
Roche also came through with its promise of bar-coding tags, offering 12 unique identifiers for complex samples, and 96 for amplicon resequencing.
This year, the company plans to improve the performance of its system and cut reagent prices. A Roche spokesperson said the price cut “will be significant,” but did not elaborate.
Pricing has been a sore point for users, several of which have complained that sequencing reagents are too expensive compared to other next-gen sequencing platforms (see related story, this issue).
Roche did not disclose current pricing information for reagents, but users have said that reagents for a single run on the GS FLX, which yields about 100 megabases of data, cost around $10,000. By comparison, reagents for ABI’s and Illumina’s systems, which each yield more than 1 gigabase of data, cost less then $3,000.
In addition, sometime in the third quarter, Roche plans to increase single-read lengths, currently between 250 and 300 bases, to between 400 and 500 bases, and plans to increase the system’s output per run five-fold, from 100 megabases to 500 megabases “with the goal of above 1 billion bases,” or one gigabase per run, according to the spokesperson.
The increased yield will cause run time to increase by only one-third, to 10 hours, according to Roche.
Roche has already delivered multiple datasets consisting of these “extra-long reads,” or XLRs, to users. These groups include the US Department of Energy’s Joint Genome Institute and Joe Ecker’s lab at the Salk Institute, which plans to present sequencing data for Arabidopsis at the upcoming Plant and Animal Genome Conference in San Diego, according to the spokesman.
He said Roche is looking for additional early-access partners to test the XLR technology in-house. Early-access partners currently also include Baylor College of Medicine’s Human Genome Sequencing Center, which said last fall that it would increase its fleet of GS FLX instruments to 10 by the end of 2007 (see In Sequence 10/23/2007), he said.
On the sample-prep side, Roche business unit NimbleGen Systems plans to introduce a sequence-capture array service by the end of the first quarter or beginning of the second quarter, according to the spokesman.
Last fall, NimbleGen and collaborators at Baylor published in Nature Methods a description of the technology, which allows users to select and enrich parts of the genome for sequencing (see In Sequence 10/16/2007). The service requires users to send in their samples and specify their genome regions of interest; NimbleGen will send back enriched samples for sequencing.
NimbleGen plans to enable users to order the custom-designed capture arrays for use in their own labs towards the end of the year, the spokesman said.
In late 2008 or early 2009, Roche also plans to launch an improved paired-end sequencing method that will allow users to sequence mate pairs separated by between 12 and 20 kilobases, enabling them to “get through the many large repeat structures in the genome,” according to the spokesman.
The company will continue to improve both read length and throughput of the FLX system beyond 2008, Roche stated.
Illumina’s GA
Illumina’s Genome Analyzer has also improved in performance since last May. Though the company was unable to provide current performance data before deadline, a specification sheet on Illumina’s website from October 2007, the latest data available to In Sequence, said the system can generate more than 2.3 gigabases per paired-end run, more than twice as much as the 1 gigabase it put out in a single-read run five months earlier.
According to the spec sheet, which can be seen here, the system now also enables paired-end sequencing, which was still in early access beta-testing in late October (see In Sequence 10/30/2007). At the time, the company said it planned to commercialize this feature by the end of the year. The spec sheet does not provide information about the spacing between paired reads, but Illumina said in mid-October that it using insert sizes of 200 to 400 bases, and is working on 2-kilobase inserts (see In Sequence 10/16/2007).
Illumina has also said in the past that it plans to increase read length to 50 bases and to introduce bar-coding tags, but the timeline for these improvements remains unclear.
ABI’s SOLiD
ABI‘s SOLiD system made its early-access debut in the summer of 2007 and formally launched in October. As of Jan. 1, the instrument, which can run one or two slides, can provide up to 3 gigabases of single-read data compared with 1 gigabase in May, according to ABI.
In addition, paired-end reads, which were in early access testing in the spring and generated 1 gigabase of data, today yield up to 4 gigabases of data, with insert sizes ranging from 600 bases to 10 kilobases, the company said.
ABI has also reduced the amount of recommended starting material, from between 10 and 30 micrograms in May to between 100 nanograms and 20 micrograms today, depending on the application. Additional cuts are planned for this year, the company said.
ABI has doubled to eight the number of samples that researchers can load onto each of the SOLiD’s two slides. This year, the company plans to introduce bar-coding tags, which would further increase the number of samples per run.
Lastly, ABI this year plans to increase single-read length to 45 bases from 35 bases, and more than double the maximum output per run to 9 gigabases.
Current Performance of Next-Generation Sequencing Systems
Roche/4541
Applied Biosystems1
Illumina2
SOLiD System