Skip to main content
Premium Trial:

Request an Annual Quote

Next-Gen Sequencers Improve in ’07; Vendors Promise More Gains in 2008

Premium

When it comes to tracking price and performance, next-generation sequencing systems are a moving target.
 
Since the spring of 2007, when In Sequence last compared the current batch of next-gen sequencers (see In Sequence 5/29/2007), vendors have improved their performance in several ways and promised to make them better and more affordable this year.
 
Roche/454 GS FLX
 
Since May 2007, Roche has added several new features to its 454 GS FLX platform. Notably, the company kept its promise to introduce mate pairs with increased read lengths of 110 bases, which improved the 20-base length of earlier paired reads.
 
Roche also came through with its promise of bar-coding tags, offering 12 unique identifiers for complex samples, and 96 for amplicon resequencing.
 
This year, the company plans to improve the performance of its system and cut reagent prices. A Roche spokesperson said the price cut “will be significant,” but did not elaborate.
 
Pricing has been a sore point for users, several of which have complained that sequencing reagents are too expensive compared to other next-gen sequencing platforms (see related story, this issue).
 
Roche did not disclose current pricing information for reagents, but users have said that reagents for a single run on the GS FLX, which yields about 100 megabases of data, cost around $10,000. By comparison, reagents for ABI’s and Illumina’s systems, which each yield more than 1 gigabase of data, cost less then $3,000.
 
In addition, sometime in the third quarter, Roche plans to increase single-read lengths, currently between 250 and 300 bases, to between 400 and 500 bases, and plans to increase the system’s output per run five-fold, from 100 megabases to 500 megabases “with the goal of above 1 billion bases,” or one gigabase per run, according to the spokesperson.
 
The increased yield will cause run time to increase by only one-third, to 10 hours, according to Roche.
 
Roche has already delivered multiple datasets consisting of these “extra-long reads,” or XLRs, to users. These groups include the US Department of Energy’s Joint Genome Institute and Joe Ecker’s lab at the Salk Institute, which plans to present sequencing data for Arabidopsis at the upcoming Plant and Animal Genome Conference in San Diego, according to the spokesman.
 
He said Roche is looking for additional early-access partners to test the XLR technology in-house. Early-access partners currently also include Baylor College of Medicine’s Human Genome Sequencing Center, which said last fall that it would increase its fleet of GS FLX instruments to 10 by the end of 2007 (see In Sequence 10/23/2007), he said.
 
On the sample-prep side, Roche business unit NimbleGen Systems plans to introduce a sequence-capture array service by the end of the first quarter or beginning of the second quarter, according to the spokesman.
 
Last fall, NimbleGen and collaborators at Baylor published in Nature Methods a description of the technology, which allows users to select and enrich parts of the genome for sequencing (see In Sequence 10/16/2007). The service requires users to send in their samples and specify their genome regions of interest; NimbleGen will send back enriched samples for sequencing.
 
NimbleGen plans to enable users to order the custom-designed capture arrays for use in their own labs towards the end of the year, the spokesman said.
 
In late 2008 or early 2009, Roche also plans to launch an improved paired-end sequencing method that will allow users to sequence mate pairs separated by between 12 and 20 kilobases, enabling them to “get through the many large repeat structures in the genome,” according to the spokesman.
 
The company will continue to improve both read length and throughput of the FLX system beyond 2008, Roche stated.
 
Illumina’s GA
 
Illumina’s Genome Analyzer has also improved in performance since last May. Though the company was unable to provide current performance data before deadline, a specification sheet on Illumina’s website from October 2007, the latest data available to In Sequence, said the system can generate more than 2.3 gigabases per paired-end run, more than twice as much as the 1 gigabase it put out in a single-read run five months earlier.
 
According to the spec sheet, which can be seen here, the system now also enables paired-end sequencing, which was still in early access beta-testing in late October (see In Sequence 10/30/2007). At the time, the company said it planned to commercialize this feature by the end of the year. The spec sheet does not provide information about the spacing between paired reads, but Illumina said in mid-October that it using insert sizes of 200 to 400 bases, and is working on 2-kilobase inserts (see In Sequence 10/16/2007).
 
Illumina has also said in the past that it plans to increase read length to 50 bases and to introduce bar-coding tags, but the timeline for these improvements remains unclear.
 
ABI’s SOLiD
 
ABI‘s SOLiD system made its early-access debut in the summer of 2007 and formally launched in October. As of Jan. 1, the instrument, which can run one or two slides, can provide up to 3 gigabases of single-read data compared with 1 gigabase in May, according to ABI.
 
In addition, paired-end reads, which were in early access testing in the spring and generated 1 gigabase of data, today yield up to 4 gigabases of data, with insert sizes ranging from 600 bases to 10 kilobases, the company said.
 
ABI has also reduced the amount of recommended starting material, from between 10 and 30 micrograms in May to between 100 nanograms and 20 micrograms today, depending on the application. Additional cuts are planned for this year, the company said.
 
ABI has doubled to eight the number of samples that researchers can load onto each of the SOLiD’s two slides. This year, the company plans to introduce bar-coding tags, which would further increase the number of samples per run.
 
Lastly, ABI this year plans to increase single-read length to 45 bases from 35 bases, and more than double the maximum output per run to 9 gigabases.
 
 
Current Performance of Next-Generation Sequencing Systems

 

Roche/4541
Applied Biosystems1
Illumina2
Name of platform Genome Sequencer
FLX System
SOLiD System
Genome Analyzer System

Sequencing chemistry

Pyrosequencing (polymerase-based real-time sequencing-by-synthesis)

Sequencing by ligation

Polymerase-based sequencing by synthesis; reversible terminators

System list price (US)

$500,000

$591,000

$430,000 (as of May '07)

Ancillary equipment/ computer system included in list price

Basic server (can support data assembly and store up to 50 runs)

Emuls-O-Matic device: vortexer

Hydroshear from Genomic Solutions: shears DNA

Covaris S2 system: shears DNA

Computer system: Head node: 2 Dual Core processors; 8 GB RAM; dual 750 GB SATA hard drives

3 compute nodes: each 2 Dual Core processors; 8 GB RAM; 80 GB SATA hard drives

Storage: 15 SATA hard drives; 11.25 TB total

Cluster station: amplifies DNA

Paired-end module: for modified sample prep

Computer system: Illumina GA: 3.6 GHz Xeon Dual processor; 4 GB RAM; 4x300 GB SCSI, 10K rpm

Cluster station: 2.8 GHz processor; 512 MB RAM; 80 GB hard drive

Recommended additional computational infrastructure (not included)

$3,000 computer to store data and perform analysis off-line (64-bit dual processor, 8 GB RAM, 500 GB hard drive, running Red Hat Enterprise Linux 4 workstation OS. Java 1.5 support also required.)

2 - 3 DVDs/run for data storage

None

Analysis pipeline server: 8 kernel, 32 GB RAM, 9 TB RAID storage; pre-configured server; automated data transfer; real-time data quality control

Recommended additional equipment (not included)

Hydroshear apparatus

Agilent BioAnalyzer

TissueLyser

Particle counter (such as Beckman Coulter counter)

Vented hood to break emulsions

None

N/A

Data analysis software included

Alignment/mapping software (up to 3 gigabase genomes)

Assembly software (Newbler) (up to 120 megabase genomes)

Software for paired-end sequencing

GUI-based software for amplicon variant detection and identification

Software to support multiplexing of samples

SOLiD Analysis Tools (SAT)

SOLiD Experimental Tracking Software (SETS)

SOLiD Alignment Browser (SAB)

Genome Analyzer Pipeline Software (image calibration and analysis, base calling, alignment)

Real-time data analysis during run?

Yes (image processing)

Yes

N/A

Consumables cost per run (list price; give range)

N/A

$3,400 (1 slide)

$6,800 (2 slides)

$3,000 (as of May '07)

Cost per raw gigabase

N/A (depends on application and whether including filtered reads only)

$2,300

$3,000 (as of May '07)

High-quality, filtered bases/run

>100 megabases

1.5 - 3 gigabases (single reads)*

2 - 4 gigabases (paired
reads)*

1.3 gigabases (single reads)

>2.6 gigabases (paired reads)

Average read length, single reads

250 - 310 bases

35 bases

32 bases

Average read length, paired reads

2x110 bases

2x25 bases

2x35 bases

Fragment/insert size, paired read libraries

3 kilobases

0.6-10 kilobases

N/A

Reads/run

>400,000 filtered reads

88-132 million (44-66 million per slide)

N/A

Recommended amount of input DNA

For genomic studies, 1 to 5 micrograms

100 nanograms to 20 micrograms, depending on application

100 nanograms to 1 microgram

Sample amplification

Emulsion PCR

Emulsion PCR

Cluster amplification

Sample prep time (prior to sequence run start)

2 days

7 days

11 hours

Instrument run time (maximum read length)

7.5 hours

Up to 8 days (single reads)

Up to 10 days (paired reads)

3 days (single reads)

6 days (paired reads)

Time required for base calling, data transfer after run

Approx. 8 hours for base calling

Real-time data analysis in color space

8 hours (base calling and data transfer to automated analysis pipeline)

Read accuracy

>99.5%

N/A
(overall accuracy 99.94% after error correction from 2-base encoding*)

>98.5%

Other data quality metrics

Quality scoring for individual base calls planned for Feb. 2008 (will improve single read accuracy)

99.999% at 15x coverage

99.99% at 3x coverage
(>2 gigabases of error-free reads in 2.3 gigabase paired-end run)

Raw data per run

12 - 15 gigabytes

2 - 5 terabytes image data
(primary/secondary data files 100 megabytes)

N/A

Subdivisions/run

4 gaskets to subdivide picotiter plate into 2, 4, 8, or 16 regions

Two different plate sizes available

1 or 2 slides per run; each slide can be divided to run up to 8 samples

8 channels/slide

Bar-coding tags available?

12 unique identifiers for complex samples, 96 identifiers for amplicon resequencing

Supporting software for project management available

No

N/A

Applications currently supported with kits/protocols

Whole-genome sequencing (microbial genomes and more complex genomes)

Metagenomics

Viral metagenomics (mutation detection, pathogen discovery)

Targeted resequencing using both PCR and sequence capture arrays

Gene expression analysis

microRNA discovery and screening

ChIP-sequencing

Ancient DNA studies of complex genomes

Whole-genome sequencing

Targeted resequencing

Gene expression analysis

microRNA discovery

DNA sequencing

Tag profiling (gene expression)

Small RNA discovery and analysis

ChIP-Seq

Peer-reviewed publications

>130

0

At least 6

Anticipated system improvements/additions in 2008

Q3 of 2008: 400+ base reads, minimum of 500 megabases per instrument run

Run time less than 10 hours

Data storage requirements will double to less than 40 gigabytes per run

45-base reads, fragment library

9 gigabases per run

DNA input reductions

Bar-coding tags

N/A

Long-term improvements (beyond 2008)

End of 2008/early 2009: paired-end sequencing with spacing of ~20kb

Continued improvements of read length and throughput

N/A

N/A

SOURCE: Companies.
1 Data provided by vendors (performance as of Jan. 1, 2008)
2 Data based on Illumina GA specification sheet (performance as of October 2007, unless otherwise noted)
* ABI SOLiD: Metrics based on high-quality data as defined by ability to be mapped back to a reference genome with fewer than 3 mismatches.

The Scan

Should've Been Spotted Sooner

Scientists tell the Guardian that SARS-CoV-2 testing issues at a UK lab should have been noticed earlier.

For Martian Fuel

Researchers have outlined a plan to produce rocket fuel on Mars that uses a combination of sunlight, carbon dioxide, frozen water, cyanobacteria, and engineered E. coli, according to Gizmodo.

To Boost Rapid Testing

The Washington Post writes that new US programs aim to boost the availability of rapid at-home SARS-CoV-2 tests.

PNAS Papers on Strawberry Evolution, Cell Cycle Regulators, False-Positive Triplex Gene Editing

In PNAS this week: strawberry pan-genome, cell cycle-related roles for MDM2 and MDMX, and more.