There was no shortage of conversations in James Carrington’s Center for Genome Research and Biocomputing when it came to next-generation sequencers. Director of the center at Oregon State University, Carrington says the process of considering whether to wait, keep outsourcing to a service facility, or buy a next-gen sequencer for the center’s core lab was a painstaking one.
It’s a dilemma everyone is having, and it is characteristic of such a young field that technology improvements come fast and furious. Carrington knew that any instrument brought into the lab “may be state of the art today, but it may not be state of the art in six months,” he says. “We had to go through this internal deliberation. If we wait a little bit longer, the technology will improve, or a new technology will be on the market. But you counter that with, OK, we need the data today.”
If that debate sounds familiar, you are likely one of the prospects for companies with commercial instruments — namely, Roche through its pending acquisition of 454 Life Sciences and Illumina through its purchase of Solexa — or companies with anticipated platforms. As the field of potential users explodes thanks to new applications for these technologies and as more instruments come to market, competition is only going to get more fierce.
So what’s the right answer for you? What follows is feedback from users of the current platforms, as well as updates on the latest applications enabled by next-gen sequencers.
With its instrument launched in 2005, 454 Life Sciences secured first-mover advantage in the next-gen sequencing field. A marketing partnership with, and later acquisition into, Roche Applied Sciences helped move the technology into labs at a rapid pace. Today, some of the instruments — the first generation, GS 20, and its latest iteration, the GS FLX — have become trusted enough to make it into the production lines at major sequencing centers.
One of those is at the Joint Genome Institute, where Feng Chen says the GS 20’s status as a production sequencer means “you know it works well.” Chen says JGI also has the newer FLX instrument, which the team has been running for a little over a month. His group consistently sees read lengths of 250 bases, and gets about 90 megabases per run from the FLX.
At the Baylor Human Genome Sequencing Center, Donna Muzny says there are two FLX and one GS 20 machine installed. Her team regularly sees 240-base reads and gets “over 100 megabases” from each run on the FLX, she says.
Those longer read lengths open the door for a number of applications. For “de novo sequencing,” Chen says, “454 is probably the only choice.” At JGI and at Baylor, much of the work on this platform goes into sequencing microbial genomes. Like many scientists in the community, Chen and Muzny both say they are working to find the sweet spot of hybrid assemblies that will optimize a mix of Sanger and 454 reads for these genomes.
Other applications aren’t far behind genome sequencing. Chen says his group is looking into cDNA and EST sequencing, while Muzny says amplicon sequencing and SNP detection are being put to the test on Baylor’s 454 machines. At Stanford, Mostafa Ronaghi says he has used the GS 20 instrument for “targeted disease gene sequencing and metagenomics [as well as] diagnostic sequencing, … microRNA sequencing, and also expression profiling.”
Solexa began shipping instruments last year, and users of that system — now officially the Illumina 1G — have had a range of experiences getting the machines up and running satisfactorily.
At JGI, Chen says after five months with the platform his team is still having “sporadic problems” and is working closely with Illumina to try to parse out what’s going on. Problems have cropped up in signal intensity, for example. The machine is churning out “300 million to 600 million bases per run when it’s in a good mood,” Chen says. But that’s not to say the machine is always cranky: “When it works, it works great,” he notes.
Others have had smoother introductions to their 1Gs. Marco Marra at the BC Cancer Research Centre says a “very positive experience” with the early-access 1G his team installed in November was what led him to purchase two more machines.
At Carrington’s center, the 1G is what his team decided to purchase after all the debating. “We needed a high-throughput sequencing platform that would serve a lot of different needs,” he says. “The best available technology that came out of our assessment was the Illumina 1G. That was based on the amount of data that can be generated, the cost per data point, the ease of operation, and the modest setup requirement. It’s right now the lowest cost per nucleotide for commercially available instruments.” Purchasing the instrument meant building a consortium of researchers around Oregon State, Carrington says; by the time a grant came through a funding agency, he notes, the technology would have been out of date. Faculty members contributed money from grants, equipment reserves, and parts of startup packages to cobble together the funding needed to buy the instrument.
Because the reads on the 1G are so much shorter than those on the 454, scientists tend to use the Illumina instrument for different types of applications. Among the usual — resequencing, SNP detection, genome comparisons, and gene expression — new uses have emerged, including ChIP-based transcription factor studies as well as small RNA analysis. Marra says in his group, ChIP is the heaviest use so far for the machine (that includes mapping transcription factor binding and histone sites). In Carrington’s group, profiling small RNAs is a major use and is particularly well suited to the system’s short reads.
The tidal wave associated with next-gen sequencer data generation is especially noteworthy for the 1G, which churns out about 700 GB of raw data per run, according to Elaine Mardis at the Washington University Genome Sequencing Center. That compares to about 30 GB per run on the 454 platform, she says.
There’s little to report about the Applied Biosystems instrument, which came through its acquisition of Agencourt Personal Genomics last year. The first early-access machines are expected to ship this month and instruments should be widely available as of October, according to ABI’s Gina Costa.
The system, known as SOLiD, is expected to produce 2 gigabases to 4 gigabases of sequence per run — generating about 6 terabytes of raw data — and a complete run will take about a week and a half.
So far, some scientists in the community have been able to work with sequence data generated for them on the SOLiD platform by ABI. JGI’s Chen says that “the data looks pretty good” and that he expects to use the instrument for SNP detection, for one.
Sequencers At a Glance
At this year’s annual meeting of the Association of Biomolecular Resource Facilities, Washington University’s Elaine Mardis presented data comparing the sequencing platforms used at her genome center. The table below includes data from her presentation. NB: All costs are fully loaded. Platform costs are comprehensive and include additional equipment automation required for the 454 instrument. Also, cost per megabase does not take into account the amount of sequencing needed to complete a genome on any of the platforms.
- Platform cost: $500,000
- Read length: 250 bp
- Cost per run: $16,000
- Megabases per day: 200
- Cost per megabase: $160
- Platform cost: $395,000
- Read length: 40-50 bp
- Cost per run: $5,000
- Megabases per day: 333
- Cost per megabase: $5
- Platform cost: $350,000
- Read length: 650+ bp
- Cost per run: $55
- Megabases per day: 1.4
- Cost per megabase: $880
Source: Elaine Mardis
• Read length. The newer GS FLX system routinely gets 250-base reads
• High accuracy. Stanford’s Ronaghi says that 454 accuracy “has a lower error rate than I expected”; his team is getting about “99.5 percent accuracy.”
• Instrument time. Run times of “just a few hours” on the instrument mean getting to the data-crunching phase faster, says Oregon State’s Carrington, than with the other platforms.
• High-maintenance system and contamination risks. Mitchell Sogin at the Marine Biological Laboratory at Woods Hole says he remembers that early data he received from 454’s service center included sequences not from his sample. “Cross-contamination of PCR products is a potential Achilles heel for the system,” he says. The 454 platform requires an environment up to the standards of a clean room; Wash U’s Mardis even factors things like disposable lab coats into the cost of running the instrument. The sequencing process has to take place in several separate rooms.
• Outsourcing costs. Using the 454 service center instead of buying an instrument meant significant price jumps over time, says Carrington. Lately the average run has cost his lab about $11,000.
• Additional equipment required. Users of the 454 platform must already have or buy other instruments to work with the machine, and those can run up to $100,000.0
• Data quality. “The data quality is, in our experience, definitely very good,” says Marra.
• Template flexibility. Thanks to the robust nature of the platform’s molecular biology, Marra says, “we can feed all manner of sequencing templates onto the thing and it just eats them up.”
• Software. Open-source software may be a better fit in this community than the proprietary program that some users of 454 have complained about.
• Cost. On a per-nucleotide basis, users say, you can’t beat the 1G’s cost.
• Read length. In the neighborhood of 40 bases, the Illumina platform is well suited to some applications but can’t handle those that require reads of at least 100 or 200 bases.
• Instrument time. The actual run time on the 1G is a solid three days, says Carrington. Compare that to a few hours on the 454 platform, which, he points out, gives 454 users an advantage in getting to the data analysis.
• Consistency. JGI’s Chen says his team has “sporadic problems” and that “the system itself is not mature and not consistent.”
• Data flood. The system generates some 700 GB of raw data per run, according to results from Mardis at Wash U.