Two post-Sanger sequencing technologies have reached the market so far, the third one will follow suit shortly, and an increasing number of potential users is thinking about acquiring one of the new instruments.
These tools promise to deliver hundreds of millions of DNA bases per run for a fraction of the cost of capillary electrophoresis sequencing, but what should users be aware of who are shopping for one?
As part of a pre-meeting workshop entitled “Utilization of New DNA Sequencing Technologies” at last week’s Advances in Genome Biology & Technology conference, held in Marco Island, Fla, Chad Nusbaum and David Jaffe, two researchers at the Broad Institute who have had access to at least two next-generation technologies — 454’s and Illumina’s (formerly Solexa’s) — provided some answers.
The main question on the minds of users is when to buy a next-gen sequencer, and which one to buy, according to Nusbaum, who is a co-director of the genome sequencing and analysis program at the Broad Institute. “It depends on the application,” he said at the workshop. “You have to make sure it’s doing what you want, and at a price that you can afford.”
According to Nusbaum’s estimate, sequencing with 454’s technology costs between four and 10 times less than standard capillary electrophoresis sequencing. But reading DNA on Illumina’s, ABI’s, and Intelligent Bio-Systems’ new platforms is likely to be approximately 100 times cheaper. And sequencing on Helicos’ technology could be as much as 1,000 times less expensive than Sanger sequencing.
While 454’s and Illumina’s technologies are already on the market, ABI has just signed up early-access customers for its next-generation platform, and both IBS and Helicos said they will start beta-testing their instruments by the end of the year.
The platforms’ read lengths, which determine what applications they are most useful for, differ significantly. Also, the shorter reads make the data “fundamentally different,” according to Nusbaum, requiring new analytical tools and techniques. Read lengths range from approximately 25 to 35 base pairs for Illumina’s and ABI’s platforms to approximately 250 base pairs for 454’s. All three platforms now offer paired reads.
Around 80 percent of the human genome can be mapped with 25-mers alone, Nusbaum noted, adding that a 25-mer has “enormously more power than a 15-mer,” and a 30-mer significantly more power than a 25-mer.
Other than the shorter read lengths, probably the greatest challenge for newcomers to next-generation sequencing is the amount of data the new instruments churn out per run. While an ABI 3730 capillary electrophoresis sequencer produces megabytes per run, 454’s instrument gets into gigabytes, and Illumina’s and ABI’s new platforms reach terabytes. “That’s a scary thought,” Nusbaum said.
“The raw data is a challenge,” he said, and researchers will not be able to store it indefinitely. “This is a bit of a shock,” he professed, because genome researchers are “pack rats” who try to store every piece of data.
He said researchers will have to get used to the fact that instead of the raw data, they will need to store information about the individual base calls and their quality, as well as the quality of a run.
In addition, users will need to find ways to filter out the good reads from the bad ones, added David Jaffe, manager of the whole genome assembly group at the Broad.
Jaffe also pointed out that sample preparation becomes more important as reads become shorter, and users need to be aware of errors introduced by, say, PCR.
They also need to consider a variety of costs associated with introducing a next-generation sequencer, which do not only encompass the actual instrument but also reagent costs, personnel costs, as well as the infrastructure required to place the instrument, and cost for data analysis and storage. “With so much data, the analysis is not for free,” Jaffe said.
Basic accounting decisions will also need to be made, as Nusbaum pointed out that the new technologies probably have a shorter life cycle than capillary electrophoresis instruments, which used to be amortized over at least three years. As a result, next-gen users will need to amortize the new machines “a lot quicker than that.”
You have to make sure it’s doing what you want, and at a price that you can afford.
Potential buyers should also estimate how much data they require for the projects they have in mind, Nusbaum said. Assembling a genome, for example, needs more coverage — and thus more data — than profiling or discovering polymorphisms.
Short reads as such should not be a deterrent, according to Jaffe. “In our experience, you can do a lot more than you think” with short reads, he said. However, they need more effort to analyze and maybe even require researchers to develop new computational techniques.
Jaffe recommended that users first generate simulated data with shorter read lengths in order to estimate if the statistical power is sufficient to do certain experiments with them. They should also try to understand the different error properties of data coming off the different platforms.
If possible, users should generate some actual data on an instrument before making a choice, and get advice from other institutions that already have the tool in-house.
But the scientists would not make any endorsements for a specific platform. “We don’t know which one is going to be the best,” Nusbaum said. That, he said, depends on the specific applications.
In a panel discussion at the end of the workshop, though, he acknowledged that 454’s instrument was “mature enough” and had reached a “threshold of hardiness” in terms of support and protocols that would allow it to enter core facilities.