By Meredith W. Salisbury
Scientists who remember the first sequencing boom say that current excitement in the field is higher than it’s ever been — and that doesn’t seem to be an exaggeration. At a sequencing technology session held at the annual Association for Biomolecular Resource Facilities meeting earlier this year, Baylor’s Donna Muzny was surprised to find herself addressing a standing-room-only crowd. Later, she realized what packed them in: word had gotten out that she was going to talk about results her team had gotten from their 454 machine.
“I haven’t seen excitement like this, even in the Human Genome Project,” says Kevin McKernan, CSO of Agencourt, part of NHGRI’s large-scale sequencing network. “The pace of development and change is nerve-racking.”
Sure, people have been talking about next-generation sequencing technology for years now — to the point where “$1,000 genome” is all but a household term. But the real shift came last year, when first-mover 454 planted its flag and started selling its high-throughput sequencing instrument. Thanks in part to a licensing deal with Roche Diagnostics, which sells the instrument as the Genome Sequencer 20 System, by the end of 2005 there were 20 such instruments placed in labs.
The move from vaporware to real, commercial machine has gotten the scientific community moving with blazing speed. Researchers are not only making inroads in the best techniques to use on the machine, but are also coming up with new applications for it and creative ways to address the instrument’s shorter reads. (For more on what 454 customers are saying about the instrument, see p. 25.) With Solexa aiming to release its machine this year and at least two more competitors, Agencourt Personal Genomics and Helicos BioSciences, looking to hit the market by 2007, scientists soon will have no shortage of technologies to test-drive.
In this overview of the latest trends in the sequencing field, you’ll get a sense of the applications these tools can tackle (gene expression, genotyping, and EST projects, to name just a few) — as well as some of the questions facing the community right now, such as whether scientists will find use for reads as short as 25 bases. While much of this remains up in the air for the time being, one thing seems clear: Sanger sequencing has nothing to fear. For now.
‘Short Little Reads’
One of the primary differences with new sequencing technology is that it produces reads far shorter than what scientists have become used to with their capillary instruments. The 1,000-base read is a twinkle in the eye of these startup companies; with 100-base runs, 454’s instrument is head and shoulders above what potential competitors are able to get so far. David Bentley, CSO of Solexa, says his company’s technology is “knocking on the door” of routinely getting 50-base reads. Helicos and Agencourt Personal Genomics, which is designing an instrument based on the polony technology from George Church’s lab, both currently run in the neighborhood of 25 bases.
In a field used to getting longer and longer reads, this has sparked an accuracy-versus-read-length debate. Companies producing the shorter-read technology argue that with their higher accuracy per base, read length isn’t necessarily a factor. One reason for that is the focus for many of these vendors on the resequencing, rather than the de novo sequencing, market. Bentley says even with 25-base reads, the Solexa technology covered 80 percent of the human genome; McKernan at Agencourt predicts that 50-mers will be sufficient for human genome sequencing.
454 users have found the 100-base reads enough for de novo sequencing of small genomes, like microbes. Though Helicos, Agencourt, and Solexa have announced completed genome or BAC sequences, many researchers remain suspicious of anything with reads in the 25- or 50-base neighborhood. “I don’t know how these other machines are going to fit in with their short little reads,” says Bruce Roe, director of the Advanced Center for Genome Technology at the University of Oklahoma.
Vendors contend that advances like paired-end reads, the melt-and-resequence approach used by Helicos (sequencing, melting off the read, and sequencing the same stretch again), improved biochemistries, and high-quality base-calling will give their reads a level of accuracy that will eliminate concerns about read length. Still, people like Helicos President and CEO Stan Lapidus ask users to remember that these technologies are in their earliest stages. “In the early days of Sanger sequencing, it read 25 bases,” he says. “The sequencing-by-synthesis technologies as a group are still very young. … As the performance gets better, this whole question of read length will kind of go away.”
Jeff Schloss, a program director at NHGRI who oversees the $1,000 genome and $100,000 genome grantees, takes the pragmatic view that shorter reads are better than no reads. “You need as long a read length as you can get and as high a data quality as you can get,” he says. “But on the way there, several of these technologies should still be useful.”
Many 454 users have adjusted to the shorter reads by supplementing them with their old standby technologies. Groups like Susanne Goldberg’s at the J. Craig Venter Institute and Feng Chen’s team at the Joint Genome Institute have performed early studies blending Sanger and 454 sequence. Sanger is used at low coverage to provide a long-read scaffold from which to hang 454 reads.
Applied Biosystems CSO Dennis Gilbert says this is a way to get the best of both kinds of technology. Capillary sequencing provides long reads, and next-gen platforms supply “lots of data to populate that,” he says.
Roe at Oklahoma says his team has found 3x to 5x Sanger combined with 15- to 17-fold coverage on a 454 “optimal” for sequencing microbial genomes in the 2 megabase range.
“When you put 454 reads with a Sanger assembly, you do see reduced gaps,” says Chen at JGI, adding that his team hopes to save money in the finishing stage with this approach.
Many other groups are working to find their own sweet spot for the Sanger/454 combo approach, and the concept has served to ease a number of researchers into the concept of using such short reads — which will likely prove a much-needed step for competitors with even shorter reads to get into the market. “This seems the best way to incorporate very large numbers of potentially very cheap reads into a product like one we’re used to seeing,” says Schloss. “This idea of merging data from different technologies that may be parallel or orthogonal I would hope is something we’ll see more and more of.”
Enabling New Apps
As the instruments have started to make their way into labs, scientists have used them for all sorts of applications that haven’t in the past relied on sequencing. “The short reads are really good for gene expression,” says Roe at Oklahoma. His team is getting ready to do some EST projects with the 454, too. Bill Spencer, director of worldwide system sales at 454, says in addition to SAGE and ESTs, customers have also been working with ditags.
At Penn State, Stephan Schuster has been using his 454 for cDNA and sequence-based transcriptomics projects in addition to the metagenomics and ancient genomics — he led the sequencing of the woolly mammoth — that have brought his lab into the spotlight.
Stan Lapidus points to gene expression, genotyping, and RNA expression as applications likely to get a boost from low-cost, high-throughput sequencing technologies. Chip technology isn’t sensitive enough to detect “very small differences in gene expression,” he points out. His team is working in particular on digital RNA expression for the Helicos instrument.
Spencer at 454 says customers have expressed interest in amplicon sequencing “because you can do high depth of coverage, which enables detection of low-frequency mutations.”
Bentley says the Solexa crew is also working to make the instrument compatible with these other applications. “In our case, if you start with 1/100th the cost of existing technology, then you start to really change the way people can acquire data,” he says.
Lapidus predicts that the scientific community will latch onto this new crop of instruments and put them to use in ways that people aren’t even imagining today. “Bit by bit, it’s going to change and people are going to say, ‘How did we ever do molecular biology without these tools?’” he says. “The pace of next-generation sequencing will in retrospect appear to have been remarkable.”
454 Feedback: The Users Speak
As the first instrument on the market by a solid year, 454 has a significant head start in the next-gen sequencing field. With 20 instruments placed in labs by the end of 2005, it looks like 454 and partner Roche Diagnostics have hit their stride.
But what do customers think of the final product? By and large, the users who spoke with GT were quite positive about the 454 instrument. Bruce Roe of the University of Oklahoma, who has had a machine for four months and has done a good amount of microbial sequencing with it, says with 15- to 20-fold oversampling, “the accuracy’s fantastic.”
Most customers GT interviewed agreed that they were routinely getting reads on the order of 100 bases and generating between 35 million and 50 million reads per run. The cost, according to Donna Muzny, who oversees the 454 group at Baylor’s genome center, is “just under that of Sanger sequencing.” (Onlookers expect that price to fall as competitors enter the market.) In general, a microbial genome takes at least 15-fold coverage and an average of three days to sequence in full.
As for ease of use, “the technology is much more demanding than any of us would have thought,” says Stephan Schuster at Penn State, who got his 454 last June. Muzny says her team encountered a few bumps at first, but chalked that up partly to the DNA sample they chose to start with. Now most projects are going fairly smoothly. Still, she says, using the instrument is “a long procedure, so there are some training issues.” Roe points out that the upfront chemistry is not automated, so it may take some effort to get that step working properly.
These comments beg the question: would it be better to use 454’s sequencing service center than to buy your own instrument? That’s a sore spot for Schuster, who calls it an “area of grievance” for people who spent the time and money to bring a machine into their lab. “Either you’re a vendor or a service provider, but you cannot be both,” he says. But Bill Spencer at 454 says having the service center “has been key in helping us further develop applications” for customers.
Because of the instrument’s short reads, it is well known to have trouble dealing with longer repeat regions. In a paper to be published shortly in PNAS, Schuster and collaborator Greg Velicer sequenced a bacterium, evolved it for 1,000 generations, and then resequenced it. When comparing the first sequence with a reference sequence, Schuster says, “There were as many as 1,000 cases where the 454 would wrongly call a homopolymer.” He adds, though, that the 454 sequence was still accurate enough that Schuster’s team found seven mistakes in the TIGR reference sequence that were later verified and corrected. Also, he says that in his experience, sequencing quality diminishes toward the end of a contig.
Much like the early days of capillary instruments, the 454 has been widely criticized for the software packaged with it. (A new version of the assembler is expected out shortly.) Many groups, like Schuster’s at Penn State, have written their own base-calling or data analysis software. Muzny says her team has “opted to see how far 454’s going to take their base-calling software” before doing any in-house development. She also hopes to see 454 be more open in the future about access to the raw data. “It’s pretty canned now,” she says. “You do a run and you can say these two runs I want to do an assembly on, but you can’t really parse it out to do more delicate or sorting exercises with the data.”
Roe says the “scary thing” about 454 experiments is the high cost of each run on the machine — which also serves to prevent many customers from being able to do the kind of technology tweaking they’re used to with machines like ABI capillary instruments. Running an ABI gel might cost $50 or $100, Roe says, but running the 454 costs about $5,000 “just in chemicals.” Still, he calls the machine “the greatest thing since sliced bread.”
Still, there’s enough buzz around 454 for Cold Spring Harbor Laboratory to have launched a workshop around it. To be held for the first time this June and led by Dick McCombie, Elaine Mardis, John McPherson, and Greg Hannon, the lab will host about 15 people using loaner machines from 454. “It’s a combination of teaching people how to do it and also getting people together to think about how best to use the instrument,” McCombie says.
But despite the excitement surrounding 454 technology, customers don’t seem wedded to the brand yet. Those who spoke with GT said they were eager to evaluate the other platforms as they’re released. “We’re going to take a look at Solexa and any other instrument that comes out,” says Muzny.
“It’s extremely important that competitors come in,” says Schuster. “Customers will wind up with the lowest cost and the best technology.”