"We like to keep the bottleneck at the DNA instrumentation level," says Mike Metzker, head of sequencing production at Baylor College of Medicine's Human Genome Sequencing Center in Houston, Texas. After all, the center owns some $25 million worth of ABI 3700 sequencing instruments, so "as long as the bottleneck is sequencing, then we're doing our job, because that's the most expensive step," Metzker explains.
As it turns out, costly, high-throughput DNA analysis machines — the very instruments that launched the revolution in genome sequencing when they were introduced three-and-a-half years ago — are now spurring advances in lab robotics and protocols. To keep each of their quarter-of-a-million-dollar sequencing instruments from standing idle, major genome facilities are figuring out how to keep them fed by optimizing each upstream step in the sequencing process.
Steven Hamilton, a lab robotics consultant who once directed automation and information services for Amgen, says that the engineers developing robotics for genomics have long tended "to be on the cutting edge for lab automation." That's because "as you energize one part [of a process] you create a bottleneck in another part," Hamilton says. "The challenge is to keep the process balanced."
To be sure, one of the better-publicized robotics developments in genomics has been the Whitehead Institute Center for Genome Research's use of magnetic beads to capture and wash DNA molecules, and a plate-track robot to pick them for sequencing. The Whitehead innovations (reported in GT's Model.org column in April 2001) not only helped the center to pump more than 1 billion bases through its fleet of sequencing instruments, but enabled it to make a 16-fold reduction in reagent consumption and save tens of millions of dollars a year.
Lately, other genome sequencing facilities have been reporting results of their own ramp-ups: Washington University adopted the Whitehead techniques, for instance. And DOE's Joint Genome Institute reengineered its entire lab last summer: it converted to a 384-well format, installed 21 of the new MegaBACE 4000 instruments, and began utilizing rolling circle amplification for sample prep. Martin Pollard, JGI's instrumentation group leader, says the lab is now sequencing about 40 million base pairs per day and getting read lengths of between 630 and 650 bases and pass rates in the range of 92 to 95 percent. Just a year ago, he says, the lab was generating about 25 million base pairs a day when "read lengths of 500 and pass rates of 80 percent were acceptable," he says.
Someday, sequencing a genome will cost as little as tens of thousands dollars instead of a hundred million dollars, predicts Richard Gibbs, director of the Baylor sequencing center. "To get there," Gibbs says, "we need a lot of sequencing reads, but we also need much more clever ways to do that work and put it together."
Gibbs' lab is among those now achieving longer reads and higher pass rates than ever: Entire-process pass rates, from picking to loading to sequencing, are at 85 percent, and read lengths are consistently approaching 600 at Baylor these days. Here's how they're doing it
RATS AND DOUBLE RATS
To sequence the whole rat genome, Baylor is using what he calls a "mixed strategy" of whole-genome and BAC by BAC sequencing. The rat project — a collaborative effort among Baylor, Celera, Genome Therapeutics, TIGR, and the University of British Columbia funded by the National Heart, Lung, and Blood Institute and the National Human Genome Research Institute — is the first to employ the approach, and Gibbs says he believes it is a new model for genome sequencing.
The combined method is saving money "because when we're able to use the individual BACs in combination with the whole genome concept, we don't need hugely deep coverage to get good assembly," Gibbs says. Indeed, as opposed to the 10X coverage that Celera generated on the human genome with its whole-genome shotgun method, Gibbs says that 1X of BAC coverage and 6X of whole genome "should give us plenty for as good an assembly as we might expect with just whole-genome coverage at much greater depth."
Baylor has set out to produce 25,000 BAC libraries in a 16-month period and the lab is now achieving a "staggering" 500 BAC shotguns per week, Gibbs says. For comparison's sake, he points out that Baylor generated 2,000 BACs over the several-year course of the Human Genome Project and that five genome centers together produced between 20,000 and 30,000 BACs in that time. Already, the two-year rat project is ahead of schedule: Having produced more than the planned 4X coverage in just over one year, Gibbs expects to achieve 7X coverage before the funding runs out.
Asked why no one had taken the mixed approach before now, Gibbs explains: "In the beginning we were completely wedded to the clone-by-clone approach. It was really Celera that said a whole-genome approach with no underlying BAC sequences would be enough to drive assembling a large genome. Then the mouse and the rat projects launched with the underlying idea that there would be some amalgam of the two methods. But in the case of the mouse, they aggressively embarked on a whole genome strategy. The rat distinguished itself by being a real-time use of the combined methodology."
To simplify the process of making BAC libraries, Baylor is trying out "DNA pooling," a process that Gibbs says is antithetical to the training every molecular biologist gets, which says that to understand DNA, you must fractionate it. "Instead of doing each BAC one by one, you do work with individual BACs but you put them in a set of pools."
Pooling works like this: One BAC is put into each well on a robot column grid. The BACs in each column are mixed together, and then the same is done separately with all the BACs in each row. Old-fashioned approaches would make 100 separate BAC shotgun libraries from each well on a 10-by-10 well array. In the pooling approach, 20 BAC shotguns are made from the 20 pools — one for each row and one for each column — with each BAC represented twice. With random sequence reads from each pool, researchers can do computational assemblies sequentially of each row and column combined to identify the contigs that represent the BAC that's shared between the row and the column.
Says Gibbs, "You've reduced the upfront work where you treat each BAC separately; you've reduced it from an Nth problem to a root N problem. Instead of 100 shotgun libraries, now you make 20, and for every 10,000 now you make 200."
What's more, because the number of reads needed for each pool is relatively small, BAC pooling allows assembly to be done on a desktop computer instead of a supercomputer. In silico, a program called BAC Fisher that was developed by Baylor faculty Paul Havlak, whom Gibbs describes as a "card-carrying, parallel-processing, hardware/software guy," takes the BAC-by-BAC data into account in whole-genome assembly. As opposed to other whole-genome assembly algorithms that read only sequence end-read information, Gibbs says, BAC Fisher lets you reach into whole genome reads and pull out those that are just associated with your local BAC region and then assemble them using Phrap and other conventional tools.
"You take a component of whole genome, you take your BAC population, you make pools of the BACs, you get coverage of each, deconvolute the BAC arrays, take those elements, combine them with the whole genome element, and now you're really getting a whole-genome assembly," Gibbs says. "Now I think we've got a strategy that really does move the field ahead."
A new prep method also has something to do with the stellar results Baylor is getting. "For most of the world, what DNA template prep you use is about as interesting as whether you like wheat or white toast," Gibbs says. "It turns out, though, that it's incredibly important. The quality of the template has to be high to satisfy the machines. It's gotta be cheap because it will blow out your budget if you pay the high cost that most vendors want for templates, and you've got to make hundreds of thousands of them."
Gibbs credits Mike Metzker, head of rat sequencing production, with developing a filter prep capture method that runs on a Packard instrument and is the "cheapest, fastest out there." Metzker, who published his method in the journal Nucleic Acids Research earlier this year (vol. 30, no. 7) and planned to describe it to colleagues at a Cold Spring Harbor Laboratory meeting in May, is now working with several vendors to move from 96-well to 192-well plates.
The method — a modified alkaline lysis procedure — uses standard reagents that Metzker describes in the NAR paper and glass beads in a 96-well format to bind DNA that has been precipitated with alcohol. "It's an old trick," says Metzker, who describes himself as a genomics person with a background in synthetic organic chemistry. "Even in high school you'd extract genomic DNA and precipitate it with alcohol in a tube."
The "trick" has enabled Baylor to scale down its production staff from 50 to 30 and to double its throughput by getting twice as many templates as it did during the human project. Back then Baylor was working with M13 single-stranded DNA that could generate one read per template. Baylor now has three Packard plate track robots that each process 17 96-well boxes per hour, producing about 35,000 preparations per eight-hour day. Metzker says that in March the lab ran 1.7 million sequencing reactions, two per template, for five cents a piece — almost a tenfold rate reduction from standard market prices. Plus, Metzker says, the new method "is producing better data than we see with any current methodology."
Metzker says that the Packard instruments allow him to fully automate the prep from cell pellet stage to where the DNA is ready for elution. "It actually adds solution and moves it onto the shaker and moves it onto the manifolds and vacuum filtrations."
During the human genome project, Metzker says it was necessary to conduct DNA prep long hours and many weekends to ensure that the sequencing instruments were continually being utilized. The ultimate effect of the new prep robots, Metzker says, is that he's back to a "blue-collar eight hour day" and the bottleneck is back on the ABI machines. "Which is a good thing for us."