By Meredith W. Salisbury
There’s no denying it: sequencing is hot again. While genome sequencing instrument sales slumped in the post-human-genome days — ABI, for instance, reported that after years of double-digit growth driven largely by sequencer revenue, sales of those instruments were lower in fiscal year 2002 than in previous years — the demand for that data continues to soar. “We have an essentially unlimited demand for DNA sequence,” says David Burke at the University of Michigan.
Everyone may want to use the technology, but the price is still stifling. Things like SNP studies and other genome-skimming techniques, says NHGRI program director for technology development Jeff Schloss, “are surrogates for sequencing, and we use them simply because sequencing is too expensive.”
That’s why more groups than ever — companies, institutes, and even individual labs — are hard at work on cost-effective, novel sequencing technologies. Current advances have succeeded in reducing reagent volume and increasing efficiency, Schloss says, “but we ultimately need to be able to get to single-molecule measurement methods.” Among the major problems people are hoping to solve, read length, cost, reagent use, and sample prep rank among the most pressing.
GT canvassed the field in search of the most interesting sequencing innovations in the works — some are already up and running, while others could be 10 years from making their mark. We focused on upstart technologies that haven’t received much attention yet instead of established, better-known methods. The techniques and technologies that appear on the following pages are nowhere near a comprehensive list of everything that’s out there, but they should give you a good sense of the efforts going on.
1 Quake’s Take
Amid the sequence dreamers, Steve Quake of Caltech stands out. “In April we published the first-ever successful single-molecule DNA sequencing [method],” he says of a paper that came out in PNAS this year. “We’re breaking new ground as we go.”
Using a sequencing-by-synthesis technique, Quake’s group, he says, is the first to show significant results from the much-touted single-molecule concept. In his routine, a few hundred primed, single-stranded DNA templates are attached to a quartz slide and then washed repeatedly with fluorescently labeled nucleotides and DNA polymerase.
The key: nucleotides are added one at a time — all A’s first, for instance — and then the DNA strands are monitored with a laser to detect the signal given off when a base is incorporated. (The sequential addition of bases allows Quake to use the same label for all the nucleotides, rather than four unique labels.) After each base is detected, the nucleotide is photobleached. “We use photobleaching as a helpful tool to extinguish the signal,” says Quake, before the next round of bases is added. The few hundred templates present on the slide allow for parallelization, and Quake hopes to get that up to millions of DNA strands sequenced at once.
At this point, reads are coming out quite short: no more than five base pairs, to be exact. “This was a proof of principle experiment,” Quake says. Increasing read length is one of his team’s priorities to get this technology into a more practical format. “We’ve got a lot of work ahead of us.”
But it’s critical work, he says. If you look at predicted cost improvements for conventional sequencing instruments based on historical trends, it’s “maybe down to $100,000 per [human] genome, way out there,” he says. “That’s still too much. … It’s clear that you need to continue innovating for something radical and new.”
2 Webb’s Waveguides
For Cornell University’s Watt Webb and his collaborators, current methods of sequencing weren’t looking at the problem in the best way. “[Our] basic idea was to convert the problem of identifying a DNA sequence from a spatial resolution … to a temporal resolution problem,” says Webb. “Then we could simply watch the order in which [bases] are added to the complementary strand.”
“Simply” may be overstating it, but Webb and his crew have indeed managed to change the game with what they call zero-mode waveguides — an array of 2.25 million tiny holes, smaller than the wavelength of light, spaced about five micrometers apart on aluminum film. Each well holds just a single molecule of DNA, and sequencing occurs by passing novel bases, prepared with fluorescent labels, across the chip to synthesize the complementary strand.
The beauty of DNA polymerase, says Webb, “is that its chemical kinetics actually involves about five sequential steps. So the time delay between successive processes is never zero.” That means the time it takes for a non-complementary base to just pass over a DNA strand is negligible, but Webb’s optics can detect the measurable time lag when a base is actually added to the strand. “If the polymerase recognizes the correct complementary base, it grabs it and holds it, which takes about a millisecond of parking time … before the dye is released from the labeled base,” Webb says. That time lag is the basis for the sequence detection system.
Jonas Korlach, a graduate student in Webb’s lab who has been working on the sequencing project since late 1997, says current experiments involve replacing two bases at a time (while they’ve shown they can replace all four bases, they’re still ramping up to do four in one experiment). The enzyme he’s using can read 100 regular bases or about 25 analog bases per second, and it holds onto the DNA strand for 100,000 bases. “We hope to have a unique sequence pattern that we can read in the next couple of months,” Korlach says.
In theory, zero-mode waveguides will be faster, more efficient, and cheaper than existing technology. “But the engineering is a long way from commercialization,” Webb cautions. Optics and detection for all four dyes still have to be ironed out, for one thing.
But the team’s work appears promising. “Even though the technology is fairly complex,” says Korlach, “there’s no fundamental limitation why you couldn’t build this into a machine.”
3 The Polony Protocol
When Rob Mitra and George Church tackled what would become their polony breakthrough, they had one underlying requirement: “We were really trying to do things that any lab could use and that would be really inexpensive,” says Mitra, now at Washington University. That meant no wells on chips — lithography was too pricey — and no instruments that people didn’t already have or might have trouble finding.
Though Church had come up with the initial ideas in his Harvard lab, it was when Mitra joined his lab and took on the project that it really got going in 1998. They started with the idea of replica-plating using microarrays but veered away from that and toward sequencing by synthesis with polyacrylamide.
The process begins with single molecules deposited in acrylamide. As PCR is performed, “the copy cannot diffuse very far from the parent,” Mitra says. That goes on until the gel is dotted with these PCR colonies, or “polonies.” Then one strand is immobilized and the other strand detaches and floats away. A sequencing primer is annealed to the remaining strand and then fluorescently labeled bases are added one at a time. Once a base attaches to the strand, an enzyme cleaves the linker between the fluorescent tag and the base, and the strand is ready for another cycle.
For sequencing purposes, polonies are still in the development phase. “We’ve achieved short read lengths up to eight base pairs,” says Mitra, who points out that it stops there not because of the biochemistry but because the gel detaches, a problem he believes “won’t be hard at all to fix.” But he says there hasn’t been hype about the concept yet, and there shouldn’t be. “Nobody should be jumping up and down yet until we sequence a bacterial genome.”
Meantime, the technology is already up and running for other applications, including exon typing, haplotyping, alternative splicing, and sequence tagging, says Church. The remaining hurdle for sequencing is one that plagues many similar approaches, that of discerning between one A, for instance, being added and three A’s. Church sees a number of possible solutions to that, including quantifying the signal, where a signal three times brighter would mean three A’s.
While so many of these technologies are being ushered straight into commercialization, it was important to Church and Mitra to avoid that phase (an “inevitable five-year delay,” says Church) to get this technology to researchers as quickly as possible. Dedicated to keeping this open-source, Church and Mitra have posted their protocols and made all the software written for it available on their website. The only instrumentation needed, says Church, is a slide cycler and a microarray scanner. Optional equipment includes a slide dipper and a desktop isolation hood to set up the PCR slides. “Other labs are already using our technology,” Church says.
4 454’s Foray
Jim Golden was around for some of the earliest days of genome analysis instruments. And now that he’s back in the sequencing arena, he says, “I thought I had left this thing forever. … [But] sequencing is becoming hot again.”
Now the business development manager at CuraGen subsidiary 454, Golden is hip-deep in the technology once more. The 75-person company, spun off from its parent in 2000, uses a proprietary picotiter plate with a goal of whole-genome sequencing. The instrument, at a cost of one-third to one-fourth of what current sequencers command, is “a combination of some off-the-shelf and some engineered stuff,” Golden says.
Essentially, the picotiter plate consists of 800,000 addressable, 75-picoliter wells (testing is underway for plates with more than 1 million wells). Genomic DNA is tagged and attached to beads, pumped through the microscope-slide-sized plate in a slurry, and then the entire thing is shaken until each well holds just one bead. 454’s chemistries generate photons detectable by CCDs, and sequencing can be done in several ways on the plate. Right now, each instrument is producing about 2.5 MB per day, and scientists are working to get that up to 25 MB. The current output is very short reads — 50 to 100 bases — which, Golden says, is fine for resequencing but can cause assembly problems for de novo sequencing. They’re working to lengthen reads as well as improve their in-house bioinformatics to get past that obstacle.
454 is in talks with some potential early adopters for the technology, and Golden says the plan is to open a sequencing center at the company’s Connecticut headquarters — where there are currently more than 10 machines operating — to which clients would come, learn to operate the instrument, and eventually go home with their own device. “There are some people that will want to buy a machine and some people who will want to buy a service,” Golden says, noting that no price has been floated at this point.
5 Mapping Mania
David Schwartz, a die-hard academic based at the University of Wisconsin, is no salesman. But after working on four generations of his optical mapping technology for 16 years, it’s no wonder he’s a little opinionated about the hype around other people’s sequencing technologies. “The other systems out there are promising results three years, five years, 10 years down the line,” he says. “This works now. And it works fantastically better than anything else.”
Optical mapping can’t be used for sequencing yet, though Schwartz says that’s not far off. Right now, the main application for the technology is providing maps that make sequence assembly significantly easier. It can also be used for comparative genomics and population studies — “this is a general platform that can be used for any sort of single-molecule interrogation,” Schwartz says.
Cells are lysed in agarose to protect against shearing, and the DNA molecules within are loaded onto a microfluidics device where each channel holds 30 MB of genomic DNA — an entire human genome fits on one plate, according to Colin Dykes, CEO of optical mapping licensee OpGen. The DNA is laid out in long strands and immobilized on a glass surface, stained, and treated with a six-base-cutter restriction enzyme that cuts roughly once every 4,000 bases. Wherever the enzyme cuts, the DNA strand pulls apart slightly to reveal a gap. Schwartz’s software goes through to identify each fragment and measure its size, revealing the pattern of restriction sites on all the DNA molecules present, overlapping the data as it detects the same patterns.
These maps tell finishers where to put sequence reads that don’t align with other contigs, and identify repeats as well as sequencing errors. “As you start getting your sequence contigs, you scan them for the presence of the restriction sites that we made using the maps,” Dykes says. “Maybe you find three sites a certain distance apart, and then you compare on the map to find the location in the genome.”
It’s a fast system. “The human genome was mapped in under 24 hours by one person,” Schwartz says. And he expects that to get better, saying mapping the human genome in five minutes is not unreasonable. “With resources — five really good people and a bit of money and a lot of coffee — I could deliver on that in probably six months.”
OpGen charges approximately one-tenth the cost of a genome project to make the map (less for really sizable genomes), and usually makes two maps using different restriction enzymes to give a couple of differently staggered patterns. The technology isn’t ready to ship yet, so at this point users send OpGen samples and wait for their maps to come back.
It’s been especially successful in dealing with repeats, says Dykes, and will likely continue to be so as new sequencing technologies give shorter and shorter reads. “If your reads are 50 bases, you’ve got no chance of assembling across a repeat,” he says.
6 Shimadzu’s Secret Weapon
It was in the sequencing boom days that Dan Ehrlich applied to NHGRI’s Jeff Schloss for a grant to fund the MEMS-based, high-throughput DNA sequencing instrument he was working on with his team at the Whitehead Institute. In 1999, Ehrlich won a three-year, $7 million grant to build the machine — and this summer, he expects to see the first commercial instrument come out.
“We haven’t really made a large noise about it,” says Ehrlich, director of Whitehead’s bioMEMS lab, but there’s no hiding his thrill at seeing the instrument come to market. The electrophoretic device, consisting of 768 lanes on two 50 cm x 25 cm microfabricated plates, gives greater than 800-base reads and will produce eight to 10 times as much data as the ABI 3730, Ehrlich says. “It’s a more or less conventional approach” — the big advances come through automation (“It’s designed for unattended, overnight operation,” he says) and size. “Its consumption of a sample is [about] one percent of the typical current material loaded onto a machine” to slash reagent use.
Unlike many of the public-sector forays into sequencing, this instrument has big guns behind it. Licenses for the technology were issued to Whitehead spinoff Network Biosystems (formerly GenoMEMS) for production of the disposable glass slides and to Shimadzu Biotech to build the instrument. Even as Ehrlich’s group continues to refine the technology — which was functional about a year and a half ago — by improving base calling, scoring of the data, and lane-to-lane uniformity, Shimadzu has rushed the instrument to market with plans for alpha sites (including at least one major genome center) by this summer.
“Shimadzu licensed the four-color sequencing patent and all the necessary IP to compete directly” in this arena, Ehrlich says. Whether people will balk at the price is yet to be seen: while the final price hasn’t been announced, Shimadzu’s chairman Tetsuo Ichikawa said last year it would cost in the neighborhood of $530,000.
7 Baylor’s Bright Idea
Baylor’s genome sequencing center has gotten attention for its sequencing advances by doing the first genome planned as a hybrid BAC/shotgun approach, but it’s Baylor’s array pooling techniques that have really piqued the curiosity of sequencers. Aleksandar Milosavljevic, who runs a bioinformatics research lab affiliated with the center, explains that the clone-array pooled shotgun sequencing method came about as a way to “avoid library preparations for individual BACs, which was a major bottleneck in the genome project.” What Baylor’s crew does instead is to arrange the BACs in a two-dimensional array, pool by column and separately by row, and then prepare libraries from the pool. The reads are coassembled and identified by row and column to determine which BAC they came from.
The pooled arrays also mean no more need for tiling maps of the BACs, says Milosavljevic. “It turns out if you design the arrays combinatorially in certain ways, you can do the mapping itself.” That’s accomplished by using two sets of arrays for each set of BACs. “The BACs are shuffled so they don’t occur in the same row or column twice,” he says, so mapping can be avoided in later stages.
A related advance comes in the form of pooled genomic indexing, which goes further by taking away the need to assemble the reads. “You can do a sequence similarity search against a related genome,” Milosavljevic says, “and if two reads hit close to each other, you can deconvolute the BAC and map it onto the region between the two hits.”
Milosavljevic uses the popular library analogy to explain the potential for pooled genomic indexing, which he notes “wouldn’t be meaningful unless we already had human and other genomes”: If each organism’s genome is like a book, “so far we’ve picked the books with the most interesting titles and we’ve read them cover to cover and then we skim some of them,” he says. With pooled genomic indexing, “we can index them all and then just read the interesting chapters in as many books as [we can].”
“The nice thing about the technology is it doesn’t require any radical change in pipeline in the existing centers,” says Milosavljevic — all the pooling is done with standard rearraying robots. “The main innovation here is not technological, but conceptual.”
8 Playing With Polymerase
Susan Hardin is no stranger to the competitive landscape of technology development. When she introduced her company, VisiGen Biotechnologies, as a candidate for the $1,000 human genome at the Genome Sequencing and Analysis Conference last year, she announced that her firm was going for the “$995 genome.”
Founder and CEO of VisiGen, a May 2000 spinoff from the University of Houston, Hardin heads up the 20-person crew in their efforts to make polymerase-based sequencing a success. Starting with Taq polymerase, Hardin’s team is engineering the polymerase by adding fluorophores and cysteines at key locations. “The trick with our technology is there’s an acceptor fluorophore located on the gamma phosphate … giving the signal that basically tells base identity,” Hardin says.
As the polymerase incorporates a base, a particular wavelength of light is emitted to identify the nucleotide added. “So what we’re working toward is a technology that will essentially produce DNA sequence information as fast as the polymerase can go — with a target rate of about a million bases per second,” Hardin says. Parallel processing will be key to achieving such high-throughput rates.
She’s certainly not there yet. Hardin estimates that it will take two to four years before the technology will work properly. In the meantime, VisiGen can keep plugging along thanks to funding from DARPA and NIH.
“VisiGen was founded at a really good time to take advantage of single-molecule detection,” Hardin says. In the long run, she envisions her business model along the lines of that other sequencing giant, Applied Biosystems: selling a complete system with an instrument, software, and reagents.
9 Silicon Valley Sequencing
Unlike many of his peers in the field, David Burke’s vision of cheaper sequencing hasn’t led him to design a new instrument system. Instead, he’s working with basic microfluidics in an attempt to improve the technology to the point where it’s advancing at the same rate as the computer industry’s microprocessor chip: cheaper and faster all the time. When he started this project nine years ago, he wondered, “Can we take advantage of the same ideas that have been so successful and figure out how to move drops of DNA around … the way packets of electrons” get moved around in integrated circuitry?
And after nearly a decade on this concept, Burke says, “We’ve solved many of the problems that we identified nine years ago.” Among the many incremental improvements he’s made to develop microchips that can be used for almost any kind of biological experiment, the biggest was figuring out how to keep salt-containing water separate from the electronics on a microchip. “Anyone who’s taken a toaster into the bathtub with them understands that idea,” he jokes. His team, which works in collaboration with Mark Burns, also at the University of Michigan, has accomplished that task. Now, the solution can even be “in immediate contact with silicon microfabricated components,” Burke says, and the enzymes still retain their activity.
Another challenge, and one that Burke expects will always be an issue, is controlling the minuscule, discrete drops in the device. Being able to work with size, location, surface chemistry, and temperature is critical to realizing a device that could, say, sequence a genome. Burke’s group decided early on to stick with well-established chemistries — “Otherwise you’re changing everything at once,” he says — so they could focus on changing the scale of the reactions, building hardware, and designing control software.
“We are close to where it’s very easy to imagine a technology that is a microfluidics device that can do many of the tasks that we do in conventional biology,” Burke says.
“We’re at a plateau for what you can get for your money” in the sequencing realm, Burke says. “It still inhibits a lot of really good science from happening.” His technology will target not the established centers but rather “people who are not in the sequencing business yet … who don’t have a lot of money who want access to genetic information.”
Microfluidics, he contends, will be cheaper as time wears on. “Making the first device is extremely expensive,” he acknowledges, but once the designs are set and the masks prepared, “making the 10 millionth device is nearly cost-free.” And he’s hoping to see microfluidics get on track with the computer industry’s expectations. “We need 10-fold improvements every couple of years. We need to get on that trajectory,” he says.
10 Nearer to Nanopores
Daniel Branton started working on his streaming nanopore idea about five years ago. “People were just really at the beginning of talking about sequencing the whole human genome and to me it looked impossible using gels,” he says.
Ever since, he’s been hard at work with a number of collaborators — including Harvard’s George Church and UCSC’s David Deamer — on single-molecule sequencing. And amid the frenzy of this field, Branton’s realistic attitude is refreshing: “It’s still quite far in the future,” he says of his technology. “There are several single-molecule methods that are likely to be available sooner.”
But don’t count his 15-person Harvard-based crew out of the running. Branton’s idea — to stream single strands of DNA sequentially through a nanopore hole while an immobile detector reads out the bases, something like the way a toll booth camera snaps pictures of license plates — is coming along. “One of the first steps was developing a method for making the very small pores in solid-state materials. That’s been achieved,” he says. “Now we’re at the stage of attaching a probe to that nanopore.” At this point, his team is using a flow of ions through the nanopore as the detector — that helps determine things like charge of molecules and their presence in the nanopore, but doesn’t give high enough resolution for base-by-base sequencing.
A key difference of this technology is the reliance on solid-state materials, which Branton sees as a more robust alternative to the enzyme-reliant technologies du jour. And theoretically, it will also take care of another current problem: “What’s becoming limiting today is the computer work that needs to be done after one gets information off a gel where the reads are relatively short and have to be patched together,” he says. “We’ve already passed DNA which is 20,000 to 30,000 bases long [through the nanopore].”
11 Agencourt’s Add-on
At Agencourt Bioscience, currently the largest US contract sequencing shop thanks to its acquisition of the Genome Vision unit of Genome Therapeutics, sequencing advances are all about what works in the production line — now.
One of the major improvements is actually a fairly simple silicon-mold lid that fits into a regular 384-well plate, shrinking each well from its regular 50 microliters to about four. Developed through a technology grant from NHGRI issued to Doug Smith, formerly at GTC, and affectionately known as the “Doug plug,” the lid can reach 200-fold dilution, says Agencourt CSO Kevin McKernan.
That level of dilution is still in the R&D phases, McKernan says. But the lids used in the sequencing shop are actively realizing 48-fold dilution for classic cDNA sequencing or 96-fold dilution for PCR-based sequencing. “We’re not building a nanoliter system to do this,” he says. “We’ve had [this] in production for about a year now.”
With some 50 or 60 sequencers in its facility, the lids mean major reagent savings for the company. And McKernan says they’re in beta tests right now and could be on the market as a product by this fall. “The response has been overwhelming,” he says. “To our surprise, [the beta testers] got more dilutions than we got.” So far, no word on price for the lids.
12 Billion-Molecule Marvel
With scientific advisors including Jane Rogers, Bob Waterston, David Bentley, and Ewan Birney, the people at Solexa must be onto something. But CEO Nick McCooke says, “When we started talking about this a few years ago, people thought we were on the lunatic fringe.”
In fact, they’re just on the outskirts of Cambridge, UK, located a scant two miles from the Sanger Institute. Based on a single-molecule detection idea from Shankar Balasubramanian and David Klenerman of the University of Cambridge, Solexa surfaced as an academic research by-product in 1998 and became a full-fledged company in late 2000.
Solexa’s technology — aimed at resequencing, rather than de novo sequencing — aims to put a billion DNA molecules on an array and use those as a basis for sequencing. The molecules are dumped on the array “in a random, unaddressed form,” says CTO Tony Smith, allowing for more molecules on each array than the standard grid or well format. The technology works directly with genomic DNA to skip sample prep. With the billion fragments of DNA attached to the chip, Solexa adds one base at a time to all the fragments, measuring fluorescence to see which base attached to which strand of DNA. “We’ve had to create some very novel nucleotide structures,” Smith says, to work with four dyes that are detectable at the single-molecule level.
This is repeated for 25 cycles until each of the billion molecules has generated a 25-mer read, and those reads are aligned against a known genome for sequencing as well as for SNP discovery and scoring. “We basically go in and capture all the variation at once,” says Smith. The cycle is run 25 times because mathematically, “that’s what you need to get a unique alignment to the reference genome.” Though that number seems small, it adds up, according to Smith: “You can get 10X coverage of the human genome from a single array.”
McCooke expects to have a prototype complete by the end of this year, and says Solexa will be in collaborations with people using the system by mid-2004. At the start, this platform might cost in the low tens of thousands of dollars and take several days to operate the billion-molecule arrays start-to-finish, but within five years he says that could be down to $1,000 and a much shorter time period.
“All the individual pieces work,” Smith says. “What we need to do is make everything work together and work efficiently.”
13, 14 Abstract Thoughts
Li-Cor, based in Lincoln, Neb., is at work on a real-time, single-molecule sequencing technology, though the company declines to talk about it. In a grant abstract available from NIH, PI Patrick Humphrey indicates that the technology involves an engineered DNA polymerase used with “charge-switched” nucleotides. Li-Cor is working on a total system to isolate and maneuver single molecules, a microfluidics technique to sort molecules based on charge, and software algorithms to tie everything together. Detection would happen through an optics system using four channels, and automated base-calling is planned from CCD images. “Read lengths,” writes Humphrey, “will be tens of kilobases to simplify shotgun sequence assembly and preserve haplotype information.”
Patent applications also provide clues to technology that might still be under the radar. A patent issued this February to Stefan Seeger of Molecular Machines & Industries in Heidelberg covers yet another method of sequencing. According to the patent abstract, the technology can sequence DNA or RNA by immobilizing single strands on a surface and interrogating each one with a laser beam. A solution of polymerase and nucleotides with luminescent tags is added to the mix, after which a single-molecule detector, possibly an amplified CCD camera, notes the luminescent signal emitted by an added base. That signal will be deleted — Molecular Machines lists several ways this could be done, including cleavage, photobleaching, or laser pulse — before the next base is added.