For all the talk about personalized medicine, there’s one ingredient we can all agree is absolutely essential before that can happen: lots and lots of sequence data. But at the current cost of sequencing genomes — just producing a high-quality draft of a mammalian genome costs $25 million, according to the National Human Genome Research Institute — it’s clear it’ll take a long time and a lot of money to accumulate the information required for doctors to prescribe treatments based on an individual’s unique genetic profile.
For all the hoopla over the past few years, most people already know that the current Holy Grail is to be able to sequence genomes for $1,000 apiece, and many groups are hard at work on this. At that cost, researchers and grant adminstrators agree, not only would it be possible to sequence the genomes of many — practically all — organisms useful for comparative studies, but one could also entertain the idea of sequencing the genome of every average Joe Patient, in theory a very useful step toward disease prevention and treatment.
To reduce the cost of sequencing a genome by five orders of magnitude, even if it takes 10 years, it’s no secret that researchers will have to develop entirely new technologies. There are many ways to increase the efficiency of current approaches, which rely primarily on capillary electrophoresis, but most researchers agree that these alone will bring us nowhere near a $1,000 genome. “As long as you need reagents [to determine the sequence], you’re still at best in the $10,000 to $100,000 range,” says Jeff Schloss, a program director in technology development for NHGRI.
A few startup companies are trying their hardest to develop new ways to sidestep that obstacle, and in the past year, several have made strides toward bringing their technology to bear on large-scale sequencing. This wave of new approaches to sequencing falls into three general categories: single-molecule, amplified molecule, and possibly, nanopore systems (see sidebar, p. 25).
The Fine Print
First it helps to know exactly what people mean by sequencing a genome for $1,000. Does that mean the genome of an organism never sequenced before — what’s known as de novo sequencing — or sequencing the genome of another member of a species for which we already have a reference sequence, a task called resequencing? The difference is fairly significant in terms of the final cost, given that assembling the individual reads into a coherent sequence is one of the most labor-intensive (and therefore expensive) components of sequencing using current techniques. It should suffice to say that accomplishing the relatively easier task of resequencing a genome for $1,000 will still be a fairly significant accomplishment.
There’s also the issue of defining a common standard of sequence quality or level of completion to use as a basis for comparing costs. NHGRI, which is administering a research program to develop new sequencing technology for eventually reducing the cost of sequencing a genome to $1,000, stipulates that such a sequence would have to be at least as comprehensive as that of the mouse genome published in Nature in 2002. In that case, the mouse genome was sequenced to 7.7-fold coverage, at a total cost of about $50 million, according to the NHGRI program announcement.
To be fair, no one’s arguing that capillary-based techniques, like those used as the basis for the Applied Biosystems 3700 series and the Amersham Biosciences MegaBace systems, will disappear any time soon. “Today, capillary electrophoresis is the most accurate, reliable, and cost-effective sequencing technology, and will be for several years,” says Philippe Nore, senior director of strategic business planning for the sequencing group at Applied Biosystems. “The $1,000 genome is not for tomorrow.”
Day After Tomorrow?
But it is coming. One of the most advanced efforts to commercialize a radical new form of sequencing technology is underway at 454 Life Sciences, a Branford, Conn.-based subsidiary of CuraGen that was spun off in 2000. The company’s approach, which falls into the amplified-molecule category, employs beads that attach to individual strands of DNA, and a “PicoTiter plate” with hundreds of thousands of wells, with one bead assigned to each well. To initially amplify genomic sequence, individual PCR reactions are performed in each well of the plate. 454 researchers then determine the sequence of each DNA strand via a cyclical sequencing-by-synthesis technique that relies on a modified form of pyrosequencing — essentially using the polymerase-driven release of pyrophosphate to determine if a particular base has hybridized to the DNA in a given well.
According to Dick Begley, 454’s president and CEO, the company last year resequenced the genome of an adenovirus, and can now routinely sequence bacterial genomes as large as 8 megabases in days. Since January, 454 has operated an in-house sequencing center that now functions as the alpha site for the company’s instrumentation. Begley won’t say how much 454 expects to charge for the sequencing system, but the company has planned a commercial launch date for sometime in the first quarter of 2005, he says.
454 scientists have submitted a proposal to NHGRI for grant money to pursue reducing the cost of sequencing a mouse-scale genome to $100,000, a task Begley thinks the company can achieve within five years. A $1,000 genome, he says, might be possible with 454’s technology in five to 10 years, he says. While 454’s technology is less suited for de novo sequencing, given its relatively short read lengths, Begley says company scientists have extended read lengths to 100 bases in the production system, and achieved 200-base read lengths in the laboratory. 454 is also working with Gene Myers at the University of California, Berkeley, on developing a next-generation assembler, Begley says.
Solexa is another startup making significant strides toward slashing the cost of sequencing. The Essex, UK-based operation is founded on the work of Shankar Balasubramanian and David Klenerman at the University of Cambridge. As a single-molecule-based technology, Solexa’s approach to sequencing requires no amplification step, instead relying on novel nucleotide structures compatible with dyes detectable at the single molecule level.
Solexa’s technology — aimed at resequencing, rather than de novo sequencing — chops up genomic DNA and deposits the molecules on an array in a random, unaddressed form, says CTO Tony Smith, allowing for more molecules on each array than the standard grid or well format. Solexa adds one base at a time to the billion fragments of DNA attached to the chip, measuring fluorescence to see which base attached to which strand of DNA.
This cycle is repeated 25 times until each of the billion-odd molecules has generated a 25-mer read. Those reads are then aligned against a known genome for sequencing as well as for SNP discovery and scoring. “We basically go in and capture all the variation at once,” says Smith. The cycle is run 25 times because mathematically, “that’s what you need to get a unique alignment to the reference genome,” he says. At the moment, Solexa is optimizing its first product, a sequencing system based on clustered arrays that will be sequencing genomes next year. Smith declined to provide an estimate of the cost.
Because Solexa’s technology avoids the reagent-laden (and costly) DNA amplification steps, it may have a strong chance at chopping the cost of sequencing. Furthermore, says Smith, cramming a billion DNA fragments onto a few-square-centimeter-size chip significantly reduces the volume of reagents required to perform the sequence detection. Ultimately, Smith projects that the company’s single-molecule array system should be capable of achieving 1x coverage in a few days, and “within this decade” resequencing a genome for $1,000.
At Nanofluidics, a Menlo Park, Calif.-based spinoff from Cornell University, scientists are busy attempting to commercialize single-molecule sequencing technology initially developed in the lab of Watt Webb, a professor of applied and engineering physics. Company executives declined to be interviewed for this article, citing their desire to “keep a low profile.”
The central element of Webb’s approach is the use of zero-mode waveguides — an array of 2.25 million tiny holes, smaller than the wavelength of light, spaced about five micrometers apart on aluminum film. Each well holds just a single molecule of DNA, and sequencing is accomplished by passing novel bases prepared with fluorescent labels across the chip to synthesize the complementary strand. Because the zero-mode waveguide creates an effective observation volume of 10 zeptoliters, according to a paper in Nature Reviews Genetics, Webb’s optics can detect the measurable time lag when a base is actually added to the strand.
Another company founded on the principle of single-molecule sequencing is Helicos Biosciences, a year-and-a-half-old startup based in Cambridge, Mass. The company licensed technology originally developed in Steven Quake’s laboratory at Caltech, and has “taken and run with it,” says Quake, who lays claim to being the first to publish a paper in PNAS demonstrating the viability of the single-molecule sequencing concept.
Quake’s approach relies on cyclic sequencing by synthesis, using a few hundred primed, single-stranded DNA templates attached to a quartz slide to create a parallel process. The slide is then washed repeatedly with fluorescently labeled nucleotides and DNA polymerase, and the DNA strands are monitored with a laser to detect the signal given off when a base is incorporated. Nucleotides are added one at a time — all A’s first, for instance — allowing Quake to use the same label for all the nucleotides, rather than four unique labels. After each base is detected, the nucleotide is photobleached to extinguish the signal before the next round of bases is added.
In Quake’s lab, researchers were able to sequence only short strands of bases — five to ten at the most, according to Tim Harris, the director of sequencing technology at Helicos. To commercialize the technology for resequencing applications, Helicos is working to increase that read length to at least 25 bases reliably, Harris says, and the challenge is to label the bases in such a way as to avoid interfering with the bulky enzymes required for the sequencing reaction. At this point, he says, it’s still a stretch to call the approach “genome sequencing” — at least technically. “Our first corporate objective,” says Harris, “is to write the first paper entitled, ‘single molecule DNA sequencing,’ where the editor of whatever prestigious journal we submit it to wouldn’t object to the truth of the title.”
Quake, for his part, is confident that a $1,000 genome will be possible within five years. “This is a very hot area with a number of groups involved, so there’s stiff competition,” he says. “It’s bursting with energy now and a $1,000 genome is not that far out.”
Others, such as Schloss at NHGRI, aren’t so sure. Single- molecule approaches have the potential to radically change the sequence game because they avoid the use of costly reagents for sample prep and DNA amplification, he says, but the ability to operate at scale is equally important. From Schloss’ perspective, sequencing a genome at a cost of $100,000 could be feasible within five years, but the $1,000 genome may take an additional five — even with the best efforts of the private sector. It seems the dream of personalized medicine will just have to wait.
Newbies under wraps
Given the avant garde nature of research into revolutionary sequencing technology, it’s not altogether surprising that some commercial efforts are hoping to stay under the radar while they test the mettle of their technologies. Here are three:
Nanofluidics, a Cornell University spinoff founded in 2000, has licensed zero-mode waveguide technology developed in the labs of Watt Webb and Harold Craighead, both professors of applied and engineering physics. Steve Turner, the company’s CTO, is a former postdoc at Cornell.
AQI Sciences, based in Bisbee, Ariz., was founded in April of 2003 to develop single-molecule sequencing technology, according to an e-mail from a company official. The company has not fully secured IP rights to the concept, but according to the company’s website the technology is based on fluorescent tagging and fluorescent resonance energy transfer technologies.
Seirad, founded by Joseph and Teresa Gatewood, aims to develop a sequencing instrument capable of fast genome sequencing at significantly reduced cost, according to the company’s website. While working as a researcher at Los Alamos National Laboratory, Joseph Gatewood studied gene expression in abnormal gestations using cDNA sequencing, performed neutron and X-ray scattering experiments on reconstituted chromatin complexes, and investigated alternative DNA sequencing methodologies. Seirad was founded in 1999 and is located in Sante Fe, NM.
The technology behind nanopore sequencing is at an earlier stage than comparable single-molecule approaches, but that isn’t to say it won’t one day catch up. Scientists at Agilent Laboratories, in collaboration with Harvard University researcher Daniel Branton and David Deamer at the University of California, Santa Cruz, are working to stream single strands of DNA sequentially through a nanopore hole while an immobile detector reads out the bases.
In theory, the different bases vary in how they interact with the pore, which can be constructed of either organic or inorganic materials, causing detectable fluctuations in the electrical conductance of the pore. Currently, the detector can distinguish between individual molecules of DNA, but has yet to achieve single-nucleotide resolution, says Jim Hollenhorst, the director of the molecular technology laboratory at Agilent Labs.
“This is not something we’re expecting to pay off next year or in a couple years — this is still quite far out. But if it works, we’ll be able to read very long DNA sequences” on the order of thousands of bases, Hollenhorst says. Ultimately, the technology could be capable of reading 100,000 bases per second through a single pore, he adds, and with parallelization, that number could increase to gigabases in minutes.
The problem, as Hollenhorst admits, is that currently Agilent and its collaborators are not yet doing DNA sequencing. “This is still a very high-risk endeavor, but we’re excited about it because there is long-term potential for several orders of magnitude higher throughput than the two orders of magnitude that people may achieve with other techniques in the nearer term,” he says.