Name: Kevin Ulmer
Position: Consulting scientist, Complete Genomics, since late 2008
Experience and Education:
• Founder, president, and CSO, Genome Corp, since 2007
• Founder, chairman, and CEO, Really Tiny Stuff, 2002-2007
• Consulting Scientist, Helicos BioSciences, 2004-2005
• Founder, chairman, and CEO, Pavonis, 1994-2001
• Consultant, Exact Sciences, 1995-1996 and 2001
• Founder, chairman, and president; later executive VP, CSO and director, Seq (later Praelux), 1987-1997
• Head, Laboratory for Bioelectronic Materials, RIKEN Institute for Physical and Chemical Research, Japan, 1986-1991
• Director, Center for Advanced Research in Biotechnology, 1985-1987
• Vice president of advanced technology, and other research positions, Genex, 1979-1985
• Postdoc, Department of Nutrition and Food Science, MIT, 1978-1979
• PhD in biological oceanography, Massachusetts Institute of Technology and Woods Hole Oceanographic Institution, 1978
• BA in biology and physics, Williams College, 1972
Kevin Ulmer has been a player in the DNA technology-development field for several decades. More than 20 years ago, he founded Seq, the first company to pursue single-molecule DNA sequencing.
Ulmer has also been involved with Helicos BioSciences, the first company to commercialize a single-molecule sequencer, and has recently joined Complete Genomics as a consulting scientist.
In the last few years, Ulmer focused on updating Sanger sequencing technology at Genome Corp, which recently closed its doors (see In Sequence 1/20/2009).
A few weeks ago, In Sequence spoke with Ulmer to get his view on the state of DNA-sequencing technology, and where the field is headed.
How has the ongoing renaissance in DNA-sequencing technology come about?
In many ways, it was what I call the bittersweet completion of the Human Genome Project that started this, and the realization that even with a few reference genomes, and millions of SNPs, and a haplotype map, we were still unable as a community to understand complex common human diseases, which was really the justification for starting the Human Genome Project in the first place, and has been the Holy Grail of that project from its inception.
And that led to the recognition that we were going to have to sequence human genomes by the thousands — or tens of thousands — just enormous numbers of them, to crack this problem. And thus the quest for the $1,000 genome, because for the production costs of the reference genome, that would just never be feasible.
So it finally brought cost of sequencing center-stage in this field, which had never been the case before. During the entire Human Genome Project, cost was really never the primary driver in technology development. There was, obviously, interest in reducing cost, but it wasn't the prime motivation. Especially towards the production phase at the end of the genome project, once Celera [Genomics] appeared on the scene, it was simply a horserace to the finish line, a 'damn the cost, full speed ahead' kind of thing. Because costs were never centric to that project, people were not trying to develop technologies to address that dimension of the sequence-production problem.
[ pagebreak ]
So now we have this new imperative: We have to sequence genomes in huge numbers at extremely low cost if we are going to understand the genetic components of common diseases. The other driver that's come up in the same timeframe is this hope that through knowledge of an individual's genome, you can do a much better job of medical care for that individual, as well as reducing aggregate healthcare costs for society. That's this new Holy Grail of personalized medicine and the emergence of personal genome projects that have accompanied that.
The general perception was that Sanger [sequencing] would never get there fast enough, that that technology was mature, people had been working on it for a long time, and it just did not look like it would ever get us to the $1,000 genome in a reasonable timeframe. That forced people to start looking at alternative methods, which were now made feasible because of the existence and the availability of the reference genomes, and to consider using technologies where you are going to map to the scaffold, and therefore start with much shorter reads and much-less accurate data. It opened the door to a lot of technology options, which simply were not tractable prior to having reference genomes.
Another piece was that Applied Bio had effectively become the monopoly vendor in the field, and they were really vulnerable to innovation at that point in time. They put everything, more or less, into Sanger [sequencing]; they were king of the heap, and really were not well positioned to offer up alternative solutions, which also provided the opportunity for others to enter that market space.
So what we have seen in the last five years is the commercialization of a handful of viable new alternatives to Sanger. And what we are now witnessing first-hand is the consolidation of that market, again, behind a few major players, including Applied Bio [now part of Life Technologies after merging late last year with Invitrogen], which is caught up in the back of the race with everyone else. And they are now in a frenzy of buying up and licensing companies and technologies to continue pushing the performance envelope.
Where is DNA sequencing headed in the next several years?
It's tough to have a crystal ball and make predictions, but nonetheless the starting point to that is that I have stopped worrying about the cost of sequencing itself anymore. In a keynote lecture that I gave at a next-generation sequencing conference in Providence in 2007, one of the slides I showed was the historical decrease in the cost of sequencing, going all the way back to 1970, when the first 12 bases of what's called the 'sticky ends' of bacteriophage lambda DNA were worked out in Ray Wu's lab at Cornell by methods that predated Sanger even. And since that time, it's been a nearly perfect exponential decrease in cost, approximately cutting the cost in half every two years or a little longer.
But what I posited at that meeting was that we were now going to be sitting at an inflection point with the arrival of these next-generation technologies and would soon witness a significant change in the exponent. Essentially, we are suddenly finding ourselves on a new, almost vertical slope, with cost changing by orders of magnitude in the same timeframe when they used to change by factors of two.
So we are rapidly approaching the point where the production of the raw sequence data needed to map or assemble a genome will effectively cost nothing. It will become a commodity with very little opportunity for differentiation across vendors. When it costs nothing, it's hard to claim that yours is better. The competitive advantage will briefly shift to the front-end and back-end parts of the process — sample prep and genome assembly and mapping. But those are going to follow the same trend and rapidly become commoditized as well, to the point where all genomes, for all practical purposes, will be free.
And the challenge now moves to a much more difficult and intellectually challenging problem, of how to use those genomes to understand, predict, and prevent disease. That will become the new high ground, and that's where I'm focusing my energies.
The other comment [I want to make], as we watch this free-fall, and these rapid advances, is [that we need] apples-to-apples full-cost accounting that used to be lacking in the field.
[ pagebreak ]
You started the first single-molecule DNA-sequencing company, Seq, in 1987. What do you think are the biggest challenges facing technologies pursuing the single-molecule approach? And can you mention specific technologies?
To start with, what I would assert is that Sanger reads are still a gold standard in terms of read length and accuracy, but obviously not in terms of cost. That was why we at Genome Corp. said, 'Why don't we take the gold standard method that produces the best quality data, and figure out how to make it cheap enough to compete with these other methods?', whereas almost everyone else essentially looked for new technologies that were inherently cheaper to begin with, but lacked the read length and accuracy, and then tried to figure out how to use those. And we still think that the Sanger gold standard is viable, and are now looking to hand that off to organizations that have the resources necessary to validate that focus.
What I had bet in 1987 was that we were going to need something better than Sanger to sequence the genome. Obviously, I lost that bet; we did the genome with Sanger. But it's interesting to now see these single-molecule methods appearing. Over time, we have come to understand that each single-molecule measurement is inherently a stochastic problem, and it will be a struggle to achieve raw accuracies equivalent to ensemble measurements with single-molecule measurements. So it will continue to be a question of, 'How many times do I need to repeat the direct single-molecule measurement to achieve the same equivalent result as the ensemble measurement?' And in the end, have you gained anything, is there a net advantage to having done it at the single molecule level? You will have to repeat it so many times that you functionally create an artificial ensemble from your multiple single-molecule measurements.
With Helicos [BioSciences], initially it was one read off of a single molecule. The error rates that were published were extremely high, and so they came up with the notion of resequencing the same molecule a second time, which allows you to further reduce that error by a substantial margin, but with the penalty that it doubles the time and the cost of your sequencing. And then there is the question of the overall yield of the process if you have to go through it twice — not everything resequences perfectly the second time around, so there are inherent losses associated with that.
With Pacific Biosciences, their promise is that they have little circles, and they just go round and round the circle enough times to make the measurement [and] to get these accuracies. So it's going at it in a different way, but it is effectively facing the same fundamental challenge of how many single-molecule reads, with all the quirky idiosyncratic behavior that single molecules display, do you need to get equivalent quality data? And what's the cost penalty for doing that?
Oxford Nanopore [Technologies], interestingly, is going back to the same exonuclease-based approach that we were developing at Seq, and that Jim Jett and Dick Keller and colleagues were working on at Los Alamos [National Laboratory] starting back in the '80s, replacing optical detection and discrimination of single nucleotides with an electrical detection scheme.
In looking at what Oxford is proposing to do, you wonder what the timeframe is [for] doing single-molecule, single-nucleotide proof-of-concept-type academic demonstrations to having a commercially viable product. But the appeal that both Oxford and PacBio [have is that both] are trying to increase the read length substantially.
In the case of Oxford, going back to what we were originally going after [at Seq], which was reads on the order of the size of lambda DNA molecules, 50 kb-type fragments, which would obviously tremendously simplify resolving structural rearrangements and heterozygous positions in haplotypes within human genomes. So it's a laudable quest, one that we certainly understand. The question will be, as these other technologies rapidly approach zero cost for delivering genomes, what their relative advantages will be, and what the timeframe for developing those will be.
All the other technologies are going to ride on the same set of engineering solutions and commercially available components and technologies to make their systems faster. Everybody is going to be dependent on cameras, how many pixels and how fast can you read those pixels, [et cetera]. But it's not as if anybody is going to have a unique position there; everybody is going to be able to exploit improvements in camera performance in different ways, and then it depends on how efficient you are at using those pixels in your particular sequencing methods.
The real-time methods, like 454 and what PacBio is developing, essentially have to dedicate pixels to watching things happen in real time, which has inherent inefficiencies associated with it. Those who are looking at sparse fields of data, where you have clusters or single molecules randomly distributed on a surface, such as the Illumina platform or the Helicos platform, are looking at the sky at night. They are interested in the stars, but most of what they are looking at is the black space in between. They are not inherently very efficient, either, whereas the approaches that use either close-packed arrays of beads, or — here at Complete — arrays of DNA nanoballs, essentially pack the samples in as tight a configuration as possible, and allow to couple most efficiently the cameras in terms of the number of pixels you need to read a base.
[ pagebreak ]
What are the inherent challenges of exonuclease-based sequencing, the approach Seq was pursuing?
[At Seq, later renamed Praelux,] we demonstrated and published the single-molecule digestion of lambda-sized DNA molecules with lambda exonuclease; [this was] work published by Johannes Dapprich.
Those enzymes, as they exist today, are perfectly capable of chewing up 50,000 bases of sequence in a totally processive fashion, and releasing those single nucleotides from a duplex strand of DNA. They will, in fact, do better than that. So the prospects for doing extremely long reads is very real — the challenge, obviously, is then detecting and identifying these cleaved single nucleotides [and] retaining the proper sequential order that they were released. Just like all enzymes, they are not perfect clocks, they don't chew off a single base at precisely the same rate; they are context-dependent and secondary-structure-dependent. But for all practical purposes, they do the job.
The optical methods that we were pursuing at Seq with technologies that are much older than what we have currently would be another viable approach. The challenges in what Oxford is proposing is that they have to create this super-molecular structure that is the combination of an exonuclease [and] a core [with] these cage molecule, the cyclodextrins, where everything is put together the right way, and then we still don't know exactly what kind of single-molecule signals we are going to get from such a system, what its intrinsic error properties will look like.
You trade off other things — you are not going to have photobleaching problems that are inherent with optical methods, but you will presumably encounter a whole hornet's nest full of other kinds of issues and problems associated with the electrical detection that are difficult to really anticipate until you got something working. On the one hand it may appear to be simpler; you say, 'Oh, we just make an electrical measurement,' [like] patch clamp [has] been for many years. It's not that simple. And then the additional questions will be, even if you can do that on a single pore level, how does this technology scale? How robust can you make it?
It's a long path to commercialization, even once you demonstrated the proof of concept. And again, you are going to be playing catch-up in a world where we are rapidly approaching commodity costs for sequencing. What resources will be there to produce yet another technology to produce a commodity product?
Five years from now, do you think several of these new technologies will co-exist in different niches for different applications, or do you think there will be one dominant player?
That brings me to the difference between the service model and the instrument-and-reagents model. I obviously have been preaching the notion that [sequencing] is a service business; it's not an instrument and reagent business, for 20-plus years now.
By analogy, what you are seeing at the moment is [that] the vendors building these next-generation systems are trying to build Formula One race cars for street use by average drivers. You can see the difference in the performance that it has achieved in-house at Illumina, with their own people in their own hands, and the genome centers that struggle to achieve comparable performance. And the poor labs that only have one of these machines are even further behind. So it's obviously hard to make these things run at peak levels of performance in the hands of average users in average labs.
So the notion that you can develop a system that's user-friendly, consistent and reliable, and bullet-proof enough to consistently give you that peak performance in the hand of an average user costs you a lot in terms of what you have to build into that instrument. The service model eliminates that. You now go back to the little things you see in the car ads where the cars are zooming around all the hairpin turns on the mountain, and it says, 'Professional driver on a closed course.' It essentially allows you to have your in-house pit crew to keep the race cars running at peak performance. Again, by analogy, there are not a lot of Nascar drivers, but we have millions of Nascar fans who watch those races. And the service model really transforms that. You don't have to worry about being a professional race driver. You just have to watch the race and enjoy the benefit of the output of it.
[ pagebreak ]
Let's talk about your new role at Complete Genomics.
It starts with this shared vision — extremely low cost, complete and accurate human-centric genome service that is scalable and sustainable, which have been the missing factors in that business model for a long time. What I can tell you, having been here now for a month and a half or so, is that they have a technology in hand that [serves as a] road map [to] deliver that service in a timeframe and on a scale that I don't think anyone else can match, and you will see the proof of that at Cliff [Reid's] talk at [the Advances in Genome Biology and Technology meeting this week]. So I have effectively voted with my feet in what I perceive to be the winner in this space, and helping them with the scale-up of this first genome center, based on that model and that technology.
It's a service — if you have samples and a good hypothesis, and a grant to pay for them, you don't have to invest in the capital, and the infrastructure. You send your sample off to someone else who can deal with that. And in particular, [we] view it end-to-end, view everything from the sample prep through that genome mapping-assembly-annotation-comparison piece, which really appears to become the Achilles' heel of these next-generation technologies. They produce prodigious amounts of data, but everyone seems to be struggling with the back end, especially those who don't have access to the kind of infrastructure that would exist at the Sanger, or Broad, or WashU, or Baylor.
Will the sequencing technology used by service-model companies like Complete Genomics change over time?
The technology is going to continue to improve even once it has become a commodity, and people are going to be worrying about how to shave a penny of profit out. So the cost focus becomes dominant and, yes, people will continue to upgrade and replace. And we will probably see generational switches in those technologies as well.
But the focus will no longer really be around that as it is today. Once the cost of producing genomes, in terms of the raw data, effectively goes to zero, or nearly free, the bottleneck is no longer the cost of acquiring the data [but] the cost of annotating and analyzing that data, and performing the disease studies or the cancer genome studies, and ultimately, figuring out how to save the healthcare reimbursement organization money.
Unless we can demonstrate how to save money in the long term, and do a better job at improving quality of life, and reducing mortality and morbidity, through use of genomic information, the whole thing doesn't hang together. So there is a lot to be done between just being able to sequence genomes by the thousands and delivering on that promise.