Associate Professor, Department of Bioengineering
Name: Annelise Barron
Position: Associate professor, department of bioengineering, Stanford University, since 2007
Experience and Education:
— Assistant, associate, and full professor, department of chemical and biological chemistry, Northwestern University, 1997-2007
— PhD in chemical engineering, University of California, Berkeley, 1995
— BS in chemical engineering, University of Washington, 1990
In a recent review article in the journal Electrophoresis, Annelise Barron, an associate professor of bioengineering at Stanford University, discusses the advantages and limitations of the new sequencing technologies, which unlike traditional Sanger sequencing do not use electrophoresis. In Sequence spoke with Barron last month about the future of electrophoresis-based sequencing.
Is there still room for electrophoresis-based sequencing? What is going to be its niche in the future?
At the current time, 454 is delivering 400-base reads. Electrophoresis can deliver, say, 900-base reads. And the hypothesis that I have is that for the correct assembly of a complex mammalian genome — whether it’s a normal human genome or a cancer genome — we can’t really know in advance that the current scaffold is going to be close enough, until we actually obtain correct assembly. The data from Craig Venter’s genome indicated more large-scale rearrangements than expected [compared to] the reference human genome. And my friend David Schwartz, who is a professor of the University of Wisconsin-Madison, has been using optical mapping to look at cancer genomes. He has seen really astonishing rearrangements in some of the cancer genomes he has looked at.
Let’s draw from the assumption that assembly could be demanding and could require longer reads because of the high fraction of degenerate repeats in the human genome, which makes it hard to place things in the correct position with very short reads. The hypothesis is that, if we wish to start de novo and not assume anything about the final assembly, then there must be an optimum percentage of long reads and short reads according to three deliverables: you want the lowest possible cost, you want the highest possible accuracy for coding and regulation regions, and you want the fewest and smallest gaps.
One of my colleagues at Stanford University in computer science, Serafim Batzoglou, writes assembly algorithms. He wrote, for example, Arachne, which is one of the best algorithms for shotgun sequencing. I asked him, ‘Can you calculate the optimum fraction of long reads and short reads?’ and he said, ‘I could, and I agree with you that there is one, but it would take many years, because we would have to, essentially, write a code from scratch to determine that.’ But if you think about it, that makes perfect sense. There will be an optimal approach to de novo genome sequencing, which will use some fraction of long reads.
But I think we all agree that current Sanger sequencing by 96-capillary array instruments is just ridiculously expensive per base. [Although] I don’t really understand what the true cost of anything is yet, because I don’t think that the true cost has yet been reported, including instrument depreciation, overhead, [et cetera]. Not just reagents — that’s the cost we are hearing. Instrument depreciation, for example, was the largest expense to the genome centers when they were utilizing capillary instruments. It was 70 percent of the cost of the sequencing process, according to Elaine Mardis at Wash U.
I think that if Sanger can be engineered to have, for example, 50,000 channels in parallel, but with a pooled sample prep — very similar to what 454 uses — you reach an economic goal that is very similar to the next-generation technologies.
I think you could reach a $1,000 genome or cheaper if you had 50,000 channels and a 10-minute turnaround time on your reads. Remember, I have shown that you can get 600 bases in 6.5 minutes. That’s in a matrix, and I’m thinking about free-solution bioconjugate sequencing as the preferred way to do this, so you don’t have to use the gel.
We are up to 250 bases in my lab now with no gel. We have not published that yet, but we will soon.
Is anybody working on 50,000 channels? When do you think this will be possible?
Some people I know are doing feasibility testing for the necessary detector. I don’t think this would take very long. The beauty of that approach is, you are basically using Sanger, just without a gel and everything else. If the detector works, there is no mystery there.
Where is the cost-cutting going to come from?
The companion technology to that is a single-tube prep for all the 50,000 channels. So it would be, basically, a pooled prep like 454 uses. If you think about it, what’s really expensive for Sanger is reagents, because you have to use a single tube, or well, for every single sample, and that means some reagent in every single well, and you are just throwing away and not utilizing efficiently huge fractions of your reagents.
Do you imagine this technology would have many users?
I think it would have many users. Let’s imagine an instrument that costs $250,000 and basically gives you 700-base reads in 10 minutes, in 50,000 channels in parallel, and you can do a sample prep that’s very low in cost and simpler, perhaps, than 454. So there is no downside there; you get the same result in the end, but you have longer reads, so in fact, you get a better result. I’m hypothesizing because such a thing doesn’t yet exist, but it could.
In your review, you also talk about several groups that are working on integrating the different steps of electrophoresis-based DNA sequencing into a single microfluidic platform. What problems do these groups still have to overcome, and how long do you think this will take?
Those people are working on integrated preparation of samples for genomic analysis on microfluidic devices. At this time, the person and company that are closest to reaching high-throughput would be the Mathies group [at the University of California, Berkeley] and Microchip Biotechnologies, and Network Biosystems. But because those still require a sample prep that is a sample-by-sample prep — so every channel that has an injection needs its owns sample prep — they are not really going to approach the throughput and cost reduction of the next-gens.
But that could be extremely useful for medical sequencing. You don’t necessarily always want to sequence an entire genome. You sort of have to spend, at this time, $7,000 if you are working with 454, and you get the whole genome. What if you want 10 exons, and you want to spend 4 cents each? That’s the kind of thing a doctor might want. I think that the advantage of the electrophoresis technologies is [that] they are scalable in that way; you can do it on a per-channel basis. And that is much more suited to looking at limited gene regions for individual patients.
Are you associated with any companies developing new electrophoresis-based sequencing technologies?
I was a collaborator with Microchip Biotechnologies on an [National Human Genome Research Institute] $100,000 genome grant, which ended a year ago. I basically provided wall coatings and polymer matrix for their chip system in that project. The PI was Stevan Jovanovich.
[Also,] Kevin Ulmer and I started Genome Corp. Genome Corp envisioned this factory for sequencing that would run 24/7 like a printing press, never stopping, the world’s cheapest Sanger, which is a beautiful idea and different, in fact, from what I have described to you as another possible future for Sanger technology.
When do you think the $1,000 genome will be here?
Kevin [Ulmer] has this graph — basically, his graph is a log-scale dollar-per-finished-base-pair vs. date. If you follow that line, and you extrapolate it out, the graph says 2025 for [the] $100,000 [genome] and 2040 for [the] $1,000 [genome].
[But] I will here defer to Socrates who said ‘I know that I don’t know.’ I know that if a genome costs only $5,000 — if Complete can really deliver that, even if they are just delivering it at a big loss — I’ll probably buy my genome.
I’m actually really excited about the new technologies, and enjoying them, and watching this so much, and I can’t wait to see what’s true and what’s not true. Time will tell, because you can only hype for so long, you can only promise for so long. Obviously, for the sake of all the funds that have been invested by venture capitalists and the [National Institutes of Health], I hope everything works. I think there is going to be room for many, many different types of sequencing technologies. I just think that we will be sequencing in a myriad of different ways in the future, more than we can anticipate now. Perhaps [the movie] Gattaca was more accurate than one would have thought.