Why genetics has shot its wad, sequencing will drop to 30 cents per lane, and genomics will be used to predict weather
It’s been four months since Craig Venter announced plans to open a new sequencing center in Rockville, Md. Most of the 79 staff has been hired, and the new facility at 1901 Research Boulevard — squeezed between The Institute for Genomic Research and Celera Genomics — is operational.
At a time when most academic centers are rolling back their sequencing efforts, Venter, the guy the genomics community loves to hate, has upped the ante — again. While his peers are trying to tweak traditional capillary-based sequencing, Venter has set his sights on the $1,000 genome and is hoping to nurture novel sequencing technologies. He’s already taken one young company, US Genomics, under his wing, and made contact with at least three other up-and-comers whose technologies he hopes will help achieve the cheap genome.
Venter spent two hours in the GenomeWeb suite during TIGR’s annual Genome Sequencing and Analysis Conference in October, talking off-the-cuff to a roomful of GenomeWeb and GT reporters and editors about his latest visions. Venter, who turned 56 in October, chatted about the center and his new research institutes; NIH funding missteps; de novo versus resequenced human genomes; and how genomics will impact meteorology. And, as you’ll see, he managed to squeeze in another reference to his own famous genome.
So you think there’s a need for a new sequencing center?
We clearly need better techniques for interpreting the genetic codes that we have. … Everyone bought into all the hype in the press and everywhere else about the Human Genome Project and the NIH centers are talking about post-genomics. Well, there never is a post genomics. We’re in the genome era, and we’ll be in the genome era for the rest of human history. We have this fallacy that we sequence a genome and put it on a pedestal and admire what we’ve done, but it’s a question of how to use it to better medicine and humanity and our understanding of science. We need a lot more genomes to do that.
Genetics has shot its wad in terms of the ability to use linkage tools to find the genes associated with human traits. The best way to do that going forward is … to have 10,000 human genomes right now with clear phenotypic correlations, with clinical records, [with] characteristics of these individuals, and doing genotypic correlations. One [genome] is almost useless other than as a reference.
We don’t know how to look at 30,000, let alone 300,000, different proteins simultaneously working in a group of cells. We don’t understand how to interpret our own genetic code. We don’t understand how to look at human variation across even the tiniest sampling of even people in this room, let alone whole countries and populations.
There’s been a lot of press about you asking wealthy people to pay to have their genomes sequenced. Is that going to be the main source of capital for your new center?
It would be very unrealistic of me to think that [NHGRI] would fund us to sequence 1,000 genomes, even though to me that’s the next logical step in this field. I don’t have enough money to fund that either. We’re using our foundation money to jumpstart it by paying for the facility and the equipment to get it going. But we’re asking the philanthropic community, ‘How about funding 100 genomes for patients with diseases that you care about or ethnogeographic groups … or in some cases you yourself or your family as part of a legacy?’
Everybody would have their data be part of a database that would be used for genome analysis in comparing clinical records, genotype/phenotype correlations — obviously in an anonymous fashion. They’d be part of the great scientific experiment and the great tradition of individuals helping to fund science instead of just relying on taxpayers and corporate initiatives.
You said your goal is to sequence for 30 cents per lane. How are you going to do that?
By using a variety of incremental new technologies and low overhead and better efficiencies. What is lacking is the competitive environment to really drive the cost down. If someone sent you a $20 million check each year to sequence things no matter what level you are doing it at, you’re not really incented to try and do it for half the price.
That’s why continuing sequencing in multiple centers — large, small, independent — creates a competitive environment. Just think if we could do sequencing for 30 cents a lane and some of the new technologies put out 1,000 base pairs; that’s a tenfold change. That means we should be able to do 10 genomes for the cost of the public effort doing the mouse genome. Without massive changes in technology that’s an order of magnitude difference in what can be accomplished.
What gets you more excited: this idea of being able to resequence human genomes, or being able to do de novo sequencing?
The term “resequencing” is actually a misnomer. We’re not resequencing. We’re going to do de novo sequencing from 1,000 people. Your genome has not been sequenced as far as we know, so it’s not really resequencing. Each one is a major discovery process as we start to understand the true nature of human genetic variation. Some people just want to remeasure one single nucleotide chain thinking that will explain biology or disease, when getting complete sets of haplotypes has not even been considered possible.
It’s a totally different approach than the genome center at NIH is taking in just trying to get a small set of composite haplotypes, as though that would explain your genetic code or mine. It’s all de novo sequencing. Unless I resequenced my genome, then that would be truly resequencing.
Do you mean that what you’re doing is a really different technique from what other people are doing?
I don’t know of any other human genome sequencing going on. We’re going to be doing comparisons where you have 1,000 sets of genes — obviously you line them up, but our lines will be on the order of a Kb each and they’re going to be from specific PCR primers so we know where we’re starting to begin with. We don’t need the backbone of the rest of the data as an interpreter. But the ability to do that in the first place rests on having done the genome once.
And what will you do with the data that you generate? Will that become a public database?
It’s certainly not going to be a secret database. It’s not clear that it makes any sense to put individual genome sequences in GenBank. It will certainly be a very powerful tool for us and our collaborators to use for making medical advances and interpretations. It will be probably available in some form or another. It’s such a massive amount of data. It’s not clear that anybody is equipped to do anything with it. We haven’t thought that far along other than that it’s not being tied up for any commercial purposes. Nobody has ever tried to publish 1,000 human genomes before.
You’d rather go the route of getting philanthropists to fund your work than getting, say, a pharmaceutical coalition to fund it?
I don’t think there’s anything in it for the pharmaceutical companies. They’d probably be one of the last groups that would know how to intelligently use this data. They want to discover new drugs and understand how to get better predictions of how to go through clinical trials. I think it’s a false notion that this would really help that process.
Genetic variation is associated with toxicity. If that could come out of this that would be one of the most valuable things ever to the pharmaceutical industry. We have a difficulty with interpreting this data. A lot of people are looking for yes/no answers. “On-off.” Certainly with diseases either you have them or you don’t. You don’t have 30 percent of the disease.
But in terms of the risk, you can have a 30 percent increased risk for disease or for a trait or for an outcome. That’s not very useful in predicting which are the right drugs to give you. But if you know you have a 30 percent chance of having a severe toxic reaction from a drug or dying from it, there’s no clinical rationale, unless it’s the only drug that could save your life, to give you that drug.
I don’t think pharmaceutical companies would necessarily be making those association studies. That would come once we had the first 10,000 genomes and could start to look for patterns.
That’s where more data will drive this whole field. It will reach a certain point and it will take off. And science and medicine will never be the same again. You can’t do those statistical correlations on five or 10 genomes — 1,000 is nowhere near enough, but it’s a start. If I knew how to immediately sequence 10,000 genomes and knew where to get the funds to do it, we’d do it.
How did you choose US Genomics’ technology over those being developed by other emerging sequencing technology companies?
Well, I’m talking to all of those groups. My goal is to get the next best technology that’s going to allow us to go to the next phase. For most aspects I don’t care which of those groups is the winner. I hope there are several winners to choose from, actually, so there is intelligent competition.
I was exposed to US Genomics’ technology and I was quite impressed by it. … It’s pretty hard not to get excited about seeing balls of DNA in the globular form go through those little posts and get stretched out in the linear form and see that happen to single molecules — and that they can map molecules in seconds or fractions of seconds! That gives me great hope that processing data in a different way than we’ve been doing is feasible.
They need another one or two orders of magnitude to get down to base-pair resolution, but that gives me and other people great excitement that there’s totally different ways to do things. Whoever thought of nanotechnology to stretch out chromosomes?
Part of our new center is a new technology test center. We would like to test everybody’s technology. We want to help drive the winners forward.
In light of this vision, how are you advising TIGR on technology purchases and planning?
The new center we announced is a joint sequencing center for TIGR, TCAG (The Center for the Advancement of Genomics), and IBEA (the Institute for Biological Energy Alternatives). Instead of just increasing the TIGR center [to meet] the demand [for] microbial sequencing, we decided it would be most cost effective to build one supercenter. This new center will be jointly controlled by the three not-for-profits. For all practical purposes it is the TIGR center and all HTP sequencing at TIGR will move over into the new center and the TIGR facility will become a specialized, very efficient closure facility.
Have you decided whether it will be ABI’s or Amersham’s sequencing technology that gets the blessing there?
We’re getting down to the wire with both of them. It’s an expensive wire. I’m getting choosy in my old age.
How about on the computer side?
We’re talking to the top computer manufacturers. It’s sort of like déjà vu. We want to do things differently this time in the sense that the 1.5 teraflop computer we built at Celera was built to assemble the human genome. It was purposely built: the Alpha Chip was three times faster doing the kind of calculations we needed than anything else.
It’s not an economical way to build the kind of compute facility we need going forward. Right now we want to build something that is replicable so any major medical center in the US or around the world can have a chance to do the same level of computing. It means changing the infrastructure so it doesn’t require the massive air conditioning. The room at Celera cost $6 million before you put the computer in. If, for any hospital to interpret the genetic code of their patients, they need a $100 million computer, this is not a revolution that will go very far.
We’re looking at these new green machines being considered at the DOE that have lower energy requirements, therefore produce less heat, therefore require less air conditioning. We’re looking at ambient temperatures, massively parallel processors. We’re trying to come up with almost the opposite of what we did at Celera: simple, cheap, replicable supercomputing.
You had mentioned one goal for the future being the possibility of using whole genome shotgun for sequencing entire environments. Are there any other new ways you see using this technology as cheaper, faster sequencing becomes available?
Part of what we’re going to do at the energy institute is a shotgun sequencing of the Sargasso Sea to see [if we can] get the genome sequences of all these unculturable organisms. All of a sudden genomics goes from looking at that 0.1 percent of what we’ve cultured and measured to being the avenue for understanding the rest of the biosphere that’s out there.
We’re also talking to some scientific collaborators about doing an atmosphere shotgun sequencing project. Imagine how [new sequencing technologies] will change the sciences of ecology and monitoring environments in terms of toxicity, emerging infections, biological warfare, anything in our environment. Biology could be the number one method for predicting weather in the future if we could really measure these changes in their dynamic state and understand the biological cycles of the whole planet.
That requires massive computing, high-throughput sequencing. DNA is the one thing that unites all of us as a species and if we can understand what’s changing dynamically we just might learn something worthwhile.
This interview was conducted by Adrienne Burke, Mo Krochmal, Kirell Lakhman, Kathleen McGowan, Meredith Salisbury, Aaron Sender, and Bernadette Toner.