Ten years ago, the institute sparked the sequencing revolution. Today TIGR is ready to expand into microarray analysis, proteomics, and genotyping.
By Adrienne J. Burke
When the Institute for Genomic Research got its start in rental space in Gaithersburg, Md., in 1992, not a genome had been sequenced, there was no such thing as a DNA microarray industry, and the word proteomics hadn’t yet been coined. Claire Fraser was 35 years old at the time, 10 years into her marriage to her former grad school instructor, Craig Venter. Fed up with careers in government medical research, and with NIH’s skepticism of his expressed-sequence-tag technique for gene finding, the pair set out on their own. In July 1992 they started TIGR with a 10-year, $70 million grant.
Nearly a decade later, the institute, now a 100,000-square-foot facility on 18 acres in Rockville, is famous for having sequenced the first whole genome of a free living organism — that of Haemophilus influenzae — and 20 other genomes, and for having identified nearly half the genes in the human genome.
Another 50 microbial sequencing projects are under way at the institute now, but straight sequencing is less and less what TIGR is all about. Claire Fraser, who has presided since 1998 when her husband vacated what he still considers to be the best job in the world, sat down with GT recently to talk about new directions the institute will take as it enters its second decade.
GT: As the president of one of the hubs of genomic sequencing, where do you see sequencing technology going within the next year or five years?
Fraser: We’ve become absolutely dependent on what the Amershams and ABIs are doing. Right now we have a sequencing facility that’s operating exclusively with [about 40] 3700s that have made a very big difference in what we do and how we can do it.
In terms of technology, we’re pretty much waiting to see what comes along, but for the time being we’re satisfied with what we have and what we can do. We and others have been able to continue to make improvements to existing technologies to drive costs down. That seems to be one of the most important issues, at least in terms of what we do with government funding because they’re very much looking at the bottom line.
How do you measure the cost of sequencing?
Fraser: It’s a per-reaction cost. We don’t amortize things per base. The cost to do a sequencing reaction is the cost regardless of whether you get 200 base pairs out of that or 700. That’s our currency. On average we run about 15,000 reactions per megabase.
And what does a reaction cost?
Fraser: You’ll get different answers depending on whom you talk to. You can look at bare minimum costs just for sequencing reagents and personnel, or you can look at the fully loaded costs with the associated informatics support overhead, etc. Our minimum cost-per-sequencing-reaction is now on the order of about $2. But that’s just to do the sequencing.
One of the things TIGR has done, perhaps more so than other places doing large scale sequencing, is take the project all the way to completion — to a finished genome fully annotated. We think we need to be including all of those costs in there. It has hurt us a few times going up against other groups for sequencing grants where we know what has been requested hasn’t been fully loaded costs. We’re as competitive as anyone else.
You mean that the competition isn’t including the bioinformatics costs? Is it that they’re not being honest with themselves about their costs?
Fraser: Yes, either that or they have alternative sources of funding to do other parts of these projects, or in fact they don’t have as large an infrastructure for doing all the downstream annotation for doing the finishing that we have. There’s no other organization that has done as many finished genomes as TIGR has. That has cost something — to set that up and keep that running and continue to get more sophisticated.
What have been the steps you’ve taken to get more sophisticated?
Fraser: The steps have been numerous. I look back to what we were doing initially with the earliest microbial genome projects for annotation. We were basically running BLAST searches and doing pair-wise comparisons between predicted genes and sequences and whatever else is out there in databases.
Getting more sophisticated has been an ongoing process. It’s been doing things like building gene families. So you’re now not relying solely on pairwise sequence alignments, but you’re looking at a new sequence in the context of a much larger gene family, where critical amino acid residues essential to the function of the proteins in that specific family can immediately be identified. And that gives you much better information about whether something is a true hit or not.
We started to do phylogenetic analysis on top of that, asking, “If we think this sequence is an ortholog of enzyme X and you run a phylogenetic analysis, does it actually group in a phylogenetic tree with all the other known enzymes from other species?” All of these things — setting up the phylogenetic databases, the gene family databases, going to hidden Markov model analysis — have taken a lot of human intervention to build. They didn’t exist a few years ago and it’s an ongoing process because as new sequences become available, regardless of whether they’re microbial, plant, or mammalian, those can be fed into all these existing database resources.
We consider that part of the necessary overhead to stay at the forefront of what we’re doing. And that comes at a cost. But I think that the product you get out of the other end as a result is worth the investment because it means that today you can rely with a greater degree of confidence on what you get from TIGR annotation than you could four or five years ago.
Your microarray effort is a big area. Where is that going?
Fraser: This is one of those examples where, in fact, we dipped into our endowment to bring the technology in house. We did that fairly early, before the technology was as robust as it is today. We felt it was an important investment in TIGR’s future. Nobody came to TIGR to do sequencing long term. Everybody viewed it as a means to the end, or to the beginning. The goal was to get enough sequence data as quickly as possible to allow everybody to move onto the new stage and it was clear that the microarray technology was going to be very powerful in doing that.
We invested early on and went through a steep learning curve figuring out that bugs needed to be worked out of nearly every part of the process: the instrumentation, the slides, the DNA binding to the slides, the labeling of the RNA. There wasn’t a single part of the process that wasn’t problematic. But that learning curve was enormously valuable because by the time everybody else was saying, “There’s good equipment out there that we can go purchase,” we had worked through all the bugs, we could figure out all the pitfalls and we’d actually begun to start doing the real experiments. That put us in a more competitive position than other people in the field, so it was an extremely wise investment.
[Now] we’ve probably earned back that initial investment 10 times over, if not more, in terms of grants that have come our way because we were really ready and had data to show that we were ready.
What was the initial investment?
Fraser: We were one of the early customers with Amersham Molecular Dynamics. I think it was a $1.6 million investment for a reader and a printer and that was a huge investment. These were still somewhat prototype devices. We learned that very quickly.
You recently announced that you would be getting into more functional genomics work.
Fraser: We got a contract from the National Institute for Allergy and Infectious Diseases at NIH to set up a Pathogen Functional Genomics Resource Center. This is a $5-million-a-year, five-year contract. One of the first goals will be to create microarray resources on perhaps as many as 10 different pathogens for the various pathogen research communities. That’s going to require us to set up a separate microarray facility because our existing facility is running at capacity.
The opportunity to comment on a draft RFA came in January 2000. It was something we felt we’d be in a very strong position to compete for and so we’ve been holding lab space in reserve all this time. We had a site visit as one of the final steps in the review for this contract and people from NIH came out and took a walk-through. We showed them these empty rooms and I said, “We’ve been doing a little wishful thinking and holding this space in reserve.” They said, “Well, a little overconfident perhaps?”
For that 18 months or so I had people coming into my office all the time saying, “I could really use that empty space” for this or for that. And I said, “No, if we really think we’re in the running for this contract we can’t give that space out. Because if we do we’ll never take it back from anybody and then we won’t have space to house it.”
Now we’re getting ready to break ground on a new building — a functional genomics building on our campus — that will be ready in 2003. The resource center will be able to expand, as will all of our other ongoing functional genomics programs. And it will allow us to consolidate all those efforts in one building.
What would you say puts TIGR at an advantage over any other organization building a functional genomics facility today?
Fraser: Probably the most important capability we have that would be hard to replicate somewhere else is the effort we’ve put into developing the databases and bioinformatics tools to integrate all the data starting with the genome sequence, gene sequences, protein sequences, and what we’re going to be getting out of functional genomics.
The instrumentation — anybody could go and buy from various vendors, but to really put this together at the right level you need to be doing it within the appropriate bioinformatics infrastructure. And in the microbial area I don’t think there are too many other places, certainly not within the US, that even come close.
Our proposal was to build upon what we have already done and modify it as necessary for the specific needs of the resource center. This is not only sequence-related bioinformatics but also a great deal of effort has gone into developing software and data-mining tools for DNA arrays.
Will functional genomics include proteomics?
Fraser: It will at some point. Probably it will also include genotyping. When the RFA was originally written the only thing that was clearly on everybody’s radar screen was the microarray technology. But it was written anticipating that there would probably be a demand for proteomics, for genotyping. It was also written in a way that made clear there might be other technologies that come on line that nobody today is necessarily anticipating. Because this is a five-year contract and five years in the genomics field is a long time.
Part of the efforts for the resource center will be to do ongoing R&D and be out looking for new technologies, bringing those in house, evaluating them on a small scale, and [deciding] whether it makes sense in the overall context of what we want to accomplish to bring [each one] on board.
Will TIGR abandon sequencing in place of functional genomics?
Fraser: I don’t think it’s either or. The size sequencing facility we have is really great because it allows us to be part of the sequencing effort at a large enough scale to feel like we are making a reasonable contribution without being a big burden. And we’ve made some specific decisions not to grow the sequencing facility beyond what we can accommodate in the initial space that we set aside.
Our capacity has gone up because we started out with 373s and replaced them with 377s and now it’s all 3700s. But we’re not going to find more space, at least not in the near future, to bring more sequencers on board. That’s a conscious decision that we’ve made because I made promises to the faculty that we will start supporting their efforts in the follow-up biology, in the functional genomics. So that’s where we’re growing things.
When this new building comes on line it may be that we can take some additional space that we’re now using for various functional genomics activities in our sequencing building and convert that to support space for sequencing. We may at that point decide that we want to grow sequencing if it looks like — whether it be microbial or eukaryotic organism sequencing — the money is just continuing to flow.
Do you have specific plans regarding technology purchases for the new functional genomics facility?
Fraser: No. I don’t think there will be any 2D gels. That’s the only thing I’m going to say, no 2D gels. That’s yesterday’s technology.
We don’t have to make a decision until we have the new building, which will be in 2003, because we won’t have space for anything before then. And this contract just began at the end of September. The first six months or so we’ll be doing all the new hires, getting things set up to be ready to get started. And then probably the next year or year and a half, we’ll be setting priorities … on what organisms will be slated for microarray analysis, how quickly do we bring them on line, etc. So our time will be occupied in getting the microarray resources made and out to the community.
Of all the organisms TIGR has sequenced so far, what’s the most interesting one? Your favorite?
Fraser: That’s not like asking me who my favorite child is, but it is like asking me, “Who’s your favorite dog?” And I can’t answer that because I have different favorites for different reasons. There have been a couple times when PIs working on various microbial projects have come to me in the manuscript-writing phase, very discouraged, saying, “I can’t believe this bad luck. I got the one uninteresting genome. There’s nothing interesting in here.” And my answer has always been, “You haven’t looked hard enough, go back and look some more.”
There’ve been some interesting organisms because of their biology and Deinococcus radiodurans, the most radiation-resistant organism, is one of them. It’s the one that attracted a lot of attention; the public seemed to really get into this. Its chromosome falls apart and can put itself back together.
But what was disappointing in a sense was that after we finished the genome sequence it wasn’t immediately obvious why Deinococcus has this extraordinary DNA repair capability. And that’s really what it has — the ability to put its DNA back together again. The secret has to be encoded in the genome, but I think it’s in the 40 percent of the genome that we don’t know anything about in terms of function.
It’s not in all the DNA repair enzymes that we can identify with a high degree of confidence and that you find in every other organism. If that was the case, then E. coli and everything else would be radiation resistant and they’re not. So it’s probably something very unique to Deinococcus and we need to be looking beyond the genes that we can assign biological function to and start delving into all the unknowns.
It should have been obvious to us going in. It wasn’t. I guess for some reason we were thinking we were going to see this and it would all become clear. We definitely have the tools for figuring this out.
You can be very excited about an organism because of its biology and then you get in there and it just says how much we don’t know about biology that we can’t make sense of it immediately. But that’s a good thing because it means we’ll be in business for a while.