BioInform caught up with NHGRI director Francis Collins after his talk at Bio-IT World last week to find out a bit more about where bioinformatics fits into the NHGRI’s future. The full text of the interview, edited for length, follows.
How has the role of bioinformatics changed under the NHGRI during the course of the Human Genome Project, and how do you see this role changing in the future?
I arrived at NHGRI just about ten years ago with a fairly naive view of the role bioinformatics could play. I had a sense that it’s going to be really important based on the little experience I’d had running a genome center in Michigan before joining the NIH. But I don’t think anybody at that point could really envision how important this was going to be — what kinds of tools and resources, software and hardware, were going to be critical to the success of the genome project.
That’s gradually been evolving. In our portfolio, it’s the area that I think in many ways has been growing the fastest and actually where I have some anxieties about how we’re going to keep up with the resource needs. You can’t just assume that because the data’s been produced that it’s going to be well-curated and maintained, and put out there in a fashion that has easy access and that will also recover all of this fabulous information that’s been generated throughout the world and put it into an accessible form that working scientists can use.
My sense of the importance of this grows by the day.
At the Airlie meeting in November, there was a concern about how bioinformatics was initially represented within the larger context of the NHGRI’s future plan. Have those concerns been addressed by the current version?
At the Airlie House meeting, I think there was some misunderstanding, and it was probably in part due to the metaphor that we were using at that point. The metaphor was not a house, it was three pillars, and the pillars seemed to be standing alone and seemed not to have much connecting them laterally. And I think that image troubled people, not just because it appeared to dismiss bioinformatics, but also because it seemed to dismiss a lot of other things, too, like training and ELSI issues.
So once we quickly figured out that the metaphor wasn’t working, and developed this alternative of the house, by the end of the Airlie meeting, computational biology had already appeared as a prominent, vertical cross-cutting element of the house that genome built, or will build. And since that time I have not heard any sense of anxiety from the computational biology community that their importance was being under-appreciated. In fact, they may be a little worried that we’re putting too much on their shoulders!
Will this new structure impact how NIH and NHGRI fund bioinformatics projects?
Bioinformatics projects come in all different sorts. There are certainly projects that aim to set up and maintain large databases for particular organisms, like SGD [Saccharomyces Genome Database] or Flybase. A lot of the database support and maintenance comes from the NCBI, but I think it’s been a good thing that there are lots of other ideas and sources out there for people to go to, and we’ll continue to encourage that kind of diversity.
We recently supported the coalescence of the Swiss-Prot and the PIR databases, and we’re calling it UniProt, which I think a lot of people have high hopes for as a really good one-stop-shop for proteomics data, which can be confusing and hard to get at sometimes.
So that’s one type of computational biology application we might get — those are usually large and expensive, but very important. But then we also get lots of grants by investigators who have an idea about deriving knowledge about a huge mass of data, and those are often individual investigator grants, perhaps three or four people developing a new algorithm and trying it out. We support a lot of those. Some of them work and some of them don’t. That’s the way it’s going to be.
Then we have this Encode project that I talked about, and there’s lots of opportunity there for new ideas and computational approaches to genomics, and I’d like to see some really good applications there as well.
I would be open also to having centers of excellence in computational biology. We’ve been open to that for a while. There is an NIH-wide interest in that, the so-called BISTI [Biomedical Information Science and Technology Initiative] program, which is specifically focused on trying to encourage broader and more extensive computational biology approaches. There is continuing momentum and enthusiasm, particularly now coming from [NIH director] Elias Zerhouni, for seeing that as a significant focus of NIH for the future.
In the case of BISTI, it seems it would be a difficult undertaking to centralize all the various bioinformatics activities going on at NIH.
BISTI has several components. It’s run by NIGMS, and they’ve done some good things, but I don’t think they would say either that this has matured into what maybe it could be. I think we are sort of learning as we go along. It’s likely to be a continuing vehicle for encouraging the field.
Does NHGRI plan to have its own bioinformatics component?
I think we have all along. Certainly, this new plan, with bioinformatics as one of the cross-cutting elements, emphasizes how critical we think that is. When the new plan gets published, it’s worth looking carefully at the particular description of what we think falls under that category. There are quite a number of things that we’d like to see studied in more detail than what’s happened so far.
There was an open meeting held for the Encode project in early March. How did that go?
We had a great response. It was a really good meeting. About 75 people turned out, and it included quite a distinguished group of creative leaders in the field. Everyone’s pretty invigorated about this idea of all getting together and working on this problem.
I have yet to describe this project to anybody and not have this sort of sense of, ‘Oh wow, that sounds like a lot of fun!’ Because we’ll have all these different technologies and perspectives brought to bear in a very focused way on this 30 megabases, and we’re going to learn a prodigious amount about how these various approaches are complementary, or at times complementary. That is one of the big questions in trying to make sense out of the genome, and this is a central part of our effort.
Have you seen much interest from the commercial community in this project?
Oh yeah, a lot of people in the room were from industry. Some of them had already indicated that they wanted to be part of this. Some of them are not even that interested in asking for money because it’s stuff that they’re doing anyway. They want to have access to the project so they can find out what other people are doing, too. It really will open the curtain, in a certain way, to some of the technologies that are very exciting from private industry that have not really been that well seen.
I heard the data from this project would be placed in the University of California Santa Cruz genome database. Is that true?
UCSC has volunteered to serve as the database manager for this effort. At least for most of it. They’ve already developed the browser, which is ideally constructed in many people’s views to be the database for a project of this sort because it allows you to add tracks for all kinds of other accessory data, so people can see what’s there and turn [the tracks] on or off. They have already done a lot of the design work, and because they’re willing to do this, we all said, ‘Yeah, great!’
There will be a component of this that is focused on expression analysis, which of course involves array data, which is a slightly different animal. Andy Baxevanis, who is in the intramural program of NHGRI, is leading that enterprise because he’s done a lot of work on databases for microarray expression.
Will that use any existing microarray or gene expression databases?
This will be a new one. It may take advantage of some of the things that are already out there, but it probably needs to be constructed specifically for this project.
What is the status of the NHGRI’s draft statement on the extension of the Bermuda rules?
We are seeking input from the broader scientific community before [the National Advisory Council for Human Genome Research] finalizes the statement. At the Florida meeting [where the new statement was drafted in February], people were not only enthusiastic about reaffirming this for DNA sequence, but they really wanted to extend it to array data, to protein crystal structure data, and there weren’t a lot of people representing those fields at the meeting.
Have you had much feedback yet?
Not much. I’m actually presenting this very discussion to all the NIH institute directors tomorrow, because I’d like to hear what all the other institutes think about this who deal with different kinds of data sets and have different components than DNA sequence. And I can understand that there are certain types of data for which this very early pre-publication release wouldn’t be appropriate.
The 50th anniversary of the discovery of the structure of DNA coincides with the completion of the Human Genome Project. You said in your talk that this is a good time to “take stock of where we are and organize the expedition.” What advances do you foresee in this field in the next 50 years?
Well, I’m a physician, so I really migrate immediately to the medical benefits, which was my dream of what this project would produce. I think well before 50 years have passed, we should have uncovered the causes — at the hereditary and the environmental level — of common diseases and come up with ways to use that information to prevent illness by lifestyle and diet or medical supervision. And we should have built upon that basic information to design therapies that are vastly more effective and less toxic than what we now have to offer.
And maybe, if you’re talking 50 years out, you could extend that lifespan from 90-100 years maximum to something a bit longer. Of course, you’d have to think carefully about the consequences of that. So that’s my main dream, but along the way, we have a lot of social issues that we have to negotiate successfully.
It sounds like that third floor of the genome house will be the most difficult to build.
Yeah, there will be a lot of people working on that floor, and some of them will need to be well connected to the people who can pass laws.
Have you seen any progress toward legislating genomics-related policy?
I think a lot of it is coming from the advocacy groups who have gotten increasingly interested in these issues and are putting some of their own time into it, and do have the ear of the decision-makers, but are not impeded in the way that I am to be able to make an argument about something that needs to be done.
I would bet in the last two weeks with this genetic discrimination bill pending, and many hoping that this is going to be the time that it will pass, there’s a flood of letters coming in to members of the Senate subcommittee urging them to do the right thing and to figure out a way to compromise over the remaining disagreements and get a bill through committee and out onto the floor. The president has indicated he would sign it.