As the field of bioinformatics grows, the vast number of software tools and databases available to researchers is becoming more and more difficult to track. In the December issue of PLoS Computational Biology, a team of researchers proposed a possible solution to this problem in the form of a "bioinformatics resourceome," which would be built upon an ontology "on which a fully distributed system of registration and annotation of biology-related computational resources could be constructed." [PLoS Comput Biol 1(7): e76]
The editorial, "Time to Organize the Bioinformatics Resourceome," was co-authored by Nicola Cannata and Emanuela Merelli of the University of Camerino in Italy and Russ Altman, a professor of Genetics, Bioengineering, and Medicine at Stanford University Medical Center.
BioInform spoke to Altman this week about the concept of the resourceome, and what it will take to make it a reality.
Can you provide some background on the motivation for this paper? Why are search engines like Google not sufficient for finding the appropriate bioinformatics tools?
First of all, if you think about sensitivity and specificity of searches, where sensitivity is your ability to find everything of interest, and specificity is only finding things of interest, Google is pretty good at sensitivity -- so somewhere on that list, if you've picked the right words, you're pretty sure that you've gotten something -- but the specificity is not great, which means you have to sometimes look pretty hard.
So that's the first issue, and I think we've all had that experience. The other thing is [that] it presupposes that you know what words to type in, and if we really want to move our tools to biological usage, we have to figure out ways to bring these tools to the attention of biologists who have no idea what the right words are to type into Google.
So this whole resourceome thing -- and I have to say from the start that this was instigated by my Italian collaborators -- Doctors Merelli and Cannata hosted me at their university, and we talked about a lot of this stuff. They have an interest in agents, which are these little autonomous pieces of software that go roving around doing things for you, hopefully intelligently, and they came to me and said, 'We think that this is important,' and we had a lot of discussions about it and the three of us got excited about it, and we said, 'Well, let's start out with a very short position paper about why [one] would do something like this.'
Now, we don't have funding or support to do this, but I think we're kind of looking around and trying to figure out how to jump-start it.
How do you envision this getting started?
There are a number of things going on. For me, the most important things that are going on are the [National Centers for Biomedical Computing] program from the NIH. There are now seven of them. There were four announced in 2004, and three more announced last fall. We have two of them at Stanford. I'm doing one on physics-based simulation [the National Center for Physics-Based Simulation of Biological Structures], and my colleague Mark Musen is doing one in ontologies [See BioInform 10-17-05 for an interview with Musen about the National Center for Biomedical Ontology]. Mark is the one that actually gives me more optimism about this, because you could say that the resourceome is fundamentally ontology building, and then using the ontology to build this killer app that I imagine, which would give you access to all these resources. I think Mark and I are agreed that this is something that could come out of some of the ontology work that they're going to be doing at his national center.
Now, at my center, we're more customers of the ontology technology. We have a mandate, as a national center for simulation, to create at least a resourceome for simulation. So I think you can count on an effort coming from my lab on a sub-resourceome having to do with physics-based simulation. What are the tools out there, what are the models? We're looking at software, we're looking at models, key papers that everyone who does simulation should look at. We're just getting underway a simulation resourceome subproject in my lab, and of course we hope to continue to collaborate with our Italian colleagues, who, with their agent technology, may be able to help us with the issue of maintenance.
That's the big worry. You can build Resourceome version 1, and it will hopefully be good. And then you publish it, and everybody's excited, but then the question is how do you maintain it? And that's where I think web technologies like the semantic web, where you build your resource, and then you do a real simple thing like register it by describing it using semantic web technology -- this is a database, it has these fields in it, here's its URL, maybe a comment field and a PubMed ID, and I think if you keep it real small, you might even be able to get editors to say, 'Before we publish a new resource, we want you to make this little tiny entry in the resourceome.'
I think that would be very doable because it's a small enough community that we all know who the editors are, and they're generally friendly, visionary people.
So the short answer to the question is, a bunch of people get together and build a high-level, simple ontology for tools and resources, they prototype it by filling it up with a hundred or a couple of hundred resources that everybody would expect, like Genbank and PubMed, they build a killer app for search on top of that ontology, and then they convince editors and funding agencies that you need to register your resource in order to do good scholarship.
So it would be maintained by the community. It wouldn't require a centralized authority.
The only central activity is kind of the bootstrapping of it -- to get that initial thing going, and you would then make it a very distributed system where people could, basically, like the Internet, do almost anything they wanted as long as those core pieces of information were there. And as we've seen many times, bioinformatics guys, when faced with such data, even though it's a little dirty, they come up with cool algorithms that clean it up, that help bring to your attention what's important.
I don't think just a static Yellow Pages will do it. I think it has to be a little bit more of an active kind of search capability, and that's where Google might help us. Once we format all this, I would be shocked if Google is not working on semantic web type things, and by then, hopefully the Google tools will be really a perfect match for this. But I think it's our job to put the content in.
Do you have a timeline yet for the proof-of-principle resourceome you're lab is working on for simulation?
For the sub-resourceome on simulation, we would like to get that going in six months. We have a body on the project, we're having a meeting on Friday, so that all looks pretty good.
What kind of feedback have you gotten on this paper?
I just did a search in my mail, and I haven't responded to all of them, but I got already about eight or 10 e-mails from all over the place. Some people are rightly pointing to other efforts. So I have an e-mail here pointing to NodalPoint.org, which is highly related to this. And of course, this was an editorial, not a review article, so we weren't as comprehensive as we would have had to be [if we wrote a review article]. Somebody pointed out the Moby project, and there's an effort at Pittsburgh at the Health Sciences Library. They're trying to organize tools for people. So it's really hitting a nerve with a lot of people, some of whom are saying, 'Thank you for writing this, we really need it,' and others are saying, 'Thank you for writing this, and you should be aware of our work in the area.'
So that's a pretty good sign. As long as they're talking about us.
Do you foresee there being a way to curate this information so that users have a way of gauging the quality of these resources?
This is where we have to be careful, because of course you can get people to do it the first time around, but in the long term, the only way to do curation is going to be this community-based curation. Some of these blogs have led the way, because if they're respected and if people read them, they kind of get more stature, so it's kind of a statistical calculation done by the number of users. I think you have to set up in the initial ontology some way for people to hang off their opinions. But it has to be distributed, because if you have it centralized, first of all, who the heck wants to be the one giving grades to people? That's a great way to have all your colleagues hate you. So I think much more sensible would be trying to get a distributed system kind of like Amazon, and that's where I would have to look around and see what kinds of technologies are available off the shelf to do that kind of thing.
But we certainly have this model for our sub-resourceome on simulation. As a national center, it's even more important that we try to be a little bit neutral. So we would much rather have the users give their opinions about these tools, success stories, links to papers. But that has to be done automatically -- you can't have people doing it.
Going forward, after you get your sub-resourceome up and running, how do you see that growing into other domain areas?
Well, hopefully what we can do is make it into a federation. So, for instance, if every national center made their sub-resourceome, we would cover a pretty good chunk of bioinformatics, because [Andrea] Califano [at the National Center for Multi-Scale Study of Cellular Networks based at Columbia University] is doing networks and systems biology, Zack Kohane [at the Center for Informatics for Integrating Biology and the Bedside based at Brigham and Women's Hospital] is doing clinical genomics, Ron Kikinis [at the National Alliance for Medical Imaging Computing based at Brigham and Women's Hospital] is doing imaging, and Mark is doing ontologies, and there are a few others. The point is they are giving to give us pretty good coverage. All of us have a dissemination and leadership mission, so we're one natural place to do it. And by no means, I have to stress, are we the only ones. But we certainly have it on our plate, and there are many other people interested and/or mandated to do this. So the nice thing is we're just going to do the part that we feel comfortable with. We won't try to build the resourceome for somebody else's area because we don't know it as well, but if we get the infrastructure going, hopefully people can just copy our effort.
Is there anything else that you think is worth mentioning about this proposal?
I want to stress that this really started out with my colleagues in Italy … and this is an international effort. So they're busy talking to people in Europe to figure out who the equivalent leaders of such an effort on which sub-resourceomes could be coming out of Europe, and we'd also like to work closely with Asia.