Columbia University's National Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) — one of seven centers recently funded under the NIH National Center for Biomedical Computing initiative — is spearheading an effort to help the computational systems biology community evaluate the performance and accuracy of its algorithms.
The project, called DREAM (Database for Reverse Engineering Analysis and Methods), is the brainchild of Andrea Califano, MAGNet's director, and Gustavo Stolovitzky, manager of the functional genomics and systems biology group at IBM Research's Computational Biology Center.
"This field is developing in basically an exponential way, where you have literally hundreds of groups coming up with new ideas and exciting ideas, and it is also very fragmented in terms of what kind of data you can use," Califano told BioInform this week. "So Gustavo and I really thought it was about time to start putting a little bit of a quantitative framework or a comparative framework around what these algorithms can do in terms of reverse engineering."
Califano said that the DREAM initiative was "an integral part" of Columbia's proposal for the five-year, $18.5-million NCBC grant, which NIH awarded in late September [BioInform 10-03-05]. IBM Research is also providing some financial support for the DREAM initiative, as is Rutgers University's DIMACS (Center for Discrete Mathematics and Theoretical Computer Science) program.
"We know from the start that we don't have all the answers to these questions, and there are many aspects of the project that can be done in different ways."
The New York Academy of Sciences will also pitch in by coordinating the first DREAM workshop under the auspices of the NYAS systems biology discussion group. The workshop, scheduled for March 8-9, will serve as a forum for interested members of the community to discuss the considerable complexities and challenges of the DREAM proposal.
One goal of the workshop, Stolovitzky said, is to "decide what is the best way to make this as valuable as possible, because we know from the start that we don't have all the answers to these questions, and there are many aspects of the project that can be done in different ways."
The long-term goals of DREAM are clear. The project will host a repository for algorithms, biological models, data, literature, and other resources related to computational systems biology. It will also coordinate a series of conferences modeled after the successful CASP (Critical Assessment of Structure Prediction) meetings in the protein structure prediction community.
But Califano and Stolivitsky both acknowledge that the steps toward making DREAM a reality are still unclear. One big difference between CASP and DREAM, for example, is that CASP uses the crystallographic structure of a protein that has been experimentally derived prior to the conference but withheld from the scientific community to serve as the gold standard by which the computational predictions are judged. "That's the advantage that CASP has that we will not have," Stolovitsky said. "That's why we need to think a little harder in this case about what we mean when we claim success."
Stolovitsky said that John Moult of the University of Maryland's Center for Advanced Research in Biotechnology — one of the CASP organizers — is scheduled to attend the March workshop, and should be able to "tell us a lot about how to measure the success of the result of an algorithm."
The DREAM organizers also hope to solicit the community's ideas about the best way to manage the project. The March workshop and the first formal conference, to be held in September, will both be open to the computational systems biology community. "The first conference is probably not going to be, 'We give you the data and you give us back the pathways,'" Stolovitsky said. "It's more going to be, 'What kind of data can we give away, what kind of pathways should we attack, what kind of cellular systems should we attack?' Those kinds of questions."
Participants in Planning the DREAM Project
|Andrea Califano (Columbia University), Co-Chair|
|Gustavo Stolovitzky (IBM Research), Co-Chair|
|Gary Bader (Memorial Sloan Kettering Cancer Center)|
|Joel Bader (Johns Hopkins University)|
|Hamid Bulouri (Institute for Systems Biology)|
|Harmen Bussemaker (Columbia)|
|Jim Collins (Boston University)|
|Diego Di Bernardo (Telethon Institute of Genetics and Medicine)|
|Tim Gardner (Boston University)|
|Mark Gerstein (Yale University)|
|Trey Ideker (University of California, San Diego)|
|André Levchenko (Johns Hopkins)|
|Pedro Mendes (Virginia Bioinformatics Institute)|
|John Moult (University of Maryland)|
|Ron Shamir (Tel Aviv University)|
|Benno Schwikowski (Institut Pasteur)|
|Eran Segal (Weizmann Institute)|
|Mike Snyder (Yale University)|
|Andrey Rzhetsky (Columbia)|
|Marc Vidal (Harvard)|
|Mike Yaffe (Massachusetts Institute of Technology)|
Califano said that the DREAM organizers are considering a number of methods for evaluating algorithms for reverse engineering biological networks. These range from simulating biological networks "using completely a priori abstractions about what the network may look like," which can serve as ground truth for judging algorithms that predict the network topology based on output from the simulation. However, Califano said, "that type of approach is not very useful to biologists who would like to know how these methods work on real biological data."
The other extreme, Stolovitsky noted, would be to enlist the aid of the experimental community, as CASP has done. As an example, he said, a research team could perform a whole-genome yeast two-hybrid study to derive the protein-protein interaction network for a small genome like that of a virus and keep it under wraps until the conference, where participants would be given the genome sequence for the virus and tasked with predicting the network computationally.
"That's asking a lot from a person to do the experiments and not publish until something happens," Stolovitsky said, "but that's the kind of thing we will have to decide — whether there is the will from the community to engage in these kinds of things."
The effort also hopes to gather feedback from those who may be skeptical of the project's ambitious goals. "We want to be useful to the community, and in order for that to be the case, we would like to hear from the people who have some doubts about how to do this," Stolovitsky said.
Califano and Stolovitsky plan to spend about a year planning and discussing the DREAM project before finalizing its organizational structure. "We really want to dedicate this first year to simply decide which data types and which underlying models we can use for this type of analysis and really start building a database around that, and then in subsequent years we'll start thinking about how we can use the metrics that will be defined by this initial workshop to actually start scoring algorithms in particular categories," Califano said.
"In fact, determining those categories based on the type of data, based on the type of underlying network you're trying to determine, based on the type of single-cell, multi-cell organism and so forth — they're all criteria that are going to be discussed at the meeting."
Stolivitsky said that a number of researchers have already accepted invitations to participate in the planning of DREAM (see box), but he stressed that the initiative is an "open team" that will "evolve dynamically, and as people learn about the project we hope to get more brains into the soup."
With the exception of IBM, most of the initial participants are from academia, and Stolovitsky said that the initiative is still deciding how to encourage industry involvement. Companies working in the field of computational systems biology "might have a vested interest in this, and the problem is how to make sure that nobody feels threatened. If a company has an algorithm and it doesn't turn out to be the best — how do we sort out the fact that there could be financial interests there?"
Eventually, he said, DREAM hopes to have buy-in from industry "to make the base of this project as big as possible," but for now, he said, "we have to start somewhere."
Details about the March NYAS workshop are at http://www.nyas.org/events/eventDetail.asp?eventID=5873&date=3/9/2006%205:00:00%20PM, while some preliminary information about the September DIMACS meeting is at http://dimacs.rutgers.edu/Workshops/ReverseEng/.
— Bernadette Toner ([email protected])