Manager of Functional Genomics and Systems Biology
IBM Computational Biology Center,
This week, the third Dialogue for Reverse Engineering Assessments and Methods, or DREAM, conference was held at the Broad Institute of Harvard and MIT. The meeting, which evaluates the performance of different algorithms for inferring biological networks from experimental data, aims to “catalyze the interaction between experiment and theory in the area of cellular network inference,” according to the organizers’ website.
Forty teams participated in this year’s event, providing a total of 413 predictions for four different challenges. Each challenge includes one or more datasets, and participants were asked to infer the unknown data or underlying network associated with the challenge. The actual measurements and networks along with the results of the predictions were disclosed at the conference and are available for viewing here.
The idea is to give the scientific community a venue in which to assess and test their algorithms against a dataset to see how well, for example, they describe the networks at the heart of biological systems such as signaling cascades, expression, or gene networks.
IBM’s Gustavo Stolovitzky and Columbia University’s Andrea Califano dreamt up the event in 2006 and have organized it ever since [BioInform 02-17-06]. This year, Robert Prill, a post-doctoral fellow at IBM, joined to help organize the challenge.
BioInform spoke to Stolovitzky and Prill this week about the conference, its results, and the state of the art of network inference. The following is an edited transcript of the conversation.
You are careful not to call DREAM a competition; it is serious science, but is it also a little fun?
GS: It has a fun aspect to it. It’s a challenge. Competition is, for us, a forbidden word because it might look like a game between people competing for a best performance. It’s a collegial dialog. Even though some participants want to be the best performers, we all know there is so much to learn and we learn as much from what we do wrong as from our successes. I was very happy when I saw 40 teams participated. We are growing with the field itself.
IBM Computational Biology Center
We are in the ballpark that CASP [the Critical Assessment of Techniques for Protein Structure Prediction] was in its beginning. That community-wide experiment attempts to take the pulse of the field, which looks to see how good methods are or how much work needs to be done in algorithms for protein structure prediction.
It’s a lot of work to participate; it isn’t for a paper that will be published, but it will be evaluated. So what do researchers gain by taking part?
RP: Just to clarify: the teams are anonymous, they just get a number. The algorithms are also secretive, unless you are the winner and you typically disclose what you did to win.
You get to see how you do in a public forum but the identities of the performers are secret.
GS: I guess that makes the participation a little less stressful. We are exploring the possibility of a more open discussion of the methods the different teams used. I asked people why they participate and they answer invariably, ‘Because I am learning while doing this.’
They are not participating to win. And I don’t want people to suffer because they didn’t do very well in a particular challenge. For example if team xyz did not do well and tries to publish a version of that algorithm or get a grant, I don’t want us to limit their possibility. But I would like to know what method they used, so we know where to improve from.
Could DREAM be a kind of incubator for new methods?
GS: It has the potential to become an incubator for methods that eventually will grow and be developed by researchers. We have only had two DREAM challenges so far, so at most what we could have seen is one method that systematically had the best performance in both events. But that didn’t really happen. The methods that did well in DREAM2 didn’t score so well in DREAM3. The methods that did well in DREAM3 didn’t score so well, retrospectively, in DREAM2. We know people are all trying their best.
We haven’t yet found a jewel that is unique. But this is what DREAM is about. We are providing a kind of playground where people can see where algorithms satisfy their own sense of good performance. In a way the final proof is going to someone who is impartial and that is what we are trying to produce — an impartial, more or less objective way of testing algorithms.
One of the challenges — to infer a signaling cascade from incomplete flow cytometry data — did not have a best performer. Why is that?
GS: These data are a new modality. When gene expression data first came up, we were all very excited to learn how to use it. While flow cytometry is not the equivalent of gene expression, in a way it is a new modality that we all have to learn how to best use.
The challenge is trying to get people to think about how to recreate cellular pathways from flow cytometry data, which is a little different from gene expression data because these are single cell measurements. … There are many routes to choose, one way would be to model mathematically the actual circuit that we believe was at the root of measurements and see whether you could optimize some parameters in your model that were consistent with the data.
I thought that maybe that was a good exercise to understand the extent to which we can do good modeling from this new modality. … Overall, none of the teams did well but Bobby [Prill] did some analysis of the results showing that as a community, the number of times a few of the molecules were correctly assigned had p-values that were very good. There wasn’t sufficient accuracy in the results in each of teams.
If the challenge had been done by monkeys throwing darts, it would have been very unlikely that five out of seven monkeys would have obtained the same correct answer. So data contained information that the community was able to obtain and the coincidence of many teams obtaining results shows us … that we are not monkeys throwing darts. The seven participants had a number of coincidences in their results indicating that information was being obtained from the data. This also says that there is good opportunity here.
Where does the data for the challenges come from?
GS: We don’t create the datasets; they are kindly provided by collaborators. People can be very possessive in science and of their data, but these people are very willing to share, such as several scientists at Memorial Sloan Kettering Cancer Center, from Harvard Medical School’s Peter Sorger’s lab, from Neil Clark’s lab at the Genome Institute of Singapore and an in silico dataset produced by Daniel Marbach at the Ecole Polytechnique Fédérale de Lausanne in Switzerland.
We curated the datasets in a way that is easily interpretable, created a description, imagined what would be the right score for the predictions.
How about the fourth challenge that involves reverse engineering of gene networks? Many scientists are probably interested in those methods. Was that true in DREAM3?
RP: It’s a little surprising to me that people really want to do this as a fun exercise. Without any real recognition, they want to produce these networks. This is the part of the DREAM conference that currently gets the most participation. It’s very popular.
We were surprised by how well the best performer did [a team from Yale University researcher Mark Gerstein’s lab]. It seemed almost too good. It was mysterious to me before I saw their solution.
GS: In terms of network analysis, what we are asking is, ‘Tell us what is the edge between two nodes that you trust the most as a good prediction, then the second and third until you have told us all the possible predictions.’ The list of numbers we received doesn’t contain anything that lets us know what the method is. We evaluate completely blind with respect to the method.
One thing that we are learning as a community is that the methods that seem to be doing best are the ones that are the least dogmatic, the most eclectic ones that take a very data-centric approach rather than a principled approach with respect to a method. Teams can use and build on pre-existing algorithms.
One team did extremely well in all categories of the fourth challenge. That means they are making inferences that are close to the actual network. When they showed us what they did, we saw they didn’t take a dogmatic approach such as an information theory approach, or correlation, or one based on Bayesian networks.
What does that trend mean for the field?
GS: One way of interpreting what we are seeing is that the notion of one-size-fits-all is probably not going to work very well with data that is very complex and comes from a variety of sources, perturbations, and other ways of probing the network. It’s the aggregation of several methods.
This could sound dogmatic in and of itself in the sense that I am married to no method and all methods are valid. But I have observed that if you fall in love with one particular method, you will try to make a square fit in a circle. If you are versatile and flexible, you may not try to make something that is more circle-like. In that sense the data being so rich, we should also have a rich toolset from which to draw to analyze it.
Is it a challenge in and of itself that current networks may or may not be synonymous with what is actually going on in the cell?
GS: The basic tenet of positivism is that you can only talk about what you can measure. … The network is not something you can really measure. It’s not like a circuit in a computer chip where you can see the filaments over which the electrons travel. In biology the network is more like a model that is in our mind. So yes, that is a point that we should only predict what we can measure. Since the network is not something we can measure, we can use it as an abstraction. I agree with that.
At the same time, based on discussions at DREAM2 and … what the community is thinking, we created challenges to predict what is observed. They involve predictions that might not be mechanistic. You might be predicting something by regression or statistical models but the feeling is I haven’t learned as much as if I had a model.
Models could be wrong, because as statistician George Box said, ‘All models are wrong, some models are useful.’ Networks allow us to think mechanistically. … The network is an abstraction, but when I make it become a representation for something that is real, then it is a proxy to that reality. … These conversations are part of DREAM, so the challenges we created this time in DREAM3, two that are network inferences and two that predict what has been measured, are a result of those discussions.
Is DREAM making it more likely to have a tool that will help researchers put together a bunch of their genomic puzzle pieces, for example, figure out which 250 of a set of 600 genes fit in a particular network?
GS: We have to differentiate between causal relations, for example a kinase binding to a substrate and phosphorylating a protein, [and] statistical influences that genes have on other genes, which I used to, tongue-in-cheek, call ‘influenceomics.’ That is the complement of influences of genes on genes.
That is a bit more vague than the causal relationships about a gene that codes for a protein that is a kinase [that] might phosphorylate another protein that is a transcription factor that goes and transcribes gene B. Then gene A influences gene B but through intermediaries transmitting the information and physically producing the effect of the cause.
When we create, from a gene expression panel, links between genes, we are talking about influences. We have to separate what those influences are telling us. There is a statistical correlation between genes but we always know that correlation is not causality and if we keep that in mind, then we might have something that is a good aid for further thinking.
What we are trying to do is be a little more rigorous in the assessment of what our network prediction algorithms actually predict. We want to make sure that if we say there is an algorithm that predicts interactions, it predicts what is at the heart of the data. We are trying to learn how to do these better and better. I think we have some more learning to do, but we are learning little by little what the right questions are to ask in order to be able to produce a methodology that allows people to know whether their algorithms are predictive or not.
So this conversation bundles efforts rather than having teams publish their algorithms one by one or two by two in scientific journals?
GS: Rather than competing with each other, we are learning with each other. That is why we don’t want to call these challenges competitions, it is not so important who wins.
RP: In the world of people who write algorithms, there is a formula to tell the world. First, you write your paper with your algorithms and you have to code up everybody else’s algorithm from the past three or four years and then you have to show that your algorithm is better. There is something very hollow about those publications, because everybody can’t always be better than everybody else. But that is the requirement to publish it. So it is a little contrived; your data with your algorithm in your hands performs better.
There is something more satisfying in having a third party evaluate [your work]. You submit your algorithm’s predictions, and you see, ‘On this particular dataset I do no better than random.’ It’s very fast and useful information to get back as a practitioner in the field.
GS: I think when people observe that their algorithm didn’t perform so well on a given dataset, it will be an eye-opener for them and they will try to improve on it.
There is a saying, it is an unfortunate metaphor, ‘If you torture the data enough, it will confess.’ If I have an algorithm and a dataset, I will make the algorithm work on that dataset, because that is what I am here for. Eventually, I tailor my algorithm so much to the data set that I don’t know how it will perform on an independent dataset that I haven’t tortured as much. This is what we provide, a way for people to test their algorithms on datasets they had not seen before.
In the future, maybe we will have to choose a concrete biological problem, that goes from signaling down to transcription and some feedback and try to throw data in about a particular pathway. So eventually besides learning something about our algorithms we will be learning something about the particular biological problem. … We are thinking along those lines [for future DREAM events].