The computational systems biology community recently held a conference called DREAM, or the Dialogue for Reverse Engineering Assessments and Methods, which was an attempt to evaluate the performance of algorithms for computationally inferring biological networks based on experimental data.
The inaugural conference, modeled after the protein structure prediction community’s longstanding Critical Assessment of Structure Prediction meetings, was held at the New York Academy of Sciences in December and evaluated the work of 36 research groups in five different categories [BioInform 12-07-07].
This week, BioInform spoke to Reinhard Laubenbacher, a professor at the Virginia Bioinformatics Institute and a participant in the DREAM conference. Laubenbacher co-authored two papers in the recently published conference proceedings regarding some of the challenges that the field must still overcome.
In one paper, “Comparison of Reverse-Engineering Methods Using an in Silico Network,” Laubenbacher and his colleagues discuss the use of a 10-gene “artificial network” that they used to compare four different reverse-engineering methods: Regulatory Strengths Analysis, Reverse Engineering by Multiple Regression, Partial Correlations, and Dynamic Bayesian Networks.
Historically, the performance of computational biology methods “is demonstrated using available experimental or simulated data,” the authors write in the paper. But in the field of reverse-engineering biological networks, “no systematic comparison of all available methods has been done, in part because such a comparison faces several challenges.”
While it would be “desirable to use an in vivo or at least in vitro network to generate the data to be used,” there are two primary obstacles to this, Laubenbacher and colleagues write: “the difficulty of performing all the needed experiments on a realistic-size network to fulfill the differing requirements for the various methods, and the lack of detailed knowledge of the network to be reconstructed.”
They note that even if a simulated network is used for this purpose, “then it is important to incorporate several realistic features, such as size and presence of noise, different molecular species, or different time scales.”
Laubenbacher discussed with BioInform his group’s experience with using the artificial network as a means of comparing different methods. The following is an edited version of the conversation.
Your paper mentions that there have been no comparisons of reverse engineering methods to date because there are so many challenges associated with finding suitable benchmarks. Can you walk me through some of those challenges?
The field itself is very exciting. It’s the central problem in systems biology. It’s the kind of problem that, in other fields, has been studied for a long time in a much more controlled environment. So in engineering, for example, system identification is a well-developed field, and there people try to infer models of engineered networks. Oftentimes the model structure is known, and you just want to estimate parameters. Other times, you want to infer the model structure as well.
But in biology, that hasn’t been done very well because the whole field of systems biology is much too young for that. And what you have now is people from all kinds of different fields that come to this problem of network inference or reverse engineering, and they bring methods from different fields and viewpoints from different fields.
I’m a mathematician and I come to this with the point of view of a mathematician, but you have people from engineering, from statistics, from computer science, and they all have different points of view and they generate different methods. And I think that makes for an extremely fruitful field.
The reason it’s such a big challenge is that the data sets that are available are … relatively small, the underlying biological networks are not very well known, and the data have not been collected with a view toward reverse engineering. They’re typically collected the way experimental biologists would do experiments. And I think that what needs to happen for the field to move forward is a close collaboration between experimentalists and modelers to decide what are good ways to generate data sets that have the right kind of information in them that makes reverse engineering successful. And also, how those data should be collected and in what quantities they could be collected.
So the biggest limiting factor that I see at the moment is that there aren’t many data sets like that available, so that’s one reason why in silico networks are very important, because you can generate all kinds of data from them; you know, at least to some extent, what the network is that you want to reconstruct, so you have some sense of how well these methods might perform.
But simulated data in many respects are different from real data. In some respects they are easier to deal with than real data and in other respects they are more difficult to deal with than real data.
Can you elaborate on that? In what ways are they easier and in what ways are they more difficult?
One of the problems with real data is of course that they tend to be much noisier than simulated data. So in a simulated network, I can dial in different levels of noise, so I can see how a method degrades as additional noise is introduced. In our experience also, methods perform better on real data than on experimental data, to the extent that can be judged, and in that sense real data are easier. But it’s just very difficult with real data to draw inferences that one can be confident in.
The thing to be kept in mind is, for example, in the paper we did, we compared four different methods. But of course, what we compared was not really methods. What we compared was software packages. And depending on how a particular parameter is chosen or how a particular software package is implemented, it might perform much worse than if, say, somebody who is familiar than a package and knows its quirks. That person might get much better results.
So none of the reverse-engineering software packages that are available are really at a level of robustness, and ease of use as, for example, a number of genomics packages like if you go and do a Blast search. That’s a pretty standard thing.
Reverse engineering software is much more challenging to use correctly. So any of these comparisons one needs to take with a grain of salt. They’re useful to the extent that they give you some sense of where a method might perform better than another one, but I think it’s probably very premature to use it to really discriminate between different methods.
It seemed that in addition to being a comparison of the methods themselves, the study was an attempt to see how well the artificial network worked as a benchmark.
That was also a very interesting exercise. We had first started as part of a project that has been going on for several years, which was related to yeast systems biology. What we wanted to do there was make a synthetic network that resembled to some extent the actual network that we were studying in yeast. So my colleague Pedro Mendes [of the VBI] made a network that was quite a bit more complicated than the one that we used in this paper. It had genes, proteins, metabolites, and a number of realistic features.
And what we discovered with the [reverse-engineering] methods that we have developed, as well as with other methods that we applied, is that they all failed pretty badly.
This network was maybe on the order of five times as large as the network we used in this paper, but it was still way too complicated, so that’s why we made the smaller network that we used in the paper. We’ve studied this [network] quite a bit with the methods that we have developed, [but] we didn’t include our methods in the comparison because we didn’t think it was fair in the sense that we knew our method really well and we knew how to use it, so we would have an unfair advantage.
But we explored the network quite a bit with our software, and it was interesting. We would generate some data from it, and then we would apply our methods to it, and we would miss key features. [For example,] there is one node in there that’s a key node in the network, and we would completely miss it. So we were wondering, ‘Well, why is that?’ It turned out, looking carefully at the network and at the differential equations that underlie the network, that we missed the fact that there was a difference in time scales in there. So we had collected data in a way that caused us to completely miss the time scale in which this very important gene was working.
So just this question of how do you generate a data set for a given network that really captures key features is a difficult one, even for a small network that you make yourself.
I would say that this network is a very good network for testing reverse engineering methods, and it has some realistic features and it has of course many unrealistic features, but I think it’s a good first step.
Were there any surprises in the comparison? Did the methods overall perform any better or worse than expected, or were there any standouts either way?
The one conclusion that we did draw from this is that different methods require different types of data. Two of the methods required data that are obtained by perturbing each of the genes. So you sort of jiggle the whole network in some places to get some sense of how it might be put together. Those methods tended to perform better than the other two that required different types of data. So I think perturbation time courses were an important feature.
Based on what you did see, would you have any recommendations or advice for people developing these methods?
As I said, I think one needs to be careful in drawing conclusions from comparing different software packages. I think what the whole DREAM effort shows — and I think it’s a very, very useful effort and the organizers of it are really doing the community a great service — is that it provides a set of benchmarks that people can use for their methods, and this competition I think adds to the impetus that the effort provides.
But I think one needs to be careful in making any kind of evaluation.
In terms of lessons for developers of such methods, I think [the lesson is] that even very small networks pose big challenges.
Probably the most important lesson for me is that it’s important to have a whole bunch of different methods from different fields that complement each other. Some methods have weaknesses in one place and strengths in another place, and I think there will never be a day when there is one method that works the best and [no] others are needed. I think that day will never come. Different systems and different types of data will require different methods. So it needs to be very much a cooperative effort.
How about conclusions in terms of the network that you used as a benchmark? Are there specific improvements that you’d like to see in that?
I think probably the most important thing that’s needed in this whole enterprise is some measure that allows you to decide how much information in some sense your data set has that you generated from a given network. Only then can you say how well a given method works, because if you have a data set that knows only 70 percent of the information that you need to reconstruct the network, then you can have a method that is 100 percent efficient, and it will squeeze that 70 percent of information out of your data, but the output is a network that is only 70 percent correct.
At this point, it’s a problem that has not been studied very much. How do we measure how good a given data set reflects the features of the network? And that goes back to what I said earlier. Even for our little network that we used in this paper, it’s not so straightforward [knowing] how to collect the right type of data and the right amount of data.
We have other experiments where for a given network, if I’m very clever in choosing my data, I get along with a very small amount of data, and if I’m not very clever, then I can generate large amounts of data and I still miss key features in the network.
So that I see as one important direction of research, is to get some estimate of data quality with respect to reverse engineering.
What are you working on now?
The [DREAM] competition that took place a little while ago was the first one, and it was a very interesting one, I think, and people learned a lot from it but I would expect the competition to continue. So we are actively continuing to develop the methods that we’re working on.
We look at this from the point of view of a dynamical system. What we’d like to reconstruct from data is really a description of the dynamical system rather than a description of just the wiring diagram, the graph theoretic depiction of a network, because if I have a network like what you see in the paper, this graph, that tells you some things about the network, but it doesn’t tell you a lot of other things, [then] it doesn’t tell you what sort of dynamics this network supports. And I think for many biological questions that’s an important piece of information.
So the methods that we’ve been developing are geared toward actually returning a mathematical description of a dynamical system, and one part of that description then allows you do make this wiring diagram, but the description provides much more information.