Skip to main content
Premium Trial:

Request an Annual Quote

International Challenge Shows Feasibility of Inferring Causal Networks From Molecular Data


NEW YORK (GenomeWeb) – A key area of proteomics research, and systems biology more generally, is generating better understandings of molecular signaling networks, such as those based on protein phosphorylation.

One aspect of this work involves the development of informatics tools for modeling and predicting the structure of such networks, and within this area of research one significant open question is whether or not it is, in fact, possible to infer causal, as opposed to correlational, molecular connections in actual biological systems.

Recently, an international consortium of researchers tried to get at this question, collaborating under the umbrella of the HPN-DREAM project, a 2013 effort sponsored by California-based healthcare provider the Heritage Provider Network and provided with additional support by the National Cancer Institute.

In a paper this week in Nature Methods, the researchers presented the results of the efforts, which suggested that it does appear possible to infer causal relationships in molecular networks, a finding that could prove useful for a variety of systems biology applications, said Sach Mukherjee, a researcher at the German Centre for Neurodegenerative Diseases and senior author on the paper.

The fundamental difference between causal and correlational networks is "that the former shed light on what happens under intervention," Mukherjee told GenomeWeb. In the case of the work presented in the Nature Methods study, the researchers looked at tyrosine kinase signaling datasets that included interventions with kinase inhibitor treatment.

In the study, participating researchers submitted models of protein signaling networks developed using data collected from reverse phase protein array phosphoprotein experiments done on four breast cancer cell lines under eight different stimulus conditions. The researchers were tasked with three sub-challenges: to infer causal signaling networks based on protein time-course data; to predict phosphoprotein time-course data under perturbation; and to develop tools for visualizing these datasets.

There was also a second in silico portion of the challenge in which participants were tasked with inferring a network from data generated not experimentally but from a nonlinear differential equation model of signaling.

The study leaders received a total of 178 submissions, several of which were able to model network behavior with statistically significant levels of success.

Mukherjee noted that combining the submissions also proved effective, "echoing a phenomenon — the so-called 'wisdom of crowds' — that has been seen in many settings," he said.

In fact, a combination of the inferred networks from all the submitting teams did slightly better than the best performing individual model and was among the top-five best performers on the in silico task. Additionally, they found that models composed of randomly chosen combinations of as few as 25 percent of the total submissions showed strong performance with both the experimental and in silico data. 

The findings, the authors said, indicate that it might indeed be possible to make inferences about causal connections in complex biological systems, which they noted has been something of an open question. "Causal network inference is profoundly challenging, and many methods for inferring regulatory networks connect correlated, or mutually dependent, nodes that might not have any causal relationship," they wrote.

Take, for example, the case of three proteins, A, B, and C, where A and B are both regulated by C. In such a case, the behavior of A and B would be correlated because both are controlled by C, but there would be no causal relationship such that, for instance, inhibition of A would affect the activity of B.

Models capable of capturing causal, as opposed to merely correlational, relationships can, in theory, provide understandings of how systems will behave under different conditions and in different disease contexts, making such models potentially useful for better understanding the biology underlying diseases and guiding therapy decisions.

"The longer-term visions is that if we can start to build networks that capture causal insights and that can be efficiently learned and empirically tested, then we will be in a position to learn networks spanning many biological contexts, such as cell types or disease states," Mukherjee said.

"Importantly, this kind of approach could really scale," he added, "because it would be based on high-throughput data and computational tools and could lead to a step-change in our ability to look at regulatory variation in health and disease."