Skip to main content
Premium Trial:

Request an Annual Quote

Computational Approach Extracts Complex Disease Insights from Incomplete Protein Interaction Network


NEW YORK (GenomeWeb) – A team based in the US and Hungary has developed a computational strategy for drawing out relationships between different disease conditions based on patterns within a partial protein-protein interactome.

"What we tried to do in this study is provide a more general framework," first author Jörg Menche, a researcher affiliated with Northeastern University, the Dana-Farber Cancer Institute, and Central European University, told GenomeWeb. "We hope that people will now use it to zoom into specific neighborhoods and look at [and interrogate] the molecular interactions."

As they reported in Science this week, Menche and his colleagues combined physical interaction data for a multitude of human proteins with disease-gene profiles drawn from genome-wide association studies and other analyses.

After defining the minimum number of risk genes needed to reliably see disease-related protein interaction modules, the researchers used the distance between these modules to learn more about relationships between various human diseases and the pathological and biological processes behind them.

Based on the similarities found for disease modules that neighbored one another in the resulting network, the team argued that even an incomplete interactome may prove useful for parsing GWAS data, unraveling the roots of some difficult-to-diagnose conditions, and finding new treatment avenues for existing drugs.

Menche said the disease modules might better define where a drug for one disease acts in the network, for example, potentially pointing to other nearby conditions that share the same target.

Although genetic variants identified through GWAS studies and other analyses can provide clues to a given human trait or disease, Menche and his co-authors argued that such variants are often tricky to interpret outside of the molecular interaction network that encompasses them.

Still, they noted, attempts to view disease-relevant variants within the interactome may be stymied by still-missing interaction information, coupled with shortcomings in the computational approaches used to try to group potential disease players together into informative sub-networks.

"An important issue that came up when we started looking at the interactome was that we realized how incredibly incomplete our current maps are," Menche said, adding that "the whole venture of using the interactome as map lacked a certain framework or mathematical formulation."

With that in mind, the team set out to find ways of evaluating disease modules at the protein level, reasoning that a well-defined protein interaction framework might pick up features missed by focusing on alterations at the gene level.

They considered genetic associations for almost 300 human diseases in conjunction with existing interaction data for 13,460 human proteins — ranging from protein-protein interactions and pathway profiles to interactions between regulatory proteins or enzymes and their targets.

Because there are still large gaps in both the protein interaction and disease association sides of the puzzle, they explained, network clusters present in the interactome often contained just a subset of the proteins encoded by genes that had been implicated in any given condition.

By applying a mathematical framework to the data, though, the team was able to define a cutoff for authentic protein interaction modules, Menche explained. "The first aim in this paper was to derive a mathematical theory in order to quantify if we can actually use — and to what resolution we can use — this interactome network to identify disease models."

The method, called percolation theory, has been well established and widely applied in statistical physics, he said. "We needed to adapt it a little bit and tweak it here and there, but the basic tool was really already there."

Based on 141,296 documented physical protein interactions and more than 2,400 disease-associated genes, the team found that disease modules could be discernable for diseases that had been linked to alterations in at least 25 different genes.

As such, the current interactome approach is limited to complex human diseases, Menche said, noting that "monogenic diseases would need a different kind of framework."

When the team looked at interaction modules for diseases with known biological similarities — say, rheumatoid arthritis and multiple sclerosis — it found that the modules were closer than usual in the wider interactome.

The modules for various diseases fell more closely to one another when the conditions shared gene ontology assignments (indicative of their biological function), researchers reported. Diseases with modules near one another were also more likely to produce comparable symptoms, co-morbidities, and/or gene expression consequences in various tissues, the researchers reported.

"Indeed, disease models that overlap in a network sense represent diseases that are more similar in many aspects, starting from similarities of the genes that are involved that are part of similar functions or pathways, all the way up to clinical features of the disease such as similar symptoms or comorbidities," Menche said.

By extrapolating from that, the group demonstrated that it's possible to tap into 3D distances between disease module pairs to detect shared molecular underpinnings for different conditions.

The investigators identified more than 700 such instances of disease module overlap — sometimes involving conditions that would not be classified together based on their clinical features and/or genetic risk profiles alone.

For example, they saw network ties between the liver condition hepatic cirrhosis and asthma, as well as overlap between asthma and celiac disease modules. The gout module turned up near the module for glioma, which in turn had modular overlap with myocardial infarction, and so on.

The disease-related insights that may be gleaned from the interactome are expected to increase dramatically with the addition of new genetic associations and protein-protein interaction profiles.

The team made its dataset of module distances between disease pairs available to other researchers as part of the paper's supplementary material in the hopes that it will serve as a resource for future medical and/or systems biology studies.

The supplementary material also describes a strategy for using the network as a source of information to sift through nominal associations in genes that may be relevant to particular diseases based on their position in the network.

"In a sense, we hope to give input to interested medical communities so they can go through these long lists and find the particular pairs that are most surprising and interesting to them," Menche said. 

Ideally, he and his colleagues would like to present the interactome data in an interactive and easy-to-view manner, though that remains challenging. At the moment, they are looking at the feasibility of folding as many data types as possible — perhaps including DNA methylation patterns and/or other epigenetic data — into networks to study individual diseases.

The group is also interested in applying its disease module strategy to tissue-specific interactome maps, Menche said.