NEW YORK (GenomeWeb) – Researchers from Brown University and other institutions have published a study in which they used an algorithm called HotNet2 to analyze mutated gene networks in multiple cancers and identify rare somatic mutations in subnetworks of pathways and protein complexes that could be involved in the development of the disease.
According to the paper, published in December in Nature Genetics, the scientists used HotNet2 to analyze data from more than 3,000 samples from 12 cancer types from a TCGA pan-cancer dataset of single nucleotide variants, small insertions and deletions, and copy number aberrations. In total, the researchers wrote, they identified 16 "significantly mutated" subnetworks of genes that encompass "classic cancer signaling pathways; pathways and complexes with more recently characterized roles in cancer; and protein complexes and groups of interacting proteins with less characterized roles in cancer, such as the cohesin and condensin complexes."
In a statement, Ben Raphael, associate professor of computer science and director of the Center for Computational Molecular Biology at Brown, and the paper's senior author, noted that laboratory experiments will ultimately be needed to confirm these findings. "But the hope is that the computational analysis will help prioritize the experiments toward those genes and mutations that are likely to be involved in cancer," Raphael said.
This particular study aimed to better understand the frequency of genetic mutations in cancer as well as how these mutations interact in the diseases. Existing research has shown that "most cancers exhibit extensive mutational heterogeneity, with few significantly mutated genes and many genes mutated in a small number of samples," the paper explains. "This 'long-tail' phenomenon complicates efforts to identify cancer-related genes by statistical tests of mutational recurrence, as rarely mutated cancer genes may be indistinguishable from genes containing only passenger mutations."
According to the paper, HotNet2 uses a "directed heat diffusion model to simultaneously assess the significance of mutations in individual genes and the local topology of interactions among the encoded proteins." It is an updated version of the HotNet algorithm, which was used to analyze cancer networks as part of the TCGA project, has been used in other studies, and has some of the same capabilities of its predecessor while also overcoming its limitations, Raphael told GenomeWeb.
A simple way of looking at how the underlying algorithm works is to think of genes represented as nodes in a network and mutations in these genes as sources of heat, Raphael explained. Mutated genes are "heated up" based on the number of samples in which they are mutated — that represents the initial mutation frequency — and then the heat is allowed to diffuse along the edges of the network to neighbor nodes and over time "hot subnetworks" of relevant genes emerge in the graph.
This initial approach worked well with smaller sample sizes, Raphael said, but ran into challenges when presented with the much larger datasets that were used for the Nature Genetics study and the much broader range of mutational frequencies. Larger datasets would make it possible to identify and explore the effects of rarer cancer mutations along with the more frequently mutated ones like TP53 but when the researchers tried to analyze the data using HotNet, the more frequently mutated genes generated much hotter subnetworks than the rarer mutations did and ended up dominating the signal. HotNet2's solution is to take into account the directionality of the heat flow in identifying subnetworks.
"What it's allowed us to do is even things out more so that we still get a nice subnetwork with TP53 ... but then we have these subnetworks that are all these rarely mutated genes," Raphael said. According to the paper, this approach "reduces the incidence of the artifact of star subnetworks [larger and more dominant networks] by more than 80 [percent]."
Using the updated algorithm, the researchers were able to identify several known cancer-associated pathways including TP53, PI3K, NOTCH and RTK signaling and also highlight "extensive cross-talk between these pathways, overlaps that are often overlooked in analyses that treat pathways as distinct gene lists," the paper states. According to their analyses, 81.9 percent of samples used in the study contained at least one mutation in the TP53, PIK3CA, and NOTCH subnetworks. They also identified new genes within these networks that have documented interactions in the literature but a lower mutational frequency which led to them not being noted as significant by single-gene tests.
They also identified mutations in SWI/SNF chromatin-remodeling complex and the BAP1 complex, both of which have recently been shown to play a role in tumor development. In both cases, the researchers' analyses highlighted new mutations that may not be considered as important on their own but could be significant because of their interactions with other known cancer genes. For example, they found evidence that suggests that mutations in ADNP — which interacts with the SWI/SNF complex — could in rare cases contribute to tumor development.
Other noteworthy findings included several mutated subnetworks with suggestive roles in cancer that have yet to be properly characterized. Two that are highlighted in the paper are the cohesin and condensin complexes, both of which are known to play important roles in mitosis and both of which contain mutations that occur in nearly all cancer types analyzed in the study. "Our HotNet2 pan-cancer analysis suggests that multiple cancer types harbor rare mutations in the cohesin and condensin complexes, supporting a proposed tumor-suppressor role for these complexes," the researchers wrote.
As part of the study, the researchers also compared HotNet2 to its predecessor and to two pathway enrichment tests including Gene Set Enrichment Analysis. They reported that their solution had higher sensitivity and specificity in identifying genes than the other methods.
Raphael hopes that research like this could point the way toward new laboratory investigations of these genes to confirm and better understand the role they may play in cancer. His group, for example, is looking to work with collaborators to further explore the role of the condensin complex in cancer.
"The next step is translating all of this information from cancer sequencing into clinically actionable decisions," he said in a statement. "For example, there are now drugs that are used to treat patients who have mutations in particular genes. However, perhaps patients who don't have a mutation in the targeted gene, but have a mutation in the same pathway, might respond to the same drug. This is the kind of analysis we would like to perform next."
Meanwhile, HotNet2's developers are exploring other applications for their algorithm and have begun using it in to analyze gene expression data, Raphael said. It could also be used to analyze germline mutations such as common variants from genome-wide association studies as well as to analyze rare and de novo variants, he said. Those are potential biological applications but the algorithm itself "is very general in that what it takes in is a network ... it takes in scores on the nodes in that network ... and then it looks for subnetworks that are connected and have high scores," he said.