Skip to main content
Premium Trial:

Request an Annual Quote

Pan-Cancer Analysis Identifies Mutational Hotspots at the Protein Level


NEW YORK (GenomeWeb) – Researchers from the Swiss Federal Institute of Technology Zurich and the German Cancer Research Center have identified 180 amino acid residues within 160 human proteins that appear to be hotspots for cancer-linked mutations.

Detailed in a paper published last month in Molecular Systems Biology, the findings could help researchers better distinguish important driver mutations from less significant genetic alterations and demonstrate the potential of a protein-centered approach to analyzing mutation data, said Ruedi Aebersold, a professor at ETH Zurich and an author on the study.

As technologies like next-generation sequencing increase the amount of genetic data researchers are able to generate, a major challenge has become distinguishing between mutations of greater and lesser biological significance. Proteogenomics, which combines the analysis of protein-level and gene-level information, is an emerging area of research aimed, in part, at addressing this question. The basic idea underpinning the approach is that significant genomic changes will likely result in detectable changes at the protein level, and that by looking for evidence of change at the protein level, researchers can assess what genetic mutations are likely worth investigating further.

In many cases, researchers have looked for alterations in protein expression stemming from genetic mutations. In the recent MSB paper, Aebersold and his colleagues took a different approach, using a software tool developed by Marija Buljan, first author on the paper and a professor at ETH Zurich, that uses cancer sequencing data to identify protein residues that are mutational hotspots, meaning that they acquire point mutations at significantly higher rates than the rest of the sequence surrounding them.

They used the tool to analyze sequencing data generated by the National Cancer Institute's Cancer Genome Atlas project and the International Cancer Genome Consortium comprising 1.3 million mutations identified across 10,000 tumor samples in 40 different cancer types from 22 different tissues. Their analysis identified 180 hotspot amino acid residues in 160 proteins.

Notably, Aebersold said, 66 percent of the proteins identified (106 of 160) are not coded by known cancer driver genes, which he suggested indicates the usefulness of looking at the question from the protein level.

For instance, Aebersold noted, a gene might have one particular residue that is highly relevant to cancer but a low level of mutations overall. "And there you can imagine… that the overall hits for that gene would be low and it might not show up [as a hotspot]."

"Whereas, if you normalize the number of amino acids, a particular [amino acid] residue might be disproportionately hit [by mutations], and there are cases like that in the [MSB data]," he said. "I think we take the view that actually the effector of the phenotype is the protein. So, in our kind of way of looking at things it matters a lot – not just is the protein mutated, but how is the protein mutated, and does this mutation have a likely determinable, definitive biochemical function? And I think within the genomics community this is not as widely used a point of view."

To that end, Aebersold and his colleagues explored the types of proteins overrepresented among the 160 with identified hotspots as well as the locations of these hotspots. As a class, enzymes were overrepresented among hotspot-containing proteins, as were proteins containing bromodomains, which are often involved in epigenetic regulation, and KH domains, which are involved in RNA binding. The set was also enriched for proteins involved in inactivating tumor suppressors.

These findings were not especially surprising in and of themselves, Aebersold noted, but, he said, "more important for us is that they provide us with guidance for [future] experiments."

"We do a lot of experiments trying to see whether protein complexes are disrupted [under different conditions], and I think this [indicates] that we can maybe make predictions about what really happens at the protein-level based on these hotspots," he said.

For instance, Aebersold said, identification of hotspots in the portion of a protein known to be an interface for interactions with other proteins could suggest that mutations at these positions disrupt a protein complex in a specific way.

"We could look to see if a particular complex is perturbed as predicted [by the hotspot data], and if we could show experimentally that it is, then that could be a nice cancer biomarker," he said. "And then, of course, if [the hotspot] is at the interaction interface, it will involve other proteins, its binding partners, and so that is a very interesting angle to come at from the experimental side."

Such an approach could potentially be useful in analyzing in experiments like large-scale genome-wide association studies, as well, Aebersold said.

"You have these large GWAS studies where hundreds of thousands of people have been genotyped and then the genome variation is related to some complex disease phenotype, diabetes, for instance," he said. But this has not been very high yield in the sense that the more people you add to these studies the more little peaks you find.

These peaks "clearly have a statistical role to play, but each one is minor and the more people you add the higher the number of these little peaks that come out," he added. "But you don't know from that alone how they are interacting, whether they work cumulatively, whether they somehow enhance each other. It could be that several of these minor peaks congregate at an [interaction] interface, or they might congregate in a particular metabolic pathway."

"I think that looking at proteins and mutational hotspots in proteins that have functional significances that can be biochemically explained is a promising way to go, and pretty much the only way to go if one wants to make sense of this massive amount of genomic data, because it is otherwise very hard to explain all these minor risk factors," he said.

Aebersold said his lab is now focused on using the software developed by Buljan to guide its protein complex work.

"We have been working very intensely at getting reliable methods to see between samples, say tumor and healthy tissue from the same individual, whether certain protein modules are reorganized," he said. "We now have such a method. We are just writing it up [for publication]. And we now want to use this [hotspot] tool to guide our focus to particular complexes."