NEW YORK (GenomeWeb) – Using a new statistical analysis approach, researchers have uncovered thousands of rare somatic variants associated with cancer.
Rather than using a gene-centric approach to tease out driver mutations from within cancer sequencing data, researchers from the University of Maryland used a model that examines protein domain families. In that way, they identified protein domains — which they called "oncodomains" — that were shared across different proteins that are frequently mutated within cancer samples.
"Maybe only two patients have a mutation in a particular protein," senior study author Maricel Kann from Maryland said in a statement. "But when you realize it is in exactly the same position within the domain as mutations in other proteins in cancer patients, you realize it's important to investigate those two mutations."
As they reported in PLOS Computational Biology today, the researchers used this approach to identify rare somatic variants in more than 5,000 genes, many of which represented novel associations to cancer.
By focusing on conserved protein domains — the structural and functional units of proteins — Kann and her colleagues argued that any variants found within them could be compared across different proteins as well as suggest possible functional links to cancer.
With their statistical approach, the researchers identified protein domain families that they deemed oncodomains based on the number of somatic variants in one or more genes that contain the same domain. Oncodomain hotspots, then, are positions within oncodomain sites where somatic variants linked to cancer occur more often than would be expected by chance.
Using both the Conserved Domain Database (CDD) and the Pfam database, the researchers used their approach to identify more than 850 protein domain families as oncodomains. They noted that the number of oncodomains and oncodomain hotspots varied by cancer type — for example, they found one hotspot in kidney chromophobe and seven in head and neck squamous cell carcinoma, but more than 1,700 in skin cutaneous melanoma.
The researchers also noted that some hotspot patterns were shared across cancer types, while others were specific to certain cancers.
In comparing their approach to other methods, Kann and her colleagues found that it could uncover more protein domains, genes, and somatic variants — including rare variants — than the others. In particular, the researchers reported that their method could recapitulate more than half of Pfam domain models as well as identify nearly 600 novel Pfam models.
Meanwhile, at the gene-level, the oncodomain hotspot approach could identify 56 percent of genes with variants significant in CHASM and about a third of genes found in region-based methods. More than 4,500 genes were unique to oncodomain hotspots, about third of which showed evidence of being involved in cancer.
For instance, oncodomain hotspots genes are enriched for GO terms related to cancer like signal transduction, cell adhesion, and metabolism.
As protein domains are shared across different proteins, the researchers said that comparing genes within the same domain family could help gauge whether rare variants might be functionally relevant.
Additionally, the researchers noted that oncodomains could inform drug development efforts. "Because the domains are the same across so many proteins," Kann said, "it is possible that a single treatment could tackle cancers caused by a broad spectrum of mutated proteins."