NEW YORK (GenomeWeb Daily News) – By examining mutations contained within a set of nearly 5,000 tumor samples encompassing 21 tumor types, Broad Institute researchers were able to find nearly all known cancer genes as well as 33 genes not previously linked to cancer, they reported in Nature yesterday.
Gad Getz, an associate member at the Broad, and his colleagues sifted through the exome sequences of 4,742 tumor samples and their matched normal pairs and, after filtering the data, they found more than 3 million SNVs, among other mutations. Those mutations touched on just about all known cancer genes as well as uncovered some 33 additional genes associated with cancer.
Getz and his colleagues also calculated that larger sample sizes would catch additional unknown genes and that sample numbers stretching between 600 samples and 3,000 samples may be needed, depending on the tumor type, to complete the catalog of cancer genes.
"Precision medicine for cancer will ultimately require a comprehensive catalog of cancer genes to enable physicians to select the best combination therapy for each patient based on the cellular pathways disrupted in their tumor and the specific nature of the disruptions," Getz and his colleagues wrote. "Such a catalog will also guide therapeutic development by identifying druggable targets."
Getz and his colleagues gathered nearly 5,000 tumor and matched normal samples, which spanned 21 tumor types including acute myeloid leukemia, breast cancer, and lung adenocarcinoma, from The Cancer Genome Atlas and elsewhere. After running the data through a filtering pipeline, the researchers uncovered 3 million SNVs, 77,270 small insertions and deletions, and 29,837 somatic dinucleotide, trinucleotide, or oligonucleotide variations, coming out to an average of 672 per tumor-normal pair.
Using the MutSig tool, which weighs mutational burden as compared to the background mutation rate, mutational clustering, and enrichment of mutations in conserved regions, the researchers searched for candidate cancer genes. They combined the significance levels generated from each test for each gene to produce an overall significance score. They analyzed each of the tumor types separately and as part of a combined cohort.
Some 334 tumor-normal pairs appeared to have significant mutations involving 224 different genes, though the genes involved varied by tumor type. Of those, 22 genes — including well-known cancer genes such as TP53, PTEN, and KRAS — were significant in four or more tumor types, and 10 more were significant in three tumor types.
By combining the 22 MutSig lists, the researchers developed what they've dubbed the Cancer5000 set of 254 genes, and they also made a list, this one called Cancer5000-S, of 219 genes that were identified using more stringent parameters. Of the 403 significant pairs on that list, about 40 are expected to be false positives.
To determine whether the genes on these lists represent all cancer genes, the researchers turned to the Cancer Gene Census as a reference set. That set contains 82 somatic point mutations associated with one or more of the tumor types Getz and his colleagues studied. Of those 82 genes, 60 were among those in the Cancer 5000 set, and eight additional ones fell just below significance.
Some 81 genes in the Cancer5000-S set were neither part of the CGC nor discussed in the literature, Getz and his colleagues reported. At least 21 of those 81 genes, they added, have plausible biological connections to cancer. Among the Cancer5000 set, an addition 12 genes have strong connections to cancer-related biological processes, the researchers said.
For instance, ARHGAP35 encodes a Rho-GTPase-activating protein and is found in a region that is deleted in a number of tumors, while other genes encode pro-apoptotic factors or are associated with chromatin regulation or cell proliferation, among others.
The researchers noted that their set of 81 genes likely contains additional cancer genes that they couldn't currently link to cancer due to gaps in knowledge. And there be many more cancer-linked genes waiting to be uncovered, Getz and his colleagues said.
Through a down-sampling technique, the researchers examined how the discovery of cancer genes increases with sample size by repeating their analyses on random subsets of their sample. From this, they determined that the total number of genes identified increased just about linearly as sample size increased as well as with increasing numbers of tumor types studied.
Additionally through a restricted hypothesis testing approach, Getz and his colleagues determined that a median six additional genes in the Cancer5000 set are likely involved in each tumor type.
"However, the data also clearly show that many new candidate cancer genes remain to be discovered beyond those in the current Cancer5000 set," the researchers added.
To approach saturation, the researchers calculated that between 650 samples and 5,300 samples, depending on the frequency of mutations in the tumor type, would be necessary to develop a fairly comprehensive catalog of cancer genes.
"[W]e are far from having a complete catalog of cancer genes, with many genes at clinically important frequencies within individual tumor types and across cancer as a whole still awaiting identification," the researchers said. "The number of such genes is still increasing steeply with the number of samples and the number of tumor types studied."
Such a catalog, they added, would help guide cancer treatment, spur the development of new therapies, and enable a better understanding of the mechanisms at play in cancer.
"Given the devastating toll of cancer, with nearly 8 million deaths annually worldwide, completing the genomic analysis of this disease should be a biomedical imperative," Getz and his colleagues said.