NEW YORK (GenomeWeb News) – Large-scale studies have unearthed a number of cancer-associated genes, but a portion of those genes have tenuous biological links to the disease and appear unlikely to have true correlations with cancer. As a team of researchers led by the Broad Institute's Gad Getz reported in Nature yesterday, mutational heterogeneity across the genome appears to account for the inclusion of such artifacts on large gene lists.
They further developed a tool, called MutSigCV, to account for such mutational heterogeneity and weed out suspect genes from the list.
"With the ability to eliminate many obviously suspicious genes, it is now feasible to start analyzing large cancer collections, including combined data sets across many cancer types," the researchers wrote.
Lists of putative cancer genes, Getz and his colleagues said, tend to include a number of false positive genes. For example, they examined whole-exome data from 178 tumor-normal pairs from people with squamous cell lung cancer, finding 450 genes that were mutated at a significant rate. In addition to including genes previously linked to cancer, their list contained genes encoding olfactory receptors as well as ones encoding long proteins or containing a number of introns.
Commonly used analytical approaches to generate those lists, the researchers noted, are based on the average overall mutation rate and frequencies of certain other mutation types — like indels or transversions or transitions — in the specific cancer type.
"We proposed that the problem might be due to heterogeneity in the mutational processes in cancer," Getz and his colleague wrote. "Whereas it is obvious that assuming an average mutation frequency that is too low will lead to spuriously significant findings, it is less well appreciated that using the correct average rate but failing to account for heterogeneity in the mutational process can also lead to incorrect results."
They calculated two scenarios in which the average mutation rate was the same — though one had a constant gene mutation frequency and the other had a variable mutation frequency. In the second scenario, they found that if the genome is assumed to have a constant mutation rate, a number of genes would be erroneously linked to disease. Further, they found that the problem actually grew as the sample size increased.
Further, in a set of about 3,000 tumor-normal pairs consisting of 27 tumor types, the researchers examined mutation rate heterogeneity and found that such rates varied wildly. For example, they reported that mutation rates in melanoma and lung cancer ranged from 0.1 per megabase to 100 per megabase and, in acute myeloid leukemia, the patient-specific mutation rate also ranged from 0.1 per megabase to 100 per megabase.
Further analysis indicated that lung cancers were prone to cytosine to adenonine mutations, which the researchers noted was consistent to exposure to polycyclic aromatic hydrocarbons found in tobacco smoke. In addition, a cluster of cancer types, including cervical, bladder, and a portion of head and neck cancers, exhibited a number of mutations in cytosine sitting with a thymine on its 5' side, a pattern often caused by certain deaminases that are linked to viruses.
Regional heterogeneity, though, affected the mutational process the most, the researchers said. Further, two factors — gene expression levels and timing of replication — appeared to explain much of the heterogeneity. For instance, olfactory genes, which have been included on a number of cancer-related gene lists, are expressed at low levels, are late in replication timing, and have a high regional mutation rate.
Based on these findings, Getz and his colleagues developed a method, which they dubbed MutSigCV, to correct for variation by incorporating patient-specific mutation frequency and spectrum as well as gene-specific background mutation rates that include expression level and replication timing data. By applying this method to the original set of lung cancer samples, their list of significantly mutated genes dropped from 450 to 11 — and most of those had previously been reported to be mutated in lung cancer and a handful had been implicated in other cancers. One, HLA-A, was novel, and indicates a possible role for immune genes in helping cancer cells evade the immune system, the researchers said.
"By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artifactual findings and enable the identification of genes truly associated with cancer," they wrote.
Getz and his colleagues noted that while their tool solves the immediate problem, "the ultimate solution will probably involve using empirically observed local mutation rates obtained from massive amounts of whole-genome sequencing."