NEW YORK (GenomeWeb News) – A pair of University of Washington researchers has delved into the nature and frequency of passenger mutations in previously sequenced cancer genomes.
The researchers focused on nucleotide substitution patterns in four types of cancer — brain, pancreatic, breast, and colorectal cancer — using published sequence data. Their findings, which appeared online last night in the Proceedings of the National Academy of Sciences, are offering a new peek into the similarities and differences between mutation patterns previously detected in germline cells and those found in some cancers.
Cancer sequencing studies have identified numerous cancer-related mutations. But, the researchers argued, in the search for driver mutations in cancer, potentially informative passenger mutation patterns have gone unexplored in some cases.
"[P]assenger mutations are an annoying 'haystack' complicating the search for causal mutations," they wrote. "However, they are also a potentially rich source of information about the specific mechanisms at work in somatic cells and cancer."
Nevertheless, large sets of data provided by cancer sequencing projects offer the opportunity to explore such mutations more fully, they explained.
"Often times the groups generating large datasets are interested in one question — or a subset of questions — that are of interest to them," lead author Alan Rubin, a graduate student in the University of Washington's Department of Genome Sciences researcher Philip Green's lab, told GenomeWeb Daily News. Reanalyzing the data can reveal information beyond these initial research goals, Rubin added.
For the current study, the researchers looked at the number of substitutions — and their frequency — in published sequence data for pancreatic, breast, and colorectal cancer as well as glioblastoma multiforme. The data represent sequence from most protein coding exons, Rubin explained, and all of the sequence data analyzed had been generated using Sanger sequencing.
Based on synonymous and non-synonymous substitution rate data and information about amino acid changes, the team found evidence suggesting most mutations they assessed were not under positive or negative selection. That, in turn, made them more confident that the mutation patterns they were looking at could provide information about underlying mutation processes.
But the biggest surprise came when Rubin and Green looked at the patterns of CpG mutations in the cancers. Past studies of germline mutations have shown that the CpG nucleotides in so-called CpG islands have a lower mutation rate than CpGs outside of these islands.
Because certain cancers have been shown to have a high frequency of CpG mutations, Rubin explained, researchers suspected these mutations were a consequence of aberrant methylation in the CpG islands (most of which are normally unmethylated or have low levels of methylation).
In contrast, though, the new study indicates most CpG mutations fell outside of CpG islands in the cancers tested. While it's still unclear what role methylation plays in this process, Rubin explained, the results suggest that the elevated CpG mutation seen in past studies of cancer isn't caused by CpG island methylation.
More similar to germline cell mutations, though, the researchers found that the cancer sequences tested have asymmetrical nucleotide substitutions, with more A to G nucleotide substitutions than T to C substitutions. Although this asymmetry was only statistically significant in the breast cancer sequence, Rubin said, the researchers also saw a similar trend in the other three cancer types.
Finally, the team explored "dinucleotide hotspot" patterns in the cancer sequencing, showing that these hotspots can not only provide mutational insights but may also uncover artifacts in large datasets.
"[W]e show that the fraction of mutations occurring at dinucleotide hotspots can be a useful metric for identifying technical artifacts in cancer sequencing studies," the researchers wrote, "by detecting an inconsistency between the discovery and validation screens in one study that is likely due to error-prone sample amplification."