By Julia Karow
This article was originally published July 1.
Cancer whole-genome sequencing may require the use of several sequencing platforms and a higher coverage than previously thought in order to catch all somatic mutations with high accuracy, according to a researcher at Baylor College of Medicine.
"Thirty-fold coverage may not be enough to see all mutations with present technologies," said David Wheeler, a professor at Baylor and a member of its Human Genome Sequencing Center, during a talk at the Beyond Sequencing meeting in San Francisco two weeks ago. Instead, he said that he and his colleagues propose to sequence tumor-normal pairs with 60-fold coverage or more on two different sequencing platforms "to get a real handle on false positive and false negative rates that occur in cancer sequencing."
Wheeler and his team came to this conclusion based on a recent Cancer Genome Atlas project for which they sequenced the genome of a glioblastoma tumor and its matched control to 30-fold coverage using the SOLiD system, while scientists at the Washington University Genome Center sequenced the same tumor/normal pair to equal coverage using the Illumina platform.
The researchers found that in both the tumor and the normal genome, the Illumina platform identified about 6 million single-nucleotide variants — using low-stringency criteria — and the SOLiD platform about 4 million SNVs.
About 2.9 million of those variants overlapped between the two datasets for the tumor genome, and about 3 million for the normal genome. These can be regarded as validated sets of SNVs because they were found independently by two different sequencing methods, Wheeler said. However, the sets appear to miss about 10 percent of the roughly 3.3 million SNVs expected to be found in any genome, based on existing whole-genome sequencing studies.
Comparing the validated variants in the tumor and normal genome, the researchers identified about 273,000 somatic mutations that were only present in the tumor, about 100 times more than expected based on previously observed mutation rates in tumors.
By requiring that a site or allele had to be identified with high confidence on either platform, they were able to winnow that number down to 43,000, and after subtracting events that had less than 10-fold coverage in the normal genome, they reduced the number of mutations to about 21,000.
That number is still severalfold higher than the 3,000 to 6,000 somatic mutations expected to be found in a cancer genome, but it might be accurate, reflecting the tumor's heterogeneity, Wheeler said. The estimated mutation rate of 1 to 3 per million base pairs in tumors "assumes a genetically homogeneous tumor," he explained, and "the fact that they are not homogeneous leads to this high rate." In fact, many of the mutations appeared to be present in only a fraction of the tumor cells.
Of the 21,000 validated somatic mutations, 90 are in coding regions and not in dbSNP, including 60 missense mutations, 28 silent mutations, and two nonsense mutations. That number, Wheeler said, ties in with previous sequencing studies of coding regions only in breast and colorectal cancers, which found about 80 mutations per tumor.
Several of these mutations are located in genes that likely play a role in cancer because they have previously been found to be mutated in other cancer types.
The researchers are now planning to study the glioblastoma genome for copy number variants and insertions and deletions, and, Wheeler said, they expect to find mutations in cell cycle and apoptotic pathways, which are known to be mutated at high frequency in glioblastoma.
Overall, he said, the project showed that characterizing cancer genomes comprehensively "is a hard problem," mainly because of the heterogeneity of tumors, which makes it difficult to find mutations that are only present at a low frequency.