Skip to main content
Premium Trial:

Request an Annual Quote

Consortium Reports Results of Challenge to Establish Benchmark for Tumor Mutation Calling


NEW YORK (GenomeWeb) – Next-generation sequencing technologies and bioinformatics pipelines keep improving, yet calling somatic mutations from tumor samples still remains a challenge, as evidenced by recent data from The Cancer Genome Atlas showing that four major genome centers only agreed on 31 percent of SNV calls in lung cancer samples.

To help advance the development of gold standards for somatic mutation calling, an international group including Sage Bionetworks, leading academic institutions from the International Cancer Genome Consortium and TCGA, Annai Systems, and IBM's DREAM project in 2013 launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of algorithms for somatic mutation calling from whole-genome sequencing data. This week, the group reported in a Nature Methods article the initial results of the challenge from 248 analyses of three in silico tumors.

The group found that a combination of algorithms performs better than any one single algorithm. In addition, they found that many algorithms, even the best-performing ones, had a characteristic false positive pattern. The trinucleotide pattern was similar to one that is actually found in human tumors, and the group theorized that it was the result of deamination artifacts during library prep.

The data "gives us a benchmark that any new method can be compared to," Paul Boutros, a senior corresponding author of the study and principal investigator at the Ontario Institute of Cancer Research, told GenomeWeb.

Before launching the challenge, a key component of the study was the creation of a tool to create synthetically mutated genomes where the true sequence was known. To do this, the researchers developed BAMSurgeon. The software creates simulations of cancer genomes by directly adding synthetic mutations to existing reads that are stored in BAM format. The tool creates realistic tumors by taking a high-coverage BAM file, partitioning it into "normal" and "tumor," and spiking in the desired mutations to create a synthetic cancer genome. A VCF file contains the true positives for comparisons.

BAMSurgeon "seems to create sufficiently challenging datasets that are interesting, in that they are realistic," Adam Ewing, a corresponding author and a computational genomics fellow at the University of Queensland's Mater Research Institute, told GenomeWeb.

Boutros added that the BAMSurgeon tool should be useful beyond this challenge. "It's a really good way of simulating tumors," he said, which should allow for future development of bioinformatics tools.

The group used BAMSurgeon to create three different types of synthetic tumors from a sequenced cell line. The first tumor was relatively simple with 3,537 SNVs, 100 percent cellularity, and no subclones. The second tumor was a bit more complex, containing 4,332 SNVs, 80 percent cellularity, and no subclones. The third tumor had 7,903 SNVs, 100 percent cellularity, but three subclones at 50 percent, 33 percent, and 20 percent variant allele frequency.

For all three, both tumor and normal samples were sequenced on an Illumina HiSeq 2000 to about 30x coverage using 2x101 bp reads.

Over 157 days, 21 teams submitted 248 entries. The entries were evaluated for the faction of spiked-in mutations they were able to detect and the fraction of the SNVs they called that were true, and were given an F-score representing the mean of their recall and precision. Overall, there was a trade-off between precision and recall.

Performance varied substantially. Even for the simplest tumor, recall varied from .559 to .994, precision from .101 to .997, and F-score from .046 to .975. However, the group found that when they looked at consensus SNV predictions, performance improved and was more in line with the top performing teams. The consensus F-score ranged between .955 and .984 while recall and precision ranged from .939 to .971 and .968 to .999, respectively.

Interestingly, the results showed how making minor adjustments to the mutation callers can alter performance. "Mutation callers are complicated things with buttons and knobs that you can tune," Ewing said. The study showed how "twiddling the parameters" of those algorithms can drastically change their performance, he added.

Some teams submitted multiple results using the same set of bioinformatics tools, but with slightly different settings, Ewing said, and they performed differently. For instance, about 25 percent of the variance seen in the simplest tumor occurred within teams. Typically, teams' initial submission would be biased toward having a higher recall at the cost of a lower precision. Tuning the algorithm's parameters resulted in boosting the precision.

One of the most surprising findings, Boutros and Ewing agreed, was that false positives were not random, but were associated with a specific pattern.

"That was the biggest surprise," Boutros said, "Not just that they weren't random, but how complex the non-randomness was." Variables associated with errors included sequence context, genomic location, and even coverage. For instance, Ewing said, some false positives fell within a specific sequence context — the bases immediately preceding or following the false positive call were not random for certain algorithms. For instance, NCG-to-NTG errors were the most common, which the authors said could reflect "spontaneous deamination of 5-methylcytosine at NCG trinucleotides." 

The key variables associated with false positive rates included allele counts and base and mapping qualities. However, for some algorithms, these variables were associated with an increased error rate while for others they were associated with a reduced error rate. For false negatives, mapping quality and normal coverage played the biggest role.

One key takeaway from the challenge in terms of what makes a good bioinformatics pipeline, Boutros said, is that mutation callers should have plenty of knobs that can be tuned for specific tumors. "Groups that put in a lot of time and effort to optimize their algorithm did a lot better," he said. "Defaults don't work for everything."

Boutros added that the next step is to expand the study from simulated tumors to real tumor samples, as well as to replicate it for calling structural variants. The group will also continue to improve the BAMSurgeon tumor genome simulator, Ewing said.

Further down the road, Boutros said that the consortium would launch new challenges to evaluate tumor heterogeneity and subclonality, and is also designing an RNA-seq component.