NEW YORK (GenomeWeb) – A new report this week has added to a growing recognition of the challenges faced in calling somatic mutations in cancer genome sequences, despite continued advances in next-gen sequencing technologies and informatics.
In a study appearing yesterday in Nature Communications, a team from the International Cancer Genome Consortium published results both identifying weaknesses and laying the groundwork for more accurate characterization of somatic mutations in the ICGC's ongoing cancer genome sequencing efforts.
The challenges highlighted are not unknown to the field. For example in another study in Nature Methods earlier this year, authors reported on intial reusults using three in silico tumor sequences as part of an international benchmarking project called the ICGC-TCGA DREAM Somatic Mutation Calling Challenge.
The new ICGC study, spearheaded by researchers from two ICGC member groups — the German Cancer Research Center and Spain's Centro Nacional de Analisis Genómico — set out to investigate the same issues, but using real tumors to challenge the consortium's variant calling.
First, the team used a common set of WGS reads of average quality from a single case of chronic lymphocytic leukemia. Then, in a second benchmark effort, they more deeply evaluated sequencing methods and somatic mutation calling pipelines using matched samples from a case of medulloblastoma from the ICGC PedBrain Tumor project.
For each case, the investigators made the unaligned sequence reads of a tumor and its corresponding normal genome available to members of the ICGC consortium, who then returned somatic mutation calls.
In contrast to the approach taken in the ICGC-TCGA DREAM Nature Methods study, which used three simulated tumor genomes, authors of the the ICGC study this week argued that using the sequence from a real tumor-normal pair was a more powerful and useful strategy with respect to real genome-wide mutational signatures.
According to the authors, when they compared the calls of different individual members of the ICGC, it was obvious that there were dramatic discrepancies in the number and type of mutations detected by different groups despite using the same cancer genome sequences, and in turn, significant differences in the number and types of gene mutations detected.
Out of more than 1,000 confirmed somatic SNPs, less than half were unanimously identified by all participating teams, the authors wrote. And with insertions and deletions, the concordance was even poorer. Only one somatic insertion/deletion out of a total of 337 was identified in common by all the parties.
"Contrary to common perception, identifying somatic mutations … from WGS data is still a major challenge," the authors wrote.
In the study, the team also attempted to evaluate multiple steps of the analysis process to see how each contributed to the observed lack of concordance in mutation calling.
Obviously calling mutations using different pipelines on differently prepared sequence read sets resulted in the lowest level of consensus. And while using a standard pipeline was an improvement, it still suffered from inadequate controls for library preparation and sequencing artifacts, the authors wrote.
In their benchmarking effort, the investigators also combined the sequencing data generated from each participating center, merging them to create a combined tumor coverage of 314X. When participants in the study used this dataset, it yielded higher concordance, but "still resulted in substantial discrepancies in somatic mutation call rates and the calls themselves in the hands of different analysis groups," the authors wrote.
"This paper helps us track progress on this important problem by both identifying the strengths of our current approaches and where further work is needed, Jared Simpson, principal investigator in the Ontario Institute for Cancer Research's informatics and bio-computing program, said in a statement.
In addition to identifying outstanding issues in somatic mutation analysis from WGS data, the study was also a first step toward formulating a set of best practices, the authors wrote.
Amongst the takeaways was a conclusion that PCR-free library prep should be the method of choice moving forward, and that the consortium should aim for a sequencing depth of close to 100x for both tumor and normal sequencing, "particularly in situations where subclonal mutations or noncoding alterations are suspected to be playing a role."
The authors also found that certain informatics pipelines for analysis showed much higher compatibility than others. Overlapping more than one tool for mutation calling, and using consensus between the two to solidify findings also appeared to be helpful, the researchers wrote.
As part of the study, the group used the high-coverage data created in the sequencing benchmark effort to create a "gold set" of verified somatic mutations, which they have released to the research community through the ICGC DACO and the EGA to benchmark and calibrate their own pipelines.
"We are making our findings available to the scientific and diagnostic community so that they can improve their systems and generate more standardized and consistent results," Ivo Gut, senior author of the publication and director of the Centro Nacional de Analisis Genómico in Barcelona, said in a statement.