Skip to main content

AMP-Led Project Compares Bioinformatics Pipelines for Analyzing Cancer Panel Data

ROTTERDAM, Netherlands (GenomeWeb) – Many clinical laboratories now routinely run cancer sequencing panels to identify potentially actionable mutations in patient tumor samples, but there is no gold standard yet for next-generation sequencing data analysis and interpretation. As a result, reports of clinically actionable variants may differ, even if the underlying sequence data is identical.

That is the main result of a project put on by organizers of the Association of Molecular Pathology's AMP Europe 2018 meeting, and presented here yesterday, that compared data analysis pipelines from three commercial vendors.

Winand Dinjens, head of molecular diagnostics in the Department of Pathology at Erasmus University Medical Center (Erasmus MC) Rotterdam and an organizer of the session, explained that the aim of the exercise, dubbed "Battle of the Bioinformatics Pipelines," was to provide commercial vendors of NGS analysis and interpretation pipelines with sequencing data from real patient samples, generated by a routine molecular diagnostics laboratory, and to see how similar or different their results would be.

Originally, he said, seven or eight companies had been interested in participating, however all but three dropped out, partially because they were unable to analyze the sequence data, which came from Thermo Fisher Scientific's Ion S5 XL system. The remaining contenders were Agilent Technologies, which employed its Alissa Interpret variant assessment support software; Qiagen, which used its Biomedical Genomics Workbench data analysis platform and Qiagen Clinical Insight (QCI) Interpret software; and Thermo Fisher Scientific, which relied on its Ion Torrent Suite, Ion Reporter, and Oncomine Knowledgebase Reporter software.

Dinjens laboratory at Erasmus MC's pathology department selected five patient samples — one non-small cell lung cancer FFPE tissue sample, one pancreatic cancer FFPE tissue sample, and three non-small cell lung cancer pleural fluid cytology samples — that differed in tumor content from 50 percent to 80 percent. They sequenced these samples on their in-house Ion S5 XL, using a custom panel. The companies obtained the sequence data both as FASTQ files, where the reads are not aligned, and as BAM files, which has aligned reads.

Their goal was to identify clonal and subclonal mutations in the tumors, down to a level of 5 percent. Vendors were asked to name and annotate the variants, state their allele frequency, and interpret them according to a five-tier classification system that ranged from "benign" to "clinically significant." 

In parallel, the Erasmus MC lab performed its own data analysis of the sequence data but did not share those results with the three vendors. In terms of relevant variants, it found a MET deletion mutation and a TP53 mutation in the NSCLC tissue sample, a CDKN2A and a KRAS mutation in the pancreatic cancer tissue sample, an EGFR mutation and a TP53 mutation in a homopolymer region in one of the NSCLC cytology samples, a TERT promoter mutation in another NSCLC cytology sample, and three EGFR mutations as well as a TP53 mutation in the last NSCLC cytology sample.

Dinjens acknowledged that his lab cannot not be sure its own results are the correct ones to serve as the gold standard. "We also don't know what's real," he said.

Interestingly, none of the three vendors completely replicated the Erasmus MC laboratory's report of variants, although many of the results overlapped. 

Once source of variation was likely the choice of variant callers, since the vendors only obtained the raw sequence data. Elias Hage from Agilent, who presented his company's results, for example, said that his team used three different variant callers and took two approaches, one that combined all variants called by the three tools and another that looked at variants called by several tools.

Also, for one sample, the Alissa platform found a potentially clinically significant variant, but that variant relied on a low-quality base call, and it would be up to the lab and its standard operating procedure to define what quality is acceptable for reporting a variant.

While the Alissa platform can help with the interpretation of variants, Hage noted, it does not provide a list of potentially useful drugs or clinical trials. For that, Agilent partners with N-of-One, which also offers manual curation of variants that cannot be readily interpreted.

Qiagen's Tim Bonnert, who presented his company's results, also stressed that the results generated by the software tools depend on what standards the labs define ahead of time, for example what sequence coverage depth is acceptable, or what genomic regions it will report from. Like Hage, he said that his team did not interpret variants in some regions with low sequence coverage, which would have yielded results that are not trustworthy. He also pointed out that variants called from the FASTQ files and from the BAM files can differ.

Greg Tyrelle from Thermo Fisher Scientific noted that his team might have been at an unfair advantage because it was probably more familiar with the Ion Torrent data than the others were. For example, he and his colleagues were able to use certain "flow space" information for each base, related to the Ion Torrent chemistry, during the variant calling process. In addition, he said, they could apply certain default parameters to analyze the data.

For several samples, he reported, the Oncomine Knowledgebase Reporter software did not come up with a hit for any variant that is linked to either a treatment or a clinical trial.

Overall, the participants and organizers agreed that despite the availability of several data interpretation software packages and support tools, it is still up to the laboratories to define the reporting criteria.

Also, molecular pathologists are still needed to interpret the information in the context of a patient's disease, and to decide what goes into the final clinical report. "Even though we have phenomenal tools, at the end of the day, you're the clinical professional and need to make decisions," said Andrea Ferreira-Gonzalez, chair of the division of molecular diagnostics in the Department of Pathology at Virginia Commonwealth University and a co-organizer of the meeting.

The Scan

Call to Look Again

More than a dozen researchers penned a letter in Science saying a previous investigation into the origin of SARS-CoV-2 did not give theories equal consideration.

Not Always Trusted

In a new poll, slightly more than half of US adults have a great deal or quite a lot of trust in the Centers for Disease Control and Prevention, the Hill reports.

Identified Decades Later

A genetic genealogy approach has identified "Christy Crystal Creek," the New York Times reports.

Science Papers Report on Splicing Enhancer, Point of Care Test for Sexual Transmitted Disease

In Science this week: a novel RNA structural element that acts as a splicing enhancer, and more.