NEW YORK (GenomeWeb) – Researchers from Washington University in St. Louis have developed a method to uncover complex insertions and deletions in next-generation sequencing data that are often overlooked.
WUSTL's Li Ding and her colleagues analyzed some 8,000 samples from The Cancer Genome Atlas and teased out nearly 290 complex indels in cancer-linked genes, many of which had been missed or misidentified in previous analyses. A number of these mutations — some affecting the EGFR, MET, and KIT oncogenes — further appeared to be clinically relevant and could help guide treatment choices, Ding and her colleagues reported today in Nature Medicine.
"By identifying such druggable complex indels, we will be able to help more patients," Ding told GenomeWeb. "Without this tool, those druggable complex indels are currently being missed by conventional approaches."
The search for complex indels fell a bit by the wayside with the introduction of next-generation sequencing tools, the researchers noted. Previous work with Sanger sequencing-based analyses had uncovered thousands of complex indels in germline DNA as well as in people with cancer, but the size of short reads makes it more difficult to uncover these complex changes.
Ding and her colleagues developed a novel module within the Pindel algorithm, which they dubbed Pindel-C, to search for such co-occurring insertion and deletion events.
Based on their alignment, the tool first distinguishes simple indels from complex ones, placing them into different bins. The bin of simple indels follows a standard identification and annotation strategy, while the complex indels go through a few more steps, Ding said.
The putative complex indels are compared to the reference genome, and the reference genome is also compared to the complex indels. If the reference isn't completely covered by the reads, that gives a clue that something is missing, she said. At the same time, if the reads of the complex indel are not completely covered by the reference, that suggests that something has been added.
"By combining these two pieces of information, we know that something is missing in the reference, something is also missing in the reads. So that says we must have a deletion and an insertion happening at the same time," Ding said.
The approach also includes quality control steps, including the reduction of false positives.
The tool, she noted, is still a work in progress and the sensitivity is not yet optimal. In the paper, the team reported using simulations to gauge its accuracy. In one case, they estimated it to have nearly 88 percent sensitivity for a read length of 100 base pairs and a sensitivity of 70 percent for a read length of 250 base pairs. However, in another, they observed 48 percent sensitivity and a false discovery rate of about 14 percent.
To examine the landscape of complex indels in cancer, Ding and her team turned to exome sequences of 8,060 tumor and matched-normal sample pairs from across 22 cancer types from TCGA. They focused in particular on genes known to be involved in cancer, as indels there would have the greatest effect, Ding said.
From this, they found 285 complex indels in cancer-associated genes, most of which were missed or mis-annotated in previous studies. Some 21 genes — including EGFR, PIK3R1, PTEN, and TP53 — harbored complex indels in at least three cancer samples, according to the researchers. A number of these complex indels may be druggable, Ding noted.
For instance, she and her colleagues identified four indels affecting EGFR. All of these variants could be traced to its flexible loop, which is part of EGFR's ATP-binding pocket.
The researchers suspected that these newly discovered somatic complex indels in exon 19 of EGFR — which remove bits of the loop — could lead to increased and sustained phosphorylation of EGFR and other ERBB-family proteins, activating the AKT and STAT pathways to promote cell survival.
Other studies, they added, have indicated that patients with exon 19 deletions respond well to erlotinib and gefitinib treatment.
Similarly, Ding and her colleagues uncovered two in-frame complex indels in the KIT oncogene that could indicate susceptibility to the inhibitor PLX647.
"Missing important mutations in those druggable cancer genes is just devastating," Ding said.
She and her colleagues are now working to improve the approach, such as increasing its sensitivity. But they also plan to expand it to be able to explore other difficult-to-find genetic variations that contribute to cancer and other conditions.
"We really want to be able to capture all human variation because today we still cannot explain some of the cancers," Ding added. "We strongly believe this is due to our lack of ability [to detect] some of the novel types of human variation."