NEW YORK (GenomeWeb) – Researchers from centers in Spain and Germany have developed a comparative algorithm for finding and characterizing cancer mutations from high-throughput sequence data by directly comparing tumor reads to reads from matched normal samples, rather than aligning sequences to the human reference genome.
"Since we don't use [the] reference genome, we only look for real differences between [the] two genomes from the same patient," David Torrents, a computational biologist at the Barcelona Supercomputing Center, told In Sequence. "This means that we can be more accurate and have fewer false positives."
Torrents and his team introduced the search software, known as "somatic mutation finder," or SMUFIN, in a Nature Biotechnology study appearing online this weekend. Based on findings so far, they argue that this approach offers a faster, simpler means of finding most types of somatic mutations than reference genome-based methods for analyzing cancer genomes.
Using simulated tumor-normal sequences, for example, the algorithm identified single nucleotide changes with 92 percent sensitivity and 95 percent specificity. In the same modeled data, somatic mutations involving structural variants were found with 74 percent sensitivity and 91 percent specificity.
Likewise, in real sequence data from mantle cell lymphoma and medulloblastoma samples, the researchers found that SMUFIN uncovered not only small mutations and structural variations, but also larger structural changes — including breakpoints pointing to extensive chromothripsis or chromoplexy rearrangements.
Until now, Torrents explained, most cancer genome analysis methods involved mapping reads from both tumor and matched normal samples to the human reference genome before searching sequentially for different types of mutations.
This triple comparison approach gives good results when searching for single nucleotide changes in cancer genomes, according to Torrents, and has yielded a great deal of information so far. But when looking for structural variation rather than this type of point mutation, that type of analysis can get much more complicated.
"The problem comes when you want to identify bigger changes — deletions of one part of the genome, which span a few megabases, exchanges of DNA from one chromosome to another, or translocations — that we call structural variants," he argued. "These were very difficult to identify and, still, many of them are almost impossible to identify using the triple [comparison] or reference genome-based methods."
In an effort to circumvent such alignment and accuracy problems, the researchers came up with an analytical method that puts tumor reads directly up against reads from matched normal samples — a project that stemmed from an ongoing collaboration between the Barcelona Supercomputing Center and a Spanish chronic lymphocytic leukemia consortium participating in the International Cancer Genome Consortium.
"The [Barcelona Supercomputing Center] was involved in the primary analysis of the data coming from chronic lymphocytic leukemia patients in this context," Torrents explained. "We, as a group, took advantage of this and while we were doing primary analysis of the genomes on one side, we started to develop new ways of analyzing genomic data in general."
What they came up with is a reference-free program involving two computational steps: the identification and isolation of tumor-specific reads — those found in an individual's tumor sample but missing in a matched normal control sample from the same person — followed by the identification of candidate single nucleotide variants, structural changes or rearrangements, and tumor breakpoint blocks.
In tests involving simulated cancer genomes, the resulting software compared favorably against several commonly used somatic variant callers.
For instance, SMUFIN's sensitivity and specificity for calling single nucleotide changes was 92 percent and 99 percent in the simulated samples. The SNV software Mutect had 97 percent sensitivity and 93 percent specificity in the same samples.
The improvements appeared to be more pronounced when looking at structural variations, the researchers reported, with SMUFIN apparently picking up on a wider range of structural variants than software such as Pindel and Delly, which are designed to detect structural changes that fall into specific size ranges.
In authentic sequence data from an aggressive mantle cell lymphoma sample, the researchers used SMUFIN to track down more than 4,400 single nucleotide changes and almost 1,100 small structural variants.
The study's authors noted that the analytical tools that are used most widely in cancer genomics involve sequential analyses of reads from tumor and normal samples, meaning different analyses are needed to track down and characterize point mutations, small deletions, large deletions, and so on.
"If you wanted to do complete screening of one genome, you need five or six different programs plus the previous alignment," Torrents said. "That means that you would need a few days … or up to a week for each tumor to be analyzed."
In contrast, results from the new study suggest SMUFIN can simultaneously assess a range of somatic mutation types in a relatively short time — just five to 10 hours, depending on the computing horsepower on hand.
In their subsequent analyses of authentic tumor-normal pairs, for instance, authors of the study considered mantle cell lymphoma and medulloblastoma tumors known to show signs of chromoplexy or chromothripsis — complicated and widespread chromosomal rearrangements — based on prior experimental and computational analyses by co-authors on the study.
Whereas the initial analyses of those tumors took weeks or months, the latest look at the tumors with SMUFIN reportedly identified the same alterations, as well as new translocations and breakpoints, in just a fraction of the time.
Moreover, Torrents argued that by finding different somatic mutation types simultaneously rather than sequentially, SMUFIN may provide an analytical avenue for groups interested in analyzing cancer genomes that don't have the computing power to install and run multiple mutation-finding programs or complicated pipelines.
Still, Torrents conceded that there are some types of somatic mutation that remain challenging for SMUFIN in the software's current form.
In particular, he said, the method cannot see chromosome ends that have been lost and may also have difficulty detecting mutations flanked by palindromic sequences — features of the algorithm that the team is working to improve.
Similarly, while SMUFIN can detect copy number changes, Torrents noted that additional tweaks to the software will be needed to quantify that type of alteration.
So far, the researchers have used this somatic mutation finding method to analyze genome sequence data generated with Illumina instruments, though they are starting to explore the amenability of applying the approach to tumor-normal genomes sequenced with other technologies.
Theoretically, SMUFIN should work regardless of the sequencing instrument selected, Torrents explained, but it may be necessary to calibrate the method slightly depending on the error rate and types of errors associated with the sequencing technology used.
Although there is no absolute minimum coverage depth required, the team has found that the performance and power of the SMUFIN software tends to break down when genome coverage in the tumor or normal samples is lower than 20-fold.
Along with their efforts to improve the types of somatic mutations that SMUFIN can find in tumor genomes, the researchers are also exploring the possibility of using the software to explore germline alterations that contribute not only to disease risk, but also to normal genetic variation between individuals and populations.