NEW YORK (GenomeWeb) – Researchers from St. Jude Children's Research Hospital have developed a new algorithm for detecting somatic copy number alterations (CNAs) in high-coverage whole genome sequence that they claim improves on existing methods.
According to the researchers, the algorithm combines structural variation information and read depth change information to identify novel cancer-related CNAs, complex rearrangements, and sub-clonal CNAs in cancer samples that other methods which rely on just read depth information may miss. It also provided dramatically better accuracy and sensitivity than previous techniques designed to identify CNAs in whole genome sequence, they said.
The researchers described the freely available tool, called Copy Number Segmentation by Regression Tree in Next Generation Sequencing (CONSERTING) in a recent Nature Methods paper that included the results of comparison tests between CONSERTING and four other published algorithms: CNV-seq, SegSeq, FREEC, and BIC-seq. For the comparison, they analyzed normal and tumor samples collected from 43 children and adults with glioblastoma, leukemia, melanoma, glioma, and retinoblastoma.
Jinghui Zhang, a member of St. Jude's Computational Biology department and senior author on the Nature Methods paper, said in a statement that the method identified CNAs in genomic data from the children in the study with 100 times greater precision, and 10 times greater precision in adults, compared to other methods. Meantime, Xiang Chen, a senior research scientist at St Jude and first author on the paper, added that the software successfully found previously undetected chromosomal rearrangements and CNAs that are only present in a small percentage of tumor cells. These may help explain why tumors sometimes return post-treatment, the researchers said.
CONSERTING was developed for the St. Jude-Washington University Pediatric Cancer Genome Project, an ongoing collaborative effort between the two institutions to identify genetic alterations that cause a number of childhood cancers. The project includes the normal and cancer genomes of 700 pediatric cancer patients with about 21 different cancer subtypes.
Initially, researchers involved in the project tried to use existing methods to analyze data from the samples and soon noticed that while these tools were generally able to identify large CNAs, "when you are going down to the focal copy number changes ... you start to have trouble seeing what is signal versus analytical noise," Zhang told GenomeWeb. With other methods, focal copy number changes are often missed outright or indistinguishable from thousands of false positives in data caused by coverage bias, mapping ambiguity in repetitive regions, or sequence library artifacts.
Efforts to optimize those other tools for their purposes did not work, Zhang said, and so the group decided to develop its own method. Furthermore, Zhang and colleagues had previously developed a second tool called Clipping Reveals Structure, CREST, an algorithm for detecting structural variations from next-generation sequencing data, and believed that they could use the information provided by CREST to improve copy number analysis.
According to the Nature Methods paper, CONSERTING uses recursive partitioning statistical techniques — regression tree analysis — to find the transition points for read-depth changes. It does so using an iterative process of segmentation by read depth, segment merging, and localized structural variant detection which makes it possible to detect CNAs with even subtle read depth changes.
As Zhang explained it, CONSERTING takes in structural variant breakpoint information provided by CREST, and uses it to perform localized read depth analysis on segments of the genome. Specifically, it uses the breakpoint information to search for new boundaries of read depth changes, she said. Breakpoints mark changes in copy number function caused usually by chromosomal rearrangements. In this context, using the breakpoint data makes it possible to distinguish between read depth changes caused by random or stochastic changes that could happen during sequencing, and those that are the result of copy number changes, she said. As new boundaries are identified, CREST is again applied to these newly found regions and the process repeats until there are no more new genomic segments are found.
CONSERTING uses regression trees so its performance is faster than what would be possible with statistical methods such as standard hidden markov models, making it possible to perform multiple iterations of read segmentation and SV breakpoint detection, Zhang said. Also, it uses loss-of-heterozygosity measures to determine which chromosome to use as a baseline reference to identify copy number changes in tumor samples, she said.
Although running the software together is more accurate, it does require more computation time than other methods, Zhang said. According to the paper, it requires about 50 minutes per iteration of read depth analysis of a paired tumor-normal whole-genome sequence. However, researchers can choose to run CONSERTING without CREST and that returns faster results with at least comparable accuracy to existing methods, she said.
The longer computation time for the combined solution is one of the reasons why the group has made available a cloud-based version of CONSERTING, along with an implementation of CREST, available on Amazon Web Services. This way, researchers can take advantage of the large quantities of compute power available on the cloud to complete their analysis in a reasonable amount of time, she said. Instructions for running the tools on the cloud are available on the St. Jude website. Alternatively, both pieces of software can be downloaded and installed locally for free from the website.
The researchers have implemented CONSERTING in St. Jude's clinical sequencing laboratory and are currently exploring was of using the information that the software supplies, Zhang said. One application that they are working on is exploring the possibility of predicting digital karyotype directly from whole-genome sequence and also predicting the clonality of copy number alterations.