Researchers at the Broad Institute have developed a new computational method for detecting copy number alterations in tumor samples and mapping their breakpoints using short sequence reads, according to a recent paper.
Comparing their method to DNA microarrays, the scientists showed in a study published late last month in Nature Methods that their segmentation algorithm, called SegSeq, identifies CNVs in tumor DNA with more sensitivity and at a greater dynamic range than arrays. The new method can also estimate the breakpoints of copy number alterations with greater precision than arrays, they said.
Up until now, CNVs in cancer have mostly been characterized by array-based methods, according to Derek Chiang, a postdoctoral research associate in Matthew Meyerson’s group, who developed the new method with his colleague Gad Getz.
However, that might change as cancer genomics projects — such as the National Institutes of Health’s Cancer Genome Atlas — start switching over to second-generation sequencing.
“I expect my method and methods like it will become increasingly important as the field in general shifts towards next-generation sequencing,” he said.
Analyzing CNVs in cancer can help researchers detect new oncogenes and tumor-suppressor genes, he explained, which often reside in regions that display copy number alterations in the same region across different tumor samples.
According to the researchers, SegSeq estimates CNVs by using read counts and read-density information from short-read sequence data. The results come “more or less for free, given that whole-genome sequencing has already been done” and does not require additional data to be generated, according to Chiang.
The main innovation of their approach, he said, is that “instead of using fixed genomic windows, we actually took advantage of the high density of aligned reads and used that to develop a statistical confidence for breakpoints that occur in each read.”
The researchers tested their method on three tumor cell lines and their normal controls, comparing sequence-based results on CNVs to results obtained from Affymetrix 6.0 SNP arrays. Using an Illumina Genome Analyzer, they generated between 10 million and 19 million uniquely aligned reads for each sample.
“I expect my method, and methods like it, will become increasingly important as the field in general shifts towards next-generation sequencing.”
The team found that although the ability of the two methods to identify copy number alterations was comparable, the sequence-based method provided a higher dynamic range for CNVs. For example, while the array found a 16-fold increase in copy number at a specific locus, sequencing estimated a nearly 60-fold increase, and qPCR confirmed the larger number.
The researchers also managed to estimate the breakpoints of three homozygous deletions in a single cell line and found that the sequence data mapped these breakpoints twofold more precisely than the array data.
One of the main advantages of the new method is that it will allow researchers to “detect very small events that occur in cancer genomes” that affect perhaps only a single exon, Chiang said.
Regarding the cost of the analysis, “it seems like arrays are cheaper for now, but I don’t know by what fold and for how long,” he said.
He and his colleagues noticed that the number of reads from a certain area of the genome depends not only on copy number but also on GC content. But because they compared matched controls and tumor samples, the researchers were able to eliminate that bias.
According to Chiang, the GC bias was “aggravated” by the chemistry of the old Genome Analyzer I and is less prominent with the GA II chemistry.
The method could also work on other sequencing platforms, like the Applied Biosystems SOLiD, and Chiang said he and his colleagues are currently testing it on different platforms. “We are very interested in having a platform-independent algorithm,” he said.
The next step will be to include paired-end information in the analysis, Chiang added. “Once we incorporate that, we can start to look for structural rearrangements, in addition to copy number variants,” he said.