By Monica Heger
Results from a recent study suggest that current sequencing efforts are missing important information on cancer re-arrangements that could eventually help subtype cancers for diagnostics and identify driver mutations.
The study, published last month in Genome Research, set out to validate a sequencing method for identifying structural variation in clinical samples. It found that rearrangements in epithelial cancers are distinct from those in leukemias and normal genomes, and also allowed the team to determine the chronological order in which re-arrangements occurred throughout cancer progression — information that could help uncover driver mutations.
The method, dubbed paired-end tag sequencing, was developed by researchers at the Genome Institute of Singapore. In the study, they used the approach to characterize 15 cancer genomes and two normal genomes and found that tandem duplications, unpaired inversions, interchromosomal translocations, and complex rearrangements are over-represented among somatic rearrangements in cancer genomes; while inversions, deletions, and insertions tend to be germline structural variations.
Additionally, the method allowed the team to time-stamp the various rearrangements and determine that large duplications are frequently the first structural variations to occur in tumor cells and may subsequently "trigger genome instability for extensive amplification in epithelial cancer," the authors wrote.
The key to the method is the creation of a long insert sequencing library with an insert fragment size of 10 kilobases. The method was originally developed in 2009 by Yijun Ruan's team at GIS in collaboration with Life Technologies for the company's SOLiD system. While the team has used it before, the Genome Research study marks the first time they have demonstrated its use with cancer genomes.
"Most of the recent cancer genomes that people have sequenced have primarily used shorter inserts, like 700 base pairs…and if you only use short insert libraries, you don't get the full spectrum of structural variation," said Vikas Bansal at Scripps Genomic Medicine, who was not involved with the study, but who has also developed protocols for detecting structural variation (IS 7/6/2010).
As described in the Genome Research study, the team tested fragment sizes of 1, 10, and 20 kilobase pairs, determining that 10 kilobase pairs is the optimal size for structural variation analysis. They used the technique on 15 cancer samples, including five primary breast cancer tumors, three breast cancer cell lines, four primary gastric cancer tumors, one gastric cancer cell line, a colon cancer cell line, and a chronic myelogenous leukemia cell line. The cancer genomes were then compared to two normal genomes from cell lines.
The PET sequencing method relies on the use of paired-end tags and the creation of circularized insert fragments. With an insert size of 10 kilobases, the paired end tags can help map sequenced DNA to within 10 kilobases of each other, which is particularly important for identifying large rearrangements and for mapping across repetitive regions of the genome.
The method is very similar to Life Technologies' mate pair sequencing protocol, but differs primarily in the fact that it has a much longer insert size, said Axel Hillmer, the lead author of the current study and a research scientist in Ruan's lab.
The long insert sizes make the fragments more likely to uniquely map to the genome and enable breakpoint mapping, said Hillmer. However, he said that despite the long insert size, there are still some genomic regions where the method does not provide unique reads, such as in the highly repetitive Alu regions.
The team generated a total of 25.9 gigabases of sequence data, achieving an average of 81-fold physical fragment coverage for each genome, with some libraries achieving more than 100-fold physical coverage.
The team found 62 structural variations in one normal genome and 96 in the other and identified an average of 115 structural variations in the eight breast cancer tumors, an average of 344 in each gastric tumor, 428 in the breast cancer cell lines, and 584 in the single gastric cancer cell line.
By comparison, when researchers from the Wellcome Trust Sanger Institute sequenced 24 breast cancer genomes from primary tumors and cell lines, specifically looking for rearrangements, they detected an average of 67.8 rearrangements per primary tumor genome and an average of 128 per cancer cell line — significantly less than what the Singapore team detected in primary breast cancer tumor and cell lines.
Additionally, structural variants such as tandem duplications, unpaired inversions, interchromosomal translocations, and complex rearrangements were over-represented among somatic rearrangements in cancer genomes, while inversions, deletions, and insertions tended to be germline structural variations.
Furthermore, because of the high physical coverage they achieved with the sequencing method, they were able to piece together the order in which the rearrangements occurred throughout cancer progression.
The high physical coverage allowed the team to estimate the copy number of each rearrangement point. Those with the highest copy number "we think are likely to be the earliest rearrangements that have then been amplified," said Hillmer.
Being able to determine the chronological order of rearrangements can help in identifying driver mutations, he said, by working backwards to determine which genes were affected in the original rearrangement.
Compared to other methods for identifying structural variation, there are advantages and disadvantages to the PET approach, said Paul Medvedev at the University of San Diego, who has tested several different sequencing-based methods for identifying structural variation.
In one approach, called the depth-of-coverage approach, "for any given region, you are looking to see how many reads map," he said. "If it's above what's expected, there may be a duplication, and if it's below what's expected, there's a deletion." Depth of coverage is better than paired-end mapping at spanning highly repetitive regions, he said, since researchers do not have to rely on having a uniquely mapped read. On the other hand, he added, it is not as good as PET at localizing breakpoints or detecting smaller events.
Medvedev thinks that the best way to fully characterize structural variation is to use a variety of approaches. "Each one is good at finding one thing but not as good at finding another," he said. As a result, "there's probably still a large chunk of structural variation that we're not detecting."
The paired-end tag mapping method has not been employed in cancer sequencing projects to date because it is challenging to construct sequencing libraries with large circularized insert fragments. Not only is it difficult to circularize DNA, but some fragments will circularize easier than others, creating a bias. As sequencing technology continues to improve, though, so do these methods. Additionally, as sequencing read lengths grow, the need for such methods will decrease.
Nevertheless, Hillmer said that his team is continuing to use the approach to study cancer rearrangements.
The team is now applying the method to other cancers, such as chronic myelogenous leukemia and gastric cancer. They are currently testing between five and 15 samples of each to look for recurrent patterns and recurrently hit genes or genomic regions.
While the current paper was intended as a proof of principle, using the approach in many more samples could eventually help in developing diagnostics based on structural variants. "We expect this approach to be sufficiently robust and cost effective to be applied in clinical settings for genetic diagnostics of cancer patients," the authors wrote.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.