By Monica Heger
This story was originally published December 18
Researchers from the Wellcome Trust Sanger Institute in the UK reported last week in Nature that they have sequenced melanoma and lung cancer genomes using Illumina and SOLiD sequencing technology, respectively. In both cases, the techniques identified both known cancer-causing mutations, as well as novel mutations. They also found evidence of ultraviolet damage in the melanoma cancer genome and damage from tobacco carcinogens in the lung cancer genome.
The two papers indicate that whole-genome short-read sequencing is becoming increasingly important for identifying cancer-causing genes that could eventually lead to better diagnosis and treatment, according to the authors.
The studies are part of a growing number of cancer genome sequencing projects, — such as the Sanger Institute's Cancer Genome Project and the National Institutes of Health's Cancer Genome Atlas — many of them under the umbrella of the International Cancer Genome Consortium.
The papers "demarcate the new era from the old era," Peter Campbell, a member of the Wellcome Trust Sanger group that conducted the research, told In Sequence. "They demonstrate quite convincingly that we can find mutations in all classes at a cost that's rapidly decreasing and a time frame that's rapidly decreasing."
The researchers noted that at the time they did the experiments, the sequencing cost around $100,000 on each platform, including both the tumor and normal genomes. They estimated that if they were to do it again today, the price would be around $50,000 on each platform.
The researchers said that they used both the Illumina and SOLiD technologies because they thought it was important to assess the two systems, though they noted that the platforms produced comparable results, and any differences between the technologies would not be relevant today because both platforms have been improved significantly.
"Both were able to deliver high-quality cancer genome sequences in which we could get comprehensive catalogs of somatic mutations," said Michael Stratton, who heads the Sanger's Cancer Genome Project and who led the research.
Campbell agreed and said that improving the algorithms would yield more benefits than using one sequencing technology over the other. "You need good informatics to take that data set and pull out what is [a] genuine mutation, what is sequencing error, and what is artifact," he said.
Both approaches sequenced genomes from the tumor cell line as well as a normal cell line from the same patient and compared the genomes to each other.
To sequence the lung cancer genome, the scientists used the SOLiD platform to generate 25-base pair mate-pair shotgun sequences and achieved about 39-fold coverage of the tumor genome and 31-fold coverage of the normal genome.
In total, they detected 22,910 somatic substitutions, and confirmed an additional 65 indels, 58 genomic rearrangements, and 334 copy number segments.
Of the 29 known base substitutions they found 22. They also tested 79 new coding substitutions and 354 randomly chosen genome-wide variants, and confirmed 97 percent and 94 percent respectively, using capillary sequencing. They also confirmed 25 percent of indels using capillary sequencing.
In addition, they detected mutational patterns previously associated with carcinogens in tobacco smoke. "The complicated mutational processes, all of which can be traced back to carcinogens, indicate that there is a cocktail of carcinogens that work together to produce the mutations that cause cancer," said Stratton.
To sequence the genome from the malignant melanoma cell line, the scientists used the Illumina Genome Analyzer II and a paired-end sequencing strategy (see In Sequence 9/22/2009). They constructed short libraries of 200 and 400 base pairs and paired mate libraries of 2, 3, and 4 kilobases, generated read lengths of 75 base pairs, and achieved 40-fold coverage of the tumor genome and 32-fold coverage of the normal genome.
[ pagebreak ]
The technique detected somatic mutations, including substitutions, insertions and deletions, and rearrangements. They identified 33,345 somatic base substitutions, including 42 of 48 known somatic substitutions. They tested 470 of the new substitutions with conventional sequencing methods, and 454 were confirmed, indicating a false-positive rate of 3 percent. The false-positive rate for indels was much higher, though —only 36 percent of those evaluated were confirmed by conventional sequencing. Fifty one rearrangements were detected, 75 percent of which were confirmed.
Stratton said that the mutations that were found were consistent with what is known about the effects of ultraviolet light on DNA. "We clearly found a mutational signature of UV light," he said.
The biggest challenge the group encountered was identifying indels. "On both platforms, we still have problems calling indels," said Campbell. "It's the biggest black box we have," he added. Improving the bioinformatics, longer read lengths, and a better understanding of the artifacts that are causing the miscalls will lead to more accurate calling of the indels, he said.
As some of the first papers detailing cancer genome sequences, Raju Kucherlapati, a professor of genetics at Harvard who was not involved in the study, said they will help guide future research and sequencing protocols. In particular, they offer insight into the depth of coverage that is needed, and the false positive and false negative rates of the techniques. He said that these studies indicate that around 30-fold coverage can provide good results in the 90-percent or higher accuracy range.
The 3-percent false-positive rate was pretty good, Kucherlapati said, adding that improving the rate would first require a cost-benefit analysis.
In these papers, the researchers used Sanger sequencing to confirm the mutations that were identified. In the future, Kucherlapati thinks that more and more cancer genomes will be sequenced with next-gen technology, so identified mutations will be able to be evaluated across several different cell lines, which would eliminate the need to confirm with Sanger sequencing.
The Cancer Genome Atlas, launched by the National Cancer Institute and the National Human Genome Research Institute, for instance, has plans to sequence more than 20 different tumor types, including breast, brain, skin, and gastrointestinal cancers. They have already begun collecting tissue samples from around the world.
Matthew Meyerson at the Dana-Farber Cancer Institute agreed that in the future cancer genomes sequenced with next-gen technology will be compared with each other. "Where we're moving in the future is not a one-off sequencing of one sample, but the sequencing of tens or hundreds of samples. We need that for statistical power — of knowing how important a gene is for causing cancer," he said.
Meyerson also said that it would be important to sequence cancer genomes directly from the primary tumor, because the cancer cell lines could continue to evolve, so the cancer cells sequenced might have different mutations than those found in the primary tumor.
Campbell agreed that this could be an issue, but noted that the fact that the Sanger team found mutations that were consistent with what UV light and tobacco carcinogens are known to do to DNA suggested that the cell lines reflected what the patient experienced.
Closer study of the mutation signatures of UV and tobacco could also be important for figuring out how they cause cancer. "[The studies] provide a mechanistic understanding of the role that each of the mutagens play in developing cancer," Kucherlapati said.
Both of the techniques also identified chromosomal rearrangements, which could previously only be identified using cytogenetic methods, Kucherlapati said. Meyerson agreed that this is a huge advantage of next-gen technology and could lead to new insights on cancer mechanisms. The chromosomal rearrangements that were found in the lung cancer genome were "particularly striking," Meyerson said. "It is a potential new pathway for small-cell lung cancer."
And, as the sequencing technologies continue to improve, Meyerson thinks researchers will continue to discover new mechanisms and learn more about the genes that cause cancer. "This is where cancer gene discovery is going, and in the long run, where cancer diagnosis is going," he said.