New sequencing platforms have started to alter the way cancer genomics studies are conducted, driven by the technologies’ falling costs and increased throughput and performance, according to the co-director of a large-scale US genome center.
Up until now, researchers have looked for genetic variations underlying cancer by a number of methods, including array-based whole-genome genotyping and gene-expression profiling, as well as Sanger-based sequencing of candidate genes. Correlating these data with clinical information has helped them to learn about cancer biology, as well as to develop diagnostic and prognostic assays.
But with the emergence of second-generation sequencing technologies, “the cancer genomics paradigm is changing,” said Elaine Mardis, co-director of the Genome Center at Washington University in St. Louis, who was speaking at the Personal Genomes meeting at Cold Spring Harbor Laboratory last week.
Under the new paradigm, scientists are now sequencing entire cancer genomes, she said, looking not only for single-nucleotide mutations and small insertions and deletions, but also for structural variations. In addition, they are beginning to use the new technologies to analyze transcriptomes, gene regulatory regions, and DNA methylation profiles of cancer genomes.
Using the example of an acute myelogenous leukemia genome that her center recently sequenced using the Illumina Genome Analyzer, Mardis showed how this approach can uncover mutations in genes that are part of known cancer pathways but were not included in previous lists of cancer candidate genes. She said Wash U and other genome centers are now planning to use whole-genome sequencing in other cancer-genomics studies.
Before they embarked on the AML genome-sequencing project, the Wash U researchers tested the Illumina GA, which produces short reads, on the 100-megabase C. elegans genome and found that it is suitable to pick up polymorphisms. They published their results earlier this year (see In Sequence 1/22/2008).
For their first human cancer genome, which has been accepted for publication in a peer-reviewed journal, the Wash U researchers chose a sample from an AML patient in her late fifties, a now-deceased Caucasian woman who had a family history of cancer but exhibited a normal cytogenetic profile. Funding for this project came from a private donor.
“The cancer genomics paradigm is changing.”
The scientists sequenced DNA from both her primary tumor and her skin, which served as a normal control. In 98 runs on the Illumina GA, they generated 32-fold coverage for the tumor sample with unpaired sequence reads. For the skin sample, they obtained about 14-fold coverage.
They then mapped the reads to the human reference genome using the Maq algorithm, which was developed by scientists at the Wellcome Trust Sanger Institute, and looked for sequence variants using the Maq variant-discovery algorithm.
After that, they compared approximately 3 million high-quality single nucleotide variants they discovered to normal genome variants found in dbSNP as well as in the genomes of Jim Watson and Craig Venter. After narrowing down and validating the remaining variants, they determined that the AML tumor genes carried a small number of somatic mutations.
A number of these somatic mutations occurred in genes that are part of known cancer pathways, though they were not part of any lists of candidate genes that researchers had previously established, Mardis said.
The data for this first study was generated between August 2007 and early this year, Mardis noted, at a cost of $700,000, which includes labor, informatics processing, data storage, and instrument amortization. At the time, paired-end sequencing was not yet available for the Illumina GA.
Since then, the Wash U researchers, fueled by funding from the National Cancer Institute, have embarked on a second AML sequencing project, using only paired reads at a distance of either 200 base pairs or three kilobase pairs. This will allow them to detect structural variants like insertions, deletions, and inversions, in addition to point mutations and small indels. They are also hoping to use short-read assemblers to resolve complex rearrangements that are indicated by anomalous read pairs.
Their second case is a 38-year-old Caucasian male with AML, with no family history of cancer, who is currently in remission. Due to the improved throughput of the Illumina platform, the researchers have been able to produce more data in fewer sequence runs for this sample, and at a lower cost: As of last week, they had generated almost 7-fold coverage of the genome in four paired-end sequencing runs, using both 2x35 base pair and 2x50 base pair reads. The total predicted full cost for sequencing this second sample is $200,000.
Next year, Wash U plans to sequence five additional AML genomes, according to Mardis. As part of the NIH’s Cancer Genome Atlas project, the center is also working on sequencing a glioblastoma genome, in collaboration with other US genome centers, in order to advance the use of next-generation sequencing technology in TCGA.