NEW YORK (GenomeWeb News) - Researchers at Johns Hopkins University who published the first cancer sequencing study to validate the concept behind the NIH Cancer Genome Atlas project last week see promise in next-generation sequencing but caution that these methods still have limitations to overcome.
The Hopkins team, which used traditional Sanger sequencing to conduct their proof-of-concept analysis of breast cancer and colon cancer published online in Science last week, is working with Applied Biosystems to validate the company’s newly acquired Agencourt Personal Genomics DNA sequencing platform, GenomeWeb News has learned.
The study, led by Bert Vogelstein, Ken Kinzler, and Victor Velculescu, revealed a large number of novel cancer candidate genes, showing that the unbiased large-scale sequencing strategy the NIH’s Cancer Genome Atlas pilot project plans to employ “is likely to produce important discoveries and impact cancer care,” NIH Director Elias Zerhouni said in a statement.
The authors note in the Science paper that recently developed DNA sequencing methods could reduce the cost of similar cancer genome sequencing projects in the future. But the high error rates of these techniques compared to Sanger sequencing will make it difficult to distinguish real mutations from false positives, they caution.
“Hopefully, such technologies will be able to make the whole enterprise more efficient and less costly,” Vogelstein, co-director of the Ludwig Center at Johns Hopkins, told GenomeWeb News in an e-mail message. “But it’s too early to tell what technologies will be useful and how best to combine them, e.g. Sanger for certain questions, new technologies for others.”
Although his lab has been collaborating with ABI’s APG group, Vogelstein would not say yet which up-and-coming sequencing platform -- ABI’s, Solexa’s, 454’s, or Helicos’ -- has the greatest promise for cancer genome sequencing. “They are all clever and have promise; I don’t know enough about them to comment further,” he wrote.
“I think there are a variety of questions that need to be addressed before any new technologies can be easily incorporated into TCGA,” he added.
In their study, the Hopkins researchers sequenced approximately 13,000 protein-coding genes in 11 breast cancers and 11 colorectal cancers. To do that, they designed primers to amplify almost 121,000 exons and adjacent intronic sequences, resulting in about 3 million PCR products. These were sequenced at Agencourt Bioscience, using ABI’s 3730 instruments, producing 465 Mb of DNA sequence.
Next, the scientists analyzed the sequence data and found almost 820,000 potential nucleotide changes. After eliminating those that would not change the amino acid sequence, they were left with almost 560,000 changes, which could represent either polymorphisms, PCR or sequencing errors, or real somatic mutations.
The scientists then removed changes that were also present in two normal samples, as well as known polymorphisms contained in SNP databases. They also visually inspected sequence traces to remove false positive calls in the automated analysis.
Subsequently, they re-sequenced the remaining 30,000 alterations to eliminate those due to PCR errors, shaving off another 10,000 of them.
They then checked the remaining 20,000 changes against normal DNA samples from the respective patients, and found that 18,414 were present in the normal samples, indicating these are polymorphisms not yet contained in any SNP databases.
A bioinformatics analysis excluded almost 300 of the remaining changes, leaving the researchers with 1,307 true somatic mutations in 1,149 genes. A validation screen, in which the scientists sequenced these genes again in 24 additional breast or colorectal tumors, generated another 77 Mb of additional DNA sequence data and found another 365 somatic mutations in these genes.
Excluding genes that were not mutated in the validation screen, and applying statistics to eliminate mutations resulting from chance events, the scientists came up with 191 candidate cancer genes, or CAN genes, 122 in breast cancer and 69 in colorectal cancer.
“The vast majority of these genes were not known to be genetically altered in tumors,” the authors note.
“This is a great piece of work, it’s a great step forward,” Matthew Meyerson, director of the center for cancer genome discovery at the Dana-Farber Cancer Institute, told GenomeWeb News. “It really demonstrates a lot of the power of systematic genome sequencing,” he said. Meyerson’s group published a much smaller-scale cancer sequencing study in Nature Medicine this summer in collaboration with 454 Life Sciences and the Broad Institute. That study found that 454 sequencing was able to detect mutations that Sanger sequencing missed.
But the Hopkins scientists “were limited in what they could do, in how many samples they could look at, and how much they could discover, really for cost reasons,” he said, and new technologies could help with that in the future.
The Hopkins authors write, for example, that they could have reduced false positives by sequencing matched normal samples and sequencing both strands of each PCR product, but this “would increase the cost of sequencing by four-fold.”
The study incurred $5 million in total sequencing costs, Vogelstein told GenomeWeb News. “How much new technologies could reduce it depends on the costs of the new technologies together with the costs of confirming any mutations that the new technologies find, with either Sanger or other methods,” he wrote.
But the new technologies have yet to prove that their data matches the quality of Sanger sequencing. “I don’t think there is any simple way other than to improve sequencing accuracy,” according to Vogelstein.
Kevin McKernan, senior director of scientific operations at ABI’s Beverly, Mass. facility, where the APG R&D team is headquartered, agreed that next-generation technologies are “going to take probably a year for everyone to validate and believe that the extra SNPs, the differences they find, are in fact meaningful and that they haven’t missed anything,” he said, adding that scientists need to find out how these technologies cross-correlate to Sanger sequencing. “All the genome centers need to get comfortable with it, and there needs to be more peer review and more cost [studies] on which systems are the right price per base.”
He said his group has been collaborating with the Vogelstein lab to test its next-generation platform. The APG team has “taken a subset of those [genomic] regions [identified by the Hopkins researchers], probably the more interesting ones,” and re-sequenced them using their next-generation instrument.
He and his team have found a number of additional mutations, similar to Meyerson’s results with the 454 technology, but “the trick with any next-gen sequencer [is] that when you start finding other mutations, you have to make sure they are not false positives,” McKernan told GenomeWeb News last week. “That’s the stage that we are at now, looking at other genotyping assays that have the sensitivity that can actually pull these out and confirm and corroborate the results that we are getting,” he said.
McKernan and his collaborators at Hopkins are planning to publish their results in a separate publication in the future, he said.
ABI has already started “speaking with” genome centers, he said. In a conference call in July, ABI President Cathy Burzik said the company was “committed to a rapid commercialization timeline of the Agencourt technology and look[s] towards mid calendar 2007 for the introduction for the instrument to early access customers.”
ABI and 454 are not alone in seeing an opportunity in cancer genome sequencing. “Our raw read accuracy, coupled with the both the fold coverage and sequencing of both the forward and reverse strands of the DNA, should position us quite well for analysis of heterogenous samples, including cancer samples,” wrote Omead Ostadan, Solexa’s vice president of marketing, in an e-mail message.
According to John Boyce, Helicos BioSciences’ senior director of marketing, his company’s technology would be especially suited for follow-up studies of cancer-related genes in tumors. “Further large-scale analysis, with true single-molecule sequencing, of the CAN-genes identified may reveal further mechanistic insight about disease progression by uncovering low-frequency intra-tumor mutations that govern the process,” he wrote in an e-mail.
In the meantime, large-scale cancer sequencing should go forward with the Cancer Genome Atlas using established technology, McKernan suggested. “This paper has given the whole community a vote of confidence that we shouldn’t hold our breath, we should just get going now with existing tools and then implement the next-gen stuff as it’s proving itself out,” McKernan said.
Julia Karow covers the next-generation genome-sequencing market for GenomeWeb News. E-mail her at [email protected]