Skip to main content

New Sequencing Tools Could Help Cancer Genome Atlas Project Meet Ambitious Goals

The Cancer Genome Atlas project could get a much-needed leg up from rapid developments in sequencing technology, according to several researchers who were recently awarded grants to adapt next-generation sequencing tools for the effort.
The TCGA initiative, a joint effort between the National Cancer Institute and the National Human Genome Research Institute, is about midway through a three-year pilot phase to assess the feasibility of a full-scale project to map the entire range of genomic changes involved in human cancer.
When the TCGA project kicked off in late 2005, Sanger sequencing and microarrays were the only available genome-wide characterization platforms, so the pilot project was structured around those technologies. Now, with next-generation sequencing technologies on the market from 454, Illumina, and Applied Biosystems ­— and a host of competing systems waiting in the wings — it appears that these tools will be playing an important role in any eventual full-scale effort.
Last week, the NCI and NHGRI awarded eight two-year grants totaling $3.4 million to support the development of new technologies for the project [In Sequence 07-03-07]. Three of those grants, worth a total of $1.2 million, went to projects based on next-generation sequencing technology.
In interviews with In Sequence, investigators involved with the three projects noted that the rapid development of new sequencing technologies has enabled a range of applications for the TCAG project that were not even conceivable when the initiative kicked off a year and a half ago. Some even suggested that next-generation sequencing might make the goal of a full-scale Cancer Genome Atlas — which carries a tentative price tag of around $1.5 billion, according to some estimates — financially feasible.
“Two years ago, nobody dreamt they’d be able to sequence 10 million ditag clones in a single run. It was beyond the realm of possibility,” said Timothy Bestor, a professor of genetics and development at Columbia University who was awarded $362,000 to apply ultrahigh-throughput sequencing to study genomic methylation patterns.
“Two years ago, to sequence 10,000 clones would be extremely expensive and time consuming, and now we can do 10 million in much, much less time, and much lower cost. I suspect that 20, 30, even 50 million clones can be read within a couple years,” he said.
Aleksandar Milosavljevic, an associate professor in the department of molecular and human genetics at Baylor College of Medicine who was awarded a $413,000 grant to develop sequencing-based methods to analyze chromosomal rearrangements, compared the state of sequencing technology today to that of the early days of personal computing, likening systems from 454 and Solexa to the first desktop computers.
“If there are any lessons to learn from that, it’s that we need killer apps … and we are seeking those applications now to employ these technologies in the fight against cancer,” he said. “To get wide adoption of these applications, you really need to reach certain cost thresholds, which is almost guaranteed to occur. The question is how fast we’ll get there.”
Hanlee Ji, acting assistant professor in the clinical cancer genomics group at the Stanford Genome Technology Center, is the co-principal investigator with Ronald Davis on a $429,000 grant to develop new methods for isolating genomic regions for DNA sequencing. He told In Sequence that the method could help make a full-scale TCGA project “more cost-feasible, because currently, the cost estimates as they stand right now don’t seem to be very practical.”
Scaling Up Selector
Ji said that the TCAG grant will enable further development of a method his group published in May in the Proceedings of the National Academy of Sciences that combined a new multiplexed, target-specific amplification method called selector technology with next-generation sequencing [In Sequence 05-22-07].
That paper “outlined much of what we initially proposed” for the TCAG grant, Ji said, noting that his team has already hit most of the milestones for its original proposal. “We’re actually planning on being much more ambitious now that we’ve set up a lot of the initial infrastructure,” he said.
The key to the method is the amplification step, which uses oligonucleotide constructs called selectors to guide the circularization of DNA target regions. This enables researchers to simultaneously perform PCR on hundreds of constructs with a single universal primer pair.
“The idea is that if you can create tens of thousands of these circles, ultimately, which is our target, and they all exist in one tube and you’ve managed to avoid the infrastructure requirements that simplex PCR would require,” Ji said, adding that the method would also reduce the amount of DNA necessary for a large-scale resequencing project.
Currently, Ji said, “We believe we can get close to 10,000 circularized genomic regions based upon our previous experience of doing circularized amplification strategies.”
For the purposes of the TCGA, “this has applications for identifying somatic mutation events that aren’t gross deletions, or things that would not be able to be picked up using traditional array [comparative genomic hybridization],” he said. “Obviously for point mutations and that kind of thing, sequencing would be the only way to identify those events.”
Longer term, he said, the Stanford team envisions the method being used routinely in prospective clinical studies. “We’d like to be able to have the ability to rapidly analyze these genes for somatic mutations, and doing it rapidly enough [so that] the mutation profiles could be provided as part of the prospective follow up for patients in clinical trials in oncology.”
Ji said that the group plans to test the method on all available next-gen sequencing platforms. The genome center already has a 454 instrument, and Ji said that his group is planning on buying an Illumina Genome Analyzer.
As for cost improvements compared to current methods, Ji said that the approach should be “dramatically” less expensive than simplex PCR and Sanger sequencing, but noted that estimates are “a bit of a moving target right now because the costs for the parallel sequencers are really changing.”
Ji estimated that it currently costs hundreds to thousands of dollars per sample for resequencing using Sanger technology. “I’m hoping that we can drive it down by at least an order of magnitude, if not lower,” he said.
Technology ‘Time Warp’
Like Ji, Baylor’s Milosavljevic said that rapid advances in sequencing technology enabled his team to meet many of the goals of its TCGA grant proposal before it was even awarded. “Since the time we proposed [the project], the landscape has changed considerably,” he said. “We are in a time warp, so predicting the future is getting very tricky because the situation is changing by the month.”
Milosavljevic said that his group is taking a multi-pronged approach to adapt next-generation sequencing technology to study chromosomal rearrangements at the “intermediate scale” — between the 1,000 base pair level and the cytogenetic level.
Milosavljevic hopes to improve on the current technology of choice for this application — array-CGH — by streamlining the analysis pipeline for sequencing-based analysis of structural variation. To do this, his team is tackling both the input to the sequencing instrument and the output from it. On the input side, the researchers are looking at ways to improve the handling of small, heterogeneous samples and to design improved vector constructs for mapping.
On the output side, Milosavljevic’s team is developing bioinformatics tools for analyzing rearrangement data. One program, called Pash (short for Positional Hashing), is being developed to anchor tens of millions of short reads onto the reference human genome and convert them into rearrangement information. Another program, Genboree, is being developed to integrate that information with other genomic data.

“To get wide adoption of these applications, you really need to reach certain cost thresholds, which is almost guaranteed to occur. The question is how fast we’ll get there.”

Milosavljevic stressed that both the input and output technologies are being developed to run on any next-generation sequencing technology “as long as the reads are of sufficient length to be anchored onto the human genome reasonably uniquely” — between 20 and 50 base pairs, he said.

He said that his team has access to the instruments at the Baylor genome center, which currently runs systems from 454 and Illumina and is also involved in a collaboration with Applied Biosystems to develop applications for ABI’s SOLiD technology.
“We have still not committed to a particular technology, and we’re looking at cost and reliability,” Milosavljevic said.
The team has conducted some proof-of-concept studies that it plans to submit for publication in a few months, he said.
Bringing Methylation to Light
For Columbia’s Bestor, next-generation sequencing offers not just an improvement over current methods, but the ability to uncover information that was unobtainable with previous experimental platforms like microarrays.
“There haven’t been satisfactory methods for methylation profiling except for those that were based on prior selection of sequences, and that prior selection introduces a big bias,” he told In Sequence. “You were looking for your car keys under the streetlight.”
Bestor said that the “major advantage” of sequencing over microarrays is its ability to handle repeated sequences, which are of particular interest in methylation analysis.
“It’s beyond a doubt true, but it’s widely ignored, that most of the five methyl cysteine in mammalian genomes is in repeated sequences, especially transposons, which make up more than 45 percent of the genome. So if you restrict yourself to methods that involve hybridization, then of course you can’t really look at repeated sequences, but by ultra-high-throughput sequencing, you can look at the whole genome,” he said.
Bestor’s team is developing a method based on ditag sequencing to profile the methylation status of cancer genomes. While the group doesn’t have a next-gen sequencer in house yet, it has collaborated with vendors and partners to test its approach on instruments made by 454, Solexa, and ABI.
“We’re expecting the ABI data this week, we got the Solexa data two weeks ago, and the 454 data should be coming very soon, so we will have an opportunity to compare them, but we haven’t yet,” he said.
From a technical standpoint, he said, the most important feature for methylation analysis is the ability of the instruments to perform ditag sequencing. “For a ditag mapping project like ours the throughput is much more important than the read length,” he said. “What we need are about 10 million ditag reads per genome to get to where we can identify methylation abnormalities.”
So far, he said ABI appears to be at an advantage because it already has ditag libraries available, though he noted that Illumina is supposed to release a ditag library construction method in the fall. In addition, he said, the throughout of the 454 technology is constantly improving, making the platform competitive with the other two.
“Which technology will win out in the end at this point is hard to say, but it’s really going to benefit research,” he noted. “The competition must be hell for the companies, but it’s great for the scientists.”

The Scan

Pfizer-BioNTech Seek Full Vaccine Approval

According to the New York Times, Pfizer and BioNTech are seeking full US Food and Drug Administration approval for their SARS-CoV-2 vaccine.

Viral Integration Study Critiqued

Science writes that a paper reporting that SARS-CoV-2 can occasionally integrate into the host genome is drawing criticism.

Giraffe Species Debate

The Scientist reports that a new analysis aiming to end the discussion of how many giraffe species there are has only continued it.

Science Papers Examine Factors Shaping SARS-CoV-2 Spread, Give Insight Into Bacterial Evolution

In Science this week: genomic analysis points to role of human behavior in SARS-CoV-2 spread, and more.