New sequencing technologies will likely play an important role in the second phase of the Encyclopedia of DNA Elements Project, and chromatin immunoprecipitation coupled with high-throughput sequencing might start replacing ChIP-chip experiments, according to researchers who are familiar with these methods.
But the scientists caution that both techniques are still improving in performance and price, that the number of datasets using high-throughput ChIP-sequencing is still small, and that ChIP-chip might retain the upper hand for some applications that do not involve the entire human genome.
“If one had to scale [the ENCODE project] to the whole mammalian genome right now, then probably ChIP sequencing [would be the way to go],” said Michael Snyder, a professor of molecular, cellular, and developmental biology at Yale University.
Last week, the ENCODE Project Consortium published results from the pilot phase of the project, in which 35 research groups analyzed functional elements in approximately 30 megabases, or 1 percent, of the human genome.
This year, the National Human Genome Institute expects to spend $23 million in grant funding to continue the project and scale it up to the entire genome, and researchers are waiting to hear back about their applications this fall. According to the NHGRI’s request for applications, the scale-up will involve “methods that have been clearly demonstrated to identify sequence-based functional elements efficiently, comprehensively, cost-effectively, and in a reasonable time frame.”
In addition, the institute wants to fund “continued pilot efforts that analyze the ENCODE target regions in new and interesting ways” as well as “new pilot projects using novel technologies, such as those developed by the first set of technology-development projects, to study the ENCODE target regions.”
Several of the ENCODE pilot studies published last week in Genome Research and Nature characterized transcription factor binding sites by ChIP-chip, or chromatin immunoprecipitation followed by analysis on genomic tiling arrays, such as those from Affymetrix or NimbleGen Systems.
But recently, two research groups published genome-wide analyses of transcription factor targets using ChIP followed by high-throughput sequencing, or ChIP-seq (see In Sequence 06/12/2007). They used Illumina’s Genetic Analyzer.
Another group published a transcription factor target analysis last week in which they used 454’s Genome Sequencer to sequence ditags. These studies are raising the question of which of the two approaches — ChIP-chip or ChIP-seq — offers the best bang for the buck.
Both methods yield high-quality data, and their results overlap extensively, according to Snyder, who was involved in one of the Illumina-based ChIP-seq studies, led by researchers at the British Columbia Cancer Agency Genome Sciences Center, that was published in Nature Methods last week.
In that study, the researchers mapped all binding sites of the STAT1 transcription factor in a human cell line and compared their results with ChIP-chip analyses of four chromosomes using 50-mer high-density tiled oligo arrays. They found ”a striking correspondence between peak sets generated by the two platforms,” according to the paper.
Snyder is also an author on another study, published last week in Genome Research by scientists at the University of Texas at Austin. In that study, the scientists mapped the chromosomal targets of STAT1 by sequencing ditags on 454’s Genome Sequencer, a method they call Sequence Tag Analysis of Genomic Enrichment, or STAGE.
High-throughput sequencing offers a number of advantages over microarrays for ChIP analysis, according to researchers.
“One of the biggest problems with microarrays is cross-hybridization,” especially when it comes to complex mammalian genomes, said Snyder.
In addition, ChIP-seq can address regions of the genome that are inaccessible to microarrays because of lack of hybridization, according to Rick Myers, director of the Stanford Human Genome Center. Myers’ lab, in collaboration with Barbara Wold’s group at Caltech, published an analysis of the NRSF transcription factor by ChIP-seq last month, using Illumina’s Genetic Analyzer.
Another advantage of Chip-seq is that “all you need to know is the sequence of the genome of the organism you are studying,” he said, and researchers do not need to build new tiling arrays for each new organism they want to study.
On the other hand, ChIP-seq has problems with highly repetitive perfect repeats in the genome where sequence reads are hard to place, according to both Snyder and Myers, who referred to their studies that used Illumina’s platform.
Snyder believes that at present, ChIP sequencing using that platform has an edge over ChIP-chip for mammalian genomes. “I believe the sensitivity is better, the resolution is definitely better, and, finally, the cost is better,” he said.
“You get more data, the background is lower, it’s more comprehensive,” Myers said.
The large number of sequence reads that Illumina’s platform generates in one experiment plays an important role in that. Though the longer reads that 454’s system provides would improve the mapping of the sequence tags to the genome, “having 10 times as many counts is definitely much better,” Snyder said.
“We can get 22 million reads with Illumina for less than the cost of 500,000 reads from 454. 500,000 reads just does not cover the genomic reads at the depth that you would like.”
“We can get 22 million reads with [Illumina‘s sequencer] for less than the cost of 500,000 reads from 454,” said Steven Jones, associate director of the BC Cancer Agency GSC, in an e-mail message. “500,000 reads just does not cover the genomic reads at the depth that you would like.”
Longer reads on Illumina’s system would be an advantage, though, he noted, because as they increase “we will be able to map reads more and more accurately, especially when the ability of paired-end approaches becomes available.”
Cost differences between ChIP-chip and ChIP-seq, using Illumina’s platform, are several-fold at the moment, Snyder said, but that could change as tiling arrays for mammalian genomes improve.
“If the next-generation arrays have more probes and they are longer, then sure, maybe it will become more cost-effective for ChIP-chip,” he said. “Having the competition, quite frankly, is very, very good,” he added.
Snyder also cautioned that the number of published datasets using high-throughput ChIP sequencing is still small, and the initial results may not hold up in the future.
Myers said that Chip-seq may have some biases of its own, although his team has not seen any yet.
Finally, for the time being, ChIP-chip still has the upper hand over sequencing when it comes to organisms with simpler genomes, like yeast, where cross-hybridization is not a big issue, according to Snyder.
“For less complex genomes like yeast, it’s clear that ChIP-chip is the most effective [technology], and with these high-density oligo arrays, the data is absolutely beautiful by ChIP-chip,” he said.
Both Snyder and Myers say that if they end up participating in the second phase of the ENCODE project, they plan to use Illumina’s sequencer for ChIP-seq analyses. “If we get our grant, we will do a lot of ChIP sequencing,” Myers said.
Others might do the same. “There seems to be a lot of interest from the people in the field who have typically used arrays,” according to Jones.
So is it likely that sequencing will eventually replace microarrays for ChIP applications? “There is a good prospect that it could,” Myers said.