Not so very long ago, a scientist looking to figure out where a particular transcription factor bound on a certain gene, chromosome, or, heaven forbid, full genome was in store for a slow, laborious slog. For years, researchers performed these one-at-a-time studies, gradually adding bits here and there to the community's understanding of DNA binding.
Naturally, scientists reared in the high-throughput world of massively parallel genomic technologies would find this practice unacceptable — so they have in recent years taken their most promising new platforms, microarrays and next-gen sequencers, and aimed them at better ways to study chromatin immunoprecipitation.
The results of that — ChIP-chip and ChIP-seq — have completely reshaped the way researchers can ask questions, and are beginning to change the very things scientists thought they knew about how the genome functions. According to Duncan Odom from Cancer Research UK, who has used the tools to perform comparisons of matched tissues in multiple organisms, "The data is heart-stopping."
That said, it's still early days for these technologies, and scientists continue to question the best ways to apply them, what the weaknesses are, and how best to analyze the data. For starters, early studies have indicated that one broad assumption — that ChIP-seq would easily replace ChIP-chip — has no scientific basis.
A new paradigm
Karen Adelman, a PI at the National Institute of Environmental Health Sciences, is one scientist whose research was turned on end with the advent of genome-scale ChIP studies. Her lab's focus is a phenomenon called promoter proximal stalling, a regulatory strategy in which polymerase is recruited to the promoter-proximal region and then, like a car with the key in the ignition, waits for a signal before starting transcription. The pattern was "shown to be important for maybe a dozen genes," Adelman says. "People thought of it as a very rare strategy."
In an effort to study this mechanism in as unbiased a way as possible, Adelman and her team applied ChIP-chip and ChIP-seq to find where polymerase is bound in the genome and compared that to where promoters were actively transcribed. The result: "a huge number of promoters occupied by RNA polymerase that were not producing detectable transcripts," she says. That lends credence to the idea that "regulation of gene expression occurs after transcription initation," Adelman says — a concept "that goes against what the reigning dogma was in the field even just a couple of years ago."
The technology was the crucial factor here. "This genome-wide capability has been essential … in forcing us to re-evaluate regulatory strategies," she says. "The advent of this genome ChIP-chip or ChIP-seq technology has been what's allowed us to show that."
Adelman's situation is certainly not unusual. Axel Visel, a scientist at the Lawrence Berkeley National Laboratory, studies distant acting enhancers in the human genome, a task for which he used to rely exclusively on comparative genomics followed up with transgenic mice. The comparative genomics approach was useful in finding highly conserved noncoding sequences, and the mouse assays would provide good information on where those sequences were serving as enhancers, Visel says. The limitation, however, was that the method could find sequences likely to be enhancers, but it couldn't give insight on where those enhancers were probably acting.
Intrigued by some ChIP-chip data he'd seen suggesting "some of the proteins that are typically associated with these distant acting enhancers," Visel says, he embarked on a project to see what ChIP-seq could offer. His team took tissue samples from forebrain, midbrain, and limbs of embryonic mice, "then isolated the chromatin and performed ChIP-seq … using an antibody that was directed against the p300 protein," Visel says. The protein was chosen because it appears to be a general cofactor that has a role in chromatin remodeling, and the team emerged with genome-wide maps of the protein's binding patterns in each of the tissue types. What became clear immediately was that "the genome-wide binding is completely different in the three tissues," Visel says.
His lab went back to mice to study the finding further, creating hundreds of transgenic mice to study the different binding patterns and see if the in vivo results matched up with the ChIP-seq project. As it turned out, he says, the in vivo results meshed nicely, giving Visel more confidence in the ChIP data and encouragement to start a batch of new studies with the technology. For instance, he says, it would be interesting to run this sort of interrogation for various disease states across a number of tissue types.
When his team was just using conserved sequence findings from comparative genomics queries, he says, "we could only do a genome-wide survey [for candidates]. Now we can really focus these efforts on specific biological processes."
Of course, scientists aren't the type to get breathless over a new technology — so while they're delighted to have access to ChIP-chip and ChIP-seq, they're already looking at ways to improve the tools and data analysis. First up: which is better, arrays or sequencers?
Greg Buck, director of the Center for the Study of Biological Complexity at Virginia Commonwealth University, has enlisted ChIP technology to advance his lab's investigations into eukaryotic pathogens and the evolution of pathogenicity. His team has used the Illumina sequencing platform because "we're much more interested in the quasi-digital, quantitative data that we get out of ChIP-seq rather than the more analog data from the microarrays," he says. "We have done this with both technologies. Now we're trying to see if the data match."
Based on his own observations, Steve Qin at the University of Michigan would probably predict that the data won't match. His lab's studies of array-based data compared to sequencer-based data indicate that, under fairly stringent criteria, "only about 60 percent of the peaks overlap," he says. He suspects that's due to the inherent but different biases of each platform. With arrays, the bias comes from having a fixed set of probes and only interrogating known regions of the genome; with sequencers, bias varies by instrument type but a typical weakness is in GC-rich regions, he notes. The good news, as he sees it, is that these biases are very specific to each platform, so there's little chance that an error with one tool would be duplicated by another. He's working now "to combine the data together and get better detection rates of transcription factor binding sites," he says. The idea is that "by bringing these data together we can recover more than by just using one technology alone." Results so far have been quite promising: he's been "able to achieve higher sensitivity and specificity" using merged data sets, he says.
Data analysis is also proving a challenge to the ChIP crowd. Qin helped develop an HMM-based method called HPeak that's available online, but he says that applying HMM to ChIP-seq data is "a little bit more tricky" than using it on ChIP-chip data. Meanwhile, Michael Seifert at the Leibniz Institute of Plant Genetics and Crop Plant Research has worked on a new take on HMM using scaled transition matrices that was recently tested and predicted more target genes of a particular transcription factor in Arabidopsis than the regular version of HMM did. The new tool was designed for arrays, Seifert says, and hasn't been tried out with ChIP-seq data.