When David Johnson began his post-doctoral fellowship at Stanford University in 2006, ChIP-chip — chromatin immunoprecipitation combined with microarray analysis — dominated epigenetic research.
And, as the Stanford Human Genome Center's project director for the National Human Genome Research Institute's Encyclopedia of DNA Elements, or ENCODE, project, Johnson found himself neck-deep in the technique.
"When I started my postdoc, Rick Myers [then director of the Stanford Human Genome Center] said to me, 'You have to do all these ChIP-chip experiments. That's your job for ENCODE,'" Johnson recalls. He is now the founder and CEO of clinical genetics firm GigaGen.
There was just one problem, though. The method — which uses antibodies to pull down DNA-binding proteins of interest and then follows that with microarray hybridization to determine the sequence of the bound DNA — didn't actually work all that well.
"There were some pretty big problems with ChIP-chip," Johnson says. In particular, he notes, the technique suffered from the microarrays' limited resolution and the bias introduced by the need for significant amplification of the target DNA.
And then, of course, there was the matter of cost.
"The microarray industry was just salivating at the thought of ENCODE using ChIP-chip," Johnson says. "You were looking at several hundred dollars per ChIP, and people were talking about doing for ENCODE multiple biological and technical replicates for many, many transcription factors."
Given these drawbacks, Johnson began casting about for alternative techniques, a search that led him to work on next-generation sequencing being done by his former PhD advisor, Stanford researcher Arend Sidow, in collaboration with sequencing firm Solexa.
"They hadn't really nailed the [next-generation sequencing] technology yet, and [researchers] weren't [yet] sequencing full genomes," he says.
For ChIP-based epigenetics work, though, researchers didn't need to be able to sequence an entire genome — they only needed to be able to sequence the portions of the genome bound to their proteins of interest.
"I started chatting with [Sidow], and I realized that this was the perfect application for this new technology," Johnson says.
Johnson obtained next-generation sequencing kits from Solexa and reconfigured them to fit with the ChIP workflow. He sent Solexa the libraries from an initial ChIP experiment and, a few months later, the company returned the sequence data.
"We started digging into the analysis, and amazingly, the first experiment looked amazing," Johnson says. In the June 2007 edition of Science, Johnson, along with his co-authors and Stanford colleagues Myers, Ali Mortazavi, and Barbara Wold, published the paper introducing a new alternative to ChIP-chip — ChIP-seq.
Five years later, ChIP-seq is the preferred technique for studying protein-DNA binding. While microarray vendors like Affymetrix continue to offer products for ChIP-chip experiments, ChIP-seq has come to dominate the field.
"I would say that it's almost 100 percent ChIP-seq and zero percent ChIP-chip these days," says Michael Snyder, director of Stanford's Center for Genomics and Personalized Medicine.
"The reasons for that are three-fold," he adds. "One: It's the only way to do a big genome effectively. If you're doing humans or mouse, ChIP-chip is just too expensive. Two: [Next-generation sequencing] has gotten cheaper, so even for an organism with a small genome, you can mix enough samples together that it's cost effective [compared to ChIP-chip]. And three: The quality of the data is better."
This last factor is due to ChIP-seq's lower background and higher resolution, Snyder says. "You get much better signal to noise with ChIP-seq and you can zoom right in on peaks much, much better than you can with ChIP-chip," he adds.
And, as sequencing technology moves forward, ChIP-seq advances along with it. For instance, says Peter Park, a computational genetics researcher at Harvard Medical School, improvements in next-generation sequencing multiplexing have significantly decreased the cost of ChIP-seq experiments while also upping their throughput.
"Right now when we do ChIP-seq experiments, we will run 12 samples in a single [sequencer flow cell] lane," Park says. "That turns out to cost less than $100 per sample."
"Four years ago, we used to pay maybe $2,000 for the same experiment and get fewer reads," he says. "So that means that just in the past [few] years, the price per sample has dropped at least 20-fold."
That, Park notes, "means that people are now able to profile more [epigenetic] marks in more samples. So it's now feasible, for example, to look at time series [of marks] in some systems. Or, rather than profiling one or two marks, you can profile six or more marks, for instance. And this just opens up many new opportunities to study epigenetic mechanisms."
In fact, Park says, the increase in data enabled by ChIP-seq technology has led to something of a bottleneck in the backend analysis and informatics portions of these experiments.
As sequencing advances make it possible to simultaneously track more and more modifications, the combinatorial analyses involved become more and more complex.
"For instance," Park says, "we and other groups have published on the idea of chromatin states, where we would like to look not at individual [chromatin] marks, but at a set of marks as a whole."
"So, [in data from recent research] we had 18 different [chromatin] profiles, and then we defined different [chromatin] states [based on this data.] For instance, we can say, 'This region of the genome is an enhancer state, and enhancer means it has this certain combination of marks.'"
"This is a new approach in reducing complex chromatin data to a more interpretable form and is a way of annotating genomic regions with their potential function in a cell type-specific manner," he adds.
New tech, old problem
Informatics aside, the main issue facing ChIP-seq has remained unchanged since Johnson and his co-authors published their paper introducing the technique half a decade ago — a lack of quality antibodies for pulling down target proteins and their bound DNA in the initial immunoprecipitation step.
The Stanford researchers "had two excellent antibodies" to their target, the protein NRSF, Johnson recalls. However, he notes, after the success of their initial ChIP-seq experiment, he followed up by testing antibodies their lab had been using for ChIP-chip studies of a number of other transcription factors.
"And some of them just didn't generate good data," Johnson says. "They showed a huge amount of background and not much specificity."
Antibody quality is an issue throughout life sciences research, but it can be particularly challenging in the case of ChIP-seq where researchers are often trying to distinguish between very slightly different versions of target proteins.
For instance, says Jason Lieb, director of the Carolina Center for Genome Sciences at the University of North Carolina at Chapel Hill, "antibody specificity against different histone marks is a big issue, especially because they often differ by such a small moiety — a tri-methyl group versus a di-methyl group, for example."
Additionally, Stanford's Snyder says, while antibody vendors often validate their reagents for use in applications like western blotting, they rarely validate them for ChIP.
"For the most part, even if a vendor claims they have suitable antibodies [for ChIP], they may not be," he says. "So, definitely, as a word of caution, every investigator should validate them in their own hands."
Lieb adds that, beyond questions of informatics, antibodies, and sequencing technology, there are a few less frequently raised issues that ChIP-seq users would be wise to keep in mind.
For instance, depending on the buffer conditions used in the immuno-precipitation step, certain sections of chromatin may prove insoluble.
"Depending on what fraction of [chromatin] is insoluble, you won't be able to see certain parts of the genome," Lieb says. "So that's a critical factor for some proteins that often isn't really paid attention to."
Another challenge researchers have grown more aware of is the large size of the chromatin fragments typically analyzed in a ChIP experiment.
"In order to do ChIP, you have to break up the chromatin, and usually this is done by sonication," Lieb says. The large chunks of chromatin produced by sonication, however, limit ChIP-seq's resolution, a problem that Pennsylvania State University professor Frank Pugh has sought to remedy via a technique termed ChIP-exo, which improves resolution by using exonucleases to trim ChIP DNA post-immunoprecipitation.
"It's a little bit more specialized [than standard ChIP-seq]," Lieb says. "But there's no reason not to do it, and it adds an amazing amount of resolution — like single base level resolution. So it's not just incrementally better, it's substantially better."
As a still developing field, ChIP-seq would also benefit from better established standards to ensure the quality of such research, Snyder says.
In this regard, he notes, progress is being driven by the same force that years ago fostered the initial development of the technique — the ENCODE project.
"The ENCODE project set up a nice set of standards and set up a pretty reasonable bar for what should make a good quality ChIP experiment," he says. "I would say that before ENCODE, a lot of the [ChIP-seq] datasets that were out there really weren't of high quality. It was a mixed bag — some were good and some were not so good."
Snyder cites in particular a paper in the September issue of Genome Research written by a number of ENCODE researchers including himself, Park, and Lieb that aims to evaluate sources of variation in ChIP-seq research and establish guidelines for the technique.
"I would say that is definitely one of the best things to come out of the ENCODE project," he says. "It defined standards for these kinds of experiments."