DNA is a crowded place. In the old 'beads-on-a-string' electron micrograph of chromatin, it does not look like it has such a heady complex of transcription factors, enhancers, and higher order histones. But together, those factors all collude to regulate gene expression in cells.
Newer technologies — beginning with chromatin immuno-precipitation about 20 years ago, right up to next-gen sequencing — have kept the chromatin biology and DNA-binding fields spinning. Higher throughput approaches, especially next-gen -sequencing, are allowing researchers to take a genome-wide approach to studying regulation, whether at the chromatin level or at the DNA--binding level. "The technology completely revolutionized the whole chromatin biology field in general," says Jorge Ferrer of the Hospital Clínic de Barcelona.
Next-gen sequencing also provides an unbiased approach to genome-wide binding studies, says Sreeram Ramagopalan, a research fellow at Barts and the London School of Medicine and Dentistry. "It completely revolutionized binding studies," he adds.
Learning more about a cell's chromatin state can help researchers understand the basic science behind gene regulation and what binds DNA as well as how those states influence disease and how they have evolved over time. To do that, researchers are using ChIP-seq, among other methods, to develop chromatin maps, which they can then use in comparative studies.
"[Chromatin mapping] provides a functional annotation of the genome. ... It gives you pretty strong hints about what the functions of the DNA sequences are. It identifies the different regulatory elements," says Harvard University's Bradley Bernstein. "It tells you a lot about the genome, a lot about the different functional elements of the genome, and it tells you, in given cell population, what the regulatory state of those genomics elements is. In a greater context looking now at a particular … population, it can tell you what regulatory pathways are active."
Chromatin mapping and DNA-protein binding studies are being used to ask basic biological questions, including those about the behavior of nucleosomes — those beads on a string — and how proteins interact with DNA to regulate gene expression.
The University of Massachusetts Medical School's Oliver Rando is working to understand the interplay of chromatin organization and replication — what happens to the histones during replication? There are a few options, he says. The histone could sit back down on one of the daughter genomes right where it was, or it too could split, or not split. "The analogy I like to use in talks is the fact that nucleosomes come off the DNA during replication is somewhat analogous to thinking about DNA replication," he says. "If you pull the two strands of DNA apart, what would happen if the bases started moving back and forth and switching positions?" Understanding this behavior of histones has implications for understanding epigenetic inheritance, he adds.
Over at Martha Bulyk's lab at Harvard, there is a focus on DNA-protein interactions. She wants to understand how regulation is encoded in the genome and that led her to transcription factors. To study transcription factors on a large scale, she developed a protein-binding microarray. The arrays are loaded with double-stranded DNA; when researchers apply proteins to them, they can determine their binding specificities. "It's an in vitro approach that we think of as being complementary to various in vivo approaches that people are using, such as chromatin immunoprecipitation," Bulyk says.
Indeed, her group has integrated data from the arrays and other experimental methods to study gene regulation. Recently, they combined data from their protein-binding array approach with data from 1,700 publicly available gene expression microarray experiments and about 200 ChIP-chip experiments to determine the DNA-binding specificities of 89 yeast transcription factors and to study their roles in gene regulation. "We really did integrate lots of data sets out there to try to bring in an in vivo component so that we can try to predict the functions of transcription factors," she says. Some of the transcription factors they uncovered were known, but others were new. For a few of those, Bulyk's team performed its own ChIP-PCR experiments to confirm finding two novel regulators.
Bulyk is also interested in how transcription factors can act indirectly to influence gene expression and, for that, her team has looked at protein binding occupancy of direct and indirectly acting factors. With their protein binding data and a re-analysis of the same 200 ChIP-chip experiments, they could identify which DNA occupancy events were likely to be due to direct versus indirect binding. In addition, Bulyk says that for the indirect binding events, they could get an idea of what the likely direct DNA binders were.
"What's interesting is that it looks like when people do chromatin-IP experiments, by which they aim to identify genomic sites that are occupied by a particular transcription factor — not all of those sites are actually due to direct binding by that transcription factor. A number of those sites are due to the transcription factor interacting with some other DNA-binding protein or proteins," Bulyk says. "By looking at the DNA-binding specificity data, we can make inferences about what those interactors are."
The same approaches are being used to pursue clinical questions. Harvard's Bernstein, Miguel Rivera, and their colleagues recently applied chromatin mapping to study Wilms tumors, a common pediatric kidney cancer. They compared the chromatin state of those tumor cells to that of embryonic stem cells and normal kidney cells. They report in Cell Stem Cell that the Wilms cells are similar to normal renal stem cells in terms of their transcriptional and epigenetic landscape and, from that, they suspect that the renal stem cells may be the cell of origin for Wilms. "In short, looking at chromatin state for these cells illuminated stem cell pathways, stem-like epigenetic state, and also could pinpoint a few aberrant regulatory pathways that are relevant to the disease pathology," Bernstein says.
In addition, chromatin maps can be used to link variants to disease. Thousands of SNPs have been associated with disease, says Barcelona's Ferrer. Most of them are not in regulatory regions, but some are, he says. Focusing on pancreatic islet cells, Ferrer and his team used FAIRE-seq to make a map of the open chromatin in those cells. Formaldehyde--Assisted Isolation of Regulatory Elements, or FAIRE, was developed by Jason Leib's group at the University of North Carolina, Chapel Hill, and it captures nucleosome-depleted DNA but not nucleosome-occupied DNA. Ferrer's group then mapped sequence variants to those open chromatin sites to see which variants might be functionally regulating the genome.
In their proof-of-concept study, the researchers found a variant linked to type 2 diabetes located in the open chromatin of islet cells. In heterozygotes, Ferrer says, you'd expect to see a 50-50 allelic balance; any deviation from that ratio could be said to be from a cis-regulatory variant affecting chromatin state. Now, Ferrer adds, they can apply this approach genome-wide. "We can detect allele balance genome-wide, so we can assess what we call chromatin gaps where there are sequence variants that act in cis to regulate the open chromatin," he says.
Ferrer's group has also found potential functions for disease-associated variants, which could help the search for causal variants. "It's very exciting because we can find a function for disease-associated variants and ... other variants that have not been linked to disease yet that are in disease--associated loci," he says. "By virtue of their functional importance that we can assign through our studies [we may find] the ones that are really doing the harm that are underlying these disease associations."
Where a protein binds, or does not bind, DNA may also influence disease. Recently, Barts' Ramagopalan and his team used a ChIP-seq approach to get a genome-wide look at where the vitamin D receptor protein binds. There is, Ramagopalan says, a strong link between vitamin D deficiency, multiple sclerosis, and other diseases, especially autoimmune diseases. "We were wondering how vitamin D deficiency could increase the risk of MS," Ramagopalan says. "We thought it could be through genome-wide interactions and there wasn't any knowledge that we knew about where vitamin D acts on the genome."
From this, his team found about 2,700 binding sites, 229 of which showed expression level changes in response to vitamin D. The sites were also enriched near genes already associated with autoimmune diseases and cancer. Ramagopalan and his group are following up on some of the variants to see how they mediate vitamin binding and influence gene expression.
Where proteins bind to DNA or how the chromatin is arranged can also offer a glimpse into evolutionary history. In studying where vitamin D receptors bind DNA, Ramagopalan's team also noted that the binding regions were associated with genomic regions undergoing positive selection in Asian and European populations. Those regions, the researchers speculate, were selected for as some human populations migrated out of Africa and lost skin pigmentation. "We saw an association of vitamin binding with selection," Ramagopalan says.
At the Cambridge Research Institute, Duncan Odom and his colleagues took a comparative ChIP-seq approach to study the evolution of transcription regulation in mammals. There is an assumption that regulatory information is encoded similarly in different species, he says. However, a few years ago, Odom's group showed that this might not be the case. Using spotted oligonucleotide arrays for about 4,000 orthologous human and mouse genes — chosen because of their one-to-one orthology, he says — they found that transcription factor binding is divergent between human and mouse. "In other words, transcription factor binding, it seems, of a set of genes in human rarely overlaps well with mouse," he says.
More recently, using ChIP-seq, Odom and his colleagues were able to study a variety of different species genome-wide — they looked at humans, mice, dogs, rats, and chickens — to determine whether transcription factor binding to target genes is conserved. "Long story short ... it's not nearly as dramatic as you might expect," Odom says. "There's some hint that there is enhanced conservation in the target genes that are dependent on transcription factors to be present but it's not that much higher than in the background genome."
At UMass, Rando's lab is also performing comparative analyses of chromatin maps, with an eye to chromatin behavior. If you only look at one species, he says, you cannot really tell what what you see means to that cell. For example, in Saccharomyces cerevisiae, Rando says there is a stereotypical arrangement of the nucleosome that is found on almost all genes — the plus one nucleosome is positioned so that the transcriptional start site is usually located about 12 bases inside the upstream border of the nucleosome. "When you see something that happens in all genes in a single organism, it's difficult to ask what that means to the cell," Rando says. And that is where comparative studies come in. From those, researchers can determine whether the nucleosome behaviors they see were conserved over millions of years of evolution or "are quirks of your species," he says.
"Of course, there are lots of interesting evolutionary questions as well, which we haven't figured out how to formulate — how does one define positive selection on a chromatin feature?" he adds. "It's harder to think about when you don't have codons."