Years after the high-quality draft of the human reference genome was delivered, it's now well known that it's not just genetic variation that causes differences in gene expression. Epigenomic changes, whether DNA methylation or histone modifications, are increasingly being studied for their role in both normal and disease-associated phenotype changes. The NIH-supported Roadmap Epigenomics Program, an international effort to create reference epigenomes for a variety of cell types, aims to do for epigenetics what the Human Genome Project did for genomics and the HapMap Project did for genetic variation. Ultimately, using these maps to study the relationship between epigenetic changes and disease will allow scientists to home in on how cancers and common diseases develop.
Hot on the heels of the discovery push has been technology development. Thanks to microarrays and next-generation sequencing, new methods let scientists profile methylation on the genome-wide level, and not just at a handful of CpG sites. "We're really at that stage of defining what [the] reference methylation state is, and then from there we can start to investigate how much variation there may be," says Martin Hirst of the British Columbia Cancer Agency. "To do that, you obviously need to do it genome-wide and at the highest resolution that you possibly can."
MD Anderson's Jean-Pierre Issa, who studies methylation in aging and cancer, adds that the idea behind reference epigenomes is that "we need to know what is normal, and then we can figure out what is abnormal."
Arrays, while still used as a readout technique, are becoming a thing of the past with the advent of accessible next-gen sequencing tools. For some time, people have been using microarrays to profile methylation on CpG islands, and several vendors offer genome-wide arrays. "The current belief is that only DNA methylation in the context of CpG [sites] is biologically meaningful," says University College London's Stephan Beck, formerly co-leader of the Human Epigenome Project and now advisor to the much larger NIH epigenomics roadmap. "This might change, but this is what our current understanding is."
In cancer, for instance, it's well known that CpG islands in the promoter regions of tumor suppressor genes, which are typically unmethylated, become methylated and thereby turn on aberrant gene expression. However, focusing solely on CpG islands doesn't really give a genome-wide look. "It represents a collection of known CpG islands, but it doesn't include any intergenic regions or other regions that may be methylated," BCCA's Hirst says. "And some of those methylations undoubtedly have biological relevance."
The move toward genome-wide profiling includes techniques such as using methylation-sensitive restriction enzymes, bisulfite conversion, and affinity capture. Generally speaking, Issa says, "they usually rely on ligation of some adaptor to a site that is either methylated or unmethylated, PCR, and then microarrays or more recently, sequencing."
In restriction enzyme-based approaches, a methyl restriction enzyme can be used to cut unmethylated DNA but not methylated DNA, followed by shotgun sequencing of that fraction. According to Issa, "The advantage of that is that all you need is the DNA and the restriction enzyme. The disadvantage is that you are looking only at the restriction enzyme site," which is typically one or two CpGs in a CpG island which consists of 20 or 30 or more, "so you are limited in your resolution." Because CpG islands tend to behave in the same way when it comes to methylation status, it's a good way to get a snapshot with a fairly high degree of accuracy even though the actual coverage of CpG sites is relatively low. Hirst says the disadvantage of restriction libraries is that regions that don't have those enzymes won't be represented.
In affinity capture methods, such as MeDIP, an antibody to methylated cytosine is used to immunoprecipitate the methylated portion of the genome, which is followed by sequencing. "It's pretty good for genome coverage, [but] it's lower resolution than some other methods," says Joe Costello at the University of California, San Francisco, adding that resolution is around 100 to 300 base pairs, rather than a single CpG site. Costello says that using a methyl restriction enzyme and MeDIP on the same sample works well. "One of the advantages is that it's pretty comprehensive and it doesn't require as much sequencing, [in other words] lower cost," he notes.
Jin Billy Li, a postdoc in George Church's lab, says, "This method is often biased toward CpG islands, or the regions with more than one or two methylated cytosines." Hirst adds that the disadvantage of MeDIP is that repeat sequences tend to be overrepresented.
Bisulfite conversion is also widely used, and many think that this approach would ultimately be the gold standard. While bisulfite sequencing has only been applied to plants — Joe Ecker at the Salk Institute performed a single base-pair resolution analysis of DNA methylation in Arabidopsis using bisulfite conversion followed by whole genome shotgun sequencing in 2006 — it's far too costly to do for a large mammalian genome. Treatment of DNA with bisulfite converts cytosines to uracils, but leaves methylated cytosines alone. Subsequent PCR or sequencing can tell the difference between the bases. "The major advantage and the reason why bisulfite methods are the gold standard, whether you're looking at a single gene or genome-wide, is that every time you come across a CpG site, you get a yes or a no" as to whether it's methylated, says Costello.
One disadvantage to bisulfite conversion followed by shotgun sequencing is the cost. Also, Issa says, "The disadvantage of bisulfite is primarily that the chemical treatment really degrades DNA down to a pretty low level, often down to 200 or 300 bases. This therefore limits what one can do." As an alternative, the Broad Institute's Alex Meissner led development of a method called reduced representation bisulfite sequencing, or RRBS, where one uses a restriction enzyme to reduce the size of the DNA sample to a small, but targeted, portion of the genome. That's then treated with bisulfite and sequenced. Using the MspI restriction enzyme and a chosen fragment size of 300 base pairs, "it will give [you] a lot of CpG islands which tend to be near promoters, but also a significant subset of fragments that are well outside of CpG islands," Costello says, "so it is biased to a certain part of the genome, but it certainly represents a lot more than just that part."
As one of the four labs awarded Reference Epigenome Mapping Center grants as part of the NIH roadmap, Hirst's group at BCCA is still investigating what works best. Issa thinks that bisulfite methods and, eventually, bisulfite sequencing will become the gold standard, possibly even in the next six months to a year. "I think the jury is still out of which is the best and it may be that there's some combination of those methods that's going to be required to actually comprehensively profile the methylated genome," Hirst says. "It's probably likely that each will have [its] own bias, to some degree."
The goal of the mapping centers is to categorize the normal methylation and histone mark profiles — scientists are limited to studying these with chromatin immunoprecipitation right now — so that they can serve as references. While common diseases in general will eventually benefit, cancer is front and center. There is a well-known link between genome methylation and cancer, specifically that not only do CpG islands in the promoters of tumor-suppressor genes become hypermethylated, but also there is genome-wide hypo-methylation as the tumor progresses. Hirst says, "Understanding the consequences and causes of global hypomethylation in tumor progression is of great interest" to his lab, and he's studying methylation patterns in stem cells as a model system for tumorigenesis.
While the NIH roadmap project is finding normal patterns of methylation and histone modifications, Stephan Beck's lab at UCL is one of many busily profiling the cancer methylome as part of the International Cancer Genome Consortium, which aims to obtain a comprehensive description of -genomic, transcriptomic, and epigenomic changes in 50 different tumor types. "What the majority of people now believe is that there are more epigenetic changes in a cancer genome than genetic changes," Beck says. "The difficulty is to tease out the driver mutations from the passenger mutations, basically the mutations that cause cancer rather than those that are a consequence of the cancer."
To accomplish this, he performs global methylation analysis on cancer tissue to find out exactly where in the genome methylation changes occur. "Is it random?" he asks. "Is there anything we can see which helps us understand how the mechanism and how the timing of these changes is coming about?"
While there's less data available, many people think methylation might be relevant for a whole host of things — for example, common diseases, stem-ness and differentiation, brain function, and more. "Really, name the disease and people are interested in whether there could be an epigenetic component to it, and whether it could be detected by methylation," says Issa at MD Anderson.
For common diseases, Beck says GWAS are not enough to explain what causes a certain phenotype. To that end, he's begun incorporating methylation analysis into GWAS, looking for changes which he then ties across cases and controls. While GWAS for genetic changes have a good four- to five-year head start on epigenetics, integrating the two lets him look for what he calls "hepitypes" (haplotype-epitype) in common diseases. "These, we believe, have higher chance of being causal than consequential," Beck adds.
Right now, Issa says, most clinical application of all this epigenetic typing has been at the level of single genes or small panels of genes, where people are looking for methylation as an indicator of the presence of cancer in blood, or in the relationship to disease prognosis or response to therapy. "But whether whole-genome analysis in every single case of cancer, for example, is going to help as opposed to just studying a few genes remains to be seen," he says.
On the horizon
While bisulfite sequencing may be the gold standard, truly affordable, next-gen whole-genome shotgun approaches aren't here yet. RRBS is one method to capture a targeted portion of the methylome, but there are others that are equally promising. Two complementary papers published recently in Nature Biotechnology used padlock probes to capture a subset of the genome. In the first, led by Kun Zhang and Virginia Commonwealth University's Yuan Gao, they designed about 30,000 probes that allowed them to look at genome-wide methylation across CpG sites on three chromosomes. Gao's goal was to "specifically target a certain region of the genome in a single tube without doing many, many PCRs," and in this paper, they proved that it could be used with bisulfite sequencing. George Church's lab did similar work, using both padlock probes and a technique called methyl-sensitive cut counting, which cuts the DNA into probes and ligates them to create a library of fragments of relatively uniform size. Co-author Jin Billy Li adds, "One of the main features is the high specificity" to the portion of the genome that was actually targeted, and he sees capturing methods becoming even more targeted in the future.
Nanopore sequencing is another "next-next gen" method for doing global profiling. The change in the current through the nanopore as a single DNA molecule passes through permits a direct reading of the DNA, and this technique would be able to tell methylated from unmethylated cytosine residues. "The beauty of this system is that it will be able to analyze methylated DNA as you isolate it from the cell," Beck says. "That means without bisulfite conversion, without enrichment, without labeling" for unbiased profiling at every single CpG site.
"If that ever works, that's the future of this analysis," Issa says.