UK researchers have developed a sequencing-based approach for quantifying levels of two related epigenetic modifications: 5-methylcytosine and 5-hydroxymethylcytosine, a demethylation intermediate suspected of having its own epigenetic functions.
"This dynamic between methylation and hydroxymethylation, I think, is actually key to function," University of Cambridge chemistry researcher Shankar Balasubramanian told In Sequence. "So knowing the levels of both and how the ratio between the two changes at various positions in different cell states is quite important for understanding function."
Balasubramanian was co-senior author on a study published online last week in Science that introduced the new approach, called oxidative bisulfite sequencing, or oxBS-seq.
In contrast to conventional bisulfite sequencing, which involves converting unmodified cytosine bases to uracil while leaving methylated or hydroxymethylated forms of the base unchanged, oxBS-seq involves a two-step conversion that leaves methylated cytosine alone but turns hydroxymethylated and unmodified versions of cytosine into uracil.
By directly looking at 5mC patterns with oxBS-seq and using standard bisulfite sequencing to see both 5mC and 5hmC modifications in the same sample, Balasubramanian and co-authors explained, it's possible to extrapolate 5hmC levels.
"While [bisulfite sequencing] leads to both 5mC and 5hmC being detected as [cytosines], this 'oxidative bisulfite' sequencing approach would yield [cytosines] only at 5mC sites and therefore allow us to determine the amount of 5hmC at a particular nucleotide position by subtraction of this readout from a [bisulfite sequencing] one," they wrote.
In mouse embryonic stem cells, for example, researchers used the approach to determine 5mC and 5hmC levels at thousands of cytosine and guanine nucleotide-rich sites known as CpG islands, showing that the 5hmC modification was enhanced at CpG sites near transcriptional regulators and LINE1 elements.
The presence of both methylation and hydroxymethylation marks near the transcriptional regulators may reflect their role in cellular differentiation, co-senior author Wolf Reik, an epigenetics researcher affiliated with the Babraham Institute and the University of Cambridge's Centre for Trophoblast Research, told IS.
"Later on in development when [the cells] stop being pluripotent and set off in a certain direction, you want to be able to switch these transcriptional regulators quickly — in one lineage, you want them on and in another lineage you don't want them on."
The identification of 5hmC near LINE1 elements is a bit trickier to interpret, Reik added, though there is evidence that these retrotransposons are reprogrammed in germ cells and early embryos, suggesting hydroxymethylation might contribute to a range of reprogramming or differentiation-related events.
When hydroxymethylated cytosine was first identified in mammalian cells a few years back, it appeared to be an intermediate formed as methylated cytosine — an epigenetic mark most often associated with gene silencing — gets converted to an unmodified form of the base. Nevertheless, 5hmC enrichment in certain cell types and developmental stages hinted that it might serve as an epigenetic mark in its own right.
With a better appreciation of the new base came the realization that previous bisulfite sequencing studies had likely been identifying not only methylated cytosines, but also hydroxymethylated forms of the base, since both would be protected from bisulfite conversion.
"Bisulfite sequencing of methylation actually hides the fact that any hydroxymethylation looks the same as methylation," Balasubramanian explained.
"All the data generated previously is likely a mixture of the two," agreed University of Chicago researcher Chuan He. "In most cases it's probably OK," He added, since 5mC is generally the dominant modification.
Even so, the presence of both methylated and hyroxymethylated cytosines in mammalian genomes raised questions about the sorts of sequencing-based approaches that could be used to distinguish between the 5mC and 5hmC marks and to get a clearer understanding of the functional roles of each.
He, who was not involved in the Science study, called the UK team's approach to solving this problem "very elegant" and a "nice technology breakthrough."
He was part of a University of Chicago and Pacific Biosciences team that reported on its own single-molecule method for sequencing 5hmC in Nature Methods late last year (IS 12/6/2011).
That group used an enzyme called beta glucosyltransferase to specifically label 5hmC bases with glucose before sequencing DNA on the PacBio RS platform. By further tagging glucose-labeled 5hmC with a biotin label, they could stretch out the pause in the PacBio system when the polymerase enzyme encounters a modified base, making it possible to distinguish 5hmC from bases with other modifications.
Active Motif has licensed that glucose labeling method, which it uses in its Hydroxymethyl Collector kit.
Prior to that, several research groups — including Reik and his collaborators from the Babraham Institute, University of Cambridge, and elsewhere — used antibodies targeting 5mC and 5hmC to pull out fragments of DNA enriched for the marks, which could then be sequenced with high-throughput approaches and mapped back to the genome.
Such enrichment studies "helped create a map of regions of the genome that had hydroxymethyl[cytosine]," Balasubramanian said. "It was a very good start to looking at the problem."
Even so, he added, enrichment methods do not allow for single-base resolution or quantification of 5mc and 5hmC levels.
Similarly, while Balasubramanian called the University of Chicago and PacBio method for doing single-molecule sequencing of 5hmC a "very nice technique," he noted that it is difficult to get any quantitative information about 5hmC with that approach.
Chemistry-Based Approach
To take their own crack at a more quantitative methylation sequencing strategy, Balasubramanian, Reik, and their colleagues started by looking for new chemistry-based approaches to distinguish hydroxymethylcytosine from methylated cytosine as well as the four canonical DNA bases.
Potassium perruthenate, the oxidizing agent for the job, was selected from a suite of oxidizing agents used for synthetic organic chemistry studies in the past based on its ability to oxidize 5hmC to 5-formylcystine without altering the other nucleotides or DNA structure.
Because 5fC can be further converted to uracil by bisulfite treatment, researchers explained, a combination of oxidation and bisulfite treatment converts all but the methylated form of cytosine to uracil.
After doing a series of experiments with potassium perruthenate-treated synthetic single-stranded and double-stranded DNA, which were assessed by Sanger sequencing, to find the optimal conditions for specifically oxidizing 5hmc, the team went on to demonstrate the feasibility of quantifying 5mC and 5hmC by pairing conventional bisulfite sequencing with oxBS-seq on a high-throughput sequencing platform.
For those experiments, researchers split each pool of synthetic or genomic DNA in half and did parallel library preps with or without the oxidation step. Using the Illumina GAII, they then sequenced each library and compared the patterns detected in the bisulfite-treated samples with those in the oxidation and bisulfite-treated samples.
The investigators then turned their attention to mouse embryonic stem cells, where they used reduced representation bisulfite sequencing (IS 7/23/2011) to specifically apply their oxBS-seq method to CpG islands.
To verify the quantitative accuracy of some of their findings, the investigators also used the Sequenom MassArray platform to quantify methylation and hydroxymethylation levels in samples prepared using the oxidative bisulfite approach.
From the roughly 120 reads generated by standard RRBS and oxRRBS for each CpG island, the researchers detected around 3,300 methylation marks per CpG island, on average.
Within the 12,660 CpG islands left after their quality-control steps, the researchers saw hydroxymethylation at 800 CpGs. At these sites, the level of 5hmC at a given cytosine base ranged from 0.2 percent to 18.5 percent, and averaged around 3.3 percent.
Another 4,577 CpG islands harbored 5mC methylation. There, cytosines showed just over 8 percent methylation, on average.
Many of the 5hmC marks fell within low-density CpG islands that had intermediate 5mC levels and relatively high levels of the TET1 enzyme that converts 5mC to 5hmC, Reik noted, suggesting these regions may be poised for activity in some differentiated cell lineages and bound for silencing in others.
"Those are the regions that are particularly interesting because they clearly get targeted by methyl and then turned over to hydroxymethyl," he said, "so they're kind of meta-stable."
"They're being constantly reprogrammed, we think, in the pluripotent cell," Reik added. "They can go into any direction — they can go into methylation or they can go into demethylation. But this is only possible to see properly with a high-resolution, quantitative method."
When they looked more closely at the CpG islands affected by each epigenetic mark, the investigators found an over-representation of 5hmC at sites near known transcriptional regulatory genes and LINE1 retrotransposons.
The team is now moving on to use the oxBS-seq method to look at methylation and hydroxymethylation levels across the entire mouse embryonic stem cell genome.
Reik said that, down the road, there is also interest in assessing and comparing methylation and hydroxymethylation patterns during differentiation and development, though those experiments will depend on whether it's possible to adapt the method for looking at the small amounts of DNA found in samples containing only a few hundred or a few thousand cells.
Balasubramanian noted that the team is exploring such questions and trying to determine how little DNA they can get away with using while still characterizing 5mC and 5hmC levels accurately.
"We're pushing against these boundaries to try to understand where the limitations may be and where we need to make improvements," he noted. "But right now I don't see any fundamental limitation to these sorts of challenges."
Another open question is the depth of sequence needed to interrogate all of the functionally relevant methylation and hydroxymethylation marks across the genome.
Reik noted that it will likely be necessary to go beyond the 15- to 20-fold genome coverage used in some standard bisulfite sequencing studies to get a good look at hydroxymethylcytosine, which is typically found at lower levels than methylcytosine.
Even so, Balasubramanian explained that it is somewhat difficult to come up with hard and fast sequencing depth guidelines for oxBS-seq studies, since it's still not clear whether both high and low levels of hydroxymethylation will be functionally important.
In general, he added, researchers should be able to find increasingly low levels of these marks at greater sequencing depths in much the same way that deeper sequencing uncovers rarer and rarer sequence variants.
"There's a trade-off with depth: the deeper you sequence, the easier it is to confidently call relatively low levels of hmC," Balazubramanian said, noting that he and his colleagues included information on the relationship between sequencing depth and 5hmC detection limits in the supplementary material accompanying their Science study.
Although the researchers used the Illumina platform for their mouse CpG island experiments, they emphasized that the oxBS-seq method is compatible with other platforms as well.
"I think one important thing about doing this at the level of chemistry is that it's actually the chemistry that discriminates these modified bases," said Balasubramanian, who is an advisor for Illumina and co-founded the Solexa technology on which Illumina's sequencers are based.
"In our paper, we've used Sanger sequencing, Sequenom [MassArray], and the Solexa/Illumina sequencing," Balasubramanian said. "So, in principle, it should be platform agnostic."
Likewise, the University of Chicago's He said that his team has now developed a method for directly interrogating hydroxymethylation sites across the genome that is also compatible with Illumina and other high-throughput platforms as well as traditional Sanger sequencing.
That method, which has not yet been published, reportedly combines bisulfite sequencing with a glucose tagging method that He and his University of Chicago team described in Nature Biotechnology prior to their collaboration with PacBio.
"The new method uses glucose tagging and also gives accurate base resolution sequencing of hydroxylmethyl[cytosine] and also the relative abundance," He said.
A publication outlining the use of this method for interrogating hydroxymethylation marks across the mouse embryonic stem cell genome is slated to come out later this spring, according to He, who said the University of Chicago is currently in talks with a company that has expressed interest in licensing its newest hydroxymethylation sequencing-related method.
While the UK team does not have immediate plans to license its own oxBS-seq method, Balasubramanian noted that his group has had "a good track record with seeing things we discover get put into a broader context.
"We're keen to help others in the community use the tools that we've created," he added.