A team of researchers led by the University of California Los Angeles has generated the first genome-wide map of DNA methylation with single-base resolution.
The study, which they conducted in Arabidopsis thaliana, paves the way for large-scale epigenomics studies in humans and other mammals.
To assess methylation in the organism, the scientists combined bisulfite treatment of genomic DNA with sequencing on Illumina’s Genome Analyzer, and analyzed the data with internally developed computational methods. The approach, called BS-Seq, could be scaled up to study methylation in mammalian genomes, they say.
Bisulfite sequencing has been the gold standard for determining cytosine methylation in DNA, according to Matteo Pellegrini, an assistant professor in the department of molecular, cell, and developmental biology at UCLA, and a main author of the study, which was published last week in Nature.
Bisulfite treatment converts unmethylated cytosines to uracil, leaving methylated C’s intact. Unlike chromatin immunoprecipitation-based approaches, bisulfite sequencing provides single-nucleotide resolution. However, researchers have only used it so far to characterize small regions of the genome because of the low throughput and high cost of Sanger sequencing.
The throughput of the Genome Analyzer “is sufficient that you can now start applying [it] to a compact genome like Arabidopsis, and there is nothing, really, to stop you from scaling it up to a human genome,” according to Pellegrini.
In their Arabidopsis study, the scientists generated approximately 3.8 gigabases of high-quality sequence data from bisulfite-converted DNA libraries that were prepared by Steve Jacobsen’s group at UCLA, which developed the necessary library preparation methods. In their analysis, they used approximately 2.6 gigabases of the data, which mapped to unique locations in the genome with high confidence.
According to Pellegrini, the data covered about 90 percent of all cytosines in the 120-megabase Arabidopsis genome at about 20-fold coverage for the diploid genome.
Pellegrini and his colleagues developed new data analysis methods for base calling and mapping of the sequence reads, which are available here. The base-calling software improves on Illumina’s own base caller, which “doesn’t do a very good job of estimating C’s” in bisulfite-converted DNA, where the majority of C’s are transformed to T’s, according to Pellegrini.
“There is nothing, really, to stop you from scaling it up to a human genome.”
For mapping reads back to the genome, his group developed a method that takes into account the sequencing errors of the Illumina platform, which increase towards the end of the read. By assigning probabilities to each of the four nucleotides at each position, they can more accurately assign reads to their correct position in the genome, he explained.
“We think it’s important to account for the uncertainty in the base calls when mapping back onto the genome,” at least in bisulfite sequencing, he said.
While the base-calling software is specific to Illumina’s platform, the mapping approach can be used for data from any sequencer, as long as each base has a probability for each of the four nucleotides attached to it, according to Pelligrini.
The study not only represents a new standard for genomic methylation in Arabidopsis, he said, but “there should not be any major technical hurdles to scale this up to the human, [or to another] mammalian genome.”
In their article, the researchers also used the BS-Seq method to analyze 60 megabases of sequence data from wild-type and mutant mouse embryonic stem cells, and map 46 megabases of sequence data from mouse germ-cell tissue to the reference genome.
About two-thirds of the reads mapped uniquely, “suggesting that it is practical to apply BS-Seq to entire mammalian genomes,” they report in the article. Pellegrini told In Sequence that his group is currently not applying bisulfite shotgun sequencing to larger genomes.
According to Stephan Beck, a professor of medical genomics at the UCL Cancer Institute at University College London, the study “will certainly stimulate efforts” to attempt the bisulfite sequencing approach on mammalian genomes, which are about 20 times larger and more complex than Arabidopsis.
“In addition to cost, the main challenge [of using BS-Seq on mammalian genomes] will be to overcome the ‘reduced complexity’ problem associated with reads, particularly short reads, derived from bisulfite-converted DNA,” Beck said by e-mail.
He and his colleagues have recently sequenced immunoprecipitated methylated DNA from male human germline cells using Illumina’s sequencer, according to a recent review article by Beck.
If researchers were able to scale up the new bisulfite sequencing method to the human genome, it could be used in large-scale studies such as the NIH Roadmap Epigenomics project or the Encyclopedia of DNA Elements project.
Alex Meissner, an assistant professor in the department of stem cell and regenerative biology at Harvard University, and an associate member of the Broad Institute, is involved in an ENCODE-funded project to study DNA methylation (see In Sequence 11/27/2007).
His group has been working on a method for analyzing bisulfite-treated DNA at reduced representation with Illumina’s Genome Analyzer, which it has used to study mouse embryonic stem cells (see In Sequence 11/20/2007). The study “will be published soon,” he told In Sequence by e-mail last week.
Last year, he told In Sequence that bisulfite-sequencing an entire mouse genome at full representation was still too complex and expensive, but Pellegrini’s method “points in the right direction,” he said.
“Both [methods] lead the way towards, ultimately, sequencing the whole methylome” of mammalian genomes, he told In Sequence by e-mail this week. “The technology is there and it will be possible very soon.”