NEW YORK (GenomeWeb) — An Austrian research team has developed an approach to uncover and analyze DNA methylation in organisms lacking a reference genome.
High-throughput approaches for analyzing DNA methylation — a key part of animal development whose defects have been implicated in cancer — largely rely on a reference genome, something many wild populations lack.
To get around that requirement, the researchers combined reduced representation bisulfite sequencing with a software program they developed called RefFreeDMA. The program can deduce genomes from RRBS reads as well as uncover regions that are differentially methylated between samples or groups of individuals, as the team reported this week in Cell Reports.
"With RefFreeDMA our experiments are no longer limited to model organisms," first author Johanna Klughammer from CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences said in a statement. "We can now study DNA methylation 'in the wild', rather than trying to fit our research questions to the model organisms that have a high-quality reference genome."
The workflow Klughammer and her colleagues developed relies on the non-random distribution of DNA methylation in vertebrate genomes — most methylation is found at CpG sites. In their approach, DNA is digested with the restriction enzymes Mspl and/or Taql, which cut at C^CGG and T^CGA, respectively, and are insensitive to methylation at the central CpG site. These digested sequences are then used to create an RRBS sequencing library.
The researchers noted that they validated and optimized the 96-well RRBS approach for genome coverage and sample throughput in nine species — human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebra fish — increasing the number of covered CpG sites from about 2.5 million to 4 million.
RefFreeDMA, a Linux-based software pipeline, exploits characteristics of RRBS reads, such as their defined start and end positions. By clustering RRBS reads from all samples from a given species by their similarity, the program infers the consensus read sequences for each read cluster. In instances where both cytosines and thymines are among the clustered reads, the researchers noted that the cytosines are kept as they are likely to reflect methylated cytosines that are protected from bisulfite sequencing in some, though not all, samples.
Once the deduced genome is generated, the researchers used BSMAP/RRBSMPA to align the reads to the deduced genome and a custom DNA methylation calling script to gauge the portion of methylated reads at each CpG site in the deduced genome. The researchers then ranked these differentially methylated reads.
Those top-ranked reads are then exported as FASTA/FASTQ files for biological interpretation by cross mapping to related already annotated genomes and by reference-free motif enrichment analysis.
To validate their pipeline, Klughammer and her colleagues applied it to 44 samples from three species — human, cow, and carp — and compared the reference-free analysis to a reference-based analysis.
For both approaches, the average DNA methylation levels at CpG sites were largely similar. The researchers noted that their approach exhibited lower C-to-T conversion rates at non-CpG sites, likely as unmethylated Cs are counted as Ts. This, they added, could be avoided by adding methylated and unmethylated spike-in controls to the RRBS protocol to monitor bisulfite conversion rates.
The deduced genomes, meanwhile, mainly reflected the reference genomes. For instance, 1,254,324 out of 1,522,786 deduced human genome fragments could be aligned to the human genome, the researchers reported. In addition, more than 75 percent of the reads and CpGs in non-repetitive regions were concordantly mapped by the two approaches.
"[W]e observed excellent agreement between the two approaches when plotting alignment positions across a representative chromosome, and the DNA methylation values obtained with the two approaches were highly correlated in all samples and all species," Klughammer and her colleagues wrote in their paper.
This approach could also, the researchers reported, uncover known biological similarities and differences among different blood cell types from human, cow, and carp blood samples. They reported that the most differentially methylated fragments in human and cow samples were mostly hypermethylated in lymphocytes, though that difference wasn't present in carp.
The researchers noted that this approach could not only be applied to wild populations, but also to agricultural and aquacultural samples.
"We see a lot of excitement for studying epigenetics in the context of animal breeding," senior author and CeMM researcher Christoph Bock said in a statement. He added that applying metaepigenome analysis to ecosystems like coral reefs and rain forests could also prove interesting.