NEW YORK (GenomeWeb) – Researchers from Aarhus University Hospital and their collaborators at other institutions have developed a pipeline for processing and analyzing data from reduced representation and whole-genome bisulphite sequencing procedures.
According to a paper describing the pipeline that was published in GigaScience last week, the SMAP pipeline provides tools for analyzing single and paired-end bisulphite sequence with fewer false-positive rates in differentially methylated regions than existing methods, and for detecting allele specific methylation (ASM) events and single nucleotide polymorphisms. The pipeline also supports multiple user-defined restriction enzymes, and it runs all methylation analyses in a single-step operation when well configured, according to the paper.
The pipeline covers reference and read preparation, sequence alignment, methylation rate calculations, detection of differentially methylated regions (DMRs), as well as SNP and allele-specific methylation calling and summarization. Input sequences to the pipeline are filtered and cleaned and then mapped to a reference genome using Bowtie2, BSMAP, or Bismark — users have their pick — and the resulting BAM file serves as the input to the next step in the pipeline. In this step, SNPs and ASMs are called using Bis-SNP or Bcftools, while DMRs and differentially methylated cytosine sites (DMCs) are detected by applying various statistical tests. A final report produced at the end of the pipeline provides mapping and coverage information for the analysis.
According to the paper, SMAP improves on existing methods for analyzing data from reduced representation bisulphite sequencing (RBBS) because it incorporates a mechanism for accounting for overlap in paired-end reads which can bias DMR detection and result in false positives.
"To correct such bias, we count sites in overlapping regions for PE sequencing only once," the researchers wrote. "This treatment fully recovers correct methylation rates and hence greatly reduces errors in subsequent calculation of DMRs."
In one experiment to test SMAP's DMR detection performance, the researchers used the tool to identify 12 randomly selected DMRs previously detected by bisulfite sequencing PCR.
SMAP also fills a need for a pipeline to detect allele-specific mutations in complex cancer cases, according to the paper. Comparison tests between pipelines running Bowtie2, BSMAP, and Bismark on simulated datasets to detect ASMs showed that the BSMAP and Bismark pipelines had a similar performance, the Bowtie2 pipeline had a high false-negative rate on datasets with 50 base pair reads. For datasets with 90 base pair reads, Bismark and Bowtie2 pipelines were more accurate in their ASM calls, while the BSMAP pipeline was more sensitive, the researchers wrote.
As part of the study, the researchers compared the performance of the various alignment tools used for mapping sequence data to a reference sequence. Specifically, they used simulated paired-end reads to compare BSMAP's performance to Bismark's. Their results showed that Bismark has a lower mapping rate and higher false-negative rate compared to BSMAP. Bismark had higher accuracy than BSMAP, especially for reads that are 50 and 60 base pairs long, however, it was less accurate than BSMAP for longer reads of lengths between 80 and 90 base pairs.
The researchers also assessed the performance of different SNP detection tools within the SMAP pipeline using exome sequencing data or RBBS data from four tissue samples. Running the pipeline using BSMAP or Bowtie2 for alignment prior to calling SNPs resulted in a higher SNP call rate than running the pipeline using BISMARK for alignment. However, the BISMARK pipeline had a much lower false-positive rate for all the tissues tested.
All three pipelines identified similar SNPs, but some SNPs from the Bowtie2 pipeline were found only in the BSMAP pipeline, the researchers wrote.
They also looked at SNP calling performance using the aforementioned simulated PE datasets. In this experiment, BSMAP and Bismark pipelines showed "considerable overlap" in terms of called SNPs, while the Bowtie2 pipeline shared fewer SNPs in common with the other pipelines.