Widespread basecalling-induced errors can occur at telomeric regions across nanopore datasets, sequencing platforms, basecallers, and basecalling models, according to a new study appearing in Genome Biology this week. The findings highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions of the genome. Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. In the new study, scientists from the Dana-Farber Cancer Institute analyzed telomeric regions with nanopore long-read sequencing in the recently sequenced and assembled CHM13 reference human genome, observing that telomeric regions were frequently miscalled as other types of repeats in a strand-specific manner. To see if these repeat calling errors extended to the telomeres of other organisms, the researchers examined nanopore genome sequencing data for eight model organisms including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, and Danio rerio. "As expected, we also observed repeat calling errors on telomeres … akin to what we observed in humans," they write. To address the errors, the investigators developed a strategy to re-basecall telomeric reads using a tuned nanopore basecaller, which resulted in improved recovery and analysis of telomeric regions with minimal negative impact on other genomic regions.
New Study Reveals Repeat-Calling Errors in Nanopore-Based Telomere Sequencing
Aug 29, 2022 | staff reporter