Skip to main content
Premium Trial:

Request an Annual Quote

Nature Papers on Algorithm to Compress Sequencing Data, Method to Quickly Identify Disease-Causing Variants, More

A new algorithm designed to compress third-generation sequencing data is reported in Nature Methods this week. Advances in sequencing technology, including the development of technologies that can produce very long reads, has resulted in gigantic amounts of data that are challenging to maintain. Methods exist to compress such data, but the highest performing ones are designed for short, high-quality Illumina reads and are unsuitable for reads from third-generation instruments, which are orders of magnitude longer and have a different error profile. To address this, researchers from Silesian University of Technology developed CoLoRd, a compression algorithm for Oxford Nanopore Technologies and PacBio sequencing data. In their report, they show that the algorithm can reduce the size of third-generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyses.

A new method to rapidly identify disease-causing gene variants from whole-genome sequencing (WGS) data is presented in Nature Biotechnology this week. Despite the potential of WGS for clinical diagnosis, pipelines for sequencing and downstream analysis remain slow. In the study, a team led by scientists from Stanford University describe a streamlined nanopore sequencing approach with improvements in library preparation, a cloud-based module to perform near real-time base calling and alignment, accelerated variant calling, and focused variant filtering. They demonstrate their pipeline by applying it to the diagnosis of a critically ill 57-year-old man and a 14-month-old infant, surfacing in both cases a candidate variant in less than eight hours after the blood draw — up to a 50 percent improvement on the fastest reported time to date, they write.

Aiming to overcome the computational challenges facing the assembly of transcripts captured during RNA sequencing, researchers from the Pennsylvania State University have developed a reference-based assembler optimized for multi-end RNA-seq data. As described in this week's Nature Computational Science, the resource — called Scallop2 — involves using an algorithm to bridge multi-end reads into single-end phasing paths in the context of a splice graph, refining erroneous splice graphs by utilizing multi-end reads that fail to bridge, and the piping of the refined splice graph and bridged phasing paths into an algorithm that integrates multiple phase-preserving decompositions. The researchers show that Scallop2 substantially improves assembly accuracy over two widely used assemblers.

Filed under