Skip to main content
Premium Trial:

Request an Annual Quote

Speedy Long-Read Genome Assembly Enabled by Minimizer-Space Analysis

NEW YORK — A new algorithmic approach can quickly assemble accurate long reads into entire genomes using only the memory included on a laptop computer, according to a new study.

Long-read sequencing technologies like those from Pacific Biosciences and Oxford Nanopore Technologies can generate terabytes of sequence data. De novo assembly of these reads can be resource-intensive, needing both time and computing memory.

Researchers from the Massachusetts Institute of Technology and the Institut Pasteur have developed a new approach that uses minimizer-space de Bruijn graphs, or mdBGs, to assemble long-read genomes. With this method, they put together a human genome in less than 10 minutes using eight cores and 10 gigabytes of random access memory, as they reported in Cell Systems on Tuesday. They similarly could quickly construct an index of a large collection of bacterial genomes that they then searched for signs of antimicrobial resistance genes, illustrating how being able to process sequencing data quickly could enable personalized medicine.

"Until this work, a single human genome assembly took days and hundreds of gigabytes of memory, which is a significant obstacle towards personalized medicine," co-corresponding author Bonnie Berger from MIT said in an email. "Our method mdBG reduces the computational resources to minutes on a personal computer — two orders of magnitude faster than existing methods."

MdBG relies on minimizers that represent short stretches of nucleotide sequences rather than single nucleotides. That way, mdBGs store a smaller portion of the total number of nucelotides, but without affecting the genome sequence.

They applied their approach to assemble PacBio long reads from Drosophila and humans and compared the performance of mdBG to other assemblers like HiCanu, Hifiasm, and Peregrine.

For Drosophila, rust-mdBG — the approach is written in the Rust language — assembled the genome in one minute and nine seconds and used 1.5 GB of memory. Peregrine, by contrast, took 40 minutes and 11 seconds and used 12 GB of memory.

Meanwhile, for a human assembly, rust-mdBG took 10 minutes and 23 seconds and needed 10 GB of memory, compared to 14 hours and eight minutes and 188 GB for Peregrine.

"Beyond genome assembly, our mdBGs can also be used to search for antimicrobial resistance genes very efficiently across huge collections of bacterial genomes, which is key for personalized antibiotic therapy," Institut Pasteur's Rayan Chikhi, the other corresponding author, added.

For instance, the researchers applied mdBG to construct an index for a collection of 661,405 bacterial genomes, a process that took three hours and 50 minutes and needed 58 GB. They further queried the pangenome graph for the presence of antimicrobial resistance genes, which took about 12 minutes, rather than seven hours with other approaches, and used less than 1 GB of memory.

Currently, the approach works best using PacBio reads, the authors noted, as they have very low error rates, and they soon expect it to be able to handle Oxford Nanopore reads.

Berger and Chikhi added that they plan to further develop their approach, for example to resolve entire chromosomes without gaps. "Thinking more broadly, we envision reaching out to field scientists and to help them develop fast genomic testing sites, going beyond PCR and marker arrays which might miss important differences between genomes," they said.

The Scan

US Booster Eligibility Decision

The US CDC director recommends that people at high risk of developing COVID-19 due to their jobs also be eligible for COVID-19 boosters, in addition to those 65 years old and older or with underlying medical conditions.

Arizona Bill Before Judge

The Arizona Daily Star reports that a judge weighing whether a new Arizona law restricting abortion due to genetic conditions is a ban or a restriction.

Additional Genes

Wales is rolling out new genetic testing service for cancer patients, according to BBC News.

Science Papers Examine State of Human Genomic Research, Single-Cell Protein Quantification

In Science this week: a number of editorials and policy reports discuss advances in human genomic research, and more.