NEW YORK – A new study highlights the ability of long-read sequencing and chromosome conformation assays to boost the recovery of closely related microbial genomes from a complex metagenomic sample.
Led by researchers from the US Department of Agriculture and the University of California, San Diego, the team used Pacific Bioscience's HiFi sequencing data and Hi-C linkage data from Phase Genomics to generate complete metagenome-assembled genomes, or MAGs, from sheep fecal samples. They were able to identify 428 MAGs with more than 90 percent completeness, including 44 in single circular contigs. In addition, they could resolve closely related microbes, improve identification of biosynthetic gene clusters for generating antibiotic molecules, and more precisely assign mobile genetic elements to host genomes.
"We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 potential host–viral [including 298 host–plasmid] associations using Hi-C data," the authors wrote in a paper published Monday in Nature Biotechnology.
"This study sets the bar in terms of how much info can be recovered from a microbiome sample using new technology, including the largest number of genomes ever extracted from a sample, reconstructing hundreds of new strains and viral genomes, and tracking mobile elements from within a single sample," Phase Genomics CEO Ivan Liachko, one of the study authors, said in an email.
The work builds on recent advancements in genome assembly, including those that helped the Telomere-to-Telomere Consortium create the first gapless human chromosome assembly in 2020 and a complete human genome assembly in June 2021, according to Pavel Pevzner, a senior author of the study.
"Past metagenomics hardly ever resulted in an assembly of a single, complete genome," he said. "Using HiFi reads allows us to generate a nearly complete picture of the metagenome, not just a fragmented assembly."
As with human genome assembly, researchers assembling bacterial genomes had been stymied by long, highly repetitive regions that were intractable by short reads. For humans, those were centromeres; for bacteria, they were biosynthetic gene clusters.
The researchers analyzed the genomes using HiFi reads alone, as well as by binning contigs using Hi-C data. To put together the genomes, they used metaFlye, a graph-based assembly algorithm developed by Pevzner's lab. They also used MAGPhase, a program from PacBio, to distinguish between very similar bacterial strains from a single sample.
"This is no small thing: Some E. coli strains are harmless, others are deadly," Pevzner said.
The potential applications of these improved assemblies are widespread. The ability to sequence biosynthetic gene clusters, for example, could have immediate application in pharmaceutical development of antibiotics. "There is great diversity of biosynthetic gene clusters, thus a great diversity of antibiotics," Pevzner said. "How they function, we do not know."
Characterizing sheep and other livestock microbiomes could also help develop approaches to reduce disease and greenhouse gas emissions, while improving productivity, Timothy Smith, a research chemist at the USDA's Meat Animal Research Center and a senior author on the paper, said in a statement. "Strain-level genome resolution will help track genes related to antimicrobial resistance and determine the extent animal husbandry might be contributing to the rise of antibiotic resistance in human and animal diseases."
Pevzner predicted that it won't be long before researchers start using these methods on human microbiome samples. "Like complete genomics, which is already being applied to rare disease diagnostics, complete metagenomics may soon make its way into medicine and many other disciplines," he said.