NEW YORK – Researchers at the US Department of Energy's Joint Genome Institute, the Lawrence Berkeley National Laboratory, and the Argonne National Laboratory have constructed a new catalog of microbiomes collected from diverse habitats across the planet that expands the known phylogenetic diversity of bacteria and archaea by 44 percent.
In a study published on Monday in Nature Biotechnology, the researchers noted that the reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. In this paper, they described how they applied this approach to more than 10,000 metagenomes collected from habitats covering all of Earth's continents and oceans — including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils — to capture extant microbial, metabolic and functional potential.
They developed a catalog that includes 52,515 metagenome-assembled genomes (MAGs) representing 12,556 novel candidate species-level operational taxonomic units (OTUs) spanning 135 phyla. The Genomes from Earth's Microbiomes (GEM) catalog is broadly available for comparative or interactive analyses, metabolic modeling, and bulk download, the investigators added.
"We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses," the authors wrote. "This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes."
Among their analyses, the researchers used the MAGs from the GEM catalog to address the problem of taxonomically defined reference genomes being commonly used to infer the abundance of microorganisms from metagenomes but failing to recruit the majority of sequencing reads outside the human microbiome. They aligned high-quality reads from 3,170 metagenomes with available read data to the 52,515 GEMs and to all isolate genomes from NCBI RefSeq, and found that an average of 30.5 percent and 14.6 percent of metagenomic reads per sample were assigned to one or more GEMs or isolate genomes, respectively. Across all samples, GEMs resulted in a median 3.6-fold increase in the number of mapped reads, which was particularly pronounced for certain environments like bioreactors or invertebrate hosts.
Despite this improvement, however, nearly 70 percent of reads remained unmapped to any MAG or isolate genome. This was particularly noticeable for soil communities, which are highly complex and challenging to assemble.
The researchers also set about to use their data to uncover new species-level diversity. They found that the GEMs cover 137 known phyla, 305 known classes, and 787 known orders. The vast majority of non-singleton OTUs contained GEMs from only a single environment or multiple closely related environments (for example, bioreactors and wastewater), suggesting that few species have a broad habitat range. On the other hand, they noted, nearly 40 percent were found in multiple sampling locations. The low percentage of mapped reads also indicated that additional species remain to be discovered across biomes, they said.
Overall, the researchers noted, their various analyses showed that the GEM catalog resulted in a 44 percent gain in phylogenetic diversity across the entire tree of bacteria and archaea and currently represents 31 percent of all known diversity based on cumulative branch length. Gains in phylogenetic diversity were relatively consistent across taxonomic groups, but were especially high for certain large clades that included Planctomycetota (79 percent gain), Verrucomicrobiota (68 percent gain), and Patescibacteria (60 percent gain).
Notably, these analyses also revealed that 75 percent of the phylogenetic diversity of cataloged microbial diversity is exclusively represented by uncultured genomes.