This article has been updated to clarify that the number of bacterial type strains were doubled as a result of this project, not the number of bacterial reference genomes as previously stated.
NEW YORK (GenomeWeb) – An international research team led by researchers at the US Department of Energy's Joint Genome Institute has published 1,003 reference genomes of bacteria and archaeal isolates as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) Initiative. These new genomes — 974 bacterial and 29 archaeal — double the number of existing type strains currently available to researchers, and expand their overall phylogenetic diversity by 25 percent.
The study, published today in Nature Biotechnology, is part of the largest single release of reference genomes to date. The GEBA initiative aims to "expand the reference genome catalog of broad phylogenetic and physiological diversity, to determine how this catalog facilitates the discovery of protein families and expands the diversity of known functions, and to ascertain whether these type-strain genomes improve the recruitment and phylogenetic assignment of existing metagenomic sequences," the researchers wrote.
The team began by analyzing existing species identified through the All-Species Living Tree Project and then targeted phylogenetic gaps in the isolate genomic space. Once the researchers had obtained specimens, they sequenced them using Illumina instruments to generate whole-genome sequences.
They compared 3.4 million GEBA proteins to 2.66 billion non-redundant protein sequences derived from 4,948 metagenomes in the IMG database. Additionally, the team performed phylogenetic analyses of the whole-genome sequences using the Genome-Blast Distance Phylogeny approach.
Through their analysis, the researchers discovered links to 25 million previously unassigned metagenomic proteins. They also predicted a total of 23,839 biosynthetic gene clusters within the sequences and experimentally validated a divergent phenazine cluster with potential new chemical structure and antimicrobial activity.
"This resource data set is the single largest effort (to our knowledge) to increase the phylogenetic coverage of cultivated bacterial and archaeal isolates," the researchers wrote. "We observed that genomes with increased phylogenetic distance encoded the highest number of novel protein families, supporting the rationale for continued phylogeny-driven sequencing efforts aimed at expanding the representation of cultivated microbes."
The team also noted that it hopes GEBA will provide a foundation for an array of experiments to develop microbial model systems and analyze biotechnologically relevant pathways for years to come.