More than 250 microbial genomes have been sequenced to date and, with 700 more projects in the works, making sense of that flood of data has never looked more daunting. This is no surprise to scientists at the Joint Genome Institute, which has spearheaded nearly a quarter of the world’s bacterial genome projects. To help researchers make sense of it all, the institute recently launched IMG/M, an experimental metagenome data management and analysis system.
“IMG/M arose from our interest in making it easier for users to access and analyze their data,” says JGI Director Eddy Rubin. Advances in sequencing technology have made it possible to sequence a microbial genome in a day, he says, which presents the risk that some genomes will be neglected due to the sheer volume of data available.
Hence the creation of IMG/M, which builds on JGI’s integrated microbial genomes (IMG) system and extends its comparative tools to metagenome data. The IMG system, built through a collaboration with Lawrence Berkeley National Lab, is updated quarterly and contains both draft and complete JGI genomes, in addition to other publicly available microbial genomes. Researchers interested in analysis, as opposed to just browsing the bank, can navigate the samples by phenotypes, ecotype, disease, and relevance.
According to Victor Markowitz, head of Lawrence Berkeley National Laboratory’s Biological Data Management and Technology Center and the system’s chief architect, the idea was always to broaden IMG’s remit. “Once we had IMG, we asked what it would take to extend [the system] to metagenomes,” Markowitz says. It took a lot, especially in terms of conceptual organization of the raw data. Whereas IMG charts isolate genomes for which assembly and gene prediction is done, IMG/M must contend with data from entire microbial communities for which assembly scaffolds come from different organisms.
Given those complexities, the LBNL team forged ahead to create a repository capable of evolving with its diverse data sets. Working on the system mostly on weekends, the group built IMG/M over a period of five months, and a preliminary version was distributed for expert testing at the end of last year. One of the early users, JGI’s Phil Hugenholtz, test drove the system to analyze enhanced biological phosphorus removing (EBPR) sewage sludge metagenomes, which yielded results slated to appear in an upcoming paper.
Hugenholtz also helped train the system on other microbial communities recently sequenced by JGI, including microbes colonizing the termite hindgut. “Termites are world-class biomass converters,” says Rubin, and understanding those metabolic pathways may help meet “one of our greatest needs to convert cellulose into starch for alternative fuel development.”
In addition to isolate genomes found in IMG 1.3, the current version of IMG/M contains metagenomic sequences generated from several environmental samples. At press time, there were data from two EBPR sludge samples, three deep sea “whale fall” carcasses courtesy of Rubin’s team, an agricultural soil sample, and an acid mine drainage biofilm. These samples are representative of a range of species diversity, dominant organism abundance, and sequencing depth.
The next version of IMG/M is slated for July 1, when IMG will also be loaded with more isolate genome data. After that, both repositories will be updated quarterly. “We’re committed to progressively adding and annotating everything that [JGI] sequences or that is sequenced elsewhere,” Rubin says, adding that the continual updates coupled with IMG/M’s easily navigable interface make “it a really killer application.”
— Jen Crebs