NEW YORK (GenomeWeb) – Using crowdsourced samples, the Earth Microbiome Project Consortium has conducted a meta-analysis to show that its resource can be used to examine microbial diversity.
The Earth Microbiome Project launched in 2010 with the aim of eventually characterizing 200,000 microbial samples from across the world to create a catalog. As the consortium reported today in Nature, it has collected more than 27,000 samples and metadata from the global science community. Within these samples, the consortium has identified approximately 300,000 unique microbial 16S rRNA sequences, the vast majority of which couldn't be found in existing databases.
Study author Rob Knight, professor and director of the Center for Microbiome Innovation at the University of California, San Diego, said in an email that when the project kicked off, he and his colleagues had hoped that it would be as big as it has since become, "but it was a completely new model for trying to do a large-scale project and many of our senior colleagues were very pessimistic."
He and his colleagues also used the Earth Microbiome Project database to investigate microbial species richness and nestedness among the samples to illustrate how their open-access resource could be used.
"The project provides a resource that will keep microbial ecologists and evolutionary biologists busy for years," wrote Jeroen Raes from the KU Leuven–Rega Institute in a related commentary.
Other large-scale microbiome projects like the Human Microbiome Project and the Extreme Microbiome Project seek to characterize human-associated microbes and those that can withstand harsh environments.
For this study, the researchers focused on the bacterial and archaeal content of their first 27,751 samples and their metadata.
The associated metadata had to conform to the Genomic Standards Consortium’s MIxS and Environment Ontology standards and the researchers built a lightweight ontology application on top of that to capture whether the samples were free-living or associated with a host and, if associated with a host, if that host was a plant or an animal. It also captured whether the sample was from a saline or non-saline environment.
The researchers amplified and sequenced the samples' 16S rRNA genes on an Illumina platform to yield 2.2 billion sequences. Knight said that the consortium benchmarked and tested a number of protocols between 2008 and 2012.
In his commentary, Raes noted that the researchers chose generalizability over sensitivity in their protocols. While a single protocol helps control for variation, he points out that all protocols work well across a range of sample types.
Rather than assigning the sequences they generated to operational taxonomic units like many other metagenomic studies do, Knight and his colleagues used a recently developed reference-free method called Deblur to obtain exact sequences, rather than clustered OTUs. After quality control, the researchers generated 307,572 unique sequences. Only about 10 percent of these sequences matched to existing databases.
This approach enabled the researchers to follow rRNA gene sequences across the samples and examine diversity in relation to their environments.
Knight and his colleagues reported that the microbial profiles clustered by environment type, no matter which research group collected it
They also confirmed the previous finding that being associated with a host is also linked to decreased richness and that microbial communities from saline and non-saline environments have distinct makeups. Further, based solely on community composition, a supervised machine learning approach could distinguish samples as being associated with a plant or animal and from a saline or non-saline environment with 91 percent accuracy. This, the researchers noted, could have implications for forensics and other applications.
Additionally, the researchers reported that microbial community richness was, as expected, highest around neutral pH and at a relatively cool temperature. They also found that nestedness appeared to dominate in these communities, meaning that low-diversity communities were likely to be subsets of higher-diversity ones instead of containing a wholly different set of microbes.
There are myriad ways other researchers could use the EMP resource, according to Knight, including for "finding new large-scale ecological patterns, finding out more about each of the individual systems in the EMP, looking up their favorite microbe in different environments, [and] using the dataset as a basis for source tracking."