By Monica Heger
This article was originally published March 28.
The Earth Microbiome Project, an ambitious project to characterize more than 200,000 microbial samples from around the world, has kicked off its pilot phase, which will sequence 10,000 samples from soil, air, ocean, freshwater, and underground ecosystems.
"We've explored probably less than a millionth percent of the microbial diversity that exists on earth," Jack Gilbert, an environmental microbiologist at the Argonne National Laboratory and co-leader of the EMP, told In Sequence. The project aims to "target specific environments that can help us expand our understanding of what exists out there, what it does, and why that's important."
Sequencing for the pilot phase will be done primarily on the Illumina HiSeq platform and Pacific Biosciences' PacBio RS. The consortium is also in discussions with Roche about using its 454 GS FLX instrument.
The team is employing two sequencing strategies: a metagenomics approach and amplicon sequencing of the 16S and 18S ribosomal RNA genes. In total, 30 to 40 trillion base pairs of sequence data will be generated from the 10,000 samples.
For the metagenomics sequencing portion, the team will first do a low-pass sequencing of all the samples, generating about 10 gigabase pairs of data per sample, said Folker Meyer, a computational biologist at Argonne National Laboratories and a co-leader of the project.
After the initial low-pass sequencing, the team will then decide whether to do a deeper sequencing, which will be based on a number of factors, including whether it seems like deeper sequencing will allow the team to assemble complete organisms from the data.
The team will also use PCR and amplicon sequencing to sequence the 16s ribosomal RNA gene, which is present in bacteria, and the 18s ribosomal RNA gene, which is the eukaryote equivalent. These RNA genes, which are not translated into proteins, are highly conserved and have been studied extensively. Sequencing them will allow researchers to better understand the range of diversity between species as well as their evolutionary history.
"Understanding what the microbial populations are can be critical for informing health-related questions and understanding biogeochemical cycles," said Meyer. However, there is currently only about 1 terabase of microbial sequence data. "The EMP will at least double what is available within the first month of sequencing," he added.
Argonne National Laboratories, the J. Craig Venter Institute, and BGI will do all of the sequencing for the pilot phase, which is being funded to a tune of about $14 million. Gilbert said the consortium hopes to bring in more sequencing partners as the project progresses.
BGI is providing around 80 percent of the funding, with about 10 percent from the US Department of Energy. The project will receive in-kind support from the University of Colorado, as well as industry support. Mo Bio is providing DNA extraction kits and technical expertise, Eppendorf is providing consumables for DNA extraction and equipment for automation, and Illumina and PacBio are providing sequencing services. The larger project has not yet been funded, said Gilbert, but will likely run upwards of $200 million.
Sequencing for the pilot is expected to be complete by either the summer or fall of 2012.
The sequencing aspect of the project is estimated to only account for about 10 percent of the total cost, with analysis and computation making up the bulk of the expense, said Meyer.
Additionally, all of the data, including metadata — precise location of where the data was collected and how it was collected — will be made publicly available. The team will work with the Genomic Standards Consortium to develop standards for the type of information required from each sample, and to determine how best to encode that information to provide a framework for scientists to compare their metadata, Meyer said. "Not just the individual PI will be able to analyze the sample, but many others."
The team has collected 40,000 samples so far, and is now in the process of choosing those it will sequence in the pilot study, said Gilbert. The samples are from China, Australia, Brazil, Argentina, Chile, Mexico, the US, Canada, two countries in Africa, Russia, Thailand, Europe, and the UK, as well as the oceans, and "pretty much anywhere you can imagine," said Gilbert. "If we could sample the moon and Mars we would be there as well."
The project is led by Gilbert and Meyer, as well as and Rick Stevens from the Argonne National Laboratory, with a steering committee whose members include Jonathan Eisen of the University of California, Davis; Jed Fuhrman from the University of Southern California; Janet Jansson from Lawrence Berkeley National Laboratory; and Rob Knight from the University of Colorado, Boulder.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.