It's one thing to know that they are everywhere: teeming masses of microbes covering computer keys, eking their way into plant roots, coating the human body.
It's not knowing who they all are, what they're up to, and why that can be galling for researchers bent on understanding the biological and ecological processes at play in the human body, throughout the -natural world, and within human-made environments.
Scientists armed with next-generation sequencing machines have been banding together to do the kinds of large-scale experiments needed to answer such questions. And it turns out now is a pretty good time to tackle those studies.
The advent of high-capacity, cheaper sequencers — platforms capable of quickly auditing the organisms in a given community and the genes they contain — has made it possible to dramatically increase the number of samples a project can accommodate, spawning studies that would have been unimaginable a decade ago.
"Until very recently, the standard in the field would be maybe a couple dozen samples," says Rob Knight, a University of Colorado, Boulder, bioinformatics researcher.
"It's amazing the resolution that you get when you move up to the scale of hundreds of thousands of samples for a project," adds Knight, who was involved in software development and data analysis for the Human Microbiome Project and is now helping to lead the fledgling Earth Microbiome Project.
The first shift in 16S ribosomal gene sequencing — the method used to identify the bacterial and archaeal members of a microbial community — occurred not long after the Human Microbiome Project launched in late 2007.
A changeover from full-length 16S rRNA gene Sanger sequencing to high-throughput sequencing of 16S amplicons on the Roche 454 Titanium instrument came just in time to help that five-year, $173 million project meet its ambitious goal of characterizing a microbial communities in and on around 300 healthy volunteers.
"The reason there was an HMP at all, and the reason there has been this explosion of metagenomic research, is totally because of next-generation sequencing," says George Weinstock, associate director of the Genome Institute at Washington University in St. Louis and leader of that center's HMP efforts.
For the more recently conceived Earth Microbiome Project, or EMP, researchers are tweaking microbiome testing methods again, sequencing even more 16S amplicons simultaneously on Illumina's short-read, high-capacity sequencing platforms.
The shotgun metagenomic sequencing methods used to catalog the complete genetic repertoire of a given microbial community have been overhauled as microbiome studies progress as well.
While still pricey compared to amplicon sequencing, high-throughput metagenomic sequencing has made it possible for the HMP, EMP, and other projects to get a glimpse of the metabolic capabilities and functional potential of microbes in hundreds of environmental and human body-associated samples.
For the EMP pilot effort, for instance, researchers have already managed to do shotgun metagenomics on a few hundred samples, despite working with a limited budget. Most of that metagenomic sequencing has been done on Illumina's HiSeq 2000, though some samples have been subjected to metagenomic and/or meta-transcriptomic sequencing on the company's more compact, lower capacity "personal sequencer," the MiSeq.
The evolution of microbiome methods is continuing as other groups transition to technologies introduced to the market relatively recently.
The Broad Institute's Dirk Gevers, who was involved in HMP sequencing efforts at the Broad and helped oversee the project's data analysis working group, says he's seen a jump in the number of teams that are turning to the MiSeq for 16S sequencing.
And still others have started to use Life Technologies' Ion Torrent PGM instrument for 16S-based analyses.
In studies appearing in PLOS One and the Journal of Microbiological Methods this summer, independent teams led by investigators in Germany and Australia, respectively, reported that they had used the PGM to sequence a hyper-variable region of the bacterial 16S rRNA gene known as V6. The Australian-led group also did shotgun metagenomic sequencing on some of its wastewater treatment system samples with the PGM.
For those who have gotten to the stage of pinpointing microbes of particular interest within a community or communities, whole-genome sequencing on individual microbial cells has started to look appealing as well. Such single-cell sequencing is something that HMP members are attempting for a select set of hard-to-culture-but-biologically-interesting bugs.
As some studies have recently shown, though, having access to sequence data on a microbial community itself can sometimes provide enough information to assemble genomes of some microbes within that community.
In a study published in Science in February, researchers from the University of Washington reported that they had successfully assembled more than a dozen candidate microbial genomes de novo using mate-pair SOLiD v3.0 metagenomic reads generated from seawater samples before going on to characterize an uncultured archaeal species.
EMP co-leader Janet Jansson, a researcher affiliated with the Lawrence Berkeley National Laboratory and the US Department of Energy's Joint Genome Institute, has been among those pursuing approaches that put together individual genome sequences from metagenomic data as well. In a 2011 Nature study, she and her colleagues assembled a draft genome sequence for a soil methanogen using metagenomic reads from an Alaskan permafrost soil sample, combined with information on sequence read depth and nucleotide frequency patterns.
Jack Gilbert, an Argonne National Laboratory researcher and EMP co-leader, says genome assembly from metagenomic data is something that the project may try to exploit down the road as well: The group has been experimenting with a so-called "fixation-free fluorescence in situ hybridization" method developed by researchers at JGI and the University of Queensland that nabs organisms of interest based on their 16S sequence prior to sequencing.
Microbiome researchers on a budget, even a pretty hefty budget, didn't always have so many options available to them.
As recently as five years ago, studies of bacterial species present on the ocean floor or on the backs of worms in deep sea vents relied heavily on Sanger sequencing of full-length 16S ribosomal RNA genes. When attempted at all, shotgun metagenomic sequencing revolved primarily around Sanger sequencing, too, generating accurate, but expensive data that favored bugs most frequently found in a given environment.
The methods used for identifying microbe community members began to change as investigators with access to Roche 454 GS FLX Titanium sequencing instruments that produced 400 base-pair to 500 base-pair reads realized that it might not be necessary to amplify and sequence the entire 1,500 or so nucleotides of the 16S bacterial gene.
By sequencing shorter, standardized stretches of the 16S gene with 454 instruments, it became much cheaper and faster to do a 16S-based bacterial roll call — and to look at many more samples within a study's budget.
"The switch to 454 was really transformative," Knight says. "We were easily able to multiplex a few tens to a few hundreds of samples per run and that completely changed the experimental design: Instead of trying to look at a handful of samples, we were in a situation where we could start to look at more detailed spatial maps, more detailed temporal maps, and so on."
Over a few months in 2010, for example, the HMP's sequencing centers — Washington University, the Broad Institute, Baylor College of Medicine, and the J. Craig Venter Institute — used 454 sequencing to crank out 16S sequence data for almost all 12,000 samples collected for the main phase of that study, representing 15 to 18 body sites per individual tested.
With Illumina's GAIIx platform, meanwhile, the HMP team has already generated 8 terabases of shotgun metagenomic sequence data. The group plans to do metagenomic sequencing on a few hundred more samples before the main phase of the project wraps up next year.
Some of that metagenomic sequence data is proving useful for looking at ways in which host genetics impact the body's microbial neighborhoods.
At the Biology of Genomes meeting at Cold Spring Harbor Laboratories in New York earlier this year, Cornell University's Ran Blekham presented preliminary results from a study of host-microbe interactions that took advantage of genetic patterns gleaned from human "contaminant" reads in HMP metagenomic sequence data.
Though the HMP is still a few months from completion, data on the first 242 healthy participants tested for the study has already served as the basis for stacks of new publications. In addition to defining the typical microbial community structures at the body sites tested, these studies offered a peek at the variability and stability of microbial communities found within and between healthy individuals.
Beyond what's being learned about the baseline microbiome states in healthy individuals, a series of demonstration projects commissioned under the auspices of the HMP are taking the next step: looking at how microbiome profiles shift with disease or other traits of interest.
"We've been deeply involved in the demonstration projects," Weinstock says. He notes that five of the 15 original HMP demonstration projects were based at WashU — his own lab is also one of half a dozen funded through the National Heart, Lung, and Blood Institute's Lung HIV Microbiome Project.
Similarly at the Broad Institute, Gevers says, several disease-related studies are underway using HMP and other data. "Given that many of these diseases have already been looked at from the host genetics side of things," he says, "we're adding on microbiome [data] and bringing these two things together."
Some of the disease-related studies are already starting to bear fruit. For instance, initial results from an HMP demonstration project focused on the vaginal microbiome appeared in Science Translational Medicine this spring. That work not only highlighted a temporal shift and population-related clustering in vaginal microbiome features, but also set the stage for more extensive studies on the microbial changes coinciding with vaginal infection or disease.
Researchers within and outside of the HMP have been especially keen to define gut microbial profiles, too, particularly in relation to conditions such as obesity, inflammatory bowel disease, and immune function.
On that front, a large European study known as Metagenomics of the Human Intestinal Tract, or MetaHIT, has invested tens of millions of euros in gut bacterial gene sequencing since 2008. In addition, a University College Cork-led study has been tracking gut microbiome shifts in the elderly and in those experiencing age-related disease or decline.
On the environmental side, the relatively new EMP team is following up on earlier work by earth, soil, and sea-sampling groups such as the JCVI-led Global Ocean Sampling team and the International Census on Marine Microbes.
But with access to cheaper and faster sequencing methods, EMP researchers are just as concerned with getting well-characterized and annotated sample sets collected over time, or from a series of related spaces, as they are with generating the sequence data itself.
"If you're processing very large numbers of samples to answer ecological or evolutionary questions, all of that information about the samples and about the sites becomes your primary data," Knight says. "The sequence information is just one more kind of data that's no longer privileged among all the other data you're collecting in the project."
From the get-go, EMP team set its sights on around 200,000 environmental samples targeting a broad range of environments that represent what Gilbert calls "gradients for physical, chemical, and biological parameters." In other words, Gilbert is interested in how microbial community structures vary over time and in response to a wide range of factors — from the sites where samples are collected to the environmental and biological exposures that bugs face at these locales.
So far, EMP members have produced 16S amplicon data and done initial analyses on around 7,500 samples collected in environments ranging from freshwater lakes to animal-associated communities. And if all goes as planned, the project is on track to complete 16S analyses on as many as 15,000 samples by the end of this year.
Shotgun metagenomic sequencing on the pilot samples, however, has lagged behind a bit owing to funding limitations, though the team has managed to look at the complete gene contents of about 300 marine, soil, and host-associated samples.
On top of that, EMP researchers have worked out protocols for performing wholesale 18S amplicon sequencing on the samples, which will provide information on the eukaryotic microbes present in each community, and have accumulated roughly 60,000 samples that they can tap into for the main phase of EMP. The EMP is now on the hunt for additional funding to support testing on around 100,000 samples over two to three years.
EMP consortium members have their hands in other, ongoing microbiome studies, too. For Argonne's Gilbert, that includes efforts to understand microbiome patterns associated with wine-producing grapes and the soil they're grown in as well as studies on so-called "built environments" like homes and hospitals.
Because data for those projects are being generated in much the same way as they are for the EMP, researchers say it should be possible to start comparing patterns across some human-built and natural locales.
And while the focus of environmental studies may differ to some extent from that of projects directly sampling human body sites, experts say an improved understanding of microbial communities in every context — be it soil, the human gut, or the space under the sofa — should help in interpreting disease-related microbial patterns.
"Even in studies that seem like basic science I think there's a lot of potential for uncovering rules that will have a direct impact on human health also," Knight says.