When you're talking about scientific advances, 10 years is a blink of the eye. But for metagenomics, the field is moving rapidly. While large-scale initiatives to detail the diversity of bacteria and viruses in a community have been steadily progressing — the Human Microbiome Project and the Global Ocean Sampling Expedition are two of the biggest — many researchers have taken advantage of deep sequencing to move the field toward both comparative and functional analysis.
As with the Human Genome Project, while it's good to have the genome sequence, it can be more useful to know what genes are being expressed, what proteins are being made, and how the metabolome is being affected by environmental perturbations. "It's becoming an experimental tool as opposed to a description tool," says Forest Rohwer at San Diego State University. "You can go in, you can manipulate a system, you can get 20 metagenomes and then you can actually say, how do they change from one treatment to another? You're going to see a lot of that work."
Metagenomics has come so far, in fact, that the discussion is less about how to perform the science but rather how to use it to answer biologically significant queries. According to Jonathan Eisen at the University of California, Davis, "Metagenomics is a tool for answering scientific questions; it is not a field. You have to start with a scientific question."
Mya Breitbart at the University of South Florida says, "What metagenomics gives you that I don't think other tools have is [it] tells you what might be important in the system. Metagenomics gives us direction — it allows us to create the hypotheses."
Starting with viruses
Viruses are among the most abundant biological entities on the -planet, and a lot of early metagenomic analyses focused on them. Rohwer has been a leader in looking at viral communities, especially in coral reefs. Changes in temperature or pH can stress corals, and in a recent paper published in PNAS, Rohwer combined viral metagenomics and real-time PCR to show that in Porites compressa corals, there is a full range of eukaryotic viruses, many related to the Herpesviridae. The virus community, he found, shifted in response to stressors like reduced pH, elevated nutrients, and increased temperature. In particular, when exposed to acidity, too many nutrients, and thermal stress, the abundance of herpes-like viral sequences rapidly increased in two separate experiments. Herpes-like viral sequences were rarely detected in apparently healthy corals, but were abundant in a majority of stressed samples.
"There are four big stressors that we think are important for the future of coral reefs," Rohwer says: the influences of nutrient loading, energy loading, increasing temperatures, and decreasing pH. "What we've found is that essentially all of the stressors will cause fairly dramatic changes in the microbial and viral communities. And it really does look like they're getting more pathogenic. For example, we see a lot more herpes viruses in the corals that have been stressed, and on the microbial side, we see a lot more things that you would call pathogens," he says.
Rohwer says that metagenomic approaches have allowed him to see what's happening in the system as a whole. "It would have been very hard to do this work without metagenomic approaches," he says. "The advantage of the metagenome is that you just look at what is happening in the system. Previously you would have had to start with whatever your best guess is, and now it's the opposite." He says his team tends to look at a system and then builds experimental tests off of that.
Mya Breitbart also looks at viral communities and is starting to use metagenomic data to find, sequence, and study individual viruses for what their function might be within the entire community. "Because [meta-genomics] is expensive and time consuming, it's been a lot of snapshots of communities," she says. "But we don't necessarily understand enough about the dynamics."
Breitbart's work takes her from sewage to sea water to animal viral communities, and one thing she's most excited about is applying the technique to discovery. "We're still doing stuff to look at whole communities, but we're going a lot more into animal tissue samples, purifying the viruses, and then sequencing them to discover whole new viruses," she says. Two years ago, she says, she and her colleagues were mostly surveying the community with broad-based, deep sequencing. "What we've been able to do now is to go into different organisms and actually recover whole genomes that we didn't know about before," she adds.
In one study, Breitbart is performing viral metagenomics directly on tumor samples from sea lions and identifying viruses from these by reassembling a complete genome. Like Rohwer, though, she thinks one of the biggest challenges facing the community is bridging to functional genomics — that is, elucidating what these genes are doing both alone and as part of the viral community. The viruses may be known, but "we actually don't know what they infect," Breitbart says. "Once you have a whole viral genome, obviously you want to know what its host is so that you can understand what its role is in any given environment."
It all comes down to sequencing, where new techniques have enabled her to find organisms beyond the typical double-stranded DNA viruses. A new focus for her lab is single-stranded DNA viruses because scientists can now separate them out of a community and sequence them. While it was once thought that single-stranded viruses didn't exist in the oceans, "now that we can see them, we realize that we've just been missing them the whole time," Breitbart says. "This seems true also of RNA viruses. Metagenomics really has moved into all the different nucleic acid types now." She adds, "If you have the right focus or the right question in any system, you can use metagenomics as a tool to get at it."
Enter the microbes
Viruses aren't the only prize in metagenomics. Bacterial communities have all but stolen the spotlight recently. According to Folker Meyer, who is involved in the international soil sequencing project called Terra-Genome and who runs the MG-RAST server at Argonne, most of the 2,000 metagenomes that have been deposited in the institute's databases are microbial in nature. "In the beginning it was a lot of viral stuff … but right now what we're seeing is mostly microbial," Meyer says.
Meyer works on both sides of the metagenomics coin, running studies and helping to analyze the data. He's helped conduct one of the largest metagenomics experiments that presently exists, the TerraGenome Project, managed by the International Soil Metagenome Sequencing Consortium. "We're trying to track how soil plays a role in carbon sequestration and global climate change processes," he says. He and his team have partnered with the National Science Foundation and have sequenced soil metagenomes from several sites across the US — now they're analyzing what lies within. "[By] using metagenomics, we can now also study the metabolism," Meyer says. "We can study which genes are present, so we get an idea which metabolic processes are active in that soil."
Jonathan Eisen at UC Davis splits his time between computational and wet lab work. He's been using metagenomics to study symbioses — between a microbe and its host or community symbioses, "which are a community of microbes and how they work together," he says. While others are generating massive amounts of data, he's asking different questions. "I'm interested in a slightly different thing, which is how they work together in order to invent new functions," he adds.
In recent work, Eisen has been looking at two separate symbiotic systems. In one, he's found that deep-sea invertebrates function a lot like plants. "They live by having bacteria inside their gut or their liver that chemosynthesize for them, and they don't actually eat," he says. "What I'm interested in is the rules by which symbiont-host interaction evolves. So I'm trying to use new high-throughput sequencing methods to characterize different types of symbionts, different types of biology involved, and different types of hosts."
In other work, he's taken to studying the glassy-winged sharpshooter, an insect that carries the -bacterium that causes Pierce's disease, which can ravage vineyards. Eisen has spent three years poring over metagenomic data to figure out why the insect, which has a very poor diet, can actually survive. It turns out that it carries not just one but two bacterial symbionts that each produce amino acids and vitamins for the sharpshooter to live on. "We didn't know [of the dual symbionts] when we started the project, but we figured it out by doing metagenomics," Eisen says.
His philosophy is to work out design methods for simple meta-genomic systems, and then port those to more complex ones. One of the earliest communities to be studied was the acid mine drainage system, and that's still a big part of continued and developing study, especially for metaproteomics. "We can learn a lot about how to design methods to study metagenomic data by first seeing if they work on really simple ecosystems," Eisen says.
Data deluge
One thing everyone can agree on is that metagenomics is moving into uncharted territory when it comes to data analysis. In part, some of Eisen's work is advancing that analysis. "Without a doubt, there are two big challenges with metagenomics, with analyzing the data," he says. "One is that you get short fragments of genomes, and the other is that you can have a complex community that you're trying to sort out."
To that end, he's developed a method of sorting reads into "organism-specific bins by building phylogenetic trees of all the reads" before doing further analysis. It's a critical step, he says. While some people are still treating metagenomic reads like a bag of genes, many are moving toward binning. "You wouldn't go to an island and grind up all of the plants and animals and then try and analyze the island ecosystem by summing across all of the fragments," Eisen notes. "It's just ridiculous. If you sort those fragments into organisms, it's much easier to then interpret the data. This is called binning."
Binning is the process of taking all the sequence reads from meta-genomic data and comparing them to the genomes of closely related organisms in order to grossly categorize them, followed by subsequently easier genomic analysis. Not all reference genomes are available, so Eisen has focused on a phylogenetic approach. He heads up the Genomic Encyclopedia of Bacteria and Archaea project. "We've been sequencing genomes from across the tree of life, to serve as these anchors for sorting through metagenomic data. They're not reference genomes, but they serve as our reference tree, in essence," he says.
At Argonne, Meyer helps with both annotation and warehousing. With about 1,700 private metagenome datasets and almost 300 public ones being housed at the national lab, and its MG-RAST server playing a large role in data analysis, Argonne is definitely a hotspot. "We've sort of become the hub for a lot of what's going on with metagenomics, I believe, because we have this MG-RAST server, which is by far the biggest repository and analysis resource that exists," Meyer says. While RAST is an algorithm that can very rapidly sort through genomes — Meyer says the Argonne team can annotate 60 to 80 microbial genomes per day — MG-RAST is for metagenomes. "MG-RAST has a much bigger role in the metagenomics community than RAST has in the genome community, because there is, I think, right now no alternative to MG-RAST," Meyer says. "It's one of a kind."
In the future, Meyer sees a need for better standards for metadata and increased bioinformatics support for short-read sequencing, as well as a focus on comparative genomics. Breit-bart is looking forward to more follow-up studies that include comparative spatial and temporal data. "If you're talking about seawater, we can compare many different areas of ocean and start to understand how much overlap there is between communities at different locations, or we can compare several depths throughout the water column and see what are the different communities, and what sort of genes are being enriched for under particular environmental conditions or in a certain location," she says. "We're getting past the point of one sample, and moving to pooling lots of samples or analyzing lots of separate samples to compare similarities and differences."
All that takes a lot of computational power. "The good news is that people learned some lessons from genome sequencing," Eisen says of the tendency to underfund the computational side of things. "I think many funding agencies have learned that we can't do that with metagenomics. Metagenomics is much more complicated than genome sequence data."
Forest Rohwer thinks one the biggest challenges is the speed of the process. While it still takes months to get from sample to analyzed data set, "I really do expect to see that coming down to about a day," he says. He also thinks that the analysis is getting much more sophisticated. "It's not just that it's getting faster, it's also becoming very … statistical. The data sets are so large that the only way to look at them is to use these high-powered statistical approaches," Rohwer says. Still, he doesn't see the computational hang-up going away tomorrow. "New tools need to be built," he adds.
Mya Breitbart concurs. "The tools are behind," she says. "A lot of new tools are being developed in that direction, and that's going to be the area where we're going see the most growth in the next five to 10 years."
They're All Going Meta
Metagenomics isn't an only child. On its heels are metatranscriptomics, metaproteomics, and metametabolomics. All emerging fields, they hold much promise, says Jonathan Eisen at the University of California, Davis. "They [all] seem like they're going to be very useful [and] they're going to be even more complicated in some cases to analyze than metagenomic data," he says. "Clearly, systems biology of microbial communities is going to have to happen. It's not just going to be genome sequence data."
Oak Ridge National Laboratory's Nathan Verberkmoes is one of the few who have published work in the emerging field of metaproteomics. "It's just an extension of what's been going on in isolates for a long time," he says, only with metaproteomics, "it's usually not a one-to-one comparison. The metagenomics just tells you the blueprint of what's there and what could possibly be functioning, but not every gene is expressed. Very often certain portions [of microbes' genomes] will be expressed under certain conditions to respond to the environment."
Most of Verberkmoes' work centers on his chosen model system, acid mine drainage, mainly because of its low diversity, he says. He's also looking at sludge bioreactors, the human gut, ocean samples, and ground water and soil sediment from contaminated field sites. "Basically the idea there is to look at the microbial community in the groundwater after bioremediation," he says. "The idea of bioremediation has been around a long time, but using meta-genomics and metaproteomics to understand it is a new concept."
Verberkmoes sees advances in next-gen sequencing technologies as having much to offer both metatranscriptomics and metaproteomics. He also believes that higher-res proteomics instrumentation will be key to better data. Because of the high variability in the communities, it's becoming more and more important to have "high mass accuracy data," he says. "Really what we need the most is better genomic sequencing, and then better fractionation. Getting deeper and wider [than E. coli and yeast] is the big struggle right now."