When it comes to phylogenetics, researchers are finding out that what they do know is far less than what they don't know. As new species of animals and plants are continually being discovered, it is perhaps not so surprising that there is also still a lot to learn about the world's many kinds of bacteria and archaea and how they have evolved. "Basically, so far, of the known phylogenetic diversity of bacteria and archaea, we have genomic data for about one one-hundredth of a percent of that diversity," says the University of California, Davis' Jonathan Eisen. "We've just started to scratch the surface of phylogenetic diversity of bacteria and archaea in terms of genome data."
Researchers can almost have their pick of what to study first, and their approaches have been as varied as the bacteria and archaea they aim to study. Some take the metagenomic route, studying the genomic material found in large samples from bodies of water and other places around the world in order to classify as many organisms as possible. Other researchers have chosen a more specific route as they try to suss out the specific mechanisms by which bacteria or archaea evolve new traits.
Growing the tree of life
Eisen's self-described "obsession" for the past seven years culminated in a study published in PLoS One in March; he and his team "stalked the fourth domain," using metagenomic data to search for a possibly novel branch of the phylogenetic tree. "Metagenomic data — where you scoop up random DNA and just sequence it — is ideal for looking for new lineages of life because you can scan through that data and look for completely novel lineages of organisms by building phylogenetic trees of genes to study the tree of life," Eisen says. "That is what we did."
Much of the discovery of the diversity of organisms on Earth has focused first on cultured organisms because they could be grown in the lab. Twenty years ago, Eisen says, PCR opened a window into the diversity of organisms, but that approach was heavily biased toward finding variants of organisms that had already been discovered, as PCR primers are targeted to amplify known DNA stretches from bacteria, archaea, or eukaryotes, and not look for a possible fourth branch on the tree of life.
The principle finding from Eisen's seven-year metagenomic search for the fourth domain was a set of novel lineages in the three gene families commonly used in phylogenetic studies — specifically the existence of multiple novel branches in the recA and rpoB gene families. "If you take phylogenetic marker genes that are used to study the tree of life and you build phylogenetic trees of those from cultured organisms as well as random sequences from environmental samples, you find evolutionary lineages and environmental data that are distantly related," Eisen says. "Now what that means, we have no idea." It could be that there are versions of these gene families that are evolutionarily distinct from those that are known in bacteria, archaea, and eukaryotes. But in the end, Eisen's team does not know whether these novel genes are from a fourth branch on the tree of life or not, because viruses also sometimes encode homologs of these genes.
The disadvantage of a metagenomic study is that it can be tough to figure out which genes came from which cell or organism. "Right now, our hypothesis is that they're either from a fourth branch in the tree or they're from phylogenetically unusual viruses," Eisen says. He hopes that other researchers will take his work and build on it — perhaps using methods that weren't available seven years ago, such as high-resolution single-copy gene fluorescence in situ hybridization — to pinpoint where these novel genes come from. "Walking your way around the genome, additional sequencing, and microscopy experiments — any of those would help resolve this. Doing viral enrichments, my bet is that's where you're going to see these things, but we don't know."
When Eisen and his team first made their discoveries in 2004, they were all very surprised by what they found; they were not prepared to see such diversity in the environment. Indeed, part of the seven-year delay in publishing came from wanting to make sure that the novel sequences were not artificial or created by the alignment methods the researchers were using. They did so by building trees from alignments generated by multiple alignment methods and restricting their analyses to lineages that included more than two or three sequences to exclude single "weird" sequences, Eisen says.
But now, seven years later, Eisen says he's no longer surprised by what his team found. As part of Craig Venter's Global Ocean Sampling survey, Eisen saw that even one small boat trip to 40 collection sites around the world increased the phylogenetic diversity of gene families, sometimes by an order of magnitude. And that, he adds, was just the ocean samples. Researchers have yet to make detailed searches of hot springs, volcanoes, salt flats, the human gut, or even the very air we breathe.
"The diversity of the functionally interesting genes is enormous in environmental samples," Eisen says. "Why? One possibility is that there is a lot of diversity of phylogenetic organisms out there that we don't know anything about. Phylogenetically novel organisms encode diverse genes. The sequence diversity in viruses is [huge], and in recent studies people have found that viruses seem to encode ecologically important genes as well — photosynthesis genes and nitrogen metabolism genes. And what we know about the genomic diversity of viruses is nothing, basically. So again, scanning through metagenomic data — 'Oh my gosh, we've found weird stuff' — in retrospect, I'm not surprised."
The sequence diversity that's present may or may not correspond to functional diversity or phylogenetic diversity, he adds, but studying it at least gives researchers a shot at understanding it. "I would bet that what we found are weird viruses, but I think it's entirely possible that there is a fourth branch of organisms out there. There could be even more branches," Eisen says. "Why not?"
Back to basics
At the Université de Lyon, in France, Manolo Gouy and his grad student, Matthieu Groussin, decided to take a different approach to studying the evolution of archaea. In a paper published in Molecular Biology and Evolution in April, the pair says that adaptation to environmental temperature is a major determinant of rates of molecular evolution in archaea. Temperature has been shown to have an effect on archaeal genomes — thermophilic organisms, those that thrive at high temperatures, generally display lower evolutionary rates than mesophilic organisms, those that thrive in moderate temperatures, Groussin and Gouy write in their paper. In mesophilic lineages, an increase in the rates of gene substitutions could be interpreted as an ongoing adaptation to colder temperatures, they add. The paper, Groussin says, has received a lot of attention, likely because it provides an explanation for a feature that had long been identified — that the rate of molecular evolution varies a lot — but for which the reasons have so far seemed elusive.
Rather than use paleontological and geological studies to infer the ancestral conditions of life, Gouy and Groussin used a non-homogenous model of the evolutionary process of molecules. "We inferred ancestral frequencies along the phylogenetic tree of archaea, and based on this estimation of ancestral frequencies, we inferred the corresponding ancestral temperature," Groussin says. "There is a linear correlation and linear relation between temperature and composition — classically called the 'molecular thermometer' in articles — and so that was the main point at first."
Once that was done, he adds, the researchers realized that the variation of the composition from the ancestral archaea to the species was tightly related to the evolutionary rates of the molecules, leading to the conclusion that the major determinant of these evolutionary rates is the environmental temperature. "This is very basic science. This is a question of the history of life, the general rules of the evolution of life, and these are events that happened over several billion years of evolution," Gouy says.
Between basic phylogenetic science and the discovery of new bacterial genes lies the work of the University of Maryland's Todd Treangen. Then at the Institut Pasteur in France, Treangen and his colleague, Eduardo Rocha, published a paper in PLoS Genetics in April that challenged what researchers had traditionally thought of the mechanisms of bacterial evolution. In their study, Treangen and Rocha found that horizontal gene transfer — and not gene duplication — drives the expansion of protein families in prokaryotes. Duplicated genes tend to be more transient and evolve more slowly than transferred genes, showing that each pathway serves a different purpose for the bacterium, the researchers say, adding that prevailing theories about bacterial evolution should be changed to accommodate this new finding.
"The reason we started looking at this problem is it's known to play an important role in the evolution of prokaryotes," Treangen says. "And at the same time, it's known that horizontal gene transfer is widespread. But there was a question in the community if the contribution of gene duplication outweighs horizontal gene transfer in prokaryote protein families." There has been an influx of bacterial genomes to conduct these studies on, he adds, allowing researchers to take a wide variety of bacteria into account when studying the evolution of that domain.
"Given eight clades of closely related prokaryote genomes — 110 genomes in total — we basically started by identifying and removing potential sources of ambiguity that plagued previous studies, and we focused mainly on recent expansions in protein families," Treangen says. "We took the sequence of known genomes and the corresponding set of proteins and we did some similarity searches. Then once we had this set of expansions, we were able to pinpoint if these events were horizontal gene transfer or gene duplication." What they found was that the "vast majority" of protein family extensions were, in fact, due to horizontal gene transfer, he adds. The xenologs — transferred genes — persist longer, possibly due to a higher adaptive role, and paralogs — duplicated genes — are expressed more, evolve more slowly, and show more protein-protein interactions.
"We're just trying to figure out the difference in what purpose they serve in prokaryotes," Treangen says. "Is it that gene duplications are more transient and only important in specific environmental stress and then go away, or are horizontal gene transfers more adaptive to provide longer term benefit? Gene duplication is a much longer path to acquiring something that could be useful to bacteria that could be acquired through horizontal gene transfer, which could confer some instant benefit and have a longer adaptive role." Although they had a hunch that their hypothesis was correct, Treangen adds, he and his colleagues were surprised by how much more prevalent horizontal gene transfer appears to be in bacterial evolution.
Not done yet
As diverse as studies of phylogeny and evolution are, there is one similarity that ties most of them together: researchers have only just scratched the surface, and each question answered creates others.
Lyon's Groussin and Gouy are planning to follow several avenues of research on the ancestral reasons for bacterial and archaeal evolution. "We are still interested in developing new methods to reconstruct the deep phylogenies of major domains of life, and to do so we have to improve our model to take into account different biological characteristics of evolution," Groussin says. "We then want to apply these new methods to infer ancestral conditions. We also have ideas of inferring the solar conditions of life, concerning the pressure at which the species live and form so we are still interesting in these deep evolutionary events."
The study of more recent evolutionary events, like the work of Maryland's Treangen, could help researchers determine the mechanisms by which bacteria evolve to resist antibiotics. While he hesitates to go too far beyond the basic science, Treangen says his and Rocha's discovery of the prevalence of horizontal gene transfer in microbes could have an impact on human health.
Still, UC Davis' Eisen says researchers are only starting to ask relevant questions — whether it's to determine if there really is a fourth domain, or just to catalog and sequence as many organisms as possible. "You could estimate that the number of niches for microbes on the planet is a quadrillion, and we've sampled 30 of those," he says. "We know nothing."