Thanks to genomics technology, evolutionary biologists have been able to ask and answer questions the likes of which Charles Darwin would probably never have imagined. But there is still much to be done, both with all the new data coming on long-studied models as well as often overlooked species that were previously strangers to in-depth genetic analysis. David Kingsley, a professor of developmental biology at Stanford University, is one evolutionary biologist spreading the wealth when it comes to genomics technology. Kingsley, who has been researching the stickleback fish since 1998, says that advancements in sequencing and genotyping technology over the last 10 years — not to mention the accompanying reduction in cost — has allowed biologists to direct the power of these technologies beyond the conventional fare of model organisms and concentrate on other systems out in the natural world. "In my own lab, we had done 10 years of mouse genetics, tracking down morphological traits in particular genes and chromosomes," he says. "But now we're going out in the wild and taking animals that have evolved under a full range of fitness constraints in completely natural environments and looking at the molecular mechanisms that have allowed them to colonize and adapt to new places."
When Kingsley first became interested in sticklebacks, there were almost no GenBank entries for the small distant cousin of the seahorse. "That's not a very promising basis for trying to track traits all the way down to genes and mutations," Kingsley says. "There were no genetic maps, no DNA libraries, no expressed sequence tag collections. And despite that, in a relatively short amount of time, it's been possible to take that wild organism and build libraries, EST collections, transgenic methods, genetic maps, expression arrays, and a complete genome sequence for three-spine sticklebacks." In fact, he is now in the process of sequencing dozens of three-spine sticklebacks to allow his team to compare the sequence changes in a whole range of examples where the animals have evolved in similar ways at different locations.
Many in the evolutionary biology community believe that they have only seen a glimpse of the type of data that will come down the pike once whole-genome sequencing is readily available. "The exciting thing that is happening is that we can actually plot the molecular trajectory of these evolutionary changes now," says David Haussler, a professor of biomolecular engineering and director of the Center for Biomolecular Science and Engineering at the University of California, Santa Cruz. "We can determine that particular changes happened in the genomes of ancient species, and there was a particular evolutionary result from that, and explain it in molecular terms, or from the point of view of systems and cell biology. ... This is the tip of the iceberg of what will be a massive set of evolutionary studies that will be enabled when we have affordable whole genome sequencing."
Depending on what evolutionary question one is asking, cutting-edge sequencing platforms are not the the problem in getting an answer. As long as your evolutionary questions are not focused on filling all the gaps in the genome, and having all the pieces put in the right order, current methods suffice. For some investigators, such as Jonathan Eisen, a professor at the University of California, Davis, genome center, the rate-limiting step is handling the informatics quagmire that many evolutionary biology projects create. "Unquestionably, informatics issues are also incredibly challenging right now because you can generate a massive amount of data with an Illumina machine — but other than a few genome cognoscenti people, most normal biologists are not used to dealing with terabytes of sequence data," Eisen says. "It's complicated because the reads are really short — you can't analyze individual reads the way you might have analyzed individual gene sequences in the past — so you have to do something where you pile all the reads together to build up some type of scaffold or other information to then make some sense out of it."
Eisen says that when it comes to developing new methods, evolutionary biologists might want to take a cue from population geneticists and, for example look at SNPs for data on polymorphisms, which are the grist of the mill for evolution. But whatever the case, they are certainly going to need access to new computational resources. "People have in the last few years become much more fans of likelihood methods and the statistical approaches of building phylogenetic trees, which is computationally very costly compared to some other methods. If you have terabytes of data to analyze, it's not clear what to do," Eisen says. "There is no doubt that evolutionary biology is being revolutionized in part by this type of data, but there is still a bit of a lag in getting everybody up to speed on trying to figure out what methods we're going to use for these massive data sets."
Beyond sequencing and informatics, Stanford's Kingsley says that a surefire boon to his research would be the ability to make targeted genetic changes by homologous recombination. "In mice, we can reach into the genome and make almost any sequence change we want, and we would love to be able to do that for other evolutionary models," he says. "Because if you're trying to track evolutionary traits to base pairs, a method for making any base pair change you want and seeing what it does would be really, really useful. We don't have that; we have to work around it."
Unfortunately, this is a problem with no easy fix as the biology of the germ cell precursors and organisms is finicky. One cannot take what works for a mouse and get it to work for another organism. "Unlike genomics or genetics, right now the behavior and cultural conditions for stem cell progenitors is idiosyncratic enough that the methods that have been successful in one place don't instantly translate into methods you can use for other organisms," he says.
Recently, Eisen and his colleagues unveiled a pilot project that they hope will help the community make the most of existing microbial genome data with a phylogeny-driven resource called the Genomic Encyclopedia of Bacteria and Archaea at the Joint Genome Institute. Currently available genomic resources for microbes are from a very narrow phylogenetic distribution of organisms, so the GEBA team has endeavored to create a project that goes through the evolutionary tree of bacteria and archaea to identify lineages for which there are no genomes available and that could be easily cultured in the lab. "We're going to do about 200 genomes in this pilot — but in the long run do 1,000 genomes — to basically fill in the 'dark matter' of the biological universe," Eisen says. "There is a benefit that comes from selecting phylogenetically novel organisms.
The Genome 10K Project
Those in the community focused on vertebrate evolution are also in business to create their own centralized resource. Haussler and a group of roughly 57 researchers from around the world who specialize in a wide variety of vertebrate species recently proposed their vision for the Genome 10K Project, a veritable genomic zoo that they hope will eventually contain 10,000 vertebrate species. The idea for the project is based on the assumption that the cost of sequencing will become cheap enough to allow the project to be completed in the near future. "Ten thousand is a number that is substantially larger than the number of genomes we have sequenced to date, which is about 50. ... Now 10,000 sounds pretty crazy at this point, but if you actually plot the cost of genome sequencing, it has dropped by four orders of magnitude. That's an enormous drop," Haussler says. "Now it's sitting at about $30,000 per genome, so all we need is a factor of 10 improvement, and we're down to a $3,000 genome. Then you do 10,000 for $30 million plus $20 million to run the project — for $50 million, you have a totally unprecedented resource for understanding the diversity, evolutionary biology, and molecular-level essentials of vertebrates."
Ultimately, the Genome 10K Project is intended to become a resource in which any evolutionary biologist can log in to a website and take a tour of the trajectory of DNA changes in the region, see the protein-coding regions change as the user delves down different branches where amino acid substitutions are made, observing the evolution of different gene duplications and what happens afterwards. "In addition to proteins, you'll be able to see the evolution of the regulatory regions associated with the gene as they come and go and are sculpted by selection, so it's going to provide this raw material for the exploration of the spectacular evolutionary innovations of the vertebrate lineage," Haussler says. "We draw the analogy to the Human Genome Project because it would make such a difference for science and medicine, but nobody claimed that the human genome sequence itself would immediately solve the problems of medicine, but it is a foundation. Similarly, we're not going to solve all the great questions of evolutionary biology with the Genome 10K Project, but we will provide an essential roadmap that is absolutely necessary for their solution."
With the continual development of genomic technology, many debates and theories are put to the test, and even more questions pop up. One such debate is related to the implications in the microbial world of lateral gene transfers in which DNA moves from one evolutionary lineage to another. The default way most people look at the evolution of species has been to build an evolutionary tree, but, for many years, it's been clear that not all species follow a tree-like pattern. A point of contention in the bacterial and archaeal world is how much of this happens and whether it requires the scrapping of the entire notion of a tree to represent speciation. "There are some people who would say that, yes, gene transfer occurs and it's so frequent that the concept of a tree is meaningless to the evolution of bacteria and archaea," Eisen says. "And there are others, like myself, who think that it is true that lateral gene transfer happens, and it is important, but it appears to not be so frequent that it wipes out all the signal of a tree, therefore a tree notion has some use."
But whatever side of the debate one is on, it has been unquestionably shown that there is a part of the genome content of bacteria and archaea that does not follow a tree-like pattern, and this was much less appreciated before genomics. Eisen and his colleagues have made an attempt at addressing this disagreement with a Nature paper published last year that shows what they believe to be conclusive evidence that there is signal in the tree, and that lateral gene transfer does not wipe out all tree-like evolution.
Another long-running debate which genomics technology is helping to address centers around how many genetic changes are actually underlying the interesting evolutionary differences that are observable in wild organisms. It could be the case that evolutionary change requires thousands of adjustments of infinitesimally small effect, which is what Darwin thought. Evolution, then, would work through a countless number of genetic changes and anything big would likely be deleterious. That issue alone has about 150 years of debate behind it. "With the ability to build genetic maps of wild organisms, it's now possible to cross various forms that have evolved differences in natural environments and find out the collection of genetic alterations that produce traits," Kingsley says. "When we've done that in sticklebacks we have found something that I think would surprise Darwin, and that is that some very big evolutionary differences may be controlled by relatively simple genetic systems." Kingsley and his colleagues published a paper online late last year in Science that demonstrates that the presence or absence of an entire hind fin is mostly controlled by a chromosomal region with a regulatory alteration in a key developmental control gene.
There is also the question of how much of evolution is caused by regulatory changes versus coding change. The ability to sequence and do genetics is now providing a whole series of examples that makes it possible to approach that question for the first time. In addition, the question of selection versus drift also persists. How many of the traits that we see in animals are actually the products of adaptation? "We know at the DNA sequence level that there are a lot of base pair changes that you see in a genome that may actually be neutral changes — they do not have much phenotypic effect and they drift. Populations change in sequence not always because the sequence changes have been selected, but sometimes because the sequence changes don't matter at all — and if they don't matter, then they're free to drift," says Kingsley. "With all the sequencing being done right now, it's possible to look for signatures of molecular selection in the genomes of different organisms, such as humans who have adapted, and observe what regions of the human genome have been subject to strong recent selection, such as the evolution of disease resistance."
In the hopefully not-so-distant future, Eisen and his colleagues say they would like to have large amounts of data on the genomic variation within different branches of the bacterial and archaeal tree. "Do some have no lateral transfer and others have a lot? Do some have weird mutation processes and others do not? The thing that's exciting from an evolutionary point of view is going to be comparing and contrasting evolutionary processes in different branches, as opposed to right now where we just have a good sampling of just a couple of major branches," Eisen says. Beyond that, he is also hoping that the bacterial and archaeal community will start to resolve some of the deep branches in microbial evolution and build a better understanding of how major features have originated in bacterial and archaeal evolution. "The Gram-positive cell walls, photosynthesis, hydrogen production — we just don't have a good idea about the early evolution of those processes," Eisen says. "Plant and animal evolutionary biologists have been able to [study these features] without genomes because they have fossil and morphology and behavior, but the best way we to get at these early events is to look at genomes."