In an attempt to solve Charles Darwin's "abominable mystery" on the origin of flowering plants, the New York Plant Genomics Consortium has created what it is calling the most comprehensive evolutionary map of seed plants to date. The project — a long-standing collaboration between the New York Botanical Garden, the American Museum of Natural History, Cold Spring Harbor Laboratory, and New York University — has resulted in a paper published in PLoS Genetics in December, which maps the evolution and gene structure of 150 species of plants. "We've put together the most robust and complete evolutionary tree of plant [life] yet," says CSHL's Rob Martienssen. "There weren't really any huge surprises in the tree, and I think it agreed with previous versions of the evolutionary tree, which is a good thing."
The team calls its approach functional phylogenomics. Flowering plants arose at a discrete time in the fossil record, Martienssen says, and appear to have blossomed into thousands of different species right away — hence Darwin's mystery. Generally, researchers use standard sets of genes to create phylogenetic trees. But using its new approach, the team was able to create a phylogeny for the plants based on all the genes in their genomes, and then use that to statistically identify which genes and proteins were most important in the evolutionary divergence of each species. "One of the most important splits is between gymnosperms — which are primitive seed plants from the Triassic — and flowering plants or angiosperms," Martienssen says. "So we looked at 150 different taxa and something on the order of 10,000 different genes across all those taxa and asked the question, 'Which of those genes contributed most to the difference between all angiosperms and all gymnosperms?'"
The answer to this question is where Martienssen found the biggest surprise. Of the 300 genes that the researchers determined contributed most to the split, third on the list was POL4, a gene required for RNA interference. "We already knew that the small RNA constitutions of gymnosperms and angiosperms are very different," Martienssen says. "Gymnosperms lack a class of small RNAs 24 nucleotides in size that come from transposons and repeated DNA, whereas angiosperms have them in abundance. And these small RNAs depend on POL4 and POL5 for their biosynthesis." The result made sense, he adds, and showed the researchers how powerful their methodology could be in identifying the genomic reason behind commonly known differences. "After looking at this mountain of data — literally millions of data points — we end up with one gene at the top of the list that really gives us insight into the difference between angiosperms and gymnosperms and Darwin's problem," Martienssen says. "I was really blown away by that."
The team's work is not over. Most of the 300 genes that the researchers say contributed to the split between angiosperms and gymnosperms are of unknown biological function. And every time a new plant genome is sequenced, the tree is set up so that the new information can be added and analyzed in that context. "It's a scalable analysis and that's really exciting because there are so many genomes coming out, almost every week now it seems," Martienssen says. "We can assign a whole bunch of genes to a branch and say 'These are the genes that matter.' And that's potentially really, really powerful."