director of the center for evolutionary functional genomics,
Arizona State University.
Sudhir Kumar heads the center for evolutionary functional genomics at Arizona State University’s Biodesign Institute in Tempe, Ariz.
Bioinform recently spoke to Kumar about a $142,000 grant from the Science Foundation of Arizona to enhance TimeTree, a hierarchical knowledgebase. With information on species divergence times culled from scientific literature, it allows researchers to search for divergence times of two or more taxa or all time estimates published by a single author [Bioinform 05-14-07].
Kumar discussed everything from the history of tetrapods to the rollout of what he said will be his simple, revamped TimeTree, designed with an adaptable userface resembling Google that even a child can use — literally, as it will be targeted to elementary and high school students as well as researchers and scientists outside of the bioinformatics space.
The database is freely available here.
Following is an edited version of an interview conducted this week with Kumar via phone and e-mail:
What goes on [at the BiodesignCenter for Evolutionary Functional Genomics], and what’s a typical day like?
In EFG, as the name suggests, we do evolutionary functional genomics, which is primarily looking at genomes and trying to understand the functions of different genomic parts. As you know, genomes are really large, especially the human genomes, and we only understand certain parts. We take a comparative approach where we compare the genome of humans with those of other species, which is an evolutionary approach, and then [we are] able to tell which parts of the genomes are highly conserved or unchanged over time.
Basically [what we do here] is about looking at parts that have not changed versus parts that have changed, and this is the field of evolutionary genomics. And today if you look at the Human Genome Project [or] at major labs studying human genomes, they are taking this type of approach. They are comparing the human genome and comparing it to other [species’ genomes] and IDing similar parts across species versus very different parts.
On April 10, Biodesign announced the award of a $142,120 grant to be used to expand the relational database you designed, TimeTree. What is TimeTree and how will the new, expanded version differ from the old?
We develop a lot of methods for identifying portions of the genome that are changing quickly or not at all, and trying to understand how that relates to function. Will that have functional consequences? My group, for example, develops a software tool [called] Molecular Evolutionary Genetic Analysis, [which] is for taking [sequence] data from different strains of influenza, HIV, from different species and building phylogenetic trees, relationary trees.
You can take data from any of these sources and analyze it and look for relationships between strains. … A lot of people use the tool.
TimeTree is a knowledgebase of information about species divergence. As the name suggests, it is when did species A and B have a common ancestor? So one can ask, ‘When did cats and dogs have a common ancestor in the past?’ If they want to know the time estimate, they usually have to go to the literature … and try to make sense of species name and family name [for example].
We have programmed the literature in a database, [and included, say,] any paper that has information that bears on cat and dog time of divergence, as estimated using the molecular clock principals. People have estimated times of species divergence using five genes, twenty genes, 100 genes in their papers — and we take all these papers and program them in a database and provide the user with a simple interface so the user can pose their query in [simple terms such as] … cat and dog.
The system goes in and finds all the time points that bear on that problem, spits them out to the user in a format so they can see all the evidence, and provides an average time estimate from all studies.
In the first release, we [included information from] 70 papers, [for what was the] pilot project, from various sources — from recent to old, a random sample, which are for four-legged animals, tetrapods — because they are what people see around them commonly — and programmed them in as a proof of concept. Using the [new] funding we are adding 100 more to study in the database. So basically it is the same set of species, but more [information will be] added. In the future we are interested in adding to the tree of life completely. …There are studies outside of [the tetrapod] species.
When did TimeTree, the Web site, first go live?
Good question. It was June of last year.
Why do you think the Science Foundation funded this project?
I think the reason is, [to] my way of thinking, [that] the Science Foundation has an interest in catalyzing scientific information at a fundamental level as well as at [the] applied research [level]. They are in that range. So what we are doing in TimeTree is basically making this information accessible to almost anyone who can pose the question; they don’t have to be an expert in evolutionary biology … It is the accessibility aspect, which brings knowledge to the general audience, including scientists who aren’t evolutionary biologists, students and kids.
What are the commercial implications of this work?
It will be available free of charge to everybody for every purpose. It’s a web site, but the back end is very sophisticated … the same thing is true for our MEGA software package, which has more than 50,000 downloads. That is another one that is also free of charge because it is developed using [National Institutes of Health] and [National Science Foundation] funding. Commercialization of software is a complex thing [in general] … [but there will be] nothing additional [to do or to be charged] — just like Google.
What’s next? Were do you go after TimeTree?
Once we finish the TimeTree based on genome data, we will start to add information about times of species divergence, when they are available, from fossils and other sources.
What is your timeline for completion?
The Science Foundation funding is for one year to complete the whole expansion, so that would be launched in less than one year. We are now entering the data. … Our next step is to request NSF funding to build automated tools and systems to actually fetch all the data from all the species that exist from all the literature … and expand the interface, the scope and so forth, and do many other things.
One of the goals of the project is to make TimeTree accessible for school-age children. Is this part of the revamping process, and if so, how exactly are you going to make it quite so user-friendly?
The idea is the same database [will be used for young students and adults alike], but on the front end there is a separate section for kids so they can see their favorite petting zoo animals and sort of explore their relationship to each other and [see, for example,] when did they have a common ancestor? They might see a goat and a bunny, for example, and [the information is] provided in an accessible way. You can see when did a rabbit and a sheep or a goat have a common ancestor? They can relate [it] to their everyday life.
The display has to be simple and not all the papers in every publication [are included in the simpler version], but rather a summary. … [It takes] an average of 20 genes to tell that rabbits and goats diverged 90 million years ago.So that means genetically they became distinct species. They might not be very different morphologically; their anatomy may be very similar. But they become different species … they might have been identical in almost every way; so the question is what is the reason for speciation? There is a lot of discussion on this. Why would they diverge into two different species?
[I wrote papers] 10 years ago [detailing how] … 90 million years ago, a lot of continents were splitting up, and as a continent was divided into two different continents … members of a group were stranded on one land mass and others on another so they could not reproduce. They were similar looking, but … they could no longer cross with each other. … [It was] speciation by continental breakup.
[I wrote about this] in Nature. The first paper came out in 1996, then another in 1998.
The main finding we had at that time is that major groups of mammals had a common ancestor, and divergence [actually occurred] almost 100 million years ago rather than 65 million years ago as people thought.
[Our] common ancestor[s] included … progenitors of all primates, rodents, all cow groups, camels, et cetera. The important thing was we indicated five major lineages of mammals that had already diverged by 90 million years ago … All the primates we see today, all the African mammals we see today, all the rodents … We did not have rabbits obviously.
Why is it obvious that we didn’t have rabbits?
There isn’t any fossil record that shows anything like rabbit. 90 million years ago, dinosaurs were roaming the earth … [and] mammals were really small because they were hiding. … It is shown in the fossil record that after the dinosaurs’ extinction, mammals started to grow in size. The mammals we see today are here because of the steady growth.
Is this the only such database [in the field]? If not, what sets TimeTree apart?
It’s the only kind of [species divergence database] that synthesizes literature and brings primary research to the fingertips of researchers who are not evolutionary scientists [as well as to a] general audience.
[So] no, [there are] not really [others like it] because most [databases] have information about sequences. Some have [information] about the tree of life, but none of the databases are actually programmed in a way that can be accessed for a specific purpose.
In a knowledgebase like TimeTree, the focus is on allowing people to identify a pair of species and bring [together] all the results people can [gather. The] options are very straightforward and specific, and the idea is to give information to a general audience. … Researchers also will have access to data in a more complex [manner].
The primary interface is for general users, undergrads, molecular biologists, kids. Think about Google. You have a single line where you can enter your query and get a bunch of results. We will keep it as simple as possible.
So how many hits a day are you getting on the site right now, before the enhancement is through?
At present, we are getting 150 hits per day.
What else can you tell me about the technology involved on this? Any particulars about the database? Its size, capacity, et cetera.
The database currently contains results from 70 studies, with an additional 100 studies to be added. There are more than 1,000 divergence times in the database today, which will expand to 3,000 with the next edition.
How else are you funded besides by the Science Foundation? Is this the only grant supporting TimeTree?
At present, Science Foundation of Arizona is our primary sponsor. However, the development of the initial TimeTree website was sponsored by the National Science Foundation.