This article has been updated to clarify how EvoCor works and the developers' next steps
NEW YORK (GenomeWeb) – Researchers at Virginia Polytechnic Institute and State University have published a paper in Nucleic Acids Research where they describe a freely available web server that they developed called EvoCor — a portmanteau of the words 'evolution' and 'correlation' — that identities functional relationships between genes based on phylogenetic profiles and gene expression patterns.
The web-based server was developed by a team led by Gregorio Valdez, an assistant professor at the Virginia Tech Carilion Research Institute, so that they and other researchers would have a less expensive and time-consuming way of identifying functional relationships between genes that made use of publicly available data and did not depend on tedious biochemical and molecular assays, he told BioInform this week. They wanted, given a gene of interest, to be able to identify genes with similar functions that occur in the same pathway alongside, upstream, or downstream from the input gene and have an effect on its activity.
Their particular interest was in researching genes that are involved in repairing and forming synapses. Valdez's lab studies how synapse structure changes with aging, what molecules precipitate that change, and what role synaptic changes play in neurological conditions such as amyotrophic lateral sclerosis. ALS has a few known associated genes but these only account for about 10 percent of the total incidence of the disease, he said by way of example. That means that there likely are "a lot of genes out there that we haven't got a handle on and we wanted to see if there was an unbiased way to come across additional genes and [that's] what led us to this tool."
To use EvoCor, scientists simply enter the gene name into the search box and the system searches through the information it has on the evolutionary history of mapped genes and compares the expression patterns of the input gene with those of other genes in the database. Queries to the system return lists of candidate genes that function together with the query gene to drive the particular cellular process being studied.
EvoCor takes advantage "of nearly 200 organisms with fully sequenced genomes to map out and compare the evolutionary history of all human genes," James Dittmar, a fourth-year VT School of Medicine student and the first author on the NAR paper, explained in a statement. The data that the server searches comes from the public repository maintained by the National Center for Biotechnology Information and includes data from both human and mouse cell lines.
At its core, EvoCor works in two phases, Valdez explained. It uses evolutionary history to identify genes that have a similar evolutionary history as the query gene. "The reasoning there is [if genes] show a similar pattern of evolution … then maybe cells have forced those genes to evolve in a similar pattern so that they can continue to serve the function, to effect the same biological output," he said.
Specifically, in this step, the system looks at how a protein coded by the query gene has changed in about 182 organisms from the NCBI database. EvoCor has a pre-computed matrix of evolution for all the proteins coded for genes in the NCBI database. So, when users type in a gene of interest, the system compares the evolutionary pattern of the query gene to the patterns in its matrix and looks for similarities.
It then applies gene expression information to refine the candidate list of genes generated from the first step. Essentially, in this step, the system asks "among all these genes that share a similar evolutionary pattern, which one of those actually show a similar expression pattern as the query gene. It then puts out those genes that evolved and are expressed together in the final output," Valdez explained.
For their next steps, the researchers intend to add tools to the platform that will allow users to incorporate their own expression datasets if they choose. Furthermore, he said, they'll add in new datasets, transcriptional data from single cells for example, to improve EvoCor's ability to predict the network of genes that make each cell unique.
What the server provides now is a general list of genes — that together with query gene drive a particular cellular process — gathered from the genomes and gene expression datasets in the NCBI database, Valdex explained. For example, if a user studying a gene in neurons enters that gene into EvoCor, the search results include similar expression patterns from genes that were gathered from kidney tissue, for instance, which may not be relevant to the study in question. "We want to give users more [search] options … so that they can get rid of, for example, all of the non-neuronal tissue," he said.
Meanwhile, Valdez and his team hope to use EvoCor in their efforts to identify molecules that can slow or stop cognitive and motor impairment caused by synaptic breakdown in diseases and aging. "We know of many genes that, when mutated, lead to disastrous outcomes," he said in a statement. "But these genes don't function alone … and those partners could turn out to be better targets for therapeutics."