Organizers of the Computational Bridge to Experiments, or COMBREX, project are seeking a stable source of funding so that they can keep the multifaceted genetic resource alive.
The project, which is managed and run by Boston University, was set up about two years ago to build and support a community of experimental and computational biologists who would work on improving current understanding and annotation of microbial genes.
The COMBREX project was set up to provide small grants to fund experiments aimed at validating computationally predicted gene functions; serve as a host for high-quality functional predictions; and to provide traceable gene annotations.
The project kicked off in late 2009 with $4 million in Recovery Act funds from the National Institute of General Medical Sciences. But that source has now dried up and COMBREX only has enough funds in its coffers to sustain "basal" operations for three to four months unless new funding agencies step in, Richard Roberts, chief scientific officer of New England Biolabs and one of the project's founders, told BioInform in a recent interview.
Meanwhile, the group has already exhausted the award money that it allocated to researchers to validate gene function predictions experimentally, although it still has about a dozen pending proposals from interested groups, Martin Steffen, an assistant professor of biomedical engineering and pathology and laboratory medicine at Boston University and a founder of COMBREX, told BioInform this week.
Roberts said that the development team has considered adopting some kind of commercial model, as some groups like the Kyoto Encyclopedia of Genes and Genomics and the Arabidopsis Information Resource did when they lost their funding (BI 5/27/2011 and 3/19/2010), however "companies are not feeling that generous at the moment and maintaining company support over the long haul is quite difficult," he said.
Roberts said he is working to secure support from agencies like the Department of Energy as well as from the National Science Foundation.
'Not Another R01'
NEB's Roberts first proposed the idea of creating a community that predicted and validated gene functions in a commentary published in PLoS Biology in 2004.
More recently, the development team described the resource in a paper published in Nucleic Acids Research.
The PLoS Biology paper, Roberts told BioInform, highlighted the fact that although a lot of sequence data was being generated, there was "very little attempt being made to work out the functions of the genes that were being found" and proposes a mechanism for providing small grants to laboratories to test predicted gene functions in microbial genomes — initially as a proof of concept.
The idea was to build and make publicly available a database of predictions and "then to find biochemists" who already have the necessary reagents and expertise in house, "who will test the functions of the genes that they are already specialists in," he explained.
COMBREX, which grew out of this initial concept, set out to provide grants of about $5,000 to $10,000 to pay for the "incremental costs" associated with testing computationally predicted gene functions. Roberts explained that this was a way of "spending relatively little money, getting experts interested and involved in the project, and in a way getting some supplements to their granting funds."
It also provided a mechanism for large funding agencies to give away small grants without having to deal with the high associated administrative costs, he said.
Researchers interested in validating specific genes would submit applications in which they would provide the rationale for the experiment; a summary of previous literature about the gene; proposed experimental procedures; references to support the lab's experience with the proposed assay; and an estimated budget.
The two-year, $4 million NIGMS grant that got the project off the ground came from "Grand Opportunity" funds provided by the 2009 American Recovery and Reinvestment Act as part of the Obama administration's stimulus plan.
That money was used to set up the publicly available database, which currently contains both experimentally determined and computationally predicted functions for more than three million microbial genes from more than 1,200 sequenced microbial genomes drawn from resources such as the National Center for Biotechnology Information, EcoCyc, UniProtKB, and those submitted by individual laboratories.
So far, COMBREX has received gene prediction submissions from researchers at institutions like the University of Maryland, Columbia University, and the University of California, Berkeley.
To date, COMBREX has provided several grants to researchers to experimentally validate computationally predicted functions of specific gene products. The group has awarded grants to researchers at the Georgia Institute of Technology, Yale University, University of Miami, and Sanford-Burnham Medical Research Institute, among others.
Users can search for information based on gene names, descriptions, predictions, and identifiers. The database also includes color-coding systems that differentiate between genes whose functions have been determined experimentally and those that were predicted computationally.
The database also includes a "traceable annotation system" for gene function prediction in which "every stated functional annotation is either experimentally determined, or is a prediction explicitly linked through a chain of evidence to an ultimate source of information," according to the website.
Put another way, "we are providing the justification for this annotation," Steffen said. "That’s a cultural change we are trying to instill on the computational end. NCBI has now started to include these traceable statements ... and this allows the individual reviewer to judge for themselves whether they think the functional assignment is valid or not because the information is right there."
Historically, functional annotations have been based on similarities between sequences, he explained. For example, a researcher might say, 'Attribute function Y to gene X.' A few months later another researcher might identify a second gene that looks similar to X and as a result assume that they have the same function.
Furthermore, a third researcher might associate function Y with a different gene that looks similar to the second gene but not the first gene, he continued. "As you get further and further away ... you lose the original relationship [because] nobody kept track of the order or the sequence ... That has led to a propagation of error in the databases."
To address this problem, COMBREX's developers have begun working on a "gold standard" gene database that contains a set of manually curated genes whose products have been experimentally verified, NEB's Roberts told BioInform.
These proteins would serve as a benchmark for proteins whose functions aren't known and could help correct erroneous functional annotations showing up in resources like GenBank, he said.
The database, which is being developed in conjunction with researchers at UniProt and NCBI, is expected to help researchers distinguish genes whose functions have been experimentally determined from those with assumed functions or functions based on structural similarities.
Both NCBI and UniProt will soon have a downloadable version of the gold standard data set on their sites, but for now, this information can only be obtained from the COMBREX site.
Additionally, the group is working on developing algorithms that can identify novel or interesting genes and predict functions that can then be tested in experiments.
This "gene recommendation system" uses a set of algorithms to select candidates for validation by ranking them based on their scientific, clinical, medical, or industrial impact in an effort to help researchers spend their limited research dollars validating the most "interesting" targets, Simon Kasif, a professor of bioengineering, bioinformatics, and computer science at BU and a co-founder of COMBREX, told BioInform.
These algorithms "formalize criteria that biologists normally use ... to guide experimentalists on selecting which genes they want to validate and test for function," he explained. "Nothing like that was ever done before."
Gene rank also plays a role in how grants are allocated because "the importance of a gene family is [one] criteria in funding," Kasif said.
"We prefer experiments on large protein families" because "when they are done, they will propagate protein function to the largest number of other proteins that have been sequenced," he said. COMBREX also favors projects that look at genes found in multiple pathogenic organisms in order to explain their pathogenicity or antibiotic resistance.
This approach could also help researchers keep pace with genome sequencing activities, Steffen noted.
"You could argue that we are never going to catch up to genome sequencing. It's so fast, so cheap, so high-throughput that you can never get to the point where you are testing each gene for its precise function and so one of the goals is to use carefully selected experiments so that you can make better predictions in the hopes of keeping up better," he said.
For now, COMBREX still focuses on gene functions in prokaryotes but its developers are hoping to take on gene function prediction and validation in eukaryotes, which are "a little more complicated," Roberts said.
But that will only happen if the developers are able to arrange a stable source of funding, he said.
COMBREX is "not a regular R01 project" and it doesn’t fit the mold of the kinds of projects that traditional study sections take on, he said.
"I feel that unless there is some special funding mechanism that comes about to continue this project, it's probably going to die."