Skip to main content
Premium Trial:

Request an Annual Quote

Results of First Protein Function Prediction Challenge Reveal Field's Progress, Areas for Improvement


The results of the first Critical Assessment of Protein Function Annotation, or CAFA, experiment, which was designed to assess the performance of computational methods used to predict the functions of protein, have been published in this month's issue of Nature Methods.

In addition, 15 companion papers — some of which detail methods that were used in the CAFA challenge — have been published in a special issue of BMC Bioinformatics.

CAFA was organized by researchers from Miami University, Ohio; Indiana University; and the Buck Institute for Research on Aging. The effort is essentially a spinout of the Automated Function Prediction group, one of the special interest groups of the annual Intelligent Systems for Molecular Biology conference. It grew out of discussions within the group about the current state of algorithms used to predict protein functions that focused on "how well we are doing" and "how we can do better," according to Iddo Friedberg, one of the organizers and an assistant professor in the microbiology and computer science departments at Miami University.

The CAFA challenge began on Sept. 15, 2010, when the organizers provided participating predictors with more than 48,000 protein sequences from the Swiss-Prot database that lacked experimentally validated functional annotations. Participants then had four months — until Jan. 18, 2011 — to make predictions about the molecular function and biological processes of the sequences. Both of these are classification categories used by the Gene Ontology consortium.

A total of 30 research groups comprising 102 scientists and students participated in CAFA. Participants submitted a total of 54 function annotation methods for evaluation, according to the Nature Methods paper.

After the deadline had passed, the assessors allowed the experimental annotations for the challenge sequences to "accumulate over a period of 11 months prior to evaluating them." Friedberg, who is also a co-author on the Nature Methods paper, explained that they did this so they could have a benchmark set of protein sequences with which they could evaluate the computational methods.

By June 2011, about 600 proteins had sufficient experimental annotations for the assessors to run an initial analysis of the submitted methods, Friedberg said. They then did a second and more comprehensive evaluation later that year using 866 proteins from the initial set that had been experimentally annotated. The benchmark sequences are from 11 model organisms and microbes including human, Saccharomyces cerevisiae, Arabidopsis thaliana, and Escherichia coli.

The results of the evaluation based on the larger number of proteins are published in the Nature Methods paper along with the top ten methods that best predicted the molecular function and the top ten that best predicted the biological process of the proteins. Three methods scored highest for both categories: Jones-UCL, developed by David Jones at University College London; Argot2, developed by Stefano Toppo at the University of Padova; and Pannzer, developed by Liisa Holm at the University of Helsinki.

These methods, according the paper, did better in both GO categories than two baseline tools that were used for comparison: Blast, in which the score for a particular term was calculated as the maximum sequence identity between the target protein and any protein experimentally annotated with that term; and a naïve prediction method, called Naïve, which used the prior probability of each term in the database of experimentally annotated proteins as the prediction score for that term.

According to the paper, the researchers found several similarities in the functional prediction algorithms. For example, they found that "most methods used sequence alignments with an underlying hypothesis that sequence similarity is correlated with functional similarity" with more than half using data "such as types of evolutionary relationships, protein structure, protein-protein interactions, or gene expression data" to strengthen correlations between the protein pairs.

They also found that most methods applied machine learning-based principles, meaning that "they typically found combinations of sequence-based or other features that correlated with a specific function in a training set of experimentally annotated proteins," the researchers wrote.

The results also showed that "the community is doing better in predicting molecular function rather than biological process that a protein is involved in," Friedberg said. Indeed, the paper's results show that all the top methods and the baseline tools made more confident predictions about the molecular functions of the proteins across all 11 organisms than they did for the biological process.

"I think the reason is that most methods work by homology inference, [which] is pretty good for predicting the molecular function because you are inferring by orthologs," Friedberg explained. "The whole idea of homology is looking at sequence similarity, [meaning] that you would know that one sequence has the same biochemistry as the other protein sequence … if it's similar enough."

However, sequence similarity doesn’t "necessarily [indicate] in which biological process your homolog participates in," he said. "An ortholog in another organism or in the same organism might be involved in something completely different," so, for example, "you can have two paralogs [that] have completely different functions because they are in two different tissues and doing two different things in terms of participation in different biological processes but they still perform the same enzymatic function."

The organizers have begun preparing for a second challenge that should kick off this summer or in early fall, Friedberg said. They didn't do a challenge last year, he explained, to give the assessors time to evaluate the results, provide feedback to the community, and give researchers time to improve their software.

CAFA 2 will have a similar format to its predecessor, Friedberg said, but it will include an additional GO category; the cellular component, where participants will be expected to make predictions about the components of a cell. "We'll also be looking at human disease and proteins," most likely using data from the Online Mendelian Inheritance in Man database, he said.

Assessors for the first CAFA challenge include Amos Bairoch, a professor of bioinformatics in the University of Geneva's human protein sciences department and the founding developer of Swiss-Prot. Also involved were Predrag Radivojac, an assistant professor of computer science and informatics, and Sean Mooney, an associate professor at the Buck Institute, both of whom worked on software that was used to assess the methods.

In preparation for the CAFA 2, Friedberg said his lab is currently working on further developing the software with the intent of making it available to participants so that they can use it to check the performance of their methods.

The organizers have also tapped Anna Tramontano, a biochemistry professor in the Sapienza University of Rome, to help assess the methods submitted for CAFA 2, Friedberg said.

He said the organizers plan to provide more detailed information about CAFA 2 at several scientific meetings, including this year's ISMB meeting, which will held in Berlin in July.

CAFA joins a growing list of community-based experiments that aim to compare the effectiveness of computational tools. These include the Critical Assessment of Genome Interpretation, or CAGI, (BI 3/8/2013 and 11/12/2010); the Dialogue for Reverse Engineering Assessment and Methods, or DREAM, (BI 6/15/2012); and Boston Children's Hospital's Children's Leadership Award for the Reliable Interpretation and appropriate Transmission of Your genomic information, or CLARITY, (BI 11/16/2012). Other efforts include the Assemblathon (BI 12/10/2010) and Genome Assembly Gold-Standard Evaluations, or GAGE, (BI 1/4/2011)

Filed under