Researchers at Germany's National Research Center for Environment and Health (GSF) have released a web-based system called BIOREL that appears to be one of the first tools available for assessing the biological relevance of predicted gene and protein networks.
A number of algorithms have been published over the last several yeas for inferring biological networks from high-throughput data sets, "but the major drawback of these papers is that there is no common benchmark," Alexey Antonov, a BIOREL developer, told BioInform.
Antonov is the first author of two papers describing the resource that were recently published in Nucleic Acids Research [Nucleic Acids Res. 2006 Jan 10;34(1):e6] and FEBS Letters [FEBS Lett. 2006 Feb 6;580(3):844-8]. An online version of BIOREL is available at http://mips.gsf.de/proj/biorel/, and the software is also available for download from that site.
BIOREL assigns a "quantitative value" to the biological relevance of a given network by classifying the associations of each gene in the network as biologically relevant or not based on data from the MIPS database and several other sources of functional annotation. The proportion of genes in the network that are classified as relevant is used as the relevance score for the entire network.
"It's not surprising that different people are thinking about this. There are a lot of algorithms in the field that infer networks, and there are no benchmarks to evaluate them."
Antonov and his co-authors note that while several previously published methods rely on functional annotation from public databases to assess the relevance of gene sets from microarray experiments or other high-throughput techniques, BIOREL is the first method to apply this approach to a biological network structure.
In addition to identifying network bias related to biological characteristics, BIOREL can also pinpoint network bias from experimental errors or technical limitations, Antonov said. "High-throughput data can contain a lot of artifacts. For example, gene expression data can have a high degree of unspecific hybridizations," and BIOREL can be used to find "technical bias in the network that relates not to the biology but to a technical limitation of the procedure you used to infer the network," he said.
In the NAR paper, Antonov and his co-authors describe the application of BIOREL to a number of different data sets, including a benchmarking test on synthetic data, the analysis of protein-protein-interaction networks from yeast two-hybrid experiments, and a functional network derived from microarray gene-expression data.
BIOREL has arrived at an opportune moment. As BioInform reported last week, efforts are underway in the computational systems biology community to apply more rigorous assessment methods to network inference algorithms [BioInform 02-17-06].
"It's not surprising that different people are thinking about this," said Antonov. "There are a lot of algorithms in the field that infer networks, and there are no benchmarks to evaluate them."
However, he cautioned, BIOREL "definitely has some faults," and should be used with caution.
For one thing, Antonov noted, the method is limited by the amount of knowledge that is available in public databases. In addition, he said, "even if you have reliable information," the statistical procedure used to infer bias "relies on many factors like the size of the network and the connectivity of the network."
These factors mean that it's easier to get a higher bias -- and therefore a higher relevance score -- for larger networks. On the other hand, he said, "to infer reliable relationships between genes on a larger scale for a big network is more complex from a computational point of view."
An improved statistical method to handle this challenge is at the top of Antonov's list for future improvements to the method, he said, along with a better way of taking the "complicated nature of structural annotation" into account.
-- Bernadette Toner ([email protected])