Researchers at the Department of Energy's Pacific Northwest National Laboratory are developing a framework that they hope will improve computational methods for reconstructing regulatory networks. A prototype of the resource, called the Network Inference Testbed, is expected to be available some time in the late fall, according to Ron Taylor, a researcher in PNNL's Biomolecular Systems Initiative who is leading the effort.
The project is a primary goal of the DOE's Genomes to Life initiative, Taylor said, which is to "uncover the structure of regulatory networks that microorganisms use to respond to their environments." A better understanding of these networks is a step toward modifying such organisms to aid in environmental cleanup or other applications of interest to the DOE, Taylor said.
Algorithms for generating biological networks from high-throughput gene expression and proteomic data are proliferating, but Taylor said that there is currently no reliable way for developers of these methods — or the biologists who use them — to evaluate their performance.
"It's very tough to compare the algorithms that are out there and make a determination of which one is actually working better, because they're not being tested on common data sets," Taylor said. This absence of a "gold standard" for biological network data, he said, is due to the fact that experimentally derived regulatory networks — even those for relatively well-known organisms — are incomplete, "and you're not absolutely certain that you've uncovered all the nodes."
Algorithm developers currently rely on manual literature searches to evaluate the results of their network-reconstruction methods — a time-consuming process that may still not uncover the complete biological network. Experimental verification is also incomplete, as well as costly, Taylor said.
PNNL's response is to create a series of "artificial topologies" — synthetic networks complete with "noise" to account for real-world experimental error — that can be used to train network-inference algorithms and also serve as a baseline to determine how well they perform. The results of this kind of comparison should help guide developers in improving their methods by pinpointing exactly where they failed and where they succeeded, according to Taylor.
Taylor said that the testbed will eventually offer a collection of artificial topologies and algorithms, so that users can plug in their own data, select an algorithm, and receive as output a network with a "confidence factor" associated with each edge. Software developers will also be able to use the resource to design their own artificial topologies with specific features of interest in order to test their algorithms.
Taylor said that his group is working with several developers of network-inference algorithms at MIT, Princeton, and Boston University to gather ideas about features that the framework should include.
Eventually, he said, the resource could serve as part of a larger, high-throughput informatics pipeline at PNNL that would analyze genomic, proteomic, and metabolomic data within the context of biological networks. The number of sequenced microorganisms is expected to jump from the thousands to the millions in the next several years, Taylor noted, and as research centers like PNNL continue to adopt more and more high-throughput instrumentation, "we will need an automated [data analysis] pipeline that goes beyond clustering." The PNNL team is also exploring the possibility of parallelizing several network-inference algorithms to run on the lab's 2,000-node cluster.
As a longer-term goal, Taylor said that the structures of the inferred, static networks would serve as input for dynamic modeling methods that would simulate sub-cellular processes.
For now, however, the testbed is still in a "very early stage," Taylor said. The prototype slated for release in the fall will include a database, "one or two" network-inference algorithms, the ability to create a "basic" artificial topology and perturbation data set, and visualization via the Cytoscape software package.
So far, the testbed project has been internally funded by PNNL, but "we hope to put together a grant proposal for external funding" to accelerate the effort, Taylor said.
The PNNL team is currently preparing an applications note on the testbed and is considering a paper based on a "rigorous comparison" that would demonstrate the usefulness of the testbed as a tool for algorithm development, Taylor said. After that, the PNNL researchers will begin using the testbed to analyze microarray data generated at the lab.
"We're working towards the future when a lot of data is going to be generated, and the biologists will want to know what the regulatory networks are," he said.
"Here's a platform where you can input your array data, maybe some background information, and easily invoke algorithms that will give you an inferred network as a starting point," he said. "It might not be a completely correct network, but it will be a starting point."
— Bernadette Toner ([email protected])