SEATTLE Network analysis is emerging from niche status in computational biology, and could soon jump to the forefront of the field, according to speakers at the Institute for Systems Biology's fourth international symposium held here last week.
This year's meeting, entitled, "Computational Challenges in Systems Biology," focused on what many see as the key enabling tool in merging systems analysis with biological research algorithmic methods for mapping the "digital information" encoded by the genome onto biological networks.
Lee Hood, president of ISB, kicked off the conference with an overview of the considerable computational hurdles that systems biology faces (see box, below), but singled out network analysis as the heart of the emerging field. To demonstrate the importance of a network-based approach to analyzing biological data, he used a slide comparing a road map of the United States with a map of national flight patterns. Just as the flight map is a better way to visualize which cities are the most important, network representations will help identify those biomolecules that act as subcellular "hubs," he said.
Bernhard Palsson, professor of engineering at the University of California, San Diego, described computational biology as moving from a "one-dimensional" focus on "component enumeration" the gathering of genes, proteins, and other molecules to the "two-dimensional" approach of network reconstruction. Naturally, this view of the field leads to three-, and even four-dimensional approaches, which Palsson described as reconstructing the complete cellular architecture and accounting for the dynamic nature of cellular processes, respectively. However, he warned, these latter goals are much further down the road than network reconstruction, which is only now coming into its own.
Palsson said that his lab, which has modeled the metabolic networks and gene regulatory systems for Escherichia coli, Saccharomyces cerevisiae, and other model organisms, is currently working on a map of human metabolism based on 1,500 open reading frames. The first build of the metabolic map is expected to be released in June, he said.
Mark Gerstein, associate professor of biomedical informatics at Yale, agreed with Palsson that the ultimate goal of computational biology is to model three-dimensional systems in time and space, but noted that "networks occupy the sweet spot in our understanding now," and act as an important intermediate step in putting the "parts list" from the genome into functional context. In addition, he said, the recent upsurge in network analysis in the fields of computer science, math, and even sociology, is certain to contribute to the increasingly interdisciplinary nature of systems biology.
Computational biologists are already maximizing the capabilities of the network reconstructions that are available to them. Trey Ideker of UCSD and Richard Karp of the University of California, Berkeley, each spoke about a project they worked on together to compare the protein interaction networks of Caenorhabditis elegans, Drosophila melanogaster, and S. cerevisiae. Ideker and his colleagues developed a network alignment tool called PathBlast to do the comparisons [BioInform 02-28-05], which revealed around 170 "complexes" of protein interactions that are strongly conserved across all three species.
Ideker described how the same methods that were used to compare the protein interactions in that study could also be used to compare a network to itself, in order to identify paralogous complexes within an organism, or even to compare different types of biological networks to each other. As an example, he discussed a study that is currently in press in which his team compared a protein interaction network for yeast with a genetic network (composed of so-called "synthetic lethal interactions" in which the deletion of a gene pair but not either gene alone causes the organism to die).
The comparison acted as a "filter" to narrow down the huge number of physical and genetic interactions to those that were the most biologically relevant, Ideker said, and enabled his team to predict novel genetic interactions based on shared protein interactions.
Mike Tyers of the University of Toronto also described a project comparing different types of biological networks for yeast in this case, a set of experimental interactions (protein-protein interactions and synthetic lethal interactions) with a curated set of interactions derived from more than 50,000 journal abstracts.
The comparison yielded some interesting findings. For example, Tyers said, the network derived from the experimental data suggested a "modular" structure, with a few highly connected proteins that don't interact with each other, while the network derived from the literature indicated a more uniform distribution of interactions across all the proteins. One explanation for this is that "we still have a long way to go in generating high-quality [high-throughput] data sets," Tyers said. "We haven't changed the way we work [experimentally] in the last few years."
Lee Hood's Top Computational
Challenges in Systems Biology
|Systems biology is stretching the boundaries of computational science and, apparently, the standard top-10 list. Lee Hood had to crank his up to 11 in an effort to describe the primary computational hurdles facing the field:|
|1. How to fully decipher the digital information content of the genome.|
|2. How to do all-vs.-all comparisons of thousands of genomes.|
|3. How to extract protein and gene regulatory networks from Nos. 1 and 2.|
|4. How to integrate multiple high-throughput data types dependably.|
|5. How to visualize and explore large-scale, multi-dimensional data.|
|6. How to convert static network maps into dynamic mathematical models.|
|7. How to predict protein function ab initio.|
|8. How to identify signatures for cellular states (e.g, healthy vs. diseased).|
|9. How to build hierarchical models across multiple scales of time and space.|
|10. How to reduce complex multi-dimensional models to underlying principles.|
|11. Test searching to bring the literature and experimental data together.|
While most of the speakers at the symposium agreed that dynamic modeling is a long-term goal, Herbert Sauro of the Keck Graduate Institute discussed a project that is giving it a shot. The challenges for dynamic modeling are considerable, he said, largely due to the lack of available data. Researchers can build models that fit their experimental data perfectly, but there's no way of accounting for regulatory interactions that are missing. In a network-based system, he said, "the presence or absence of just one link can have a huge impact."
Sauro's approach is to "falsify" available models rather than to validate them, through an iterative process of experiment and computation that uses Monte Carlo simulations to determine the range of parameters that satisfy a given set of experimental data.
Gerstein's group at Yale has also tried to account for cellular dynamics in network analysis. He described a project that examined the transcription network in yeast using gene expression data for five cellular conditions. It turns out, Gerstein said, that some genes are regulatory "hubs" for all cellular states, while others are "transient hubs" only playing an important role in some processes. The "hubbiness" of certain genes was demonstrated to change in phase throughout the cell cycle, for example, proving that the network is constantly being "rewired" to account for different conditions.
Studies like this serve as further proof that reconstructed biological networks are only an intermediate step in computational systems biology, since they don't account for all the subtle changes that take place over the course of time in a cell.
Richard Karp conceded that the use of interaction networks to study systems behavior is limited, because they capture only a "snapshot" of static and temporary connections. However, he noted, computational systems biology is still constrained by the type of data that is available to analyze. In order to model and analyze dynamic systems, he said, "We'll have to await the generation of data sets with more dynamic content."