Skip to main content
Premium Trial:

Request an Annual Quote

Is the GenBank Era Over? CSBi Director Foresees Computational Models Overtaking Databases


There’s little doubt that advances in computational biology are partially responsible for the recent upsurge in systems biology, but according to Peter Sorger, director of MIT’s Computational and Systems Biology Initiative (CSBi), systems biology may return the favor, as wider adoption of this paradigm may eventually alter the way biological data is represented, analyzed, and exchanged. In the future, Sorger said during a presentation at CSBi’s second annual Symposium on Systems Biology, “biological information will be presented in computational models, not in databases.”

Addressing an audience of 500 people who braved Arctic temperatures on Jan. 8-9 to attend the CSBi symposium on the MIT campus in Cambridge, Mass., Sorger noted that the “string-based” foundation of sequence-centric bioinformatics won’t hold up as more and more biologists begin studying pathways and networks. “It won’t be easy to use sequence structure for protein biochemistry or for cell and tissue dynamics data,” he said, describing a scenario in which computational models of biological systems are exchanged as freely as database entries are today. The issue, he said, is that “open access to models will be much more critical than [open access to] databases.” But the question is whether researchers will be as willing to share these models, which arguably would contain more valuable information about biological function than any of the individual datasets used to create them.

Other speakers at the meeting may not have been so bold as to predict the end of bioinformatics as we know it, but all agreed that computational models are at the core of systems biology research. The second day of the conference featured presentations from nine different systems biology research centers, and although their descriptions of systems biology varied widely, their PowerPoint graphics consistently depicted the discipline in one of two ways: an iterative circle of experimentation, measurement, and computational modeling; or the “four Ms” of measure, mine, model, and manipulate.

According to Lee Hood, founder of the Institute for Systems Biology, experimentation is a vital aspect of any systems biology program, but “ultimately you want to convert that into a mathematical system,” he said, in order to help explain the origins of biological properties, predict perturbation effects, and — eventually — design entirely new systems in silico. The big challenge in building such models, Hood said, is integrating information from the different levels of a system, such as DNA sequence data with mRNA data, protein sequence and structure, pathways, and networks. “Integration is where the rubber meets the road” in systems biology, Hood said. ISB has addressed this challenge with a number of informatics tools, including the Cytoscape software package (, which is able to “express biological elements in terms of their network context.” (See “Tackling Systems Biology...,” p. 4, for additional tools discussed at the meeting)

Broader adoption of systems biology isn’t likely to make GenBank and other biological databases obsolete, but it will change the way those resources are used, according to Hood. Soon, researchers will no longer just be accumulating large amounts of data and sifting through them haphazardly for correlations. “The data space is essentially infinite,” Hood noted. “You have to interrogate only those dimensions of the data space relevant to the biological system you’re studying.” In other words, although much more data will be required to fully flesh out current biological models, hypothesis-driven methods will become ever more important in order to direct the experiments necessary to selectively acquire more data.

Sorger agreed that biology is data-poor in “systematically acquired sets of data.” The barrier, he said, “is going to be crossed by creativity, not more CPUs …The goal is to usher in a systems biology approach without losing the small science” that drives discovery.

One sign that systems biology is indeed driving a transformation in the way biologists use computational tools is the fuzzy boundary that currently separates the disciplines of bioinformatics and computational biology. Among the research centers that presented at the CSBi conference, some drew a distinct line between the two fields, keeping all aspects of network modeling and simulation under the auspices of computational biology, and database management under bioinformatics. Others, however, drew no distinction between the two at all. Sorger explained that “computational biology is usually defined as everything after Blast,” but predicted that such hair-splitting will eventually become meaningless as bioinformatics foregoes its database management roots and gives way to formal modeling.

New Funding Models

Informatics won’t be the only aspect of biology altered by the advent of systems biology. Jim Cassatt, director of the National Institute of General Medical Sciences’ Division of Cell Biology and Biophysics, said that the rise of multidisciplinary, integrated research teams has forced NIH and other government agencies to rethink the way they fund research. “Old science isn’t dead,” he stressed, but added that new approaches “have affected the way we view grants.” Most systems biology research, he admitted, will require “something other than R01s.”

Cassatt said that NIH has traditionally “struggled with how to support collaborative science,” but the NIH roadmap announced in the fall of 2003 represents the agency’s attempt to “break down the barriers between the institutes” at NIH and encourage more interdisciplinary research. In the past, he acknowledged, the NIH peer-review process has not been conducive to interdisciplinary fields like bioinformatics. In response, NIH is in the process of reorganizing its institutional review boards, with new study sections being formed in areas like modeling and analysis of biological systems and biodata management and analysis. In addition, Cassatt said, NIH has instituted a two-step funding mechanism, labeled R21/R33, in order to encourage higher-risk, exploratory research. No preliminary results are required to apply for the first phase of the grant, which is funded at a much lower level and for a shorter term than the second phase.

Interdisciplinary research institutes are also tweaking their funding models to encourage innovative science. The Bio-X program at Stanford, for example, has launched an internal “seed grants” initiative called the Interdisciplinary Initiatives Program to fund research “not traditionally funded by the NIH and NSF,” according to Bio-X chair Matthew Scott. So far, he said, the program has awarded more than $6 million to 40 grants, and has brought in “much more than that” in follow-on federal funding.

Training Tomorrow’s Biologists

According to David Botstein, director of Princeton University’s Lewis-Sigler Institute for Integrative Genomics, systems biology calls for nothing less than an overhaul of undergraduate biology education. At Lewis-Sigler, he explained, a new curriculum is being introduced that will “stand beside” the traditional biology curriculum, but will “integrate a quantitative point of view” into the entire process. Freshman biology students will gain a working knowledge of computer programming and algorithm design, as well as hands-on experience with the tools of the modern biology lab. The recent parallel rise in accessibility of both “the genome and computers makes this a good time to rethink undergraduate biology education,” Botstein noted.

Hood said that ISB is also engaged in efforts to introduce systems biology-style thinking into the educational system, via a program that trains K-12 teachers in the Seattle region on methods of “inquiry-based science.” Hood said that ISB also plans to introduce a systems biology graduate program next year. MIT’s CSBi program and Harvard Medical School’s department of systems biology are also planning PhD programs, with CSBi’s expected to start next year.

Sorger speculated that there would be two key outcomes as systems biology matures as a discipline, and as computational models begin displacing massive databases as the core of biological data representation. First of all, he noted, “biologists in general will become much more interested, because the tendency of databases in high-throughput biology has been to present a lot of information, but none of it is actionable: You can’t plan an experiment based on that information.” Additionally, he noted, as the field’s computational challenges migrate from ho-hum database schema issues toward complex network modeling, computer scientists and other quantitative experts will be drawn to the field, willing to find new, creative approaches to the technical issues surrounding the exchange of computational models. “We simply don’t know how to do that right now,” Sorger said.

— BT

Filed under

The Scan

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.

Team Presents Cattle Genotype-Tissue Expression Atlas

Using RNA sequences representing thousands of cattle samples, researchers looked at relationships between cattle genotype and tissue expression in Nature Genetics.

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.