The push to introduce biologists to bioinformatics has been years in the making. But for scientists who haven’t doubled up on programming skills, the modeling software useful for simulating networks from genomic, proteomic, and metabolomic data can be downright daunting.
Despite the intimidation factor, though, the tools are absolutely necessary to make sense of the wealth of data. “Even a relatively well-defined cell biological process is composed of so many elements and molecules and interactions of molecules that your brain can’t keep it all straight,” says Leslie Loew, the director of Virtual Cell and a cell biology professor at the University of Connecticut Health Center. In an attempt to smooth the transition from an experimental biologist to confident computational biologist, developers of simulation software have recently started touting “user-friendly” packages designed with the non-programmer in mind.
There are dozens of software packages a systems biologist can use to analyze, model, or simulate data. Some model biochemical pathways, others simulate networks of interactions, and others perform complex statistical analyses. Certain packages require that the user know a bit of programming, while others are based on the kinetic equations that are familiar to biochemists, and still others use a visual approach, drawing or dropping in pictures. The tools also vary in degree of technicality, but the technical level tends to be directly related to the software’s functionality. The fancier the functions, the more difficult the program’s underlying mathematics — and the more it overflows into the user’s lap.
“It’s tricky to make a scientific tool user-friendly and not sacrifice any of the powerful mathematics or simulation engine in the back end of it. There are a lot of tools that are user-friendly but can’t do strong simulation and strong mathematical analysis,” says Kristen Zannella, biotech and pharmaceutical industry manager for the MathWorks, which produces MatLab.
Once a user masters the program, runs the simulation, and writes the paper, the model or simulation has to be accessible so that others in the community can see and play with it. Though there are many databases and a variety of markup languages, including the systems biology-specific SBML, sharing is not always easy. “It is a work in progress,” says Srinivas Iyengar, a systems biologist who studies cellular signaling networks at Mount Sinai School of Medicine.
A Glimpse of the Field
Some tools excel at data analysis and statistical number-crunching. Others are good at creating models of that mound of data — models that can then be tested against future experiments. And even others can simulate complex interactions and networks, helping researchers visualize their work. Which of these a researcher chooses often depends on the research aims and the individual researcher’s experience.
“For us,” says Jason Haugh, an associate professor of chemical and biomolecular engineering at North Carolina State University, “the user-friendliness is not the overriding criterion. It’s really the functionality of the software. But with that said, obviously if it is more user-friendly then that’s better.”
Tools with a lot of functionality have crossed over into the life sciences from engineering and the physical sciences. Originally targeted at controls engineers, the MathWork’s MatLab has its own technical computing language that can be used to crunch statistics, develop algorithms, do 2-D or 3-D plotting, and perform matrix analysis. In recent years, MatLab has become increasingly used by systems biologists as the company has introduced add-on features more applicable to the field. The SimBiology and bioinformatics toolboxes include analysis and visualization for microarray, sequence, and mass spectrometry data.
A competitor of the MathWorks, Wolfram Research, started as a tool for mathematicians and physical scientists and is also venturing into the life sciences. In May, Wolfram released a new version of Mathematica that contains features that systems biologists may like, such as load-on-demand curated data on chemical elements and common compounds that can be called up from their server to the user’s desktop.
Some tools are more field-specific than MatLab or Mathematica. For biochemists, there is Pedro Mendes’ GEPASI, which he created in the early 1990s to simulate biochemical networks. Over the years, GEPASI evolved and was replaced by a new freely available program called COPASI. COPASI, which came about from Mendes’ collaboration with Ursula Kummer’s group at EML Research in Heidelberg, now includes visualization capabilities along with its steady-state analysis, time-course simulation, and metabolic control analysis abilities. In the future, Mendes, professor of computational systems biology at the University of Manchester, plans to add more non-linear analysis to the program.
Also in the 1990s, a team of researchers at the University of Connecticut Health Center created Virtual Cell. Housed completely on UCHC’s servers, Virtual Cell can be used online by researchers to graphically create metabolic simulations and signal transduction models and to look at electrophysiology in a model cell. “It’s really very broad,” says Leslie Loew, explaining that this freely available program can solve problems that deal with both time and space, such as diffusion problems and, in an upcoming version, stochastic problems as well.
Function vs. Ease of Use
But the more functional the program, the more technical it becomes to use. Many systems biologists do not have the programming or mathematics background for the most technical computing, modeling, and simulation programs. Even the more user-friendly programs may expect some basic knowledge from the user.
“This is research software, so you don’t expect it to be the ultimate in ease of use and completeness of documentation,” says Andrew McCulloch, a professor of bioengineering at the University of California, San Diego.
Both MatLab and Mathematica rely on the user having some familiarity with programming. Each of these programs uses its own markup language. In MatLab, the user does not have to be looking at a command prompt, but can choose to do so. “We do keep everything pretty open so it’s not a black box-feeling tool,” says the MathWorks’ Zannella. In the company’s SimBiology, the user points and clicks to drag and drop different features into the model or simulations, or types in the chemical reactions to build the networks or pathways. “Still, a lot of the mathematical things are going on in the background,” Zannella says.
GEPASI and COPASI, as well as Virtual Cell, take similar biochemical approaches to creating models as their user interfaces hide away the command prompt. To create models in these programs, the user might not need to know how to solve a differential equation, but does need to know biochemistry. To create a model and run a simulation, the user has to input rate expressions to describe how all the parts in the model interact. “[The program] tries to present the model in terms of biochemistry, in terms of reactions, in terms of time course, in terms of language that the biological scientist would be familiar with,” says Mendes.
In Virtual Cell, the user can import images to associate with the model. “We’ve tried to make it as straightforward as possible. The actual mathematical model is generated automatically and the simulations take that mathematical model to generate the corresponding code to actually solve a problem numerically, automatically,” says Loew at UCHC.
Even when the creators of these programs try to keep things simple, creating models and simulations is still technical work. Each of the programs offers help, from Web instructions to courses. MatLab posts demonstrations on its website; COPASI has online tutorials and gives demonstrations, tutorials, and workshops at conferences; and Virtual Cell offers a training course.
“We don’t want people to waste time on a ramp-up,” Zannella explains. “We really want people to be able to get to the science and the interesting part of this.”
Learning to Share
One of the challenges with these computational tools is in sharing the resulting models and simulations. “You don’t want to rebuild a model that’s somebody’s already built,” says Mount Sinai’s Iyengar. But with all the different types of programs available, compatibility and sharing quickly become hurdles.
Take your average researcher who e-mails a model to a colleague. That colleague attempts to view the model, but gets caught in the classic my-program-can’t-handle-that-format problem. To deal with this, a few markup languages have emerged to unify the field.
In April of 2000, the software platforms for systems biology forum decided the field needed an XML-based language to describe models. With the development of the Systems Biology Markup Language, or SBML, the hope was that all tools would become interoperable. This language is now supported by various packages, including GEPASI, COPASI, and Virtual Cell. It is also compatible with MatLab through a toolbox extension. “We think, absolutely, this type of software has to be interoperable, as much as possible,” says Mendes, who was a part of the effort to create SBML. “We very much like to work with the community and have a lot to gain from them and also want to contribute.”
Though SBML is becoming widespread, Iyengar points out that “there is no standard language.” In addition to SBML, there are other import and export formats that researchers may want to use, including CellML. With so many formats, most programs are compatible with a variety of them, ranging from Excel to MatLab to Adobe files.
SBML used to house a database but it has since been incorporated into the BioModels Database at EML. This database is curated and stores published, peer-reviewed biological models created with a variety of programs, including COPASI. Virtual Cell also keeps a repository of models built using that program on its website. Investigators may choose whether their model is made public, private, or only available to certain people.
Still Don’t Like Math?
Not all systems biologists are jumping on the modeling and simulation bandwagon. Some biologists might not yet be convinced of the need or value of computational work. Others may simply be held back by their training.
Models and simulation have predictive power especially in kinetics, NCSU’s Haugh says. “If you can tell me what the mechanism is, I can write a model for it,” he says. The inverse is also true. “If the mechanism is uncertain, I can write a model for all the mechanisms that I can dream up, then try to test each of those models.”
But there are researchers who realize the value of models and simulations and still do not use them. “There are a lot of biologists who haven’t been trained to think quantitatively,” says Loew.
If even more of the math is left to the program to do, perhaps more biologists will then take advantage of simulations. But there is a problem in the programs becoming too user-friendly, warns McCulloch at UCSD. “Turning these things into black boxes could be dangerous because you still have to understand the assumptions and some of the theory behind it,” he says.
As it stands now, it still looks like researchers need to learn some of the mathematics behind the modeling and simulations for maximum functionality of modeling tools. Or, Haugh says, biologists can always make friends with a physical scientist or engineer to get that simulation just right.