A recent survey of 125 computational systems-biology researchers indicates that despite the presence of well-established standard formats like the Systems Biology Markup Language, most researchers feel that there is much work to be done in developing useful standards for the field.
The survey was conducted by researchers at Germany’s Max Planck Institute for Molecular Genetics and Ruhr University and published as a letter in the April issue of Nature Biotechnology with a full report available as a supplement online.
The study surveyed 125 researchers — 75 percent of whom where modelers, 4 percent were experimentalists, and 21 percent who consider themselves to be both.
It found that SBML is “widely recognized as a standard format for systems biology models.” Around 60 percent of respondents use SBML, compared to only around 4 percent who use CellML, and around 15 percent who said they use “other exchange formats.” Around 15 percent of respondents, however, noted that SBML is “missing functionality,” and 10 percent responded that “the learning curve of SBML is too steep.”
Respondents also appeared to be familiar with the MIRIAM (Minimum Information Requested in the Annotation of Models) standard, an offshoot of SBML that governs annotation of models that are submitted as part of a publication. Around 35 percent of respondents indicated that they were “aware” of MIRIAM, though there was no information in the survey results regarding usage of the format.
In addition, the survey found a strong demand for new standardization efforts in areas where formats are lacking, such as the graphical representation of biochemical networks.
Overall, 80 percent of the survey participants agreed with the statement that “standards are necessary” for systems biology research, largely in order to simplify the exchange of models across research groups and because it is still difficult to reproduce simulation results from published computational models.
The call for standardization was far from unanimous, however. The 20 percent of respondents who were not in favor of standards cited two main reasons, according to the report: “biology is considered to be too complex to be standardized,” and “obeying standards can be too time-consuming and restrictive.”
Edda Klipp, head of the computational systems biology group at the Max-Planck
Institute for Molecular Genetics and lead author on the survey paper, said that the project started within her own research group as an exercise in determining what standards were available and which tools were in use. Klipp designed a questionnaire that was first distributed among members of the European Yeast Systems Biology Network, and then more broadly throughout the systems biology research community via listservs.
One of the goals of the YSBN is to develop standards, Klipp said, “but we realized that before we start to standardize something, we should know what other people are doing.”
Another goal of the survey, she said, was to get a better idea of specific modeling tools that researchers are using. “If I have three colleagues working with Petri nets, then I could have the impression that Petri nets is the most common approach, but now from the survey we see what people are really doing, and most people are doing very classic deterministic modeling using [ordinary differential equations], and that was something that I could not predict,” she said.
Most researchers said they are studying either metabolism or signaling networks. ODEs are mostly used for modeling metabolism, cell signaling, and cell cycle research; stoichiometric models are used primarily for metabolism; partial differential equations are commonly used for structural analysis; and those studying genetic networks tend to use graphical models or discrete modeling methods such as Bayesian, Boolean, and Petri networks.
The survey also addressed systems biology software tools. In the general-purpose category, Mathworks’ Matlab package was the clear leader: Nearly half of the survey participants said they use Matlab “regularly” in their work. In terms of specialized computational systems biology packages, libSBML, CellDesigner, BioModels, XPPAUT, and Gepasi saw the most usage.
“For many respondents, the price and free availability of a tool is of utmost importance, followed by the requirement that it should be flexible and applicable to many different problem types,” Klipp and colleagues write. “The widespread use of Matlab indicates that flexibility is seemingly more important than free availability,” they add, although they note that “many respondents regard Matlab as free software, probably because many universities have campus licenses.”
An important — if not altogether surprising — finding of the survey is that while SBML is “a very important standard for exchanging models, it has its limitations,” Klipp said.
“SBML was clearly perceived as an upcoming standard,” the report states, though it also outlines several drawbacks, namely, the fact that SBML “is becoming more and more complex and therefore difficult to be implemented completely,” and the fact that interoperability between programs that are said to be SBML-compatible “is not satisfying and must be improved.”
This is not the first critique of SBML. A Nature Biotech paper published last year by Rui Alves of Spain’s University of Lleida and colleagues compared 12 kinetic modeling packages and found SBML compatibility to be “problematic in many packages.”
Michael Hucka, a senior research fellow at the California Institute of Technology and a lead SBML developer, told BioInform via e-mail this week that the SBML developers are “keenly aware” of the shortcomings mentioned in the survey report and noted that he and his colleagues are “working hard on this front to address specific areas.”
First, he said, the SBML Level 2 Version 3 specification will be released next month, which is expected to simplify “certain areas that were too complex” in the Version 2 specification. “We feel this will help software developers produce more compatible tools more easily,” he said.
At the same time, the SBML team will also release version 3.0 of libSBML, a software library that developers can use to implement support for SBML in their software “without having to start from a low-level XML parser,” Hucka said.
“The ranking of the simulators almost perfectly follows the quality and completeness of SBML support, which demonstrates the importance of standard formats.”
Also in the works is an SBML Test Suite “that will provide an objective way for software developers and users to assess how well a software system is interpreting SBML models,” Hucka said, noting that the suite should help improve the interoperability of SBML-based tools.
Hucka said that some of the report’s findings were less expected. “I think I'm most surprised by the prominence … of the notion of standardizing graphical network diagrams,” he said. He added that while he is involved in the Systems Biology Graphical Notation project, which aims to develop such a standard format, he didn’t realize that the lack of a standard in this area was on the radar of the broader systems biology community.
“I thought it was still early days,” he said. “Except for a relatively small number of those who are directly participating in this, I actually didn't think people in general really paid much attention to the issue. Apparently, enough other people desire this that it made an impact on the survey results.”
Hucka said that another “interesting result” is the fact that no single approach dominates modeling signaling networks. “Maybe it's because the modeling of signaling systems is so difficult that people are exploring all kinds of different approaches, and are unable to stick to a single paradigm such as ODE-based models,” he said.
“It suggests that we in the SBML community need to keep this result in mind,” he said. “All too often, people end up thinking that SBML models will be based on either a continuous ODE framework or a discrete stochastic simulation framework, but these results show a broader distribution. We need to make sure to make it easy.”
Nicolas Le Novère, a researcher at the European Bioinformatics Institute who is involved with SBML, MIRIAM, and the BioModels database, said that he was “pleased” by the results of the survey, which indicated that SBML has had a positive impact on the field.
Novère noted that the top three tools cited — libSBML, CellDesigner, and BioModels — were developed with SBML grants. In addition, he said, “the ranking of the simulators almost perfectly follows the quality and completeness of SBML support, which demonstrates the importance of standard formats.”
Indeed, one of the report’s recommendations is that modelers make their models available in a format that enables exchange and testing by others. “Considering the frequent use of SBML, this would be preferably SBML format,” the authors write.
Klipp stressed, however, that it is important for the field to remain flexible and not settle into a single standard too quickly. “In one sense I would completely agree that one cannot standardize everything,” she said. “You always need new ideas and you need the development of new concepts, and one should never restrict this in any way by demanding some kind of standard.”
Many models “don’t fit into SBML,” she said. “If you have a different type of model that has a completely different formalism, then of course you cannot apply SBML. But then they must make clear in the publication what the formalism is, how it has been calculated, how I can get the results that they show, and so on.”
The point of the survey and report was not to endorse any particular standard, Klipp noted, “but if we want to exchange models and if we want to use what other people have done, then there must be some kind of format,” she said. “These formats are always developing, but we can try to find the best solution for the time being.”