This article has been updated from a previous version to clarify Gunawardena's role at Harvard and to correct Mallavarapu's current affiliation.
More and more researchers are turning to mathematical modeling to describe complex biological systems, but a research team at Harvard Medical School says this explosion of activity is a futile exercise if researchers can't share the information contained within these quantitative models.
In response, the Virtual Cell program at Harvard's department of systems biology, led by Jeremy Gunawardena, is developing a new programming language called b (pronounced "little bee") that will enable biologists to build, exchange, and assemble cellular subsystems in a modular manner. The Harvard team is finalizing tutorials and other documentation for the language now, and plans to formally release it in September.
Gunawardena said that the language grew out of his team's Virtual Cell initiative, a project that aims to understand how cellular phenotypes emerge from molecular interactions. After studying a number of cellular models published in "very good journals," Gunawardena said that he was struck by how inaccessible these systems are for other biologists.
"The models themselves are basically not things that I think the referees of the papers have actually even looked at properly," Gunawardena said. "They're monolithic objects that the people who build the models have obviously used, but very few people outside that group ever get real access to them — not because they can't, but because the effort and work required to take someone else's model and to put it into a form that is usable by somebody else is really too high."
The upshot "is that people build their own models, and what we have is a community of people who build models, but they're not really a community," Gunawardena said. "I think if we persist in that way, building models is really going to remain a cottage industry carried out by a small group of people and not really something that is used by ordinary biologists in their everyday work."
Aneil Mallavarapu, a reserach scientist in the Virtual Cell program, tackled this challenge by developing b — a system based on the Lisp programming language that is able to "translate between the world of objects and the world of mathematics," according to Mallavarapu.
The language is targeted at three primary user communities: biologists, who can use it to describe biological objects; theorists, who can use it to formalize hypotheses for how these objects interact within a cell; and modelers, who can use it to construct a cellular system or subsystem given a set of theories and biological objects. According to Mallavarapu, b enables all three of these communities to represent and exchange biological knowledge in a format with which they are comfortable.
"In little b, you can formalize [the] notion of mass action kinetics, and then a user can say, 'Okay, I've described a certain reaction, now let me say that the kinetics for this reaction are mass action kinetics.' And without knowing any of the mathematics or having to update any of the different reactions or species that are actually present in the model, the language just uses the description that the theorist has already formally encoded to reason over the objects as they exist and to produce mathematics for you," Mallavarapu said.
Mallavarapu said that the language is able to infer mathematical relationships that must be entered manually in modeling software packages like Matlab and Mathematica.
Gunawardena said that b is an "outgrowth" of SBML, the Systems Biology Markup Language, but noted that the two efforts have different goals. "One of the fundamental assumptions behind the SBML approach has been to treat models as data," he said. "I think that's very different from what we found was necessary if you want to take modularity seriously, which is that models have to be regarded really as programs."
Mallavarapu noted that b's roots in Lisp set it apart from SBML and Cell-ML, which are XML-based data-exchange formats, "whereas b is a language in the sense that there is a computational interpretation of what you write."
Lisp is well-known in the artificial intelligence community, but has yet to find a strong user base in bioinformatics. That may change as computational systems biology continues on its current growth trajectory, according to Mallavarapu. "My feeling is that Lisp is going to become more and more important as we tackle problems in biology," he said.
Gunawardena agreed. "Languages like Perl and Python have been very successful at coping with the demands of the formalization needed for studying genomic sequences," he said, "but when you come to the problem of formalizing phenotype, or formalizing more complex notions of biological knowledge, the demands it places on the computational infrastructure are much more complex, and the kinds of languages that I think are going to be necessary for dealing with this are much more languages like Lisp."
B is for Building Block
Researchers can use b to create computable definitions of specific entities or relationships, and then construct larger systems using these definitions in a modular fashion. Gunawardena said that modularity and incremental development "are the key things that have really been behind our thinking of what we need to do this kind of systems biology."
Using the modular system, Gunawardena explained, someone interested in building a model of a particular subsystem, such as the EGF pathway, could use existing models of its components that were created by experts, such as the EGF receptor and its interactions, the scaffolding proteins downstream of the receptor, or transcription factors in the MAPK pathway.
"What you might do is to take parts of these models that somebody else has built for perhaps quite different reasons, and extract them and drop them into a virtual cell that you've created, and these different models would wire themselves up without you having to write complicated mathematics," he said.
One challenge in this approach, he admitted, is the degree of "relativity" that goes into describing biological systems. "The level of description that you use depends, to a great extent, on the questions you want answered," he said, so subsystems from different research teams may not always fit together seamlessly. "The best we can hope for is that in attempting to marry two models together, if it turns out that if there is an inconsistency, then the system will gracefully reflect that inconsistency back to the user and the user then has to decide which of the two representations of this protein is the right one, or perhaps add something to the system that will enable it to transform one representation into the other," he said.
In addition, the language is currently limited to nonlinear mass action ordinary differential equations, which Gunawardena described as representing "biochemistry in a test tube" rather than a living cell. "This is clearly not a very good approximation, but in terms of the data that we have at the moment, and in terms of the kind of experimental capabilities we have at the moment, this is pretty much where the field is," he said.
"What we haven't done — and I think this is what we are really looking forward to when we release the language — is to understand exactly how people will put this to use in everyday working life," Gunawardena added. "Our view is that we have to start by getting something out there that is usable, and then the language itself I think is going to evolve quite a lot over time as we understand the kinds of things that people want to do, and the kinds of facilities that need to be incorporated in the language."
— Bernadette Toner ([email protected])