To better enable synthetic biologists to maximize the potential of their wet lab experiments, bioinformatics developers have been hard at work adding to the synthetic biology toolbox.
One of the most widely used tools for synthetic biology — a computational cell biology program that encompasses both synthetic and systems biology research — is called Virtual Cell, or VCell, a software-modeling environment. With VCell, researchers can design complex multi-layered models with a Java Web-based graphical user interface. VCell was developed and is maintained by a team at the National Resource for Cell Analysis and Modeling at the University of Connecticut Health Center. At last count, VCell has about 3,500 active users worldwide, and 20,000 registered users. And VCell continues to evolve as its developers add more features and functions for more accurate cell design and simulation.
"Last year, we added new capabilities for searching pathway databases. We've also added some brand-new simulation capabilities, including the ability to simulate Brownian dynamics, stochastic processes, and arbitrary cellular geometry data, which we get from microscope images," says UConn's Leslie Loew. "In the future, we're planning to increase the efficiency of our partial differential equation solvers and incorporate moving boundaries, so the cell no longer has to be static during a simulation but can actually change shape."
The types of calculations that VCell performs can often be done using a standard desktop computer. However, more intensive simulations such as those involving the complete geometry of the cell — where the molecules are actually moving in space — will require high-performance computing, Loew adds.
Developers of synthetic biology software are also bridging the gap between computational modeling and biological data with a computer-aided design program that is similar to the ones architects use to design buildings. One such tool that has attracted a lot of interest in the synthetic biology community is TinkerCell. This open-source application is essentially a visual modeling tool that supports a hierarchy of biological components or parts. TinkerCell also hosts other third-party C and Python programs or algorithms that users can employ to analyze their models.
"The idea with TinkerCell is that it is like a drawing program that integrates a lot of ideas in synthetic biology and that is able to integrate the modeling side and the experimental side in biology," says Deepak Chandran, a graduate student at the University of Washington in Seattle. "When a researcher creates a mathematical model, they usually have a conceptual idea of the biology in their head. TinkerCell captures that conceptual diagram from which it can automatically generate mathematical models or link up the conceptual diagram to an experimental result."
Chandran, who developed TinkerCell as part of his graduate work in Herbert Sauro's lab at UW, created the application to be extensible so that users can develop plugins and change the interface to suit their needs. "I made it that way because synthetic biology is a rapidly changing field, so I wanted the software to be very flexible," he says. "There's a feature in TinkerCell that allows users to add their code. That way people can write different algorithms that perform different types of analysis and then share it with each other. That's why the basic model itself is neither a mathematical nor an experimental model; it's a combination of both, so that people can add functions that are either for modeling or relate to wet lab or database access."
Bioinformatics is also playing a role in addressing the concerns raised over the years by many groups opposing synthetic biology for security reasons. Jean Peccoud, associate professor at the Virginia Bioinformatics Group, has released a software program called GenoGuard that aims to be to synthetic biology what a computer antivirus is to a PC. GenoGuard is a monitoring tool, designed to prevent acts of bioterrorism using synthetic DNA by scanning databases of sequences to detect the presence of sequences of concern as they are defined in US guidelines on synthetic genomics.
"People operating registries of synthetic biology sequences, such as gene synthesis companies or the Registry of Standard Biological Parts, can make GenoGuard point to their database," Peccoud says. "They would get notified if people upload sequences of concern, and it would allow them to talk to the person who submitted the sequence. The whole idea is to limit the risk that the national synthetic biology infrastructure can be used by rogue individuals to develop illegitimate applications."
Currently, some synthetic biologists see the possibility of their work leading to a biosecurity incident as a remote one. But as the field of synthetic biology grows, it might become necessary to take security more seriously. "Just like in the early days of personal computing, very few people cared about computer security, but today any device is password protected, communications are encrypted," Peccoud says. "While GenoGuard is probably still a little bit ahead of its time, I think that in a few years, the community will become more aware of biosecurity issues and will need biosecurity solutions. The pressure will come from the need to mitigate the risk of public acceptance challenges, changes in the regulatory framework, or corporate policies aiming at minimizing liability exposure."
According to Peccoud, the biggest computational challenge for synthetic biology is defining the parameters of models. In the community, there is a general assumption that it will be possible to estimate the functional parameters associated with genetic parts through thorough parts characterization efforts like the Biofab Project. This may not be the best approach, however. "Many in the community have embraced the idea of data sheets for genetic parts, which assumes that parts' function is fairly context-independent," Peccoud says. "Unfortunately, there are very few hard results to show that this assumption is correct."
He points to the work of MIT synthetic biologist Chris Voigt, who developed a biophysical model that integrates sequences from the ribosome binding sites and also the 5' region of the coding sequence. Voigt's work demonstrated that context dependencies might be the rule rather than the exception.
"This leads to a wealth of computational complication, so we need powerful experimental designs to systematically explore possible context dependencies," Peccoud says. "We also need to figure out what parameters can be estimated based on the power of our characterization protocols, and what parameters cannot be determined."
All that noise
Another factor that greatly complicates all computational problems in synthetic biology is the noise affecting the dynamics of molecular networks. Stochastic models are inherently more computationally demanding than deterministic models, and they are also much more complex at the theoretical level. Even something as simple as detecting stochastic oscillations is not trivial and, as of yet, there is no theory for bifurcation analysis on stochastic models.
While most synthetic biology simulation and analysis programs can be operated on a desktop computer or single workstation, there will likely be a time in the near future when high-performance computing will become necessary. However, there are some tools that make use of distributed computing like the Ribosome Binding Site Calculator, which is used to analyze synthetic ribosome binding site sequences and provides results with a list of start codons in the mRNA transcript together with predicted translation initiation rates. RBS was designed by Penn State University's Howard Salis, and uses the TeraGrid for its computational power.
"We are exploring the idea of HPC hardware dedicated to hosting synbio applications and customized to the computing workflow of the application they host, which is an approach consistent with other bioinformatics trends," Peccoud says. "As we better understand the structure of the computing workflow, we can customize the computing hardware by combining different computing processors, such as GPUs or FPGAs, to maximize the horsepower without blowing up the cost of the computing infrastructure."