Canada’s CyberCell project is only one of a handful of initiatives around the world that involves modeling the cellular processes of Escherichia coli in silico, but it will soon have a 360-teraflop advantage over its peers, in the form of access to the Blue Gene/L supercomputer under development at IBM.
In late May, IBM granted the virtual cell project an eServer pSeries 690 with more than a terabyte of storage capacity under its Shared University Research program. But that is just the first step in a budding relationship between the project and Big Blue, according to Michael Ellison, executive director of the Institute for Biomolecular Design at the University of Alberta, which hosts the project. In addition to the equipment award, valued at more than $1 million, IBM has agreed to cover 60 percent of the costs of “taking the project to the next level,” Ellison said — extending the simulation from a single server to “multiple servers with the appropriate kinds of configurational links that mimic the shared memory format of Blue Gene.” This phase of the project, which will expand the platform to three computing nodes, each equipped with 16 1.7 GHz processors, should run about $2.7 million, Ellison said.
Ellison said the CyberCell team has been working with IBM Research for around six months to determine the requirements for moving the virtual cell onto the 65,536-node Blue Gene/L — a process that will likely begin within the next two to four months. The 350-teraflop supercomputer, planned as a first step toward a one-petaflop machine, is slated for release some time in the next year.
The new eServer is a boon for the project, but access to Blue Gene/L will really kick CyberCell into hyperdrive. Prior to IBM’s involvement, Ellison said the CyberCell team was using three SGI machines at the University of Alberta’s computational facility, “and we calculated that to carry out 100 minutes of simulating the life of E. coli, which is about two division times, it would take us 10 million years.” By speeding up the code a bit, the new eServer would run the same simulation in 1,500 years — a 6,000-fold improvement, but still not practical. Transporting the simulation to Blue Gene will reduce the process to about a year, he said, adding optimistically: “By the time you invoke Moore’s law, and without invoking any kind of radically new computational technology, you can make a calculation that shows that by about 2012 you’ll be able to do a simulation like that in about three days.”
Ellison explained that CyberCell requires more computational power than similar initiatives, such as Japan’s E-Cell and the EMC2 project at Purdue University, because it is taking a very different approach to modeling E. coli. Other simulations tend to be “mathematical in nature,” Ellison said, “so that processes in cells are modeled by equations that assume there is a continuous behavior…things undergo smooth transitions and you don’t pay a lot of attention to what is happening in space because you make the assumption that they are distributed uniformly throughout the space of the cell.”
CyberCell, by contrast, is adding the dimension of space to the simulation process. “We track every biomolecule in space and time — its reactivity, its position, its size,” Ellison said. Even for a simple cell like E. coli, about 200 million biomolecules have to be modeled in four dimensions, he said — an enormous number-crunching task.
Although CyberCell has secured the computational power it needs to get going, it still faces an additional hurdle before things can really get rolling, Ellison said. “The major challenge that remains is collecting the kinds of data that you need with enough rigor and enough reliability that they can be used to drive and validate these simulations,” he said. “No single group is going to be able to collect this kind of data,” he added — a realization that sparked the creation of the International E. Coli Alliance last summer [BioInform 08-12-02].
The consortium members have had meetings in England and Germany so far, and will soon meet in Japan, where they plan to grapple with several issues, including “who’s going to do what, how are we going to put into place aspects of quality control and consistency so that an experiment that’s done in Japan can be reproduced in Canada, and basically how do we set the foundation so that we can collect the kinds of data we need to be able to drive these simulations?”
The consortium is taking a collaborative approach to data sharing, but when it comes to the simulation itself, “we’ve more or less reached the conclusion to agree to disagree,” Ellison said. Calling the arrangement “gentlemanly competitive,” Ellison said that a healthy dose of rivalry will be a good thing for the field. “I don’t think one can say right now what kind of simulation is going to prevail at the end of the day, or ten years down the road,” he said. “Through meetings where we actually get to test different kinds of simulations on exactly the same kind of data, we get to find out which ones work and which ones don’t work.”