IBM has published its first protein simulation study using BlueGene/L — the first proof that its new supercomputer can meet the demands of computationally demanding life science research problems.
“This is a big achievement for us,” said Ajay Royyuru, who heads the Computational Biology Center at IBM Research. “In a sense, this is the first science output that we have in using BlueGene in an area that we had targeted all along for exactly this purpose.”
The study, published in the current Journal of the American Chemical Society, was a 118-nanosecond molecular dynamics simulation of a single rhodopsin molecule embedded in a bilayer containing cholesterol and two different lipid species. According to IBM, the study is unique both in terms of the timescale of the simulation, and the complexity of the system, which included more than 43,000 atoms.
The uniqueness of the work is a matter of debate, however. Rainer Böckmann of the Theoretical and Computational Membrane Biology Center for Bioinformatics at Germany’s Universität des Saarlandes noted that other recent projects have attained comparable levels of simulation time and complexity. For example, last year Böckmann and Helmut Grubmuller published a simulation study of the interaction of calcium ions with a phospholipid bilayer — a system of more than 21,000 atoms, for 200 nanoseconds — “on a single dual-Athlon workstation.”
In a more recent study, still in press, Böckmann and colleagues simulate a system of more than 85,000 atoms for 80 nanoseconds using a Beowulf cluster. In an e-mail to BioInform, Böckmann called IBM’s work “nice but nothing spectacular, even disappointing, taking into account the size or price of BlueGene.”
But even if the IBM study doesn’t break any new ground in the field of molecular dynamics, it does serve as the initial proof of concept for BlueGene’s applicability to biological simulation, and also marks the official debut of IBM’s Blue Matter molecular dynamics software, which was written specifically for the BlueGene architecture.
Frank Suits of IBM’s TJ Watson Research Center, a co-author on the paper, said that the team had previously published several molecular dynamics studies using Blue Matter on other IBM hardware, but this is the first time the software has run on the machine it was designed for — in this case a half-rack (512-node) version of BlueGene/L, capable of around 1.4 teraflops.
The benefits of the new hardware are obvious. The first simulation IBM published “only had 5,000 atoms in it, and the next system had about 12,000 atoms in it,” Suits said. “Only when we got the first BlueGene hardware could we really start working on a large complex system that really starts to represent something biologically interesting.”
As for the time scale, Suits admitted that “118 nanoseconds is very short in human terms.” However, he added, “in biological terms, for these systems it begins to be interesting at that point. You begin to have meaningful measurements for a simulation.”
Typical simulations currently run on the order of 20 nanoseconds, Suits said. “For our work, we ran 118 nanoseconds and the first 20 nanoseconds we basically just threw away.”
Suits said that the simulation provided valuable insight into the way cholesterol and lipids arrange themselves around rhodopsin. “It’s been known that cholesterol somehow stabilizes rhodopsin in the membrane, but it wasn’t known exactly how,” he said. “In our simulation, we see direct evidence that in fact cholesterol is sort of gathering some distance away from the rhodopsin, but what we can see is that it’s sort of filling in the space — rhodopsin has this kind of hourglass shape, and we can see cholesterol kind of packing in some distance away in a way that looks like it might be relaxing the membrane.
“Since we can just look at these molecules that we’re simulating, we can get these geometric views of what’s happening that are just unavailable experimentally,” he said.
Even with the power of BlueGene, the 118-ns simulation took several months to run, Suits said. The Watson team is already working on scaling up its simulation studies on a two-rack system, and is looking toward a four-rack study. There are still some questions, however, about the best way to put the larger machine to use.
With BlueGene, “you have the ability to run massively parallel jobs on thousands of nodes, or you can partition them somewhat into individual, still very powerful jobs on 512 nodes,” Suits said. Therefore, rather then run larger, longer, and more complex biological simulations, he said, it may be more effective to run multiple smaller simulations at once.
“It’s one thing to run just one [simulation] for a long time, but a lot of scientific insight can come from this huge power of being able to run a number of these systems very quickly all together,” he said. “It will give you much better statistics and even better confidence that what you’re seeing is real.”
“There’s some scalability that we see and we are exercising, but of course there are limits to scalability,” Royyuru said. “Beyond a certain point, you don’t have enough atoms to go around. You have more nodes than atoms, or a comparable number of nodes as atoms, and there isn’t enough work to be done on each node.”
Suits added that BlueGene’s hardware and the Blue Matter software were developed to enable “strong scalability,” which he described as “having a medium-sized problem and running it very well on a very parallel machine.” While admitting that “it’s very hard to get efficiency with only a few atoms on each node,” he said, “we hope we will be doing just that soon.”