A year and three months into the Blue Gene protein folding project, IBM researchers said they have made some noteworthy progress in addressing several fundamental issues posed by the monumental effort.
The daunting scale of the supercomputer, which will link an unprecedented one million processors into a one-petaflop system, as well as the algorithmic challenges presented by protein-folding simulation, have led the IBM team to explore new territory in both the hardware and the software side of the equation.
The massively parallel architecture planned for the system, for example, can’t be thought of in terms of the cluster systems commonly used in bioinformatics, said Bill Pullyblank, who heads up systems architecture development on the project. “If I’m used to running on a 50-node cluster and then I go up to 100 nodes, in some sense if I scale it cleverly everything still works on the bigger problem,” he said. “When we go to a million, basically nothing works the same way.”
The problem of mapping an application onto a million communicating nodes is one of the first areas Pullyblank and his team have addressed.
Using the example of a protein structure of 32,000 atoms, Pullyblank said that even though that would involve a million pair-wise interactions, giving each processor one interaction to calculate would not be the best approach due to the excessive communication generated between individual processors.
“That one we know how to do,” Pullyblank said. His team breaks up the calculation so that each processor gets a block of interactions, rather than a single one, and then passes on a more reasonable, cumulative, amount of information.
“You split the work up so that everybody gets the same amount of work to do,” Pullyblank said, “and when they’re done, they can pass on a very brief aggregate of all the stuff they did to their neighbor, which is the stuff they can make use of.”
Pullyblank noted that the cellular architecture of the system, which includes logic, memory, and communication in each single-chip cell, differs from “embarrassingly parallel” distributed approaches to protein folding simulation, such as [email protected], which don’t require communication between each node.
IBM is not dismissing the work of [email protected] and other protein folding projects, however. Joe Jasinski, senior manager of the computational biology center in IBM’s research division, said that a key part of Blue Gene’s mission is to work with the external scientific community. The project is already cooperating with Vijay Pande, who runs the [email protected] project, as well as researchers from Columbia University, the University of Pennsylvania, Stanford University, and the University of Indiana.
“There are lots of different ways to approach these problems computationally,” said Jasinski. “There’s no one way that’s absolutely right to do any of this.”
The project intends to incorporate ideas from the wider scientific community to determine the best way to use its computational resources. With this goal in mind, IBM and the San Diego Supercomputer Center are jointly hosting a two-day protein folding workshop on the campus of the University of California at San Diego March 30-31.
There will be 12 invited talks at the workshop organized into three sessions that address protein folding through experimental approaches, computer simulation applications and techniques, and developments in biophysical theory.
“We’ve presented an open invitation to come to this workshop and discuss what the important problems to solve with this research really are,” said Jasinski.
Jasinski stressed that Blue Gene is aimed toward advancing the art of biomolecular simulation. “It’s certainly possible that when we announced Blue Gene in December of 1999 that people got the wrong idea, that we were just doing what from a scientific perspective might be viewed as a silly, show-off kind of thing. That’s not what this project is really about.”
Jasinski said that Blue Gene has a “different flavor” than the well-publicized Deep Blue chess-playing supercomputer to which it is often compared. While with Deep Blue, “it was pretty clear how you could define success or failure — you play the world chess champion and you win or you lose,”Jasinski said that in Blue Gene, “there’s no win or lose other than to do something that is scientifically valuable.”
Jasinski said that Blue Gene’s inclusion of key researchers from the protein folding and biological simulation community would be a key part of answering the scientific and technical questions surrounding the computational challenges of protein folding simulation.
Though no hardware has been developed so far, Jasinski said that over the course of the year, the software team would make certain pieces public, such as simulators for how the machine will work, so that teams interested in collaborating can provide feedback on how well specific research problems map onto the architecture.
IBM has no product plan for Blue Gene at this time, although Jasinski said the company is “confident that the things that we learn in building this will turn up in IBM products in the future.”
Both Jasinski and Pullyblank said the project is still on track to begin protein-folding computations in 2004.
“I’m sure looking to that day in 2004 when we throw the switch and it begins to fold proteins,” Pullyblank said. “I’ll take that next day off, I think.”