Last week, IBM signed a one-year cooperative research and development agreement with Oak Ridge National Laboratory to collaborate on the Blue Gene research project, the petaflop-scale supercomputer IBM is building to tackle the formidable computational challenges of protein folding simulation.
But ORNL is only the latest of Big Blues partners in this effort. While the company is primarily relying on its own systems know-how to develop the next-generation cellular architecture behind Blue Gene, it has enlisted the aid of scientists from across the country for what many see as the more challenging side of the project the development of the scientific applications that will actually do the work.
One of the things we realized early on was that trying to use large-scale computation to improve our understanding of the mechanisms behind protein folding is such a challenging problem and has so many aspects that IBM is not capable of handling all of that ourselves, said Bob Germain, a manager in the computational biology center at IBM Research who heads up the science and application portion of the Blue Gene project. We need to make contact with people in the external community both to understand what people are doing in the field and to get ideas about how to most intelligently use this very large computational resource were developing.
In addition to ORNL, the project is collaborating closely with scientists from Columbia University, Stanford, the University of Pennsylvania, and the Swiss Federal Institute of Technology in Zurich. In March, the Blue Gene team hosted a workshop in a joint effort with the San Diego Supercomputer Center that brought together leading researchers in the field to discuss experimental approaches, computer simulation techniques, and developments in biophysical theory that relate to protein folding.
The project has also begun a seminar series at its Thomas J. Watson Research Center in Yorktown Heights, NY. Over 20 speakers have visited the center so far, Germain said, where they give a talk in the morning and then spend the rest of the day with the IBM researchers. We learn about important problems that we may address as part of the Blue Gene science program and we also get a chance to present what we think are the important issues that we hope to address and whats going on in our program, said Germain.
Lively debate is rampant in the field, whose practitioners are still sorting out the best approaches to determining and subsequently modeling the protein folding process. For example, Bernain said, One big question we have to address is what level of detail do we need to use in order to model some of these processes adequately? It may be that, depending on what we want to study, different levels of detail may be appropriate.
But one of the primary challenges resides in mapping the modeling applications to the novel cellular architecture behind Blue Gene, which will use far more processors than any parallel system currently running.
Without application software that can take advantage of this unique architecture, its not going to produce the science that is needed, said Thomas Zacharia, director of ORNLs computer science and mathematics division. So what were trying to do is to kick-start a parallel development in scalable systems software and applications software that can take advantage of this unique architecture.
Bruce Berne, a Columbia University chemist who is collaborating on the project, noted that no one has written software for such a massively parallel machine, So theres a big effort to parallelize it in the smartest way.
Germain said that the Blue Gene team has relied on feedback from its collaborators and seminar speakers to determine which algorithms to implement and how to prioritize their implementation.
At this stage in the project, however, with Blue Gene prototypes two to three years down the road, the researchers are looking for the best way to scale up their existing applications effectively. The team is using its existing SP2 and Linux clusters to perform experiments similar to what others in the simulation field are doing. This approach helps them get our feet wet in this field, according to Germain, and also provides an opportunity to exercise the various models that already exist.
Berne said one angle the researchers are taking is optimizing the algorithms they use to run more efficiently on single processors. You can improve speeds by factors of between 10 and 100 just by being smart, Berne said. Then you want to use those economies on the parallel code using many processors. If you can save a factor of 100 on each processor, then instead of doing a calculation that would take a year you can do it in three to four days.
One of the obvious benefits that collaborators will see by contributing to the project is early access to Blue Gene once it does come on line. Berne said the prospect of using the system in a few years is heartening. Although we can use large supercomputers around the country, you can never get a lot of processors on any one of them for long periods of time.
Zaccharia said that ORNL is also eying the system for possible purchase once the project is completed. One aspect of the labs involvement will be determining how well the cellular architecture that powers the protein folding work may be applied to the labs other research areas, such as nanoscale and materials science and climate dynamics.
Im sure that while Blue Gene has been defined as a one-off project, there are a class of applications that could benefit from the cellular architecture, said Zaccharia.