Brookhaven National Laboratory isn’t considered a hot spot for computational biology, but a few of the lab’s researchers are looking to change that. Recently, some Brookhaven scientists took a close look at a supercomputer that was originally developed for high-energy physics applications to determine whether it would be applicable to protein structure simulations. According to their analysis (in press in Intl. J. Supercomputing Applications), the massively parallel computer called QCDOC (quantum chromodynamics on a chip) is actually well suited to the molecular dynamics equations required for protein structure prediction, and could reduce the time required for some simulations from 20 years to a week.
QCDOC was designed for quantum chromodynamics — a theory to describe the strong interactions between elementary particles — by a team of computer scientists and physicists at BNL, Columbia University, IBM, and Japan’s Institute of Physical and Chemical Research (RIKEN). The 10,000 processors in the system each contain 4 MB of memory and 24 parallel communication channels — an architecture that happens to be very similar to that of the Blue Gene supercomputer that IBM is building for protein folding and other scientific applications. Last week, BioInform caught up with James Davenport, a physicist at BNL and associate director of the lab’s Center for Data-Intensive Computing, to find out how QCDOC compares to its more famous protein-folding relative, and what this project means for computational biology at BNL.
What made you decide to use this supercomputer designed for high-energy physics for protein folding?
First of all, we have not folded any proteins yet. QCDOC was designed by a group at Columbia University working with people from IBM, and it turns out that the architecture is quite similar to Blue Gene.
It was originally thought to be a very special-purpose machine — really only suitable for high-energy physics, for quantum chromodynamics. It uses IBM’s ASICs [application-specific integrated circuits], where they burn on the same chip the processor, 4 MB of memory, and communication controllers. Now, it turns out that this shouldn’t have been thought of as being only good for quantum chromodynamics, because in fact, it is a similar architecture to Blue Gene, and there had been analyses at IBM on how Blue Gene would perform on protein-folding type problems. And we actually redid those analyses — we did them independently, but we did a similar paper study of how fast this machine should work on a typical protein, and we concluded that you could do 100,000 atoms for something like 10 microseconds of simulated time in a week or two of computing.
[QCDOC] is a 10,000-processor machine, whereas Blue Gene is a 131,000-processor machine, so it is smaller, but it can still do a lot of work.
When did you begin to realize that the QCDOC architecture might have applications outside of physics?
The machine itself, like Blue Gene, uses an IBM PowerPC core burned onto this chip, which is a general-purpose machine — it’s what’s in a Mac. So we started thinking, ‘Well, gee, you can program it just like any regular machine, so it ought to be good for other things.’
The thing is that it uses what’s called a mesh architecture — or a torus, if you want — it’s a periodic mesh. So in other words, the processors have a fast communication link to their neighbors in the mesh, but there’s no switch. So if you want to send a message from processor A to processor B, if they’re not neighbors to each other, it has to do a multi-hop. And it was that feature that led people to overlook the fact that this could be good for protein problems, because with proteins you generally have long-range interactions — because of the long-range Coulomb interactions — and you really need to communicate the positions of all the particles to all the processors.
So people just hadn’t thought about it. It occurred to us that it should be good because of the IBM PowerPC core, and then we’re left with, ‘Well, is it really true that it can’t communicate rapidly with distant processors?’ And it turned out, when you analyzed that in detail, that it could. And the reason is that it has an extremely fast network. The communication links to its neighbors are very, very fast — 500 Mbps — and furthermore, it’s wired in a six-dimensional mesh. It turns out that in six dimensions you have 12 neighbors, so it can communicate with 12 neighbors, two-way simultaneous communication — that’s the equivalent of having 24 wires going in and out of each chip at 500 Mbps. So that’s just enough communication that computation is more, and it’s a general feature of parallel computers that if the computation time is more than the communication time, then the communication time is irrelevant. And we did some analyses that showed that for this problem — for molecular dynamics — that that is true.
Is QCDOC already in place?
It is currently being built right now. They have a version with 64 nodes at Columbia that is just being assembled now, and they have done tests on it that convinced them that on the 64-node level everything is working fine. They expect to receive enough chips to get up the 10,000-node level, and that will be assembled over the next several months and shipped here to Brookhaven. And we’re hoping to buy an additional 10,000-node version that can be used for other things.
The first machine is actually going to be in what’s called the Riken BNL Research Center, which is funded by Riken. They are funding serial number one of this, and we are hoping that the lab, through the Department of Energy, will be able to buy serial number two, and then there’s another one being built that will go to Edinburgh [UK]. There’s a lattice quantum chromodynamic group in Edinburgh that’s buying one.
Will the machine at BNL be dedicated to protein folding?
No. The first one, serial number one, will be dedicated to high-energy physics, and they expect to make 10 percent of the time available to other things, of which we think proteins will be a big part of it.
How much computational biology research is currently underway at the lab?
Not a lot. There’s a strong structural biology group related to the synchrotron light source — it was recently said that approximately half of the structures in the PDB are from our light source — but we have relatively little computational biology, so we hope to get this started, and if the machine comes, we would hope to expand in that area.
Are you planning to run actual protein-folding simulations on the computer, or will it be mainly protein structure prediction?
Some of each. People generally believe that the folding problem is still too hard. [QCDOC] would be around 100 to 1,000 times the computing power of typical computers, but that’s still not quite enough to do interesting folding problems. So we expect that we will do some structure determinations, some docking, and we hope to do some structural rearrangements — like as a result of temperature changes, or the addition of drugs attached to a protein, those kinds of things. To actually watch a structure fold from scratch is still difficult, but we expect we might be able to do some of them.
You mentioned some of the similarities between QCDOC and Blue Gene, but what are some differences?
Blue Gene also uses an IBM PowerPC core. On each chip, they have two processors — not just one. They run at a faster clock speed — they’re predicting 700 MHz, whereas QCDOC is 500 MHz. That’s an interesting point because you can buy an Intel Pentium at 3 GHz, but the point is that you don’t get the full advantage of all those cycles, so we think you get a higher percentage of the cycles on the slower machines, and also, the slower machines use much less power. You couldn’t have 10,000 Intel processors in a room — the thing would melt. Also, they use a three-dimensional interconnect system rather than six-dimensional, so they have a little bit less fast network connectivity, but they have built a completely separate network to do global data exchanges.
They’re both very similar, and it’s almost a research issue to find out which would be better, or how could they do a next generation that would be better. We basically think that they will be comparable. It’s also true that the first Blue Gene that was sold to Lawrence Livermore National Lab is a 360-teraflop machine, whereas our QCDOC is 10 teraflops. And they get that because they have 131,000 processors instead of our 10,000 processors. So it’s a much bigger operation, but other than that, it’s very similar in concept.
How well do systems like this scale as you add processors?
If you take any fixed-size problem, and you increase the number of processors, there will come a time when it just stops scaling, and it’s not known for sure — no one has really assembled 131,000 processors before, so its not clear where it’s going to stop scaling. It will probably stop scaling in the 10,000 to 20,000 processor range, in which case the plan would be to run a number of different problems simultaneously. On the other hand, that’s just one problem. When researchers start thinking about it, I’m sure they’ll come up with algorithms that will scale much better, and they’ll apply them to different systems. For example, in materials science, where you can frequently model systems without long-range forces, it probably would scale larger, but we really don’t know.
How will you validate your predictions that QCDOC will actually work for protein-related computation?
You start doing some things on known structures and see if you get the right answer. The other thing is that taking a structure completely from scratch, the only cases where that’s been really successful are really small systems. They do a lot of homology modeling and try to guess the structure and see if it makes sense, so I think we’re just going to push the envelope back to the point where we would be able to do larger systems, or would be able to do smaller systems without resorting to homology.
Have you come across any particular computational challenges in biology that seem unique compared to physics or other scientific disciplines?
I guess the complexity, the visualization. And informatics plays a much bigger role. You need to bring to bear a lot more information. Many groups do protein structure determination with x-ray crystallography, but that’s a frozen picture — it’s not easy to see how a protein changes either in time or in response to pH or something, and the computer offers that possibility. The other thing we always cite is membrane proteins, because many of them cannot be crystallized, so we think the computer has a huge potential benefit here.
The other thing, from the computational side, is that it has traditionally been a problem that computer manufacturers are more focused on the commercial market, and they don’t necessarily design machines that are good for computational science. Many scientists are concerned that there will be a divergence in the future between the commercial machines and the scientific machines. The IBMs will focus on the commercial, for example, and there won’t be anybody focusing on the scientific side. Cray nearly went out of business.
In the life sciences, at least, it seems like everybody is using Linux clusters.
Linux clusters are great. We have a 125-node, 250 processor, Linux cluster, and for most research groups they are wonderful. At relatively low cost, they can give you very easily 10 or 20 times the power of one machine. You have to buy 40 machines to get the power of 20 because of the loss due to communication, but the fact is that 20 times a single processor is a very big leg up.