By Meredith W. Salisbury
It’s not every day that a bioinformatics center gets a brand new, $2.3 million, 1,600-plus-processor supercomputing cluster. So you can imagine the excitement that must have stirred at the Center of Excellence in Bioinformatics and Life Sciences at the State University of New York at Buffalo when it received that very machine — a new Dell cluster that was installed a few months ago.
And you can probably also imagine the sentiment when the cluster was plugged in and, instead of spewing out highly insightful biological data, immediately caused power outages at the university’s North Campus.
Last month, the new Dell cluster was running at 65 percent capacity because the compute center simply doesn’t have enough power to bring the rest of the cluster online, says Bruce Holm, director of the center of excellence. “We need to do another small electrical upgrade” to get the machine fully up and running, he says, expecting that the necessary hardware could be ordered and installed within “about a 30-day period.”
So what happened? This is the center’s third major Dell cluster, and Holm says a “good deal of front-end work” went into preparing for the arrival of this machine, which was considered a timed upgrade to the center’s resources. “We knew, for example, over a year or maybe two years ago that as things expanded we would have to plan for power supplies,” Holm says. The need for electricity — which costs the center more than $700,000 annually — was familiar territory for Holm’s high-performance computing team, and he says that the Dell technical specialists were helpful in planning ahead for the new power load.
The problem, says Holm, is that certain expected drops in power needs didn’t happen as planned. “Some of our other clusters that were thought to be coming offline … still have a substantial amount of usable life,” he says. The additional resources, and the fact that the new cluster wound up having “slightly greater computational capacity than we had originally planned,” left the computing folks at Buffalo in a squeeze.
A hefty part of the electric bill — close to $200,000 per year — goes toward air conditioning to cool the cluster. That’s why finding a new approach to cooling has been one of Holm’s short-list considerations to helping lower the power draw. “The technology to date is to cool the entire room,” says Holm, explaining that actual temperature patterns in the server room have led experts to believe that cooling just certain parts of the room would be sufficient to prevent heat damage to the cluster. “We spend a good deal of resources cooling space that doesn’t really need to be cooled in order to get adequate cooling to the spots that actually need to be cooled,” he adds.
Holm says he and his crew have discussed some options for alternative cooling technologies with companies like Delphi and Axion. “Frankly, there is a good deal to be gained by creating more innovative and efficient systems than the simple cool-the-room way it’s been done,” he says.
Shri Joshi, a chief engineer in Delphi’s thermal division, says the company has one product on the market that works to circulate air, rather than cooling it, in a way that would optimize air flow and temperature patterns to draw heat away from the worst spots — the back of the server rows. The product itself consumes about as much power as a light bulb but can significantly reduce the power needed to keep the cluster cool, he says. Another tool that’s still in development relies on Delphi’s experience with the automotive industry, essentially taking vehicle engine-cooling technology and applying it to computers. The result will be a component built into the servers that will be more effective at cooling than the usual fans.
Holm says he’s aiming to get the new cluster fully operational as soon as possible — which means that he’ll probably have to stick with the standard cooling technologies du jour. Going forward, though, he is very interested in decreasing the cluster’s power draw and says his center will be among the first to sign up as a beta tester when the more innovative cooling approaches come out.