By Meredith W. Salisbury
It’s not every day that someone decides a 4,000-processor cluster just isn’t getting the job done. But it happened to Jeffrey Skolnick, bioinformatics director at the University of Buffalo, whose computing infrastructure was just two years ago the crown gem of his Center of Excellence in Bioinformatics. The cluster has been completely saturated since day one, he says. In the time since, Skolnick has not only learned quite a bit about what to look for in a vendor, but also installed a brand new 1.32 teraflop IBM blade system.
While the original Dell installation was getting good science done, Skolnick says his team wasn’t getting the support it needed from the computer vendor. The giant cluster was “really straining the state of the art. All the things that can go wrong, the odds are we’re going to see them. … It’s not what happens when there are issues with the operating system, with integration — it’s what happens next,” he says. “Unfortunately, I really think [Dell] lost interest.” (A Dell spokesperson was unavailable for comment.)
So when it came time to think about an upgrade, Skolnick held a full competition to get bids. “Each vendor had to provide a little mini-rack so we could test it,” he says. Adding that “it wasn’t really a question of Dell’s price not being as competitive,” Skolnick says he was won over by IBM because it got its system to work in his environment, will have people optimize his code, and also because of its tremendous interest in collaborating with the center. In addition, the advantage to having blades was their smaller footprint — “You can squeeze 50 percent more computers in the same square footage,” he says — and lower power consumption.
During the 20 months Skolnick’s been at the center, his cluster has gotten most of its workout in predicting protein structure and function. Those predictions, fed to the machine by any of the 18 members of the center’s research team, account for 60 percent to 70 percent of the infrastructure’s workload, Skolnick says. Once structures are divined, he adds, they are used to derive further predictions of biochemical interactions based on protein shapes.
For now, the Dell cluster will remain up and running. “We have a three-year warranty on it. We’re certainly going to use it until the hardware warranty runs out, and probably beyond,” he says.
Try This at Home
Skolnick’s experience with various computer vendors in the past couple of years offers some valuable lessons for anyone thinking about buying or replacing a cluster. For starters, he says, “Try if you can to get a demo unit. Run your application on it, and run it for different configurations.” He adds, “Just because the thing is running nominally 20 percent faster in clock speed doesn’t mean your code’s going to run faster. You may pay a premium price for a one percent improvement in performance.”
Next, he says, optimize your configuration — different compilers could give a 20 percent difference in performance on the same code, he notes. “The cheapest thing to do is to check out different compilers. Try it on different machines and try to find the sweet spot.”
Of course, don’t forget basics, such as getting references from a vendor to check up on things like responsiveness to problems. Be sure to account for air conditioning and power supply needs, as well as checking that your floor can accommodate the weight of a cluster.
Most importantly, he advises, “Don’t buy anything you don’t absolutely need today — because next month it’s going to be cheaper.”