The combined effect of impressive processing capabilities together with a reduction in power consumption and cost has been the main driver behind the rapid increase in the use of graphics processing units for scientific computing over the last few years. Hardware vendors have fine-tuned their marketing pitches to emphasize the power of high-performance computing that one or two graphics cards can bring to the desktop. Recently, though, a discernable shift has taken place — GPUs are coming to HPC. Musings on the potential to couple many GPUs together to form clusters or supercomputers are steadily gaining traction. For decades, the world of supercomputers and clusters has been the domain of CPUs and specialized processing hardware, but GPUs are beginning to show up in systems — not only as accelerators or coprocessors, but as fundamental elements in the architecture of the system itself.
"I think that there is a lot of interest in the use of GPUs for supercomputing due to their high peak FLOPS and memory bandwidths, and the potential for decreased cost for HPC workloads, both in terms of node-hours of runtime, and also in terms of space, power, and cooling requirements," says John Stone, a senior research programmer at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Stone has developed specific algorithms and optimizations for a new breed of GPUs that are specially designed for high-throughput processing, and he is currently running two widely-used molecular dynamics packages, NAMD — Not (just) Another Molecular Dynamics program — and VMD — Visual Molecular Dynamics — on a large cluster of GPUs to conduct massively scaled molecular simulation experiments. He and his colleagues are also increasingly turning their attention to figuring out how GPUs can provide acceleration on a petascale for the simulation of molecules over extended timescales.
"Many problems of biomedical relevance involve very large molecular complexes, which by itself poses a computational challenge, but beyond that, large structures must be simulated on longer timescales, increasing computational demands even further," Stone says. "GPU supercomputers will also enable researchers to use physical or mathematical models that are more precise and ... run a much larger number of simulations to improve statistical sampling, thereby increasing confidence in the results of simulations."
GPUs and supercomputers
Recently, supercomputer maker Cray began developing specialized blades for its XE6 supercomputer that will include Nvidia's Tesla 20-series graphics cards. IBM has announced that three of its most popular blade center chassis will include Nvidia's new Fermi chip — a graphics chip designed expressly for HPC. Both Dell and Hewlett-Packard have also released blades and/or rack systems that incorporate GPU chips. In addition, HP won the contract to build the Tokyo Institute of Technology's Tsubame 2.0 supercomputer, which reportedly contains more than 4,224 GPUs, and will provide more than three petaflops of processing power to researchers throughout Japan's academic community. And in May, China wowed the world with Nebulae, a GPU-based super-computer housed at the National Supercomputing Center in Shenzhen. Nebulae is currently the second most powerful supercomputer in the world — right behind Oak Ridge National Laboratory's Jaguar system — and is the second GPU-based supercomputer to enter the top 10 of Top500's biannual list. The first to garner that honor was the Tianhe-1 system, which is at the National Supercomputer Center in Tianjin, China — it is a hybrid design that uses Intel Xeon processors and AMD GPUs. Tianhe-1 became came online last year and currently ranks seventh.
Last October, HPC researchers at the Georgia Institute of Technology and Oak Ridge National Laboratory, along with industry partners, were awarded a five-year, $12 million National Science Foundation grant to develop an experimental high-performance computing GPU-based system. The new system, dubbed "Keeneland," is based on Nvidia and HP hardware and is slated to become a TeraGrid resource in early 2012. In addition, the National Center for Supercomputing Applications at the University of Illinois last year launched the production of two GPU clusters, Lincoln and AC, to explore the challenges of GPUs in an HPC production environment. A team of researchers from Temple University recently announced that they were using Lincoln, which is comprised of 192 GPU nodes, to run a parallelized version of the open-source molecular dynamics program HOOMD-blue.
Although the multicore CPUs and specialty hardware found in current supercomputers offer more capacity, they don't necessarily accelerate certain jobs, especially those that run molecular dynamics code, according to Ross Walker, a research professor at the San Diego Supercomputer Center. Because of the mathematics involved in simulating molecules, the limiting factor is not speed, but the runtime of the simulation. The longer the runtime, the more energy the hardware requires and that, in turn, leads to an increase in energy consumption to remove the heat from the data center. Power efficiency is perhaps the biggest benefit of using GPUs in a cluster or supercomputer.
"Each new supercomputer that is coming online now runs our code slower than the previous machine did. The difference is that there are a lot more cores, so you can run more copies at once," Walker says. "But what we've found is that we can scale to a number of GPUs and obtain speed-ups of our code that actually takes us beyond what we could achieve on the latest supercomputers."
Walker says his team is trying to reach "the sweet spot of about 64 GPUs," though he notes that there are some jobs that they could run on a larger number of GPUs.
"With molecular dynamics, the length of the simulation you can run is much more important than the number of them you can run at a given time," he adds.
Despite their recent entry into the realm of large-scale computing, setting up a GPU-based cluster is not that much more difficult — and it may even be simpler, depending on one's technical prowess and patience — than running a typical Beowulf cluster of CPUs. "It should be pretty straightforward, depending on the performance benefit you get from the GPUs. You assume that if the GPUs are worth messing with, you're getting a pretty substantial performance benefit," says Jim Phillips, a senior research programmer at the Beckman Institute for Advanced Science and Technology, who was part of the team that demonstrated one of the earliest examples of running molecular dynamics code across a cluster of GPUs in 2007. "You can get the equivalent performance on a cluster with significantly fewer nodes [than a CPU-based cluster], and fewer nodes makes administration easier and mean time between failure larger," he says.
GPUs in the cloud
GPUs are also beginning to show up among the offerings of cloud computing providers, a development that could allow researchers to dip their toes in the GPU waters without having to commit to a major hardware purchase. In July, Web-services provider PEER 1 Hosting launched the world's largest public general-purpose computation on graphics processing units, and Sabalcore Computing, an on--demand high-performance computing provider, is also offering servers with Tesla GPUs with support for CUDA, its programming language. There is also Hoope Cloud, an Israeli open-source project designed to build cloud-based GPU computing systems based on GPUs, and researchers at the North Carolina State University have developed MatCloud, a cloud infrastructure expressly geared towards scientific computing. Although still under active development, MatCloud is a functional service infrastructure that users can access through a simple Web terminal interface to run MATLAB-like commands.
"I think that there's a great opportunity to use GPUs in cloud computing in general," Stone says. "I am very curious to try some experiments with molecular dynamics simulation and analysis calculations in that environment."