Sometimes it's the ubiquitous technologies that are the most overlooked. Such is the case with GPUs, which have largely gone unnoticed though they've lived inside our laptops and desktops for quite some time. Most of us were probably not even aware of their existence, while we appreciated their rendering power nonetheless.
While there is never any shortage of vendors touting their hardware acceleration tool as the silver bullet for your computing bottlenecks, it's important to keep in mind that it is not the hardware, but the algorithm and the data set that should inform your choice, along with cost and ease of use. The fact that GPU chipmaker NVIDIA has made porting code for GPUs easier for the average bench biologist with its CUDA software technology helps the argument for considering this breed of acceleration technology.
And there are other reasons to make the most of your GPU chip. "The advantage, first and foremost, is just the ubiquity of these things — there has already been deployed 100 million CUDA-capable GPUs," says John Melonakos, founder and CEO of Acceler-Eyes. "They come standard in desktops and laptops and they're cheap. … They perform with the best of the accelerators, plus you get about 10 to 100 times speedups."
Melonakos is understandably biased; his company recently released a GPU acceleration software engine for Matlab. Jacket 1.0, released after a seven-month beta testing period, allows users and developers to program GPUs in their desktops, laptops, or workstations using Matlab's M language. Part of the impetus for Jacket is to provide an alternative to users who do not want to get their hands dirty trying to port their Matlab code to C or some FPGA software compiler. The tool utilizes NVIDIA's CUDA and manages all CPU-GPU memory transfers, kernel tweaking, and execution. "The whole concept behind Jacket is we want scientists and engineers to focus on science and engineering, and not on computer science," Melonakos says. "That's the point of Jacket: it basically handles the implementation details of running a computation on the GPU in an automated fashion."
Right now Jacket is built onto CUDA, and therefore tied to NVIDIA's hardware, but Melonakos says that the company is keeping an ear to the ground for Intel, which is expected to be coming out with a GPU in a year.
When dealing with less widely used programs than Matlab, it is not such an easy task for those working on GPU acceleration to cull from the large number of bioinformatics algorithms which ones should be the focus of the next GPU acceleration project.
"There are some algorithms that would be very well suited for this and we are exploring a few, but there are so many algorithms, so what exactly do people want and need accelerated?" says Joe Landman, president of Scalable Informatics. "Last time I checked, there were I don't how many — 200-plus phylogenetic software codes. … It's very hard to select the one or two codes that will be most impactful to the research community that we can spend effort on or that the community would be able to derive value from."
Landman and some academic colleagues have started by taking safe bets on more popular algorithms that would work well with the GPU's architecture. In collaboration with Vipin Chaudhary, an associate professor of computer science at the State University of New York at Buffalo, Landman recently released a GPU version of HMMER, a software suite for creating and using hidden Markov models of sequence data. "Within HMMER itself, there is a particular routine called P7 Viterbi, which does a Viterbi calculation, that is extremely well suited to this type of architecture," Landman says. "Things that look like data parallel calculations will go extremely nicely onto a GPU."
In general, there is no one-size-fits-all compiler that will change your code to take advantage of the GPU technology. "The overall idea is that tools like CUDA are not a magic bullet, but they do help you because you don't have to think about the code at a low level of abstraction," Landman says. So when it comes right down to it, if you want to take advantage of the power of the hardware in front of you, you have to adapt your code to the underlying architecture.
"There's simply no way around that. There are compilers that will try and help you get there ... but most scientists I know don't want to deal with vagaries of doing thread joins and forks — what they'd rather do is write their code and then give their compilers some hints about this is how I want to run. That's OpenMP, and its been a fairly successful paradigm," Landman says. "Then there's the distributed views of the world [which forces] you to rethink how your application was laid out and you couldn't magically get it going 20 or 30 times faster on your cluster, it still required a lot of thought on how to get it there."
Also coming down the GPU development pipeline are improvements for MUMmerGPU, a GPU adaptation of the genome alignment program that was originally released in 1999. Early on, MUMmerGPU demonstrated 10x speedups in comparison with standard MUMmer on CPUs, but its developers still felt that significant performance improvements could be made.
"The original implementation of MUMmerGPU was straight-forward, in terms of writing a program that worked correctly, but getting the program as fast as possible required a significant investment," says Michael Schatz, a graduate student at the University of Maryland. "In the end, MUMmerGPU is much faster than before, but we spent several months trying to squeeze as much performance as possible, and literally trying hundreds of options."
With the first MUMmerGPU, the algorithm was split between the GPU and the CPU, which partly resulted in performance not being through the roof. With the upcoming release of MUMmer-GPU 2.0, the entire algorithm will run on the GPU, which results in far greater performance. "The first and most important lesson we learned was [that] to get good performance from a memory-intensive application, you have to follow good software engineering practices and carefully measure how your program works on the GPU hardware," Schatz says. "However, if you are willing to spend the time learning the ins and outs of the hardware and the model, the payoff can be very impressive."
Open door policy
In addition to genome analysis algorithms, GPUs have been a logical candidate for molecular dynamic simulations. In February, Vijay Pande, an associate professor of chemistry at Stanford University, and his colleagues announced the release of Open Molecular Mechanics (OpenMM), an open-source software package designed to make use of GPUs for accelerating small molecule simulations. Because OpenMM is actually a library, it can be inserted into the code of many different molecular dynamics programs, such as GROMACS, to achieve speedups of up to 700 times faster than these same programs would have running on a single CPU, according to its developers. Another key characteristic is that the tool is hardware agnostic, meaning OpenMM is not bound to any particular GPU vendor.
Along those same lines, Apple has helped support OpenCL, a programming language for GPUs and multicore processors that is also hardware independent. The development of OpenCL is currently being supported by Khronos Group, a consortium of computing hardware and tool developers. "Apple puts GPUs in their computers and they don't want to write a single line of CUDA because if NVIDIA starts hiking up their prices, they always want to have the option of getting out," says Melonakos. "OpenCL will be the standard of how scientific computing is done on these GPUs in the future, but currently, it's still in its infancy as an API."
But as open-source, hardware-agnostic, GPU languages, compilers, and software libraries become more fully developed, chip vendors will be forced to fight even more for their place in scientific computing. And with more competition resulting in lower prices and better performance, it appears that the argument for GPUs as a feasible option to accelerate the informatics workflow becomes a little stronger each year.