Many labs may already own hardware capable of accelerating Smith-Waterman and other dynamic programming algorithms, according to preliminary results from researchers at Lawrence Livermore National Laboratory and the DOE's Joint Genome Institute.
Wayne Huang, an employee of both institutions, presented early results from the team's work in a poster at the Genome Informatics conference at Cold Spring Harbor Laboratory in early November. Huang and his colleagues implemented the double-affine Smith-Waterman algorithm on an nVidia GeForce 6800 graphics card — which costs around $400 and already ships with many PCs — to achieve a six-fold speedup over the algorithm on a 2.0 GHz AMD Opteron.
While the performance gain is nowhere near that of FPGA-based systems like TimeLogic's DeCypher, Huang said that the ready availability and low cost of graphics processing units, or GPUs, may present a cheap alternative for many small labs that can't afford high-priced FPGA systems, which run into the tens of thousands of dollars.
"Many labs with clusters and workstations already have the hardware needed to accelerate Smith-Waterman alignments — they're just not making use of it," Huang said.
"Many labs with clusters and workstations already have the hardware needed to accelerate Smith-Waterman alignments — they're just not making use of it."
Yang Liu, a colleague of Huang's at LLNL, was already looking into alternative uses for graphics hardware, a field often called GP-GPU programming, for general-purpose programming on GPUs. Huang proposed implementing Smith-Waterman on GPUs to help streamline JGI's annotation pipeline. "One of our largest bottlenecks is the analysis of our predicted gene models against known proteomes, where we use alignments like Smith-Waterman to do that analysis," he said.
With initial results promising a 6X speedup, Huang said that the team is now working on a production version of the software for regular use at JGI. "It's hard to go from test cases to benchmark your software to actual production runs of real data, so I think it will be interesting to see the actual performance that we get compared to software and hardware versions that we're using at the lab," he said.
Huang said that the team is also building an API "that will essentially provide developers with a library that will automatically detect if they have a supported graphics card in the system that they're running the application on, and it will automatically exploit the graphics card to accelerate a number of variants of the Smith-Waterman algorithm if that graphics card exists on the system — otherwise it will just perform the normal in-software variant that they request."
The team is also looking at implementing other bioinformatics algorithms on the graphics hardware. Huang said that two attendees of the Genome Informatics conference asked him to look into accelerating their algorithms. One analyzes conservation across protein interaction networks, and the other finds conserved regions in RNA structures.
"We're not in a phase where we're actually implementing them, but we're investigating it," he said.
Huang noted that there are limitations of the GPU architecture that make graphics cards very difficult to program. "You really have to be able to formulate your problem in a way so that it can be accelerated on the graphics card, and that in itself is probably the most difficult part of the solution," he said. "Once you've got that, it's essentially relatively simple to implement."
The LLNL/JGI team has submitted a paper on the method that should be published next year, and Huang said that developers familiar with the OpenGL programming language should be able to implement the double-affine Smith-Waterman algorithm on their own after reading the paper. Most biologists, however, "probably wouldn't be able to develop this on their own," he said. "They would just take advantage of the pre-compiled binaries that we'll provide."
Despite its promise, GP-GPU is not yet common in bioinformatics. Huang speculated that the relatively modest performance gains for Smith-Waterman "may be only slightly faster than an alignment heuristic like Blast." Nevertheless, he said, "there are a lot of alignment heuristics that are really fast, but most of them at their core use some variant of Smith-Waterman, so if we can accelerate Smith-Waterman, there's a chance that we can also accelerate these alignment heuristics as well" — including Blast.
The GPU implementation also falls far short of FPGA-based accelerated bioinformatics systems. In benchmarks, Huang said that TimeLogic's DeCypher Engine G4 was about 15 to 20 times faster than a GPU implementation.
But labs may be able to compensate for the moderate performance improvement with additional cards. According to the poster abstract from the Genome Informatics conference, the team estimates that it would take around four 6800 graphics cards — at a total cost of around $1,600 — to match the performance of a single DeCypher board, which costs more than $30,000.
Huang stressed, however, that the team's results are still preliminary and that the effort has "barely progressed beyond just sort of a side research project."
— Bernadette Toner ([email protected])