By John S. MacNeil
In bioinformatics hardware as in race cars, there are two types of people: Those who insist on purchasing the Corvette off the shelf, and those who can take an old Chevy Caprice, drop in a turbo-charged V8, and create a high-performance vehicle at half the cost of a brand-new car. In the world of bioinformatics hardware, Yoshiki Yamaguchi and his colleagues at RIKEN’s Genomic Sciences Center in Japan are those who build the hot rods.
Faced with the need to design a high-performance computing system on a limited budget for applications in comparative genomics, Yamaguchi and his team came up with the idea of souping up a regular Pentium-based computer with a device called a field programmable gate array, and customizing the hardware’s performance with specific algorithms for running sequence homology searches. If FPGAs could boost the performance of PCs running a simulation of how morals emerged in society — as Yamaguchi and his colleagues had shown previously — then why couldn’t they enable high-octane homology searches?
As it turns out, FPGAs are useful accessories for ramping up PCs dedicated to sequence homology searches — especially if the researcher wants to run the compute-intensive Smith-Waterman algorithm. Unlike Blast, a modified algorithm with built-in assumptions that minimize the brute-force effort needed to compare a query sequence with those in the database, the Smith-Waterman approach is more comprehensive in its strategy for comparing a query and database sequence. As a result, the Smith-Waterman algorithm provides more detailed and precise results, but at a significant cost in computational time and effort.
As Yamaguchi and his colleagues Tsutomu Maruyama and Akihiko Konagaya demonstrate, slapping an FPGA onto a Pentium-based PC can drastically reduce the time required to perform these types of queries, and at a reasonable cost. With an unlimited budget, of course, researchers could spring for tailor-made supercomputers or a cluster of PCs — or even a so-called dedicated hardware system, a commercial version of a PC/FPGA configuration offered by the likes of TimeLogic or Paracel. But Yamaguchi’s demonstration shows that with the help of his programs for configuring the system, which are available by request, average bioinformaticists can create souped-up PCs of their own.
“Many research departments, especially in universities, do not have adequate research funds,” Yamaguchi wrote in an e-mail, “whereas they need incredible [computational power]. We would like to assist and help the situation, [so] we propose the FPGA system.” The advantages of his home-built system extend beyond cost, Yamaguchi adds. Supercomputers or clusters require large spaces, climate-controlled environments, and a permanent power supply. “A personal computer with off-the-shelf FPGA boards is also space-saving,” he says.
Essentially, Yamaguchi’s system works like this: Select an off-the-shelf FPGA board with a PCI bus interface and hook it up to a Pentium-based PC running Windows or Linux with the help of interface programs that Yamaguchi’s group has developed. To run sequence homology searches using the Smith-Waterman approach, Yamaguchi and his colleagues have devised their own version of the dynamic programming algorithm, which allows the processor to compare elements of the query and database sequences in parallel.
Because the internal memory on an FPGA is limited, Yamaguchi and his team were forced to further customize their search algorithm for the PC/FPGA configuration by splitting the search process into two parts: in the first phase, database sequences are broken into sub-sequences and compared with the query sequence to produce a score corresponding to the best possible match. In the second phase, the FPGA is told to go back and figure out which sequence in the database that best score belongs to. At the end of all this, the user receives a readout of the best possible matching sequence and its corresponding score.
As a practical demonstration, Yamaguchi and his colleagues compared the performance of their PC/FPGA configuration to that of a PC outfitted with a Pentium III 1GHz processor running Linux. In one example, the researchers configured a desktop PC with a Celoxica FPGA board containing a Xilinx FPGA (Xilinx XCV2000E), one of the largest FPGAs available at the time the paper was published in 2002. With a query sequence of 2,048 elements and a database of 64 million elements, the PC/FPGA was able to complete the first phase of the search process (that is, generate the best score) 327 times faster than the Pentium III. The PC/FPGA completed the second phase of the search (that is, fish out the database sequence corresponding to the best score) about 102 times faster than the standard machine.
Yamaguchi points out that his group’s approach to modifying PCs is inherently flexible, given the wide variety of FPGA boards currently available from such companies as Xilinx, Nallatech, Celoxica, and Altera. “The cost of our largest FPGA board is several times the cost of a Pentium-based desktop computer system, while the cost of a PC card with a smaller FPGA is less than half the cost of a laptop computer,” Yamaguchi and his co-authors explain in their paper, published in the online proceedings of the 2002 Pacific Symposium on Biocomputing.
Having shown they can apply their hot-rod PC to sequence homology searches, Yamaguchi and his fellow researchers are now moving on to analyzing signal transduction pathways. The researchers plan to publish their results next month.