Supercomputer maker Cray has launched a new FPGA-based implementation of the Smith-Waterman algorithm for its XD1 platform in a bid to gain a foothold in the bioinformatics market.
While Cray is well-established in other scientific computing sectors, the company best known for its pricey vector-based computing systems hasn't had much success penetrating the bioinformatics market. According to Amar Shan, XD1 product manager, there is a simple reason for that: "The customers in bioinformatics are very, very cost conscious, so they're looking for the absolute cheapest solutions, and they're just looking for lots of processors."
In order for Cray "to really crack the bioinformatics sector," Shan said, the company needed "to show an application that performs far better than what they can see on one of these other platforms, and I'm hoping that's what Smith-Waterman is going to do."
Cray's XD1 — a modular system based on AMD Opteron processors — is priced in the range of hundreds of thousands of dollars, as opposed to the multi-million-dollar range typical for the company's high-end systems. Coupled with FPGAs (field programmable gate arrays), which are gaining popularity within the bioinformatics community to speed computationally hungry algorithms like Smith-Waterman, the system offers the combination of general-purpose supercomputing along with "accelerated" bioinformatics applications previously available only via dedicated appliances from Active Motif subsidiary TimeLogic, Paracel, or Sage-N Research.
Shan acknowledged that Cray already has some entrenched competition in the field, but said, "When I set out to build the Smith-Waterman implementation, my intent was not to go head to head with TimeLogic. My intent was more to be able to say to people who are buying clusters to do a broad range of computing, 'There's a better way — here's a high-performance computing solution, and if you equip it with FPGAs, you get all this additional performance.'"
As it turns out, he said, since Paracel closed shop last year [BioInform 10-04-04] , "We have a number of situations where people are contemplating the XD1 as Paracel replacements, so we are very much being forced into that mode of benchmarking one against the other, making sure that the sensitivity is the same — there's a lot of work that's going on there."
The only benchmarking number that Shan provided, however, is that of a 28-fold speedup for a Smith-Waterman search using a single FPGA over a single Opteron processor, which was conducted at a beta customer site.
The Smith-Waterman implementation is the first in a suite of FPGA-enabled applications that the company intends to release for the XD1 — in bioinformatics as well as other scientific disciplines. Smith-Waterman was the company's first choice for several reasons, Shan said.
"First of all, from a technology standpoint, it lends itself really well. Second, we know that in the bioinformatics community, they really value both the openness and they value price-performance. So we figured, take Smith-Waterman, which is a gold standard for bioinformatics sequence matching, accelerate it, provide it back to the community, and see how quick an adoption we can get in the bioinformatics world."
Shan said that Cray intends to release all of its FPGA-based applications as open source software, which is expected to give the company a competitive advantage over dedicated accelerated bioinformatics hardware vendors. "When I looked at the market that some of the other FPGA accelerated solutions have enjoyed, traditionally it hasn't been that great, and part of it is that the bioinformatics people tend to have a very strong belief that solutions should be open. So there have not been a lot of people who are willing to lock themselves into particular versions of algorithms or into particular hardware," he said.
Since Cray's Smith-Waterman implementation is still in the beta release stage, it is currently available via CD for XD1 customers, with updates available from a website accessible to Cray customers. Eventually, Shan said, the company may release its code through the Open FPGA Consortium (www.openfpga.org), of which Cray is a founding member, or through other mechanisms.
"We know that in the bioinformatics community, they really value both the openness and they value price-performance. So we figured, take Smith-Waterman, which is a gold standard for bioinformatics sequence matching, accelerate it, provide it back to the community, and see how quick an adoption we can get in the bioinformatics world."
Partners and Rivals in the FPGA Market
Cray has identified FPGA technology as an important part of its future high-performance computing strategy. In September, the company signed two collaborations to solidify its position in the FPGA market — one with Mitrionics, in which XD1 customers will be able to use the company's Virtual Processor technology to program FPGAs using a standard software-development kit, and another with DSPlogic, which will enable XD1 customers to program FPGAs with the company's Rapid Reconfigurable Computing Development Kit.
The company is not the only big IT player to recognize the value of reconfigurable hardware. In September, SGI introduced RASC (reconfigurable application-specific computing), an internally developed alternative to FPGAs that the company is targeting toward bioinformatics and other scientific markets. An SGI spokeswoman told BioInform that SGI has not yet developed any bioinformatics applications for the RASC technology, however.
Christopher Hoover, marketing manager for TimeLogic, acknowledged that the FPGA space is getting a bit more crowded than it used to be, but noted that "it's exciting to see people recognize the value of FPGAs. It kind of validates our business model."
Hoover said the Cray system shouldn't present too much competition for TimeLogic in the bioinformatics market, since it only includes the single Smith-Waterman algorithm, "and that's one of 40 or 50 different search types that we carry on the DeCypher system."
Hoover said that TimeLogic is expanding the number of algorithms available on DeCypher even further, and plans to launch a new algorithm for short oligo searches, called TeraProbe, in about a month. The company is also developing an application for protein identification from mass spec data that should be available in "early 2006."
Nudging FPGAs into the Mainstream
Shan stressed that FPGAs are a long-term strategy for Cray, which "is investing in this technology because we believe it has a future, but at this point in time, it's not adding hugely to the bottom line."
FPGA programming, in particular, "is very much in its infancy," he said. One reason that the company is behind efforts like Open FPGA, therefore, "is to promote more people to actually take advantage of this technology and to share the results that they get. … We're hoping that will really drive the adoption of FPGA technology in the mainstream."
In addition to the difficulty of programming FPGAs, Shan said that another hindrance to adoption in the past has been sub-standard communication between the FPGA and the primary processor. "Normally people plug FPGAs into cards on PCs, and all the performance gains you get from the FPGA, you lose with the slow speed of communication," he said. In the XD1, however, "We have a conventional Opteron processor and the FPGA tied together through a very fast communications processor. So they're able to transfer data on the order of 30-times faster than you can through a PCI-based card."
That set-up enables Cray to "not put the entire program on the FPGA." In the case of Smith-Waterman, he said, "We just took a very small piece of the code — the piece that does the scoring of the quality of the match — and we put that into the FPGA." This capability also reduces the amount of time required for programming, Shan said, because "we can just identify small compute kernels so we can actually produce FPGA-accelerated solutions much faster than somebody who has to put the whole thing into hardware."
While Shan was reluctant to label the system a "low-end" version of the company's supercomputers, he said that the XD1 "scales down to a size that Cray has traditionally not gone after." Because the XD1 is modular, pricing can range anywhere from $100,000 to several million dollars, but Shan estimated that a typical "mid-sized bioinformatics shop" would be looking at a system in the range of $200,000 to $300,000.
But is that price low enough for most bioinformatics customers? With most Linux clusters and dedicated accelerators like DeCypher priced in the tens of thousands of dollars, it remains to be seen whether labs are willing to pay an order of magnitude more to have both capabilities in a single system.
Shan said that Cray is seeing interest in the Smith-Waterman implementation, however, and that there are four customer sites currently using the application. Several XD1 customer sites — such as the British Columbia Genome Center and the National Institute of Environmental and Health Sciences — are also developing their own FPGA-enabled bioinformatics algorithms, he said, while other beta customers have requested a number of additions to the Smith-Waterman implementation — such as global alignment capability and improved statistical analysis features — which Cray is in the midst of rolling out.
As for future bioinformatics algorithms that Cray plans to add to its suite, "It's going to be very customer-driven," Shan said.
— Bernadette Toner ([email protected])