The National Cancer Institutes recent discovery that it could achieve a 100-fold speedup in its computations using a Cray SV1 supercomputer happened by accident, according to Steve Conway, head of bioinformatics at Cray.
While the NCI has been using three Cray machines along with computers from IBM, SGI, and Hewlett-Packard to support its bioinformatics work for years, Conway said it wasnt until a Cray employee saw a recent news broadcast on the computational requirements of bioinformatics that a lightbulb went off: It turns out that the hardware capabilities that have been on Cray supercomputers for decades for classified government work have to do with pattern matching, which are of the very same nature as bioinformatics problems, Conway said.
Cray convinced the NCI to test the ability of the vector supercomputers, and the results of the initial demonstration project have been very impressive, according to NCI systems software specialist Dennis Foley. In the project, scientists at the NCIs Advanced Biomedical Computing Center in Frederick, Md., produced a comprehensive map of short tandem repeat sequences (STRs) for the entire human genome.
The Cray system completed computations that would normally take eight to ten hours in two minutes. This speed-up will allow NCI biologists to ask new questions, Foley said, since exhaustive analyses that used to be impossible are now feasible for the first time.
Conway said that bioinformaticists have gotten used to relying on statistical algorithms to narrow down their searches because an exhaustive analysis using identity algorithms has been far too long and expensive to carry out. Because statistical methods involve guesswork and shortcuts, Conway believes the results are not as accurate as they could be. Now, he said, the Cray machines are fast enough to permit identity searching and 100 percent validation.
However, while news of the potential performance gains due to the Cray system should pique the interest of many genomics and bioinformatics researchers, the company has a modest view of its potential role in the marketplace. Conway said Cray intends to make a push into the market, but were not going to give IBM or Compaq 24 hours to get out of town.
Conway acknowledged that most genomics researchers dont need the brute force computational power provided by the Cray machines. Because relatively few commercial organizations are doing primary research in genomics or proteomics, he said the potential market for Cray in this sector is actually fairly small. IBM and Compaq have gone after the life sciences market in a very broad fashion, he said a strategy Cray does not intend to follow.
Instead, Cray will forge ahead in its collaboration with NCI to develop additional software tools for non-tandem repeats, EST cluster assembly, CG island detection, genome assembly from BAC clones, SNP analysis, and the extension to protein sequences for proteomic applications. Conway said the software is being built to take advantage of the speed of the Cray machines but will be portable to other systems. He expects the full suite to be available in one to two years.
Cray and the NCI expect the software and data resulting from the collaboration to be publicly available.