Bioinformaticists looking to implement Blast or Smith-Waterman on a multiprocessor cluster have plenty of parallel packages to choose from, but when it comes to parallelizing custom applications, the only option is often just to rewrite the program from scratch.
Interactive Supercomputing is looking to change that through a software product called Star-P, which simplifies the process of porting desktop applications written in high-level languages like Matlab to compute clusters.
This week, the company announced that the National Cancer Institute’s Pediatric Oncology Branch had successfully ported an internally developed software application called CORR4DB from a desktop environment to an 8-processer SGI Altix system to gain a 200-fold speedup.
CORR4DB, written in Matlab, calculates correlations between genes in microarray gene-expression experiments. The program was written for a single-processor Pentium desktop, which ran out of memory at the 10,000-sample mark, according to a case study provided by ISC. A typical experiment took up to a week to analyze in serial using desktop PCs, the company said.
However, porting the application to a multiprocessor system would have required it to be rewritten, in which case all the benefits of the original Matlab environment would have been lost, said Mark Potts, president of HPC Applications, a consulting firm that NCI contracted to migrate CORR4DB to the Altix cluster.
Without Star-P, “you would have started over and adopted some C or C++ and MPI in order to parallelize this,” Potts told BioInform. “In other words, you would have thrown away the familiar framework, and that was its real strength.”
Potts said that as a result of using the Star-P platform, the NCI researchers saw a 200-fold speedup in their analysis, without losing the familiar interactivity of their desktop application. “The only compromise they had to make was they had to move their data to the server,” he said.
Interactive Supercomputing was launched in 2004 to commercialize technology developed at Massachusetts Institute of Technology. Ilya Mirman, vice president of marketing at ISC, said that the company is targeting a broad range of scientific computing markets, but noted that almost half of the technology’s early adopters have come from the life science market — particularly in medical image analysis.
“Life sciences is the early market for us,” he said. One driver for that is that “the life sciences sector has large and growing data sets, but they don’t have a lot of parallel programming expertise, and with the availability now of very cost-effective parallel systems — a supercomputer used to cost millions of dollars and now you can get a deskside machine with eight or 16 or 32 processors for $15,000 to $25,000 — it’s very affordable, and suddenly you have orders of magnitude more power than the scientist or engineer can program.”
“What Star-P does is let them take their own custom algorithm and just extend it with much larger data sets and much faster execution into a parallel environment without having to write their own.” |
Cluster computing has become quite popular in bioinformatics due to “embarrassingly parallel” algorithms like Blast that “are relatively easy to break up into independent chunks,” Mirman said. However, he noted, “there is all sorts of science that’s done on the desktop that’s beyond or different than the few algorithms that have been parallelized.”
Scientists running custom programs must often remain on the desktop because they don’t have the expertise to parallelize the code, he said. “What Star-P does is let them take their own custom algorithm and just extend it with much larger data sets and much faster execution into a parallel environment without having to write their own.”
Currently, Star-P supports Matlab, but the company plans to add other high-level languages this year like Python, Mathematica, and R.
Mirman said the company also plans to develop “more automated parallelization tools” to help speed the porting process using Star-P.
ISC also has an open API that enables external code to be plugged into the environment, and plans to add “interfaces to more and more of the popular algorithms,” such as Blast for the bioinformatics market, he said.