In Nat Goodman’s review of the available approaches to high-performance sequence analysis (“Catch a Rising Star,” Oct. ’02) he notes that we at Aneda “march to a different drummer” and, having tried our product MPSRCH, he wonders how we achieve our “incredible speed” considering that we deliver the full Smith-Waterman. I couldn’t let Nat’s question go unanswered, so let me outline our approach, share a few of our secrets, and show you just how good the performance of the algorithm can be.
While we believe that there is a need for absolute speed as delivered by Blast and other fast, heuristic methods, we’re convinced it’s important for users to be able to run the full Smith-Waterman algorithm at high speed without resorting to very expensive hardware accelerators. This is where MPSRCH comes in. MPSRCH implements the full Smith-Waterman algorithm without cutting any corners, ensuring a marked sensitivity advantage over the heuristic methods most people use. This is especially noticeable in the regions of distant homologues where Blast, for example, frequently misses one in six significant hits.
MPSRCH gets its speed from the unique way it is implemented. Originally, the code was written to run on the massively parallel MasPar systems available during the mid-1990s. At that time it was one of the most efficient software implementations of the algorithm around. In the intervening years, modern processors such as Alpha and Pentium have gained instructions targeting digital video which works in a very similar way to the old data parallel MasPar. Consequently, we were successfully able to port the MasPar-specific parts of the code to these general-purpose processors. The result is that MPSRCH 4 is about 20 times faster than a normal Smith-Waterman. I currently run it on a 1 GHz Pentium III laptop and the performance is as good as a 16,384-processor MasPar!
Also of note is the low memory requirement. With programs today needing more and more RAM, we buck that trend too, requiring less than 1MB of RAM per job. Importantly, the memory requirement is not affected by database size, in contrast to many other programs.
Various writers have questioned the worth of running a Smith-Waterman now that Blast 2 can do full gapped alignments. Our customers have certainly found it to be worthwhile. MPSRCH produces the optimal Smith-Waterman score for each sequence in the database. It generates the histogram and implements a Poisson fit to that data, calculating the statistical significance of each result.
In addition, since the alignment process is a full Smith-Waterman, you achieve the guaranteed optimal alignment for your query against each database sequence. This can often differ from that seen by Blast, with MPSRCH returning higher scoring alignments for the same pair. This may mean that Blast reports an aligned pair to be insignificant, whereas the exhaustive algorithm in MPSRCH finds the score to be highly significant. For example, we have seen Blast return an expectation of 10-5 whereas MPSRCH (with the same scoring table and gap penalties) finds an expectation of 10-100.
Ultimately, the choice of search engine should not be restricted to a single product. MPSRCH fits comfortably into any environment where Blast is currently used for homology searching. Hardware requirements are similar, so now a full Smith-Waterman can be added without expensive hardware accelerators. MPSRCH happily runs on a Linux machine with a recent Intel or AMD processor and we would recommend 128MB of RAM so that the system has room to cache file system reads.
This is a complete contrast from the old days, when the original MPSRCH code required a climate-controlled environment and a $1 million supercomputer.
Dr. Shane Sturrock
Chief Scientific Officer