Post-genomic-age problems demand more robust apps.Gary Montry is rounding ’em up and branding ’em.
By Ken Garber
Gary Montry doesn’t think much of today’s bioinformatics software. “Pathetic,” he says. “Code that’s out there in the public domain, none of it is very well optimized and none of it is supported, except BLAST.” Although most of these programs are less than five years old, in Montry’s view they’re already “legacy” — that is, obsolete. “They’re not quality assured,” he points out, “and they haven’t been adapted to the latest parallel processing architectures.”
That final point is a particular problem, Montry contends, because as genome databases expand, single-processor computers will be infuriatingly slow. “Three or four years ago, you could run [genomic] problems on workstations,” he says. “But software that was written in the mid-90s doesn’t have the necessary throughput capability” now to run on multiprocessor machines.
Southwest Parallel Software, Montry’s one-man operation in Albuquerque, NM, is dedicated to changing that. “How does somebody like my-self make a living in this business, where all the software is free and freely available?” he asks rhetorically. Fairly readily, actually, as long as the wild pace of biologic discovery keeps up. The explosion of data and a need for better, faster, more efficient software has put Montry’s brand of genius in demand.
Here’s why: University-based bioinformaticists don’t have the resources to make the improvements necessary to accommodate ever bigger problems in genome assembly or analysis. At the same time, makers of high-end computer hardware, such as Sun, Compaq, and SGI, need software. So, all three computer makers have contracted Montry to take existing, freely available code and make it faster, bug-free, and compatible with their hardware.
“I know two people who are really good” at parallelizing bioinformatics software, says Montry’s former business partner Dan Joy. “One is Gary.” (The other: a free agent in Dallas named Don North.) “Gary understands the flow of data through an algorithm,” Joy says. “He knows how to get every bit of performance out of the code.”
Joy, who was recruited away from Southwest Parallel in 1999 by Sun and is now an independent consultant, calls Montry a “character.” Describe him? “Gary? Honestly?” Joy answers. “I wouldn’t want you to print anything on that. He’s a lot of fun.” Joy also calls Montry “awfully opinionated,” but adds, “he’s usually right.”
“I like to have a good time,” agrees Montry, a young looking 54-year-old with mischievous eyes. “I try to not be too uptight about work. Scientists can be pretty tight.” Montry admits his candor has gotten him into trouble in the past. “I’ve probably dissed a few people whose work I thought was pretty shoddy,” he says.
To be sure, parallel processing is still considered a utopian backwater by many computer scientists. With performance of conventional single processors doubling every 18 months, why go parallel? “The overhead of going parallel just swamps the savings” for most jobs, says University of Michigan computer science professor Marios Papaefthymiou. “At a certain point, if you don’t have more work to do, then adding more and more computing power to the problem doesn’t work at all.”
But for certain kinds of jumbo jobs, Papaefthymiou adds, parallel processing can make all the difference. “If the problem is really large with respect to the size of the system, then you get linear speedup.”
Still, Larry Hunter, director of the Center for Computational Pharmacology at the University of Colorado Health Sciences Center, says those problems are few. “Parallel processing has turned out to be a very specialized niche,” he says. While necessary for sequence assembly, he adds, “there’s less of a need for it, in my mind, for most bioinformatics applications.” Hunter points out that protein structure prediction and molecular dynamics are hard to parallelize, because the complex interactions prevent breaking up the problem and spreading it out evenly among many processors. The so-called “embarrassingly parallel” problems like sequence assembly are “very unusual,” says Hunter. “I really have my doubts about whether you can make a business by parallelizing public domain software.”
The defiant Montry may not have made a big business, but he certainly appears to be making a living. Although Southwest Parallel remains a tiny venture, heavyweight customers depend on it. “Genome data is released at very high volumes,” says Joe Cerro, senior scientist in the bioinformatics department at Bayer, which uses Montry’s parallelized version of PHRAP, a well-known program for assembling genome sequences from shotgunned DNA fragments. “Especially for a company trying to establish a proprietary position, intellectual property, the ability to analyze and prosecute the data as quickly as possible is essential to what we do.”
While Montry’s own work gets raves from his computational biology peers, selling it would appear to violate the “open source” ethic that has dominated bioinformatics and guarantees—at least for academic researchers—free access to programs.
Does this make Montry an open-source outlaw? Not really, says Sean Eddy, the Washington University biology professor who wrote HMMER, an open-source statistical application for protein sequencing that is used widely by drug and biotech companies. Academics are limited, says Eddy: “We can never quite bring a program to industrial strength. We can’t support it, make it fully robust, document it.” What Southwest Parallel does is legitimate and necessary, he adds. “This is exactly the way things should work, as far as I’m concerned. I’m surprised they’re one of the few companies out there with this business model.”
Ironically, Montry can’t profit from the work he did to make HMMER 10 to 12 times faster for multiple queries. Eddy distributed HMMER under a GNU General Public License, which means that anybody who upgrades the software must make the source code available upon demand. “I have to give it out,” notes Montry. “I can’t make a living doing that.”
But Eddy’s software is the exception. Montry can take most public domain software and, after licensing it and agreeing to pay the inventor royalties, charge for improved versions without handing over the source code. Montry’s dozen or so PHRAP customers, who pay $19,000 for a first-year license and $7,500 per renewal year, include AstraZeneca, Bayer, Novartis, and Schering-Plough.
“He’s definitely done a really good job” parallelizing PHRAP, says University of Washington’s Phil Green, who wrote the program in the early 90s. Trained as a physicist, Montry has spent his career in computers, and has dealt with applications spanning nuclear fusion, aircraft design, weapons simulation, and weather prediction. Biology is just the next big thing.
And it’s a challenge Montry has been preparing for since he helped set a speedup record for parallel processing in the mid-80s. Montry and two co-workers at Sandia National Laboratories, John Gustafson and Robert Benner, achieved a 1,000-fold scaled speedup on a 1,024-processor nCUBE. “Some people hated us,” recalls Montry. “Some people said we cheated.” But the achievement earned them the first Gordon Bell Prize in 1988. The prestigious award “made my career,” says Montry.
Montry is now trying to extend parallel processing to other bioinformatics applications. He’s confident he’ll find customers. Parallel processing, as Montry demonstrated at Sandia, achieves its power from certain kinds of gargantuan jobs, and biology is now providing them.
“The old style biology is dead,” he proclaims. “It’s changed forever. That’s what Craig Venter showed with Celera ¯ the computational expertise. You just can’t take your time developing things like you could in the old days under the old academic idea of, ‘Just move slowly and go to several conferences a year and talk about it.’ Now that the carrot of large amounts of money is there, this business is completely transformed.”