Conversation with Ward Wheeler bounces haphazardly from hard facts to — well, whatever pops into his head. So when he declares that the American Museum of Natural History, where he’s worked for more than a decade, “was an insane asylum for 135 years until they decided to put stuffed animals into it,” it’s clear he’s teasing. And when he goes on to say that what might be one of the world’s fastest supercomputers is in its basement (right next to the carpentry shop), the joke seems obvious.
But this time, Wheeler isn’t kidding. In fact, though he never formally studied computer science, he’s the driving force behind a 520-processor Beowulf cluster and the bioinformatics algorithm running on it. Wheeler, 37, earned his PhD in evolutionary biology from Harvard and did postdoc work at UCLA before joining the New York museum, where he’s now co-director of molecular laboratories and curator of the division of invertebrate zoology.
Wheeler employs comparative genomics to reconstruct evolutionary trees. “The tree is not only a branching diagram,” explains Dan Janies, a NASA-funded research colleague. “It also represents a hierarchical map of all the changes that have occurred between ancestor and descendent.” A typical study might involve 500 creatures; researchers would use the museum’s three ABI 3700s to sample the genomes at, say, 100 loci, and then align and compare those sequences.
These trees take on the shape of a notoriously difficult NP-complete problem. “It’s the traveling salesman problem,” Wheeler says. That is, minimizing the path connecting a set of points, with applications including laying fiber optic cable, network routing, and establishing evolutionary histories.
Providing a peek into evolution
Inevitably, Wheeler’s work sucked him into the practical problems of algorithms and computing. His solutions were pure do-it-yourself.
At first, he wanted a program that would combine the best of existing phylogenetic applications with a sequence analysis tool. “I needed to write [the program] and then write a whole parsimony program to test it out,” he says of an effort that began in 1992. Out came POY — Parallel Objective YAP, which in turn is Yet Another Parsimony. He initially wrote in C++ but threw it out and started over with ML, though the numerically intensive parts are still in C.
Freely available with easy-to-modify open-source code, POY has found a worldwide following and is downloaded from Wheeler’s site about 300 times per month. Gonzalo Giribet, an assistant professor in Harvard’s department of evolutionary biology, uses POY because “it’s one of the very few phylogenetic analysis programs that can perform operations in parallel.” A recent analysis took two or three months using Wheeler’s algorithm, Giribet says; without running in parallel, the job would’ve taken more than 40 years.
POY’s development presaged that of the museum’s cluster. The first incarnation in 1993 consisted of 10 machines and was used mostly for sequence alignment. As heterogeneous elements were added (“some Linux boxes, some HPs, some SGIs, some Suns,” Wheeler recalls), the system became unruly.
About two years ago, Wheeler and his colleagues proposed a large cluster, and NASA helped fund it. Robert MacElroy, a deputy program manager at NASA’s computing center who estimates his department’s annual grants to the museum at $400,000, explains, “We want to know how [organisms] are related to one another and how their interactions might alter in the future.” He calls Wheeler’s study “a kind of window into this evolution.”
Problems of parameters
Funding in hand, the immediate obstacle was location. The museum is bounded on all sides by streets, can’t expand up because of landmark status, and can’t go down because it sits on top of a lake. So the high-powered cluster took over a renovated 14’ x 31’ basement storage facility. With each growth spurt, more planning goes into squeezing things in without overloading the power supply or exceeding the capacity of the industrial-strength air conditioners.
The challenges didn’t stop there. Although comparative genomics is the prime application, the cluster is also used for other problems, such as studies of star formation. While the evolution researchers could save money and hassle with diskless computers, astrophysicists needed disks. Where the physics team wanted increased network speed, bioinformaticists preferred faster CPU time.
Wheeler calls the cluster a “duct tape and Dixie cup model.” The first round, totally homemade, included 130 nodes for 260 500-MHz Pentium III CPUs on a 100-megabit network.
His spacious office is littered with computer organs — fans, boxes, hard drives. Janies assembles computers in his spare time, snagging postdocs and grad students to help out. This is how all of the computers were made for the original cluster, as well as for this spring’s 130-node addition (with dual gigahertz processors). The DIY environment leads to things like overclocking — juicing up speed by pretending the chip is faster than it actually is, which Wheeler does to wring 12 percent higher performance out of older computers. Newer chips, he laments, are made differently to prevent the practice.
Beneath the fossils, fast computers
POY evolved as the cluster grew. Linear scaling was a hurdle: the cluster scaled well to about 16 machines, and then performance dropped off as more machines were added. A new level of the algorithm periodically checks for traffic and reroutes problems to idle parts of the network, allowing for near-ideal scaling for the 520 processors. As Wheeler writes new code, he tests it out on his four-processor computer and hooks it up to his Linux-enabled laptop to emulate parallelism. (The idea, he says, is to “debug the code before ruining people’s lives.”)
Throughout May there was a huge push to get the second group of 130 nodes built and installed. Wheeler stays in the office past midnight during crunch times, driving the algorithm and machines to perform well enough to make the cut for the world’s top-500 supercomputers list; he believes that would be a first for a genomics-based cluster. There’s also a separate list of fastest clusters, and the museum’s 432 gigaflop entry for that could outdo some clusters currently in the top 10, many of which have close to 1,000 processors.
As it’s well known that the benchmark isn’t designed for genomics-based machines, it’s unclear whether the achievement would be noted by bioinformaticists. “The benchmark may not matter to them,” Wheeler acknowledges. “What may matter to them is how large an evolutionary tree we can construct reliably in what length of time and [with] how much data.”
As the last boxes are added to the cluster and another version of POY is being released, everything seems to be right on track. But Wheeler’s already planning the next phase. “In the next couple of years I’ll have to start from scratch again,” he says about both the cluster and the algorithm. He’s confident that with even smaller computer boxes, he’ll eventually be able to pack several thousand CPUs into his cramped basement room. “It’s a Rube Goldberg kind of thing,” he says.