By Meredith W. Salisbury
Rob Edwards, an assistant professor at San Diego State University who uses genomics to study organism communities such as marine microbes, didn’t want to spend all of his time maintaining a computer system. “We’re biologists, not computer scientists,” he says of his team.
But the work they do has become increasingly compute intensive. Edwards’ crew chooses an environment — the ocean, for example — and does the kind of metagenomics study that is becoming more and more common after efforts like those at the Joint Genome Institute and Craig Venter’s Sorcerer II expedition. “We go and extract all of the DNA and we sequence it,” Edwards says. “You end up with lots and lots of pieces of very short DNA [and you have to] try to put all those sequences together and try to identify functions for them.”
If you think that sounds like a job for Blast, you’re right — but with studies that could generate about 300,000 fragments of 100 base pairs each, Edwards is looking at some serious time devoted to sequence homology searching. “Right now there’s about a couple of million proteins that are well known and well characterized,” he says. “We use that as the basis for our comparison.” Then, as if there weren’t enough comparisons going on, Edwards says “the really interesting thing” is when his team starts comparing whole environments to each other.
It’s for scientists like Edwards, who are involved in work that demands increasingly high-powered computing but hail from relatively small labs or don’t have access to major compute clusters, that IT companies are aiming their newest products: personal compute clusters. These are workstations with serious power — from just a dozen nodes to 128 or more — geared toward smaller labs that may be running their problems on desktops or scheduling them at busy central clusters available to many groups.
Edwards says his team had been doing its computations on regular Linux desktops. Around the beginning of this year, he opted to try one of the new mini-cluster generation and brought in a 12-node desktop cluster from Orion Multisystems. Like its counterparts, part of the lure of the Orion machine was its low power draw: “That was a big plus for us,” says Edwards, who notes that the machine has a single power supply and didn’t require any special electrical resource consideration. “We could just plug it into the wall.”
Much of the desktop cluster is designed to behave as if it were a regular desktop machine. “It’s all contained in a single box with essentially a single OS,” Edwards says. “When we do stupid things to break the cluster — which we do do occasionally — we only have one machine to unplug and plug back in to reboot” instead of having to shut the nodes down in a certain order.
The desktop cluster is part of a line Orion is pushing, having recently released a related “deskside” product that contains up to eight of the 12-node cards for as many as 96 processors in a single machine about the size of a two-drawer filing cabinet, according to Stu Jackson, former bioinformatics director at Incyte who’s now an applications engineer at Orion. Deskside units are still power-friendly and pricing starts “in the $100,000 range,” according to the company.
The machines arrive ready to plug in and get going, Jackson says — complete with software such as mpiBlast pre-loaded. These types of clusters are ideal for labs with up to 10 scientists that don’t want the hassle of worrying about air conditioning, power supply, floor support, and the various other practical concerns of installing a regular compute cluster.
This is certainly a direction that looks like a promising market. Earlier this year Fujitsu launched a 128-processor deskside grid computer called the Bioserver. (Released initially in Japan, the company expected to launch it in the US within a few months.) Mike McManus, vice president of Fujitsu’s BioSciences group, says the machine relies on low-power chips that can be packed densely without drawing more power than a few light bulbs.
No doubt other vendors have similar plans. From Edwards’ perspective, it’s a trend worth encouraging. His cluster runs at “100 percent capacity pretty much all the time,” and that’s all right with him. “People are starting to ask bigger questions,” he says, as they have access to more and more computational resources.