Lawrence Berkeley National Laboratory has selected a 40-processor Linux Networx cluster computer system to power its work on the Berkeley Drosophila Genome Project. The lab has been working on completing the holes in the fruit fly genome, which was sequenced and published in Science last March.
Erwin Frise, systems manager and biomedical scientist with Berkeley’s Drosophila project, said that as the pace of the project accelerated, he had to find an inexpensive way to boost the lab’s computing power in order to analyze larger and larger amounts of data.
“The whole thing changed when Celera started sequencing data,” Frise said. “Suddenly we had to sequence 130 million bases.”
Celera Genomics collaborated with the Berkeley project and Baylor University to sequence the fruit fly genome.
The Berkeley project is currently “filling in the gaps” of the Drosophila genome by re-sequencing large pieces and re-aligning them with the existing clone. The eight-processor system the lab was using ran at a snail’s pace of one clone every few days. Frise said that the cluster now enables them to do 50–100 alignments at one time.
The cluster has also reduced the daily number crunching time to two to three hours. Frise estimated that the system could process 50-60 jobs a day. He said, “Our human curators used to sit around waiting for something to do. Now they can’t keep up with the data.”
Frise had considered cluster technology as an alternative to a supercomputer for some time. He chose the Linux Networx system because it offered an integrated package that would mesh easily with the lab’s existing computer environment. In addition, the clusters are highly scalable, so he’ll be able to add additional compute modules to the system as demands increase.
Another benefit of the Linux Networx system, according to Frise, is its cluster management software, ClusterWorx, which allows him to manage the entire system from a single computer station. The software enables users to remotely monitor the power and temperature of each node, power the system up or down, or reset the entire system or individual node.
Brad Rutledge, public relations director of Linux Networx, called Linux cluster systems “the technology of choice for scientists and researchers.” He estimated that the price can be as low as one-quarter to one-tenth that of a supercomputer while offering similar performance.
The company considers bioinformatics to be one of its biggest future markets. Rutledge said that the number-crunching ability of the Linux cluster makes it an ideal back-end system for bioinformatics software.
“If you’re not going with Linux clusters in bioinformatics, you’ll fall behind,” said Rutledge. “Huge computational algorithms work much better on Linux clusters.”
Rosetta Inpharmatics and Baylor College of Medicine are among other users of Linux Networx cluster systems for genomic and biological research.
The system is the first Linux cluster to be used in the Drosophila project, though Frise noted that other genome projects have expressed interest in the technology. “Drosophila is a very important model organism and we are a centerpiece of this whole thing,” he said. “Other smaller genome projects are interested in using Berkeley’s tools because of a lack of funding. We are acting as a model for other genome research.”