Following on its announcement in October that it plans to build the first life science supercomputer cluster based on IBM’s Cell processor, Linux developer Terra Soft has formed a consortium with the goal of providing bioinformatics researchers with free access to the cluster.
The so-called HPC Consortium, initially proposed among a small group of Cell enthusiasts at the November Supercomputing 2006 conference in Tampa, Fl., has grown quickly, according to Glen Otero, who is heading up the effort for Terra Soft.
Terra Soft created a mailing list for the initiative (available here) in December, and around 100 people have joined the list so far, Otero said. The official website for the consortium went live this week.
In addition, Terra Soft plans to host a hackathon at its headquarters in Loveland, Colo., Jan. 19-28, with the goal of optimizing several bioinformatics applications for the Cell processor. Otero said that between 25 and 40 people are expected to participate.
Sony, which co-developed the Cell in collaboration with IBM and Toshiba for use in its Playstation 3, has committed hardware to the hackathon. In addition, IBM and Mercury Computer Systems — which both sell servers built with the Cell processor — plan to provide training on how to port applications to the processor, Otero said. In addition, researchers from several national labs and academic research groups, including Argonne National Lab, Oak Ridge National Lab, and Colorado State University, plan to participate.
Hardware and Software
Terra Soft announced in October that it was working with Sony to build the first supercomputer based on Cell processors. The company said at the time that it had completed construction of a 3,000-square-foot data center to house the cluster [BioInform 10-13-06].
Otero said that Sony has not yet shipped the hardware for the cluster, but noted that Terra Soft has several PS3s and Cell-based blade servers on hand for the hackathon. “We’ll have individual units that we may or may not cluster for development purposes and software testing,” he said.
He added that the 400-processor cluster will likely be a mixture of PS3s and Cell-based blades.
Terra Soft began working with Sony in 2005 to adapt its Yellow Dog Linux distribution — originally developed for Apple’s Macintosh computers — for the Cell architecture. At the same time, the company began working with Argonne National Lab under a service contract to build a suite of open-source bioinformatics tools for easy deployment on Linux clusters. Terra Soft released the software, called Y-Bio, at the end of 2005 [BioInform 11-25-05].
The two projects are now converging in the HPC Consortium. The supercomputer agreement with Sony “started out as just being a cluster, but what we wanted to do is build a community around that, so we weren’t just giving individual researchers access just to run their jobs,” Otero said. “We wanted to build a community around it that would help foster and support each other in both software development and in just doing basic bioinformatics research on the cluster.”
The primary goal of the consortium is “to drive the adoption of Cell-based clusters in general,” Otero said.
The Cell processor, which combines a PowerPC processor with eight GPU-like coprocessors, promises huge performance gains for computationally intensive algorithms, but porting existing applications to the architecture is a bit tricky.
“The early adopters and the folks that have really tinkered with the Cell processor see the great potential that’s there, but right now, to tease that potential out of the processor, you have to be very experienced in manipulating code line by line and assembly compiler intrinsics,” Otero said. “So right now, to the average — or even above average — bioinformatics user, that potential is really still hidden away.”
Terra Soft envisions the HPC Consortium as a way of speeding the porting process for bioinformatics applications “so that the greater community can see the potential speedup that’s there and the price/performance ratios and start comparing that to other platforms,” Otero said.
A cluster built with PS3s could potentially work as a low-cost HPC system, but Otero stressed that the real promise for the system would be its potential as a dual-use platform for developers.
“The Playstation now can be a game console or it can also be your Linux development platform for bioinformatics applications that you’d, for example, run on the more powerful Cell-based cluster — with Cell-based blades from IBM or Mercury,” he said. “We really wanted to make zero barrier of migration from the desktop Linux computer to your cluster computer, which people have tried in the past, but now with a really low entry $700 Playstation, you get two appliances there and it allows you to have seamless migration.”
Terra Soft’s Yellow Dog Linux is the official Linux distribution for both platforms, so the company stands to gain from its broad adoption among both developers and end-users.
Longer term, Terra Soft plans to expand the HPC Consortium to drive bioinformatics software development for other processor technologies, like x86 chips and graphics processing units, “But right now, we’re really Cell and Power focused because that’s where the buzz is and that’s where the newness of the market is,” Otero said.
Selling the Cell for Bioinformatics
Otero said that he’s aware of several bioinformatics algorithms that are in various stages of porting to the Cell, including Blast, Smith-Waterman, Fasta, ClustalW, HMMer, and NAMD. For the hackathon, “We’d like to see four or five of the applications get to, say, a pre-beta stage and have someone in the consortium volunteer to be project leaders,” he said.
When complete, those packages will be freely available to academic and government researchers for use on Terra Soft’s Cell-based cluster. The packages will also be included in Y-Bio, which can currently run on the PowerPC at the core of the Cell, but has not yet been optimized for the chip’s additional processors.
Otero acknowledged that computationally demanding applications like molecular dynamics and computational drug design will likely see much greater performance gains from the Cell architecture than standard bioinformatics algorithms like Blast and Smith-Waterman, but noted that the company’s philosophy is “walk before you run.”
“The early adopters and the folks that have really tinkered with the Cell processor see the great potential that’s there, but right now, to tease that potential out of the processor, you have to be very experienced in manipulating code line by line and assembly compiler intrinsics.”
In November, Mercury announced that it was partnering with researchers at Boston University to port an application called fragment-based drug design, or FBDD, to the Cell processor. At the time, Mirza Cifric, director of Mercury’s biotech group, told BioInform that typical bioinformatics programs are already “well-suited for general-purpose processors” and wouldn’t see the “tremendous advantage” that FBDD demonstrated when it was ported to the Cell — a 10-fold speedup over a Blue Gene processor, according to the company [BioInform 11-27-06].
Otero agreed that Mercury will likely see better performance gains with this approach, but noted that the open source nature of many bioinformatics algorithms, coupled with the consortium’s goal of providing free access to the resulting software, made bioinformatics tools a better starting point for the effort. “We’ll see a benefit for all of the applications,” he said, “but the larger benefit will definitely be for the more computationally intense ones.”
Nevertheless, some bioinformatics developers are pleased with what they’ve seen with the Cell so far. Chris Mueller, a research assistant in Indiana University’s Open Systems Lab, has been working with IBM since last summer to develop a prototype of BlastP for the Cell. Mueller said that it took around two months to port the application, which has demonstrated up to a four-fold speedup over an Intel Itanium processor in preliminary benchmarks.
Mueller said that the IU team found that the best way to port Blast was to “start from scratch and rethink how the algorithm is going to be mapped to the hardware.” In particular, he said, “We had to really rethink the data flow — how you move the database through the Blast algorithm and how you move the queries.”
Mueller said that this approach runs counter to the “conventional wisdom” in programming, “which is that you write portable code and let the compiler do all the work for you.” However, he noted, for multi-core processors like the Cell, “the compilers don’t do the best job anymore of understanding what you’re doing, so at that point it actually helps if the developers step back and rethink how their algorithm can actually map to eight processors using the SIMD instructions.”
Despite these challenges, and the fact that lower-level programming can often be “painful,” Mueller noted that the porting process “was a lot more fun than we expected it to be and easier than we expected it to be.”