In the world of high-performance computing, the usually ubiquitous Microsoft, which dominates almost every other area of computing, has been ironically absent from life science clusters. Instead, the open-source Linux operating system, with its across-the-board application support, throngs of developers offering open-source applications, and overall popularity with IT administrators, has ruled the roost, occupying roughly 75 percent of the high-performance computing market in this field, according to a recent survey by market research firm IDC. And considering that bioscience is one of the fastest growing high-performance computing markets, it's not surprising that Microsoft has finally geared up to make its presence felt.
Enter Microsoft's Windows Compute Cluster Server 2003 software solution. In an effort to capture the hearts and minds of bench biologists and IT administrators, Microsoft launched the software package just under a year ago as an alternative to standard Linux-based cluster computing operating systems. The Cluster Server tool is a two-CD software package for high-performance compute clusters that utilizes Microsoft's Active Directory user authentication network software with an industrial-strength, 64-bit version of Microsoft Windows Server 2003 Compute Cluster Edition. The software runs on AMD's Opteron and Athlon 64 processors, as well as on Intel's Xeon and Pentium processors. But what the Compute Cluster Server claims to offer is not so much enhanced performance over Linux or Unix-based clusters, but rather a familiar face to high-performance computing for current Windows users that touts outstanding usability, seamless installation, and readily available technical support.
Name of the Game: Support
Matt Wortman, director of the Computational Biology Core at the University of Cincinnati, says that the Compute Cluster Server software was an attractive solution for his cluster because he and his staff already knew Windows well. "The big issue is that I can have my Windows system techs handle the machines rather than having a Linux high-performance tech do everything, because we had the in-house staff already doing Windows," Wortman says. With the previous Linux system, his group would often have to send out for specialized (read: pricey) Linux HPC support to tweak things from time to time.
And while all server software solutions have their fair share of problems, when it's time to pick up the red phone, some users feel that Windows provides much more technical support. "The number of problems that arise [with both Linux and Windows] is comparable, but the difference is that with Windows, it is much easier to find support," says Jaroslaw Pillardy, senior research associate at the Computational Biology Service Unit at Cornell University's Theory Center. "For Linux, very often we have problems and it is hard to find support." The reason for that is that the university's IT department is already geared toward supporting the thousands of Windows desktops that make up its computing infrastructure.
According to Kyril Faenov, director of high-performance computing at Microsoft, one of the goals in developing the Compute Cluster Sever was to create a relatively painless solution that lets users avoid having to seek tech support at all. "What we have done is created an out-of-the-box configuration that allows one person to be able to set up a main node, provision computation nodes completely automatically, and then start managing this cluster and offering the resources through an innovative job scheduler in under an hour," says Faenov. "You can take a cluster and make it available very rapidly with somebody who might have general IT skills, but not necessarily HPC skills."
Getting Down to Business
Users also contend that the software's easy implementation will open the door for high-performance computing to the non-IT expert. "It doesn't require much IT knowledge to set up a Windows cluster," says Yibin Wong, a doctoral candidate focusing on bioinformatics at Virginia Tech's Advanced Research Institute. "We have a small IT group and an existing Windows infrastructure, so we can just add these compute clusters into the current setup and it takes full advantage of the existing environment." The team's cluster also includes some Linux machines, but Wong says that this is more for personal preference of those researchers who are already Linux savvy.
But the fact remains that, up until recently, the majority of bioinformatics applications were developed for Linux or Unix machines. So Microsoft is going to have to play a significant amount of catch-up to make its cluster package a real hit. But thanks to Windows developers, such as Cornell's Pillardy, there is a growing number of popular tools like Blast and HMMR that have already been ported to Windows. Pillardy's service unit currently hosts a website that provides a slew of Windows-ported algorithms including versions of Blast, ClustalW, FASTA, and MrBayes, to name a few. "Five years ago I would say that there are not many [applications for Windows]," says Pillardy. "But now if you look at our interface, [for] hot topics like population genetics, almost every new program is either running on Windows in another version or can be easily ported."
Despite the handful of Windows-ported applications offered by the website, there are still many major bioinformatics tools that have not been compiled to run on Windows as well as they run on Linux machines. Staple bioinformatics tools, such as molecular simulation suites NAMD, Amber, and Charmm, which are offered in parallel versions for Linux, do not exist in parallel Windows flavors, says Wortman. Users are thus limited to a single command line version from a Windows compiler like Cygwin or the Windows command line — hardly ideal. For this reason, Wortman's cluster still includes 85 dual Opteron Linux machines to run a handful of life science applications.
And while Microsoft's new cluster software might be able to offer an easier, more familiar way of accessing high-performance computing so researchers don't have to worry about the IT side of things, at roughly $469 per node, it will still be a challenge to make the financial case against the open-source Linux. "If you have to buy 100 licenses to install [Windows] on your cluster, how can that be lower acquisition costs than something that costs you zero?" asks HPC expert Joe Landman, founder of Scalable Informatics, a high-performance computing solutions vendor.
Windows is also notorious for requiring antivirus and anti-spam software as it receives the bulk of computer viruses and worms. The net effect of all this extra software on each node is that I/O operation is severely hindered, Landman says. Linux, like Mac operating systems, both of which are Unix-based, tends to be generally more resistant to viral invasions. "I have had my entire [Linux] computer lock up maybe two times in eight years, have never experienced a virus, and don't even have antivirus software," says Kai Staats, CEO of Terra Soft, a Linux software vendor. "I simply can't imagine beating off viruses on a single Windows box, let alone an entire cluster which is already a complex beast in and of itself."
Ultimately, getting lost in the Linux vs. Windows debate is missing the point. If a group is already using Linux, they have the know-how, so why change? And conversely, if a group of researchers who aren't IT experts are looking to build a cluster, a Windows solution can offer high-performance computing on known territory. "I don't see anybody switching from Linux to Windows," says Wortman. "[The Compute Cluster Server] is going to capture a new segment of users who are looking for more high-powered bioinformatics and drug discovery that don't have the sort of technical ability to implement Linux."
As always, the bottom line when it comes to high-performance life science computing is allowing users to think about science and not about IT. "Users largely don't care what OS runs their cluster," says Landman. "What they care about is, do the applications run painlessly, seamlessly, how quickly can I get them up, how much time do I have to sit there and think about it?" But for those researchers who are longtime Windows users, having a cluster that's as easy to run as their desktops is bound to turn some heads.