At the Cloud Computing Expo East conference held in New York City in April, self-effacement was a noticeable theme. Despite the fact that this was a gathering of cloud computing vendors and zealots, keynote speakers such as Duke Skarda, vice president of information technology and software development at The Planet — a dedicated server hosting company — acknowledged that the marketing hyperbole surrounding cloud computing has only given ammunition to critics who label it as an impractical solution that can never really deliver on all the hype. In the technology's defense, Skarda pointed out that the criticism is unwarranted as cloud computing is essentially nothing new; the basic concept has already been proven with the software-as-a-service models that came to prominence in the late 1990s. Despite cloud computing's technological pedigree, the issue of ease of use still remains. This is especially true when it comes to the research community, where scientists would rather spend time doing actual research instead of configuring machine images, the software implementations of a computer that runs programs like a physical machine on Amazon Web Service's Elastic Compute Cloud.
Proponents of cloud computing for biotech research include people like Chris Dagdigian, director of technology at BioTeam, an IT solutions consultant for life sciences. Dagdigian says that while he was initially skeptical about the practicality of the cloud, he now believes it could have a real place in the IT toolkit of biologists looking for quick access to lots of compute power. "Prior to 2007, a couple of us [at BioTeam] were individually experimenting with Amazon, but the sea change for us was that by late 2007, every single consultant had independently selected, evaluated, and used Amazon Web Services to solve a customer's problem," Dagdigian says. "I'm fairly cynical when it comes to hype and marketing — I was not honestly expecting the cloud to be as useful as it actually was. The reason I drank the Kool-Aid was that I kicked the tires, I did some work, and it solved multiple real-world problems for me."
It's not just small pockets of IT folks or academic researchers who are drinking the cloud Kool-Aid. The National Human Genome Research Institute recently held a workshop to examine the potential of analyzing next-generation sequencing data using cloud computing. The challenges outlined at the workshop included the usual suspects: problems associated with transferring data to and from the cloud; using customized applications; and security and privacy concerns associated with putting patient data onto the cloud. But interest and dedication to exploring the technology remains. NHGRI has plans to include the information from the workshop in the planning process in its vision for the next stage of genomics research, which is slated to appear in a major journal by the end of the year.
And in early May, Cold Spring Harbor Laboratory held the first developer conference for Galaxy, a Web portal that combines information from existing genome annotation databases created by Anton Nekrutenko's lab at the Center for Comparative Genomics and Bioinformatics at Penn State and James Taylor's lab at Emory University.
While the majority of the life science research community is still evaluating this technology, a new breed of cloud computing service middlemen has emerged, promising bench biologists easy access to the cloud. These vendors essentially act as a service layer between the user and those that host the cloud such as Amazon, Microsoft, Google, or IBM. One company, DNAnexus, announced in late April the launch of a service that provides users with an interface to the cloud for analyzing and managing next-generation DNA sequencing data by uploading data sets generated by the Illumina Genome Analyzer and HiSeq systems or Life Technologies' SOLiD systems. "One of the challenges with the cloud is that it's just a bunch of hardware and infrastructure. To really take advantage of it, you have to build a layer on top of it, and that's how we see ourselves," says DNAnexus co-founder and CEO Andreas Sundquist.
The team's solution handles many of the tasks involved in utilizing the cloud, such as setting virtual compute nodes and parsing out which analysis jobs can be run in parallel.
Daniela Kenzelmann Broz, a postdoctoral researcher at Stanford University, says that DNAnexus allows her to do her own ChIP-seq and RNA-seq data analyses and have control over the analysis parameters without having any computer programming knowledge. "Without DNAnexus, I would have depended on collaborations for the analysis of my data," she says. "Also, we don't have the infrastructure in the lab for storage of huge amounts of data, so it is extremely convenient for us that the data for my samples is stored online."
In early March, Cycle Computing rolled out a family of specialized compute clusters that utilize the Amazon EC2 cloud and come equipped with a slew of pre-installed application sets for bioinformatics, proteomics, or computational chemistry. "We help to make it so that folks don't need to do any programming to run applications on the cloud," says Cycle Computing CEO Jason Stowe. "We spin up a cluster and make it so that the cluster environments grow and shrink depending on the demand or load on the cluster. That's where we've improved on what Amazon offers." The company's new service offers analysis pipelines that incorporate software favorites such as Gromacs, Bowtie, HMMER, and Blast.
Peter Tonellato, a senior research scientist at the Center for Biomedical Informatics at Harvard Medical School, is using a cloud management platform developed by RightScale, to ramp up translational and personalized medicine research. RightScale's platform allows his lab to enhance the overall usability of Amazon's cloud by linking multiple EC2 accounts, monitoring jobs, and controlling access for various lab members. "Their cloud platform has been a tremendous asset [to us]. We use it to reduce administrative overhead and accelerate our time to implementation," Tonellato says. "Over the past two years, we have provided substantive feedback to RightScale based on our approach to managing cloud resources to conduct a wide array of biomedical science and translational medicine computational resource projects with our collaborators from around the world. The cloud paradigm and their console have lowered barriers to collaboration and reduced overall costs significantly."
From Dagdigian's vantage point as an IT consultant, the need for these cloud managers is real, and while the number of vendors catering to bioinformatics might not exactly grow, the market will. "There's a whole bunch of boring plumbing that is related to job management, monitoring, and scaling — there are good business opportunities for middlemen to get in there," he says. "It's not a competitive advantage to my science to be writing grid monitoring software, so if I can have a Cycle Computing or RightScale do it for me, I can go back to doing research."