Nature Methods Considers the Cloud

By Matthew Dublin

Nature Methods has granted cloud computing up-and-coming player status with an editorial piece and technology feature. "Next-generation sequencing: adjusting to data overload" by Monya Baker includes a veritable Who's Who list of informatics thought leaders such as David Dooling of the Genome Center at Washington University, Vivien Bonazzi, program director for Informatics at the NHGRI, Michael Schatz of the Center for Bioinformatics and Computational Biology, and Neil Miller, deputy director of software engineering at NCGR, just to name a few.

Baker starts off with the classic song and dance about the data deluge caused by the continual drop in the cost of sequencing thanks to next-gen platforms, which is moving faster than drop in the cost of storage, and the problems associated with many researchers holding onto data and not deleting files. The article points out some of the issues associated with the cloud, such as privacy and latency (and problem compounded when more data is created during analysis on the cloud). Some experts such as Dooling seem to think that the future of cloud computing will be a mixture of public and private clouds, although it's worth mentioning that a private cloud is really just a trip back to square one. If you're hosting your own cloud, it's not that much different than having an internal cluster or compute resource, which is what folks are trying to get away from with the cloud to begin with.

The editorial "Bioclouds" says the future is looking cloudy (journalists have got to come up with a new catch phrase) with efforts like the Open Cloud Consortium, which is supporting the NIH's Bionimbus framework, and various exploratory efforts headed up by the NHGRI that are looking at cloud computing with commercial and academic groups. Not so clear however is the emphasis on the cost of the cloud, as the author sees the need for big funding bodies to include support for the cloud computing in the form of large-scale purchases of compute hours or storage. The issue with cloud is not really the cost, that along with scalability are its primary selling points. The real is issue is the cost in terms of job time due to latency as well as the myriad quirks that still need to be worked out when using some bread and butter informatics tools. It seems that the future still looks a bit cloudy (sorry, I had to) for everyday bioinformatics and cloud computing.