Stem Cell Project Wins Cloud Computing Competition

By Matthew Dublin

Cycle Computing has named Victor Ruotti, a computational biologist at the Morgridge Institute for Research, as the 2011 CycleCloud BigScience Challenge.

Finalists' proposals were selected based on their benefit to humanity, originality, creativity and suitability. Entrants submitted projects that focus Parkinson’s disease, diabetes, organic photovoltaics, genomic diversity mapping. The finalists were judged by Jason Stowe, CEO, Cycle Computing, as well as Matt Wood, technology evangelist for Amazon Web Services, and Peter Shenkin, vice president at Schrödinger.

Ruotti will be awarded $10,000 of time — the equivalent of eight hours on a 30,000-core cluster — on their cloud. In his submission for the contest, Ruotti proposed a knowledgebase indexing system for Human Embryonic Stem Cells and their derivatives, which usually requires hours of computational times.

“The high throughput computing power of CycleCloud will enable the
classification of currently uncharacterized cell types, including hES cells and iPS cells from our laboratory,” says Ruotti. “The transcript profiles from each cell type will be analyzed and compared by aligning billions of sequencing reads in combinatorial pair wise steps. By doing so, we will create the first read level index to yield classified cellular derivatives along with methods to produce these cell types in a laboratory setting which could become potential therapies of the future.”

Amazon Web Services Slashes Prices Cloud, Again

By Matthew Dublin

Amazon Web Services is turning into the
Crazy Eddie
of cloud computing — today they announced more price cuts on a range of services including its Elastic Compute Cloud.

The price reduction varies by instance type and by Region, with Reserved Instance prices dropping by as much as 37 percent and On-Demand instance prices dropping up to 10 percent. The chart below highlights the price decreases for Linux instances in our US-EAST Region, although AWS sats that they are lowering prices in nearly every Region for both Linux and Windows instances.

These price reductions also apply to Amazon Elastic MapReduce and Amazon Relational Database Service (RDS).  Prices for new RDS Reserved Instances will decrease by up to 42 percent, with On-Demand Instances for RDS and ElastiCache decreasing by up to 10 percent.

Earlier this month, AWS lowered prices on its Simple Storage Service enabling customers with 50 TB of storage a 12 percent on their monthly bill. It's not clear whether AWS is lowering costs due to a lack of projected sales numbers or if they are just attempting to attract new customers.

Here's a full list of Amazon EC2 pricing

Keeping Computers Cool

By Matthew Dublin

There are several ways to keep your cluster or supercomputer cool these days, some a little harder to wrap one's head around than others. A good example is the concept of liquid cooling which involves applying some type of liquid directly to hardware.

In some scenarios, this can mean submersion cooling, which has been pioneered by a startup called
Green Revolution Cooling
. This technology — not for the faint of heart — entails submerging your server hardware in colorless, odorless, heat-transferring liquid contained in specially designed racks.

The liquid is a blend of mineral oils which the company claims is even safe for human consumption and has 1,200 times the heat capacity of air and can keep hardware components 50 to 90°F cooler than if they were cooled in a standard, air-cooled data center. The company claims that their "Dielectric fluid submersion" technology is less expensive than water or other cooling technologies.

Another method that involves liquid cooling without fully submerging your hardware is called "phase-change." This technique uses perfluorocarbon fluid, an organic compound, that transitions quickly from a liquid to a gas, so heat generated by hardware components is removed through the process of evaporation. Pacific Northwest National Laboratory (PNNL) has been using this evaporation method of cooling for its supercomputers since 2007.

A newer cluster housed in the PNNL's Energy Smart Data Center called the "
NW-ICE IBM computing cluster
" uses a phase-change cooling system developed by
Spraycool
, a company that offers customized solutions for spraying liquid onto the processor cores on a server board and removal through evaporation.

Something even less intuitive is the use of hot water to remove heat. In fact, hot water captures heat at a rate about 4,000 times higher than air cooling. In 2010, IBM and the Swiss Federal Institute of Technology (ETH Zurich) developed a system called Aquasar, a supercomputer that consists of special water-cooled IBM servers. The warm water keeps the microchips an optimal 140 °F, well under the processor overheating threshold of 185 °F. This summer,SuperMUC, Auqasar's big brother, will come online at the Leibniz Supercomputing Center in Garching, Germany.

In 2009, "over clocker" enthusiasts — folks interested in pushing the boundaries of processor speeds — created a custom liquid helium cooling unit within a desktop PC at a gaming conference called Quakeon. The processor reached a record breaking speed of 7.08 GHz.

Cornell University's New Storage Solution for Sequencing Data

By Matthew Dublin

Cornell University has opted a Red Hat software storage solution to deal with their genomics data. Cornell's Center for Advanced Computing (CAC) is tasked with handling roughly 15 to 20 terabytes of sequencing data per month from the university's Institute for Biotechnology and Life Science Technologies.

Their previous storage system only allowed access to standard file systems and had a capacity that capped at 16 terabytes per node. So in order to eliminate their storage headaches, the folks at CAD decided to deploy Red Hat's software on a combination of native operating systems and Windows on 158 terabytes using Dell blades with Infiniband interconnects.

According to James VanEe, IT Director of Cornell’s Institute for Biotechnology and Life Science Technologies, the primary determining factor in going with Red Hat — formerly Gluster — was cost. “The idea of a scale-out storage solution was something we’d always been interested in, but never could implement due to cost,” says VanEe. “We considered other solutions — such as Isilon — but the large, up-front capital investment motivated us to turn to CAC for suggestions on other possible storage solutions that did not require a big upfront investment."

Software Helping Software

By Matthew Dublin

Researchers at the Karlsruhe Institute of Technology have developed a way to cut down on the trial-and-error process of developing scientific software.

Developers at KIT are now using the PALLADIO simulation software to analyze their code and discover problems early on, instead of wasting time, money, and energy testing flawed code on a local cluster or in the cloud.

Named after the Renaissance era architect Andrea Palladio, the PALLADIO simulation software analyzes software architecture to find non-functional properties such as performance, reliability, maintainability, and costs. This analysis also includes an evaluation of workflows in the components and subcomponents, scalability, use of resources, and distribution aspects of the software are disclosed and the complete layout of the software is checked before "building" is started.

"In the beginning was our observation that software developers apply a trial-and-error process. This is a rather inefficient method to produce error-free software," says Ralf Reussner, chair of Software Design and Quality at the Karlsruhe Institute of Technology (KIT), Germany. "If you want to build a bridge, you do not simply place a stone on top of a stone, let a truck drive across, and hope that the bridge will survive the load."

At the moment, Reussner and his collaborators are preparing PALLADIO for simulating the integration of existing software into the cloud.

PALLDIO is available through a tech-transfer effort that began in 2003 that began in 2003 as a research project of the University of Oldenburg and nowadays is a tool-supported software architecture simulation approach which has been successfully applied in industry scenarios and science. It is actively developed by Karlsruhe Institute of Technology (KIT), FZI Research Center for Information Technology, and University of Paderborn.

Google's GDrive Release Imminent

By Matthew Dublin

Thanks to a quick-thinking social media consultant with a camera phone, the cat was officially out of the bag last September regarding Google's mythical "GDrive" (Google Drive) now allegedly just called "Drive."

Johannes Wigand took a picture of what was assumed to be Drive (codenamed "Platypus") at a Google-sponsored event and posted it on his blog. It turned out that the picture was indeed of Drive and Google employees have been using the virtual disk drive storage service internally for a few years now. Actually, people have been murmuring about Drive's existence since 2006, so it appears that Google really wanted to make sure they got the kinks out before competing with the likes of Dropbox in this space.

The rumor is that Google is getting ready to roll out Drive after a prolonged period in beta mode behind Google's wall — apparently Drive was very buggy until recently. Right now, folks are speculating that this new storage solution will be offered in the same way that Gmail account holders are offered a certain ever-increasing amount of storage space for their email and docs, but with the option to buy more as needed.

The Future of Supercomputing Software Libraries

By Matthew Dublin

In this video, D.K. Panda from Ohio State University presents his talk on the future of supercomputers software libraries.

 The video was recorded on Feb 7th at the recent HPC Advisory Council Israel Supercomputing Conference.

In the video, Panda discusses the emergence of exaflop-scale computing, trends in commodity computing clusters, and the challenges associated with scaling software to billions of processors to meet the demands of exascale computing.

DOE Study Says Clouds Can't Replace Supercomputers

By Matthew Dublin

The verdict is in: cloud computing should not replace supercomputers for scientific research. Such was the result of a two-year study conduct by the US Department of Energy on the feasibility of cloud computing for meeting the computational demands of big-data research projects.

The 169-page report says that while commercial clouds might be fine for enterprise applications, big data research require more "care and feeding" — in other words, the marketing pitch of the cloud as a plug and play compute solution does not really hold water.

The DOE team, comprised of the Argonne National Laboratory in Illinois and Lawrence Berkeley National Laboratory in California, executed a range of scientific computing projects on Magellan, a testbed for cloud computing with server farms located at the National Energy Research Scientific Computing Center and the Argonne Leadership Computing Facility, as well as commercial clouds such as Amazon's EC2. The performance, costs, and manageability of the clouds were then compared to a Cray XT4 supercomputer and a Dell cluster system.

“Our analysis shows that DOE centers are often three to four times less expensive than typical commercial offerings,” the authors write in the report. “These cost factors include only the basic, standard services provided by commercial cloud computing, and do not take into consideration the additional services such as user support and training that are provided at supercomputing centers today and are essential for scientific users who deal with complex software stacks and require help with optimizing their codes.”

The study reached the following conclusions:

Scientific applications have special requirements that require cloud solutions that are tailored to these needs.

The scientific applications currently best suited for clouds are those with minimal communication and I/O (input/output).

Clouds can require significant programming and system administration support.

Significant gaps and challenges exist in current open-source virtualized cloud software stacks for production science use.

Clouds expose a different risk model, requiring different security practices and policies.

The MapReduce programming model shows promise in addressing scientific needs, but current implementations have gaps and challenges.

Public clouds can be more expensive than in-house large systems. Many of the cost benefits from clouds result from the increased consolidation and higher average utilization.

DOE supercomputing centers already achieve energy efficiency levels comparable to commercial cloud centers.

Cloud is a business model and can be applied at DOE supercomputing centers.

Click here to download the study.

Amazon Cuts Cloud Storage Prices

By Matthew Dublin

Amazon Web Services have announced their latest price reduction, this time for storage. Amazon S3 standard storage customers will see the most benefit from these price cuts. For example, if you're storing 50 terabytes of data you can expect a 12 percent reduction in cost, or if you have 500 terabytes of data you will now see a 13.5 percent savings in cost.

The following price reductions were made effective February 1, 2012.

These savings are a direct result of the continued growth of S3, which by the end of 2011 hosted roughly 762 billion objects. At peak times, S3 processes 500,000 request pre second. Since 2006, the total number of objects stored on S3 has grown by 192 percent, with last year experiencing the most significant growth.

Cleaning up Messy Data with Google Refine

By Matthew Dublin

Rod Page over at iPhylo has a post describing how useful Google Refine is for cleaning up taxonomic databases. Google Refine, formerly known as Freebase Gridworks, is a freely available web-based "power tool" that supports TSV, CSV, Excel, and XML file formats. Among other features, Google Refine allows users to pull together disparate data sets and work with the data in a collated, polished fashion.

Page, a professor of evolutionary biology at the University of Glasgow, is a big fan of Google Refine's "Reconciliation Services," which he uses for matching names to external identifiers.

So far, Page has used Google Refine with EOL, NCBI taxonomy, uBio , WORMS, and GBIF.

Here's an introduction to Google Refine:

Fighting Disease with iPhones and Big Data

By Matthew Dublin

A startup iPhone app developer based in Bucharest, Romania, called Skin Scan has big plans to fight and track skin cancer. Skin Scan's app (also called Skin Scan) allows users to snap pictures of questionable moles or lesions which are then sent to Skin Scan's servers where a proprietary algorithm analyzes the picture. While the app will not provide an accurate diagnosis — yet — the algorithm will identify abnormalities and assign a rating for the abnormality from low-risk to high-risk and then refers users to local dermatologists.

Skin Scan is building an analytic database based on photographs and results from user, including location data in order to create a time-space map model based on the severity and frequency of lesions.

As skin cancer is best analyzed over time, this data may be useful to not only physicians, but government and academic researchers tracking cancer as well, assuming it can be sufficiently de-identified.

The app developer also has designs on connecting doctors and users to eliminate in-person office visits.

In discussions of personalized medicine, the concept that someday soon patients might walk around with their genomes in their pockets or on mobile devices is often batted around but the viability or execution is rarely explored. Technology developments such as Skin Scan could prove to be a good test case for connecting patients with physicians with personalized medical data in a way that integrates instantaneous communication and real-time data analysis with consumer electronic devices.

Cray Now Offering $200,000 Supercomputer

By Matthew Dublin

In effort to reach out to researchers with limited funding and a desire to own their own supercomputer, Cray is now offering a line of commodity supercomputers with a starting price tag of $200,000.

Cray's entry-level offering combines the software support previously only reserved for Cray CX1 and Cray CX1000 systems with the petascale capabilities of the Cray XE6m and Cray XK6m line. The $200,000 system also comes equipped with Cray's Gemini interconnect, the latest version of the Cray Linux Environment, powerful AMD Opteron 6200 Series processors, and GPUs.

"Cray's new entry-level configurations leverage its deep HPC technology portfolio to create purpose-built systems for the departmental technical computing market segment," said Earl Joseph, IDC program vice president for HPC. "This segment was worth around $3 billion in 2011 and IDC projects that it will grow at a healthy 7 percent to 8 percent CAGR through 2015."

The new "affordable" supercomputer is not really a full-fledged supercomputer per say but rather a blade server configuration that's essentially a baby XE6m configuration with six blades and 49 sockets using Opteron 6200s. The server rack is capable of 6.5 teraflops — which comes out to about $30,769 per teraflop.

These new entry-level supercomputers might be the perfect solution for researchers interested in developing code for larger-scale systems, such as Blue Waters at the National Center for Supercomputing Applications at the University of Illinois or the Titan supercomputer at Oak Ridge National Laboratory.

What it Takes to Get to Exascale

By Matthew Dublin

Science has an article discussing what it will take to make exascale computing a reality. These new systems — which at present remain only theoretically possible — would be capable of performing 10 to the 18th power floating point operations per second, or an exaflop.

Exascale supercomputers would be 100 times more powerful than today's fastest supercomputer, the K Computer at Japan's Riken institute, which is currently ranked at roughly 11.3 petaflops. All the major supercomputing powers are racing towards constructing a viable exascale system, including the US, China, Japan, Russia, India, and the EU.

However, the challenges of energy efficiency and sustained performance are formidable, not to mention developing brand new programming models for these huge systems.

Even though computer hardware has seen a steady increase in performance over the last few decades, when it come to actually achieving exascale performance, all those technological advances go out the window. Exascale won't simply be a matter of building a really, really large supercomputer center, crammed to the ceiling with the latest server blades, but rather, an entirely new processor and interconnect architecture.

Intel has released its 50-core Knights Corner and Xeon E5 server chips in an attempt to build up to exascale by the year 2018. These chips are designed for massive processor core counts as well as low energy consumption.

Sometimes the need for a completely new hardware to accommodate the perpetual growth in research data gets lost — folks still think the cloud can save them when, for example, genomics datasets reach the exascale mark. Unfortunately, an exascale cloud can't exist until there is exascale hardware to make it float.