Supercomputer vs. Cloud

By Matthew Dublin

It's a no-brainer that supercomputers perform faster than cloud computing providers, but what if you don't care? Such is the position taken in a recent post by Argonne National Laboratory's Ian Foster, who argues that clouds are in fact good for science, even if they are not architected to be speed demons. Foster's inspiration came from an article written by Ed Walker, a researcher at the Texas Advanced Computing Center at the University of Texas at Austin. Walker compared the National Center for Supercomputing's Abe supercomputer with Amazon's EC2 using a small set of programs designed to evaluate the performance of parallel supercomputers called the NAS Parallel Benchmarks. Walker's conclusion was no surprise: he found that while the potential of cloud computing providers like Amazon is promising, "a performance gap exists between performing HPC computations on a traditional scientific cluster and on an EC2 provisioned scientific cluster. This performance gap is seen not only in the MPI performance of distributed-memory parallel programs but also in the single compute node OpenMP performance for shared-memory parallel programs."

But Foster says that before we come to the conclusion that (at least for now) cloud computing is a non-viable solution for scientific research, think again. It's not high-octane computing that is the priority for him, but rather, accessing large amounts of compute power now. Obviously, utility computing was developed to answer the prayers of those forlorn researchers forced to queue up for time on a big machine, and likewise, the big win for cloud computing is this immediacy. To wit, the metric we should all be thinking about when considering or criticizing cloud computing is not execution time but "elapsed time from submission to the completion of execution."

So while Walker's numbers state that a job on 32 processors takes roughly 25 seconds on Abe and 100 seconds on EC2, adding in queue and startup time blurs the distinction. Using Walker's benchmark, the time it takes to start up 32 nodes on EC2 is about 400 seconds, whereas the odds of getting 32 nodes within that same amount of time on the supercomputer is only 34 percent -- not so super.

A point John West makes helps to wrap the idea up nicely. He says that comparisons like Walker's are only really meaningful if one has a choice -- in other words, "in the absence of access to a supercomputer, EC2 at least lets you get the job done, even if it takes longer."