Yale Enters HPC Ivy League with Used Bulldog

By Matthew Dublin

It's an HPC throw down on the quad as Yale University announced yesterday the acquisition of a used 52 teraflop supercomputer called "Bulldog O." The new 702 node system was purchased for slightly over $2M Yale bought the supercomputer from Hewlett Packard, which was working with a government agency that was testing the computer before buying an even larger one.

Bulldog O has one petabyte of storage, more than Yale's other eight Bulldog clusters combined (the different bulldog clusters are differentiated by a letter of the alphabet). The new cluster now puts Yale ahead of other Ivy League universities in terms of compute power available to researchers. Unlike Harvard and Stanford, which have had systems included in the TOP500 list consistently over the last few years, Yale has never had a computer powerful enough to be ranked on the list, until now.

Bulldog O has earned a 146th ranking overall and a 74th place finish among academic institutions.

According to Yale's Chuck Powell, associate chief information officer at Information Technology Services, the number of users of its centrally shared computing systems has grown by 170 percent over the past 20 months, and in response the University has purchased the supercomputers known as Bulldogs L, M and N in the last 15 months.

While still more power efficient than the other Bulldogs on campus, the new system uses roughly 300 percent more power than the University’s other supercomputers.

Here's a video on high-performance computing projects at Yale.

ORNL's Cray-GPU System Under Construction

By Matthew Dublin

Oak Ridge National Laboratory (ORNL) is partnering with Cray to build a 20 petaflop system that will be dubbed “Titan.” According to a presentation by project leader Buddy Bland, at the Oak Ridge Leadership Computing Facility in the Tennessee, assembly on the system will start this year and is expected to be running at its full 20 petaflop capacity by next year.

The estimated cost for the giant machine is somewhere in the neighborhood of $100M and will be made up of Cray’s new “Gemini” XE interconnects and a new memory system the system’s team is called “globally addressable memory”- more information on this to follow. Titan will also sport 16-core AMD processors and GPUs as co-processors, an increasingly common feature of new HPC systems over the last two years, and will run an updated version of Cray’s Linux as an operating system.

Amazon's 10 Minute Cloud

By Matthew Dublin


The Amazon Web Services (AWS) blog
has posted a tutorial that they claim can help you set up a cluster of high performance compute nodes on their cloud in just 10 minutes.

(If the image of the dedicated father on Christmas morning swearing up a storm while struggling to put together an Erector Set comes to mind, we don't blame you)

While the tutorial is obviously promotion they are offering a $20 cloud coupon/free service credit, so potential cloud users can kick the tires without actually having to pay to tear their own hair out.

The video guides users through the process of provisioning a virtual cluster on the cloud to run molecular dynamics simulations.

Last week, AWS published a case study on Bioproximity, a contract research organization that provides proteomic analytical services. Bioproximity is utilizing Amazon Elastic Compute Cloud, Amazon Elastic Block Store, and Amazon Simple Storage Service as the foundation for its Proteome Cluster, a web-based solution for searching tandem mass spectrometry data.

Dropbox for HPC

By Matthew Dublin

Researchers at the University of Manchester now have an easier, desktop drag-and-drop method by which they can take advantage of the university’s Condor pool. Called DropAndCompute, this application lets users submit and manage large data analysis jobs on a grid or cloud without needing any expertise with the system’s operating system or Unix or Linux command line tools.

DropAndCompute was initially inspired by and built on top of Dropbox, the freely available graphical file management application.

The University of Manchestor's Ian Cottam, one of the lead developers on the project, has a blog post describing how the solution evolved from a homegrown job management application built on top of Dropbox into a standalone application.

Here is a video of the first version of DropAndCompute that was used on a Condor pool within the university’s interdisciplinary biocentre:

The latest version of DropAndCompute no longer relies on Dropbox and is a standalone “local version” that only requires users to mount a folder on the submit node using their local computer and the same drag-and-drop approach in order to initiate jobs to run on their institute’s HPC resources:

HPC Wants Its Apple TV

By Matthew Dublin

Move over Sony Playstation 3, the newest thing in using consumer entertainment technology for HPC is energy efficient parallel computing with Apple TV. Researchers at Ludwig-Maximilians-University Munich have built the first-ever HPC cluster from Apple TV units. The current proof-of-concept Apple TV cluster is comprised of six nodes, that’s six Apple TV devices, although there are plans for expansion. The 2nd generation Apple TV shares its hardware internals with the iPad in the form of an Apple A4 processor running at 1Ghz, a PowerVR GPU, has 256MB RAM, and uses roughly six Watts.

The inspiration for this experiment is based on two factors: the HPC industry doesn’t really push CPU development on its on, having instead relied primarily on workstations and sever processing technology, and the processing components of consumer products has never been more powerful.

According to the project's website:

"With our AppleTV Cluster we try to provide a data point on the current state of things with regards to energy efficient parallel and distributed computing on ARM powered consumer electronic devices. We are currently evaluating the cluster with respect to its performance characteristics and limiting factors. We will also analyze the industry trajectory that we believe might make consumer electronic based technology viable building blocks for future HPC system designs."

So far the team has implemented MPI and Perl on their cluster; they described the process in their website's software section.

Last year, a YouTube video was posted by the InsideHPC blog that showed Steve Finn from BAE Systems demonstrating how he used his iPad to work remotely with his 20,000-processor SGI Altix ICE system.

New Cloud and "Cloud-Like" Offerings

By Matthew Dublin

Genomatix has announced this week the release of mygenomatix, a “cloud-like” computing service that incorporates its in-house next generation sequencing data analysis platform. The new offering claims to provide users with analyzed data in a “matter of one to two weeks” and says the security is much better than “standard cloud models.”

The offering is slated for launch this April and from CEO Marty Seifert’s description in the initial announcement, it sounds like what they have basically created is a service model where the work around for security holes is…snail mail.

“We have been exploring a cloud like model for quite some time, and our service model addresses the issue of security by getting the data to our computers via a hard disk shipment program,” stated Seifert in the press release. "With mygenomatix, anyone doing NGS data analysis now has access to an easy entry path to our technology and databases."

Who would have thought all cloud computing was really lacking was a postage machine? At some point in the future maybe a consensus will be reached on exactly how to define the "cloud" because more often than not it seems to be inserted in marketing copy in place of less sexier concepts such as plain old software as service or the ancient utility computing model.

Other vendors are not willing to give up on the idea that the cloud can be secure enough in its own right. SecludIT has released a new tool that capitalizes on their “Elastic Security technology” which allows users to dynamically adapt security settings as their cloud computing infrastructures change. SecludIT’s Amazon Detector for Amazon EC2 is an automated security even detection tool that is designed to help users isolate holes on security groups and applications. The company claims to employ a new paradigm called “auto checks” that lets users set the correct security checks and alerts automatically.

In other cloud news this week, Complete Genomics announced on Monday that it will now offer its customer the option of storing and visualizing their sequencing data with DNAnexus’ cloud computing platform. Complete Genomics’ customers will have access to a informatics tools through the DNAnexus cloud, built on Amazon’s EC2, where they will be able to visualize and query multiple complete human genomes. DNAnexus has also announced that in the near future they will integrate Complete Genomics analysis tools functionality, an open source software project for downstream analysis of data produced by Complete Genomics.

Researchers Use Supercomputer to Study Enzyme Evolution

By Matthew Dublin

Investigators at Ohio State University are using a large-scale cluster to analyze the evolutionary history of the poly (ADP-ribose) polymerase (PARP) "superfamily." PARPS are proteins found in eukaryotes that have been extensively studied in mammals. The family tree of enzymes has been implicated in a range of human diseases and as possible targets for anti-cancer therapies.

Rebecca Lamb, an assistant professor at OSU, used the Glenn cluster at the Ohio Supercomputer Center to identify 236 PARP proteins from 77 species across animals, plants, molds, fungi, algae and protozoa.

The Glenn cluster is 75 teraflop IBM Cluster 1350 that includes AMD Opteron multi-core and IBM cell processors as well as 36 GPU-capable nodes. Lamb ran the PhyML3.0 software package, a tool used to fit a statistical model to the aligned sequence data and provided estimates for the model’s parameters, on the cluster.

The researchers found that the broad distribution and pattern of representation of PARP genes indicated to the researchers that all existing eukaryotes encoded proteins of this type and that ancestral PARP proteins had different functions. They also found that the diversity of the PARP superfamily is larger than previously documented, suggesting that gene family will grow as more eukaryotic genomes become available.

Lamb and her colleagues published their research entitled “Evolutionary history of the poly(ADP-ribose) polymerase gene family in eukaryotes,” in a recent issue of the journal BMC Evolutionary Biology.

A Case for Open Source Health Care IT

By Matthew Dublin

A new study by researchers at the University of Warwick’s Institute for Digital Healthcare and the Centre for Health Informatics and Multiprofessional Education at UCL Medical School says that open source software may be more secure than closed source or proprietary software. The impetus for the study is based on the observation the well documented troubles the health care industry has with vast costs, lack of fault tolerance, and incompatibility of systems preventing database architectures and web applications from talking to each other.

The long-running debate is buoyed up by many a salient point on both sides, so the results of this study will hardly close the book. However the authors maintain that open source software is a secure and cost-effective prescription to cure the health care industry’s IT woes. Their paper refutes critics’ claims that because the source for code is available to the general public the software is inherently more vulnerable for exploitation and attack.

"Proprietary systems often rely on a 'security through obscurity' argument, that is systems that hide their inner workings from potential attackers are more secure. However security through obscurity alone completely fails when code is disclosed or otherwise discovered using tools such as debuggers or dissemblers,” says Jeremy Wyatt, a professor at the University of Warwick's Institute for Digital Healthcare."Worse, it has been suggested that the cloak of obscurity tends to encourage poor-quality code. Opening the source allows independent assessment of the security of a system, makes bug patching easier and more likely, and forces developers to spend more effort on the quality of their code."

The researchers' paper entitled "Open Source, Open Standards, and Health Care Information Systems" has been published in the Journal of Medical Internet Research.

Opensource GPU Apps for Systems Bio & GWAS

By Matthew Dublin

Researchers from the Hong Kong University of Science and Technology have developed a GPU-based tool for detecting gene-gene interactions in genome-wide control studies called GBOOST. Ling Sing Yung and Weichuan Yu of the university's laboratory for bioinformatics and computational biology are using GPUs to accelerate Boolean operation based screening and testing (BOOST), which usually completes gene-gene interaction analysis in roughly two and half days on a typical desktop computer. Yung and Yu report a 40 fold speedup compared to standard BOOST running on CPUs and is capable of analyzing Wellcome Trust Case Control Consortium Type 2 Diabetes genome data 1.34 hours on a desktop computer with Nvidia card.

GBOOST code is freely available here.

Researchers at Oxford University are also using GPUs to explore stochasticity in biological systems, currently there is a need for more efficient software to create realistic stochastic simulations. A new software solution tool called STOCHSIMGPU exploits GPUs for parallel stochastic simulations of biological reaction systems that can be integrated into MATLB and is compatible with the Systems Biology Toolbox 2. The open source software implementation of the Gillespie stochastic simulation algorithm, the logarithmic direct method, and the next reaction method (NRM) is roughly 85 times faster than NRM on a CPU.

Download STOCHSMIGPU here.

CSCS Adopts Next-Gen Supercomputer

By Matthew Dublin

The Swiss National Supercomputing Centre in Manno, Switzerland, has announced that it will install a next-generation Cray XMT supercomputer to accelerate a range of research projects including the HP2C platform, an initiative that supports several efforts focused on developing programming and database design approaches to take advantage of the generation of supercomputers expected to become available by 2013. These projects include the massive "Selectome" database that stores information on genes which have evolved under Darwinian selection usually thought to be involved in complex diseases.

CSCS, a long-time Cray site, currently has a Cray XT5 supercomputer called "Rosa" and was also the first recipient of the Cray XE6 system. The site will use its next-generation Cray XMT supercomputer for solving problems that require large-scale data analysis and will become part of a new project called 'Eureka.' The proposed facility will be used for large-scale analysis of unstructured data and data mining, and is designed for parallel applications that are dynamically changing, require random access to shared memory and typically do not run well on conventional systems.

Each processor in the Cray XMT system can handle up to 128 concurrent threads and is designed to scale from 16 processors up to multiple thousands of processors that can operate on multiple terabytes of shared physical memory.

SDSC Expands Flash-Based Supercomputer Development

By Matthew Dublin

The San Diego Supercomputer Center has added another supercomputer to its growing arsenal of flash memory-based systems. Named "Trestles," the new system was built with a $2.8 million award from the National Science Foundation and will be made available to researchers as part of TeraGrid. With 10,368 processor cores, a peak speed of 100 teraflops, 20 terabytes of memory, and 38 terabytes of flash memory, Trestles will be among the five largest HPC systems in the TeraGrid network. Flash, also known as solid-state memory, is a form of memory that is standard in many devices including mobile phones, laptop computers, and USB drives. Relatively new to the world of large-scale computing, SDSC is exploring how best to exploit to the benefits of flash memory in supercomputers. The advantages include in I/O speed and more significantly, power savings, as they have no moving parts unlike disk-based memory with its spinning motorized components.

SDSC's other flash memory supercomputers currently include the Dash system and the larger Gordon, which is slated to become operational later this year.

Bridget Carragher and Clint Potter, directors at the National Resource for Automated Molecular Microscopy at The Scripps Research Institute, have already utilized Trestles with some success. Their project focuses on establishing a portal on the TeraGrid for structural biology researchers to facilitate electron microscopy image processing using the Appion pipeline, an integrated database-driven system. "We are very excited about this early opportunity to use the Trestles infrastructure for high performance structural biology projects," says Carragher in a release announcing the new system. "Based on our initial experience, we are optimistic that this system will have a dramatic impact on the scale of projects we can undertake, and on the resolution that can be achieved for macromolecular structure."

TeraGrid Helps Users Navigate Diverse Resources

By Matthew Dublin

A tool offered by TeraGrid aims to help users easily navigate the multiple systems the make up the geographically distributed grid resource that is comprised of a very diverse hodgepodge of HPC hardware. TeraGrid's new Common User Environment simplifies access by identifying commonalities across systems and eliminating many of the differences.

"CUE was the solution to a number of nagging problems from an EOT (education, outreach, and training) point of view," says Jeff Pummill, a TeraGrid Campus Champion Leadership Team Member based at the University of Arkansas-Fayetteville. "New TeraGrid users want to test-drive several resources. Without CUE to unify the machine environments, they can become discouraged spending more time learning about each machine's unique features than on their research."

CUE consists of three major components:

Management System: Using provided examples of core modules that have been developed for all TeraGrid production systems, users can customize a command-line interface for their unique needs as well as a default interface that features the same naming conventions for basic functions on multiple resources.

Variables Collection: Allows users to take advantage of common environment variables and parameters in the global ssh_config file with values that are defined across all platforms. This includes a short-hand method of configuring TeraGrid host aliases, which eliminates the need to enter fully qualified login host names when using ssh or gsissh, as well as a standardized shell prompt that indicates which resource an open shell window is connected to.

Testing Platform: Example programs that can be compiled and executed through the CUE to demonstrate its use on all TeraGrid machines allows users to make valuable comparisons. This allows users to learn how to write, compile, and run fully functional programs by following examples of simple scientific MPI code written in C and Fortran programming languages.

New Automated Benchmarking Tool for Clusters

By Matthew Dublin


Raul Gomez
, aka NachoGomez, has developed an automated benchmarking tool for HPC clusters called
ClusterNumbers
. Gomez's new tool incorporates a set of benchmarks capable of providing a thorough overview of CPU, memory, and network performance. ClusterNumbers aims to cover the following metrics:

PTRANS (network): Allows you to measure the total network transfer capacity of the cluster using large blocks of data, performing a matrix transpose.

High Performance Linpack (HPL): The classical Linpack measures floating point operations per second (FLOPS) across the whole cluster.

DGEMM (CPU): Similar to Linpack, but just matrix-matrix multiply on separate nodes without communications between them.

STREAM (memory): Measures bandwidth between CPU and memory using vector-scalar multiply-add operations.

RandomAccess (memory): Measures the update rate of integers in random memory locations.

FFTE (CPU): Measures CPU execution rate in FLOPS for Discrete Fast Fourier Transforms.

b_eff (network): Measures latency and bandwidth of the cluster’s network using small data sets and MPI routines.

Netperf (network): Used for peer-to-peer network testing.

IOzone (disk): Measures various I/O operations.

ClusterNumbers is coded in Python and at the moment only runs on Linux x86 and x86_64. The tool is broken into three parts, the Server Module, the Communications Module and the Graphical Module.

The Server Module is in charge of analyzing the entire cluster by walking each node and gathering information about processor type and quantity, OS architecture and amount of memory of each node. The Communications Module consists of a standard SSH server running on the admin node of the cluster and a set of libraries and routines on the client side called by the Graphical Module. And finally, the Graphical Module runs from a desktop PC and handles the interaction with the user. The first time you connect to a cluster using the GM you’ll be presented with this log-in window.

The project, currently hosted at Source Forge, will be open for contributions from the user community by the end of the month.