Intel Believes in the Promise of OpenCL

By Matthew Dublin

The always über-cool dynamic duo of Clay and Katy over at Parallel Programming Talk has posted a video featuring an interview with Intel's OpenCL technical leader Yariv Aridor.

Originally developed by Apple, OpenCL is a software development framework for writing parallel programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors.

The OpenCL movement is pushed forward by both the open source community and commercial vendors, like Intel. Aridor is one of Intel's group leaders who interfaces with the Khronos Group, a non-profit member-funded consortium at the center of the OpenCL community that is focused on developing standards for the OpenCL model.

OpenCL is a good and admirable idea in theory, but it has yet to really take off. The primary reason is that standardizing one programming approach to work with any type of hardware is a hugely ambitious challenge to grapple with, not just because of the wide variation of processor architectures out there, but processor design, be it for CPUs, GPU, or FPGAs, is forever a moving target.

But Intel's OpenCL software development kit (SDK) aims to level the playing field for OpenCL enthusiasts. Intel's SDK 1.1 for OpenCL supports Windows and Linux as well as most of the OpenCL extensions, including OpenGL — a cross-platform GPU application programming interface.

While OpenCL programmers have the option to write kernels with explicit vector operations, Aridor hopes that the Intel release will eliminate the need for developers to update their source code whenever the underlying processor architecture changes so they can focus on writing simple kernel code.

Aridor says that the most exiting feature of this release is the improved vectorization model, which enables users to take advantage of the SIMD instructions available on Intel CPU architectures.

With their OpenCL project, Aridor says that his OpenCL team wants to "put all the burden of optimized kernel binaries on the OpenCL compiler, leaving the programmers to focus on simple code based on the problem domain..."

Here is the interview, Aridor appears at 6:30:

XSEDE Project to Replace TeraGrid

By Matthew Dublin

The Extreme Science and Engineering Discovery Environment (XSEDE) officially launched this week with much fanfare in the HPC community. Funded with a five-year, $121-million grant from the National Science Foundation (NSF), the XSEDE aims to expand on, and eventually replace, the NSF TeraGrid as the go-to virtual HPC resource for researchers in need of supercomputers, data collections, and the newest tools.

The project will be managed by a team at the National Center for Supercomputing Applications (NCSA), the Texas Advanced Computing Center, the San Diego Supercomputer Center, National Institute for Computational Sciences, and the Pittsburgh Supercomputing Center, to name just a few.

Initially, XSEDE's compute resources will include 16 supercomputers from across the country and a slew of data collections culled from its various multisite partnerships.

User services for XSEDE will be accessed through the User Portal (XUP), a functional extension of the TeraGrid User Portal, which will provide researchers with direct command line access to computational resources and data management tools, such as information about services, user jobs, accounts, projects and allocations. In the near future, the project leaders hope to add more features to the XUP, including in-portal chat with the help desk, the integration of social media features, personalized views of XSEDE for each user, and a fully integrated ticketing system.

Below is a video with NCSA's John Towns discussing XSEDE:

Cornell Collaboration Tests MATLAB with GPUs

By Matthew Dublin

The Cornell University Center for Advanced Computing, together with Nvidia, Dell, and MathWorks, are working together to test the performance of GPUs with MATLAB. The collaboration is specifically focused on the use of multiple GPUs on the desktop via the MathWorks Parallel Computing Toolbox and a GPU cluster via MATLAB Distributed Computing Server.

Researchers at the Weill Cornell Medical Center, University of Michigan Health System, and the Rutgers Laboratory for Computational Imaging and Bioinformatics are currently utilizing GPUs and MATLAB to ramp up the diagnosis of cancer cells with template matching. Using GPUs, they were able to speed up the code processing time from 86.9 to 5.9 seconds. Trying to exploit GPUs in this manner would be particularly useful for pathologists looking to process many large scale images per day.

The GP-you Group, which develops high-level software tools to access GPUs from different platforms, has already released a freeware library named GPUmat that allows MATLAB users to exploit GPUs.

The basic foundation for exploiting GPUs in MATLAB is Jacket, a numerical computing platform built around the M (or MATLAB) language. Click here for a series of lectures on Jacket courtesy of Torban Larsen at Aalborg University.

Microsoft Releases Large-Scale Data Processing Cloud Tool

By Matthew Dublin

Yesterday, at the opening day of the 12th annual Microsoft Research Faculty Summit, Microsoft released a new platform designed to aid researchers deal with big data on the cloud. Codenamed "Daytona," this is the first viable technology to come out of Microsoft Research’s eXtreme Computing Group (XCG) — a development group aimed at parallel programming models, cloud software, data center architectures, and specialty hardware accelerators.

The new platform aims to provide researchers with terabytes of data to analyze a super-user friendly interface to access Microsoft’s cloud. Daytona can scale out to hundreds of server cores for distributed data analysis.

Roger Barga, an architect in Microsoft Research’s XCG, says that Daytona will have an easy-to-use programming interface for developers to write machine-learning and data-analytics algorithms. “They don’t have to know too much about distributed computing or how they’re going to spread the computation out, and they don’t need to know the specifics of Windows Azure,” says Barga.

New releases of Daytona are scheduled monthly, which will incorporate updates based on feedback from the scientific and research communities, and is also being distributed for free.

iPad-Controlled Supercomputing

By Matthew Dublin

Keep an eye out this fall for an iPad app that can control your local supercomputer or cloud computing account.

Manish Parashar, a professor at Rutgers University, recently demonstrated the application at an IEEE competition — that he and his team won. Their demonstration pulled together IBM supercomputers in New York and Saudi Arabia, adding and dropping groups of processors as end users tailored the configuration according to the details of the computational task at hand.

Called CometCloud, the new software enable on-the-fly federation of geographically disparate supercomputers located in either data centers, public or private clouds, and enterprise grids.

CometCloud has been used to support academic and engineering projects but only as a research project. Parashar says that he expects the service to become commercialized before the new year.

Coriell Institute Teams up with IBM

By Matthew Dublin

The Coriell Institute for Medical Research has announced that they will be using IBM technology to manage their massive collection of living human cells that support genomics disease research. Coriell’s cryogenic freezers house up to 48,000 samples at any given time. In the past when mechanical failures occurred, response teams were appearently altered only in the event of a total failure of the unit, not a partial failure. In order to improve the alert system, and preserve more samples, Coriell researchers are installing an IBM monitoring software system.

In addition, Coriell is implementing IBM’s XIV Storage System to manage data sets generated from over two million ampules of cells, one million vials of DNA, and hundreds of thousands of other biomaterials. The new storage system will also support the Coriell Personalized Medicine Collaborative Research Study, which is aiming to collect roughly 1.5GB of genetic data per person from over 100,000 participants.

Coriell is also installing IBM Tivoli Maximo, IBM Tivoli Netcool, and IBM WebSphere Lombardi Edition.

Below is a video featuring Coriell’s president Michael Christman and CIO Scott Megill discussing the institute’s data management challenges:

UNC's Secure Cloud for Clinical Data

By Matthew Dublin

In order to meet the challenges of analyzing medical data, a team at the Renaissance Computing Institute (RENCI) and School of Information and Library Science at the University of North Carolina at Chapel Hill are creating a secure cloud computing environment for sensitive medical data.

Michael Shoffner, a senior research software engineer at RENCI heading up the initiative, says that the main challenge is to get medical data to researchers while minimizing the risk of accidental release by protecting privacy and complying with regulations. Typically, when researchers use test results, doctor’s notes or other clinical data in their research, data is obtained on a disk or something other type of unsecured media that can easily be lost, stolen, or even just accidentally thrown away.

Shoffner’s technical team is working with medical researchers at the North Carolina Translational and Clinical Science Institute to develop the requirements for a secure medical workspace environment in the form of a private cloud that uses VMware ESXi. Each virtual machine contains a slew of analysis software tools, including Microsoft Office suite, and SAS analytics applications.

The cloud will be administered by UNC IT staff.

RENCI’s Secure Medical Research Workspace from RENCI on Vimeo.

When Good Code Goes Bad

By Matthew Dublin

Paul Seibert over at Hub Tech Insider has a blog post on managing a software development project if you're not a developer using the common sense approach of a shared vocabulary to describe issues with the code. The need to quickly evaluate the quality of code and describe issues are of obvious importance for researchers working on a slim budget or without a lot of resources, where efficiency is key. Seibert has a list of terms for improving communication with a group of computer programmers discussing a section of code or an entire application.

1. Fragility: When changes in the software code cause the system to break in places that have no conceptual relationship to the part that was changed. This is a sign of poor design. The opposite of fragility is known as robustness.

2. Immobility: When the code is hard to reuse. The opposite of [immobility] is known as re-usability.

3. Needless complexity: When the design is more elaborate than it needs to be. This is sometimes also called “Gold plating”. The opposite of [needless complexity] is known as simplicity.

4. Needless repetition: This occurs when cut-and-paste of code segments is used too frequently. The opposite of [needless repetition] is known as parsimony.

5.Opacity: When the code is written in such as manner as it is not clear. The opposite of [opacity] is known as clarity.

6. Rigidity: When the design is hard to change because every time you change something, there are many other changes needed to other parts of the system. The opposite of [rigidity] is known as flexibility.

7. Viscosity: When it is easier to do the wrong thing, such as a quick and dirty fix, than the right thing. The opposite of [viscosity] is known as fluidity.

An HPC Storage Backstory

By Matthew Dublin

While it seems like every week there's a new announcement about the latest HPC system installation, it’s not often that we get the backstory on how one vendor solution was chosen over another. HPCwire profiles one such rare story by following up with the University of Utah’s Brain Haymore, director of the HPC storage team at Utah's Center for High Performance Computing (CHPC), to learn about their evaluation process for competing vendors during the upgrade the center’s Updraft cluster and data storage facility.

Panasas, HP with its IBRIX system, the Dell and Terascala Lustre package, and the IBM and DDN GPFS solution, were all courting CHPC initially, so there were a lot of options, price points, and support offerings to choose from.

While Panasas provided no performance increase, Lustre did provided a three-fold increase however performance issues were hampered by “mysterious I/O errors” that affected half of the runs. In addition, the fact that the Lustre file system was about to be handed off to Oracle created some practical concerns about stability. In the end, Haymore and his colleagues had to decide between the DDN/IBM GPFS and HP’s IBRIX solution, both of which had about the same level of performance. So instead of performance or price acting as the purchasing-decision tipping point, in the end, the support model was the biggest factor. Whereas obtaining hardware from DDN and the software from IBM required chasing support from two separate vendors, HP’s solution is a unified support model.

Click here to read the whole article.

Sandia Researcher Advances Computer Cooling Technology

By Matthew Dublin

Sandia National Laboratories researcher Jeffrey Koplow may have made a serious dent in the heat problems plaguing large data centers. While all blades and desktop computers contain fans and a heat sink — a metal component attached to the motherboard that transfers heat away from the processor cores with “cooling fins” — Koplow’s team has shown that this standardized design is hardly an efficient one.

Typical heat sink:

While fans are usually positioned so that they blow heat away from the heat sink and not the actual chip, there is still a layer of motionless hot air that remains on the cooling fins creating a boundary or layer of insulation that resists the airflow from the fan. In addition, there is something known as the “heat sink fouling” problem where the heat exchanger is covered with dust or other airborne contaminants while the fan blades remain mostly clean, further preventing proper airflow and cooling of the chip. Cooling and the removal of heat from the heat exchanger is also restricted by a limitation on fan noise, which puts a cap on the speed and power of the fans integrated into the hardware.

According to Koplow, no one has devised a way to address these problems – until now.

His “Air Bearing Heat Exchanger” technology seems to solve all three of these problems by providing a several-fold reduction in the boundary layer, immunity to heat sink fouling, and noise reduction. According to the paper, this solution “is also very practical from the standpoint of cost, complexity, ruggedness, etc. Successful development of this technology is also expected to have far reaching impact in the IT sector from the standpoint of solving the “Thermal Brick Wall” problem (which currently limits CPU clocks speeds to ~ 3 GHz), and increasing concern about the the electrical power consumption of our nation’s information technology infrastructure.”

Kaplow with a prototype of his heat exchanger:

Click here to download the technical paper.

BGI's Latest Software Release Includes GPU & Cloud Tools

By Matthew Dublin

BGI has announced the early access release of their newest suite of bioinformatics tools, including some cloud and GPU accelerated applications.

The new release includes an updated version of the Short Oligonucleotide Analysis Package (SOAP), including a GPU-accelerated alignment tool version SOAP3-GPU. According to BGI, SOAP3-GPU performs up to ten times faster than SOAP2 at aligning short reads with a reference sequence.

“To tackle these difficulties, BGI and its collaborators are working on GPU accelerated bioinformatics tools, including alignment and variation detection, for example. The improvements in speed are impressive -- the prototype version alignment tool is ten fold faster than it’s CPU counterpart, while SNP detection codes are about two magnitudes faster,” stated Bingqiang Wang, director of the High Performance Bioinformatics Center of BGI in a press release.

BGI's software package release also includes an upgraded version of Hecate and Gaea, two cloud based distributed solutions that Evan Xiang, director of R&D at BGI's Flexible Computing Center, says will be used to solve research problems in a "flexible" manner — or in other words, researchers can run de novo assembly jobs on the BGI cloud or by downloading the programs and running them on Amazon's EC2 cloud.

While there are no benchmarks available, the BGI announcement claims that Hecate, based on MapReduce, can reduce the cost of de novo assembly by more than 50 percent and Gaea, a SNP calling program, can improve the efficiency of cluster usage by more than 30 percent.

Bioinformatics and the Future of Hadoop

By Matthew Dublin

Pacific Northwest National Laboratory's (PNNL) Ronald Taylor has published an overview of Hadoop, the popular open-source software framework the supports data-intensive distributed applications. Taylor's paper in BMC Bioinformatics looks at how Hadoop has been adopted by the bioinformatics community, with a specific focus on next-generation sequencing.

Hadoop, an open source implementation of the MapReduce programming paradigm — a framework for processing huge datasets developed by Google — is a cost-effective method of analyzing data on commodity Linux clusters and the cloud. Taylor also discusses some of the major open source project that are built on top of Hadoop, including the Hive framework used for ad hoc querying with an SQL type query language, and Pig, a high-level data-flow language for bath processing of data.

The Magellan project, a joint research effort of the National Energy Research Scientific Computing Center (NERSC), Lawrence Berkeley National Laboratory, and the Leadership Computing Facility at Argonne National Laboratory (ANL), uses Hadoop and HBase, a non-relational distributed database, on a cluster at NERSC and have been run using Hadoop in streaming mode for BLAST computations. NERSC is also evaluating the use of Hadoop and solid state storage, a low-energy memory technology that is being explored by the HPC community.

Taylor concludes that "for much bioinformatics work not only is the scalability permitted by Hadoop and HBase important, but also of consequence is the ease of integrating and analyzing various large, disparate data sources into one data warehouse under Hadoop, in relatively few HBase tables."

For a good breakdown of Hadoop and the history of MapReduce, check out this video:

OSC Team to Develop Apps for Intel's HPC Chip

By Matthew Dublin

Researchers at Ohio Supercomputer Center (OSC) have announced that they will start developing code with Intel’s Many Integrated Core Architecture (Intel MIC) for scientific computing workloads. Over the next six months, OSC staff will evaluate the capabilities of the Intel MIC Architecture across a range of HPC application areas including computational chemistry, climate and ocean modeling, high-energy physics, and computational material sciences.

In order to help the OSC team kick the tires on this new chip architecture, Intel will provide the staff with early access to the first commercial Intel MIC coprocessor code-named “Knights Corner.” The new chip is a massively parallel x86 processor slated to debut with 50 cores. The technology behind “Knights Corner” is actually based in part on the canceled “Larrabee” project — Intel’s failed attempt at competing with Nvidia in the GPU race with a high-performance x86-based discrete graphics processor.

“We are excited to be an early evaluator of the Intel MIC Architecture since it promises to provide performance and power efficiency similar to GPU-based solutions on highly parallel workloads, but most importantly without the need for new programming models,” says David Hudak, OSC program director for cyber infrastructure and software development.

Intel’s mission statement is that the Intel MIC will target high-performance computing, workstation, and data center markets. They are boldly stating that this new design will eventually pave the way forward for exascale computing by 2018.

In a wise move, Intel has made the MIC architecture compatible with existing programming tools and methods, such as C, C++, and FORTRAN source code. Programs written for Intel’s MIC products can be compiled and run on standard Intel Xeon processors.