Cloud Computing App for Population Genetics

By Matthew Dublin

A team at the University of Buffalo has developed a cloud computing resource to help students understand parts of evolutionary biology and population genetics. Called Pop! World, the cloud application uses Adobe Flash to support a highly visual and interactive illustration of evolution with red and green lizards, although it can be used with any population of organisms. The program is supported with a $250,000 National Science Foundation grant and is currently running on the Google App Engine Cloud. Its designers hope that by the fall they will be able to use the funding to roll out a more sophisticated version of the application.

"The cloud serves as a way to distribute resources for free without limits on how many people can access it and with no regard to what kind of computer you are downloading to," says Jessica Poulin, a research assistant professor in the Department of Biological Sciences in the College of Arts and Sciences, who developed Pop! World. "Almost all of evolutionary theory can be mathematically modeled if you know enough information to begin with. ... If you enter the correct parameters into the computer, the computer will tell you what will happen after one generation or a thousand generations. I wanted students to be exposed to something that made them feel they were actually watching evolution happen. I wanted it to be captivating."

In other cloud news today, Amazon Web Services has released a video tutorial that demonstrates how allegedly easy it is to set up and use an AWS cloud cluster specifically for HPC. This 18-minute video walks through how to establish an 8-node, 64 core, ad hoc cluster to run through some molecular dynamics simulation.

ORNL Releases GPU Benchmarking Suite

By Matthew Dublin

The Oak Ridge National Laboratory has announced the release of its Scalable Heterogeneous Computing Benchmark Suite, a collection of benchmark programs that test the performance and stability of systems using computing devices with non-traditional architectures for general purpose computing, such as GPUs and FPGAs in conjunction with traditional CPUs, and the software used to program them.

SHOC, which is available for systems running both OpenCL and CUDA code, lets users evaluate their clusters with stress and performance tests. The stress benchmarks put your system through its paces with demanding kernels that identify bad memory, inefficient cooling, and other hardware issues. The performance tests are similar to BLAS API in that they have several categories of varying complexity depending upon capability of the devices.

SHOC also features cluster-level parallelism with MPI, node-level parallelism for use with multiple GPUs, stability tests for large-scale cluster resiliency testing, and easy reporting in a spreadsheet format. SHOC is Linux and MacOSX compatible.

Click here to download SHOC 1.0

Finland Invests $6.7M in Biomedical IT Infrastructure

By Matthew Dublin

Finland has invested $6.7 million in a national project called Biomedinfra, which aims to improve biobanking, translational medicine, and high-performance computing infrastructures. The hope is that by implementing a robust biomedical IT infrastructure, not only will health care in the country be improved, but international collaborations will have a seamless way to collect and utilize massive amounts of data.

According to Anu Jalanko, head of the Public Health Genomics Unit at Finland's National Institute for Health and Welfare, the new funding will play a key role in establishing the Finnish national biobank network, BBMRI.FI. "Biobanking requires close collaboration locally, nationally and internationally. Finland has strong traditions and expertise in biobank-related research and has created several world-class population biobank resources that are often also used in international collaborations," he says.

Tommi Nyrönen, who leads the development Finland's CSC- IT Center for Science where they are currently developing cloud-based high-performance computing and storage services tailored to the needs of biomedical researchers, the new funding will enable them "to develop technologies and policies to ensure that distributed biomedical data can be managed and used efficiently and securely. Research institutes should be able to access supercomputing resources easily and securely."

Biomedical Computing at NCSA

By Matthew Dublin

Here's a video of Victor Jongeneel, a senior researcher at both the National Center for Supercomputing Applications and the Institute for Genomic Biology , discussing the ways in which NCSA resources are helping biologists take advantage of large-scale computing resources to analyze massive data sets for a range of biomedical projects at the IGB. Jongeneel also outlines how NCSA is aiding ongoing collaborations between the University of Illinois and the Mayo Clinic aimed at developing remote sensing devices and medical records integration for facilitating innovative approaches to patient therapy.

RNA Play

By Matthew Dublin

In yet another example of how crowd sourcing and gaming can advance life sciences research, a group from Carnegie Mellon University and Stanford University has developed an online game to help non-experts uncover RNA design principles. EteRNA scores players based on how well their virtual designs can be rendered as real, physical molecules.

The goal of the project is to help investigators design RNA knots, polyhedra, and other shapes that have not be identified before. At the end of each week researchers synthesize the top designs in order to determine if the resulting molecules fold themselves into the three-dimensional shapes predicted by computer models.

"Putting a ball through a hoop or drawing a better poker hand is the way we're used to winning games, but in EteRNA you score when the molecule you've designed can assemble itself," says Adrien Treuille, an assistant professor of computer science at Carnegie Mellon and co-leader of the EteRNA project. "Nature provides the final score — and nature is one tough umpire."

Treuille also helped to design Foldit, another online game where players compete against each other to find new protein folding sequences.

In December, researchers from McGill released a similar online game called Phylo, which challenges players to find and compare similar regions of genetic sequences from several genomes.

And for those of you who would like to let all of your friends know what a great RNA designer you are, EteRNA is also integrated with Facebook.

Here are some videos made by Treuille and his colleagues to better acquaint you with EteRAN:

What is the EteRNA game?

What have we learned from EteRNA?

How was EteRNA created?

UCSF's Institute for Human Genetics Chooses Dell

By Matthew Dublin

The Institute for Human Genetics (IHG) at the University of California, San Francisco (UCSF) has elected to install a Dell HPC system to undertake a genotyping project that aims to analyze variations in over 700,000 different SNPs. It is anticipated that this project will generate over seven petabytes of data.

After comparing solutions from both HP and Dell, Brad Dispensa, director of IT and information security at the IHG and the UCSF Center for Cerebrovascular Research, says they finally decided on sixteen Dell blade servers running CentOS Linux “partly because we like the way the fabric integration on the chassis works, and partly because the Integrated Dell Remote Access Controllers are included at no charge with the server hardware.”

While Dell's promotional "article" detailing Dispensa's choice is obviously a bit slanted, it does paint a picture of the process an IT manager must undergo when preparing to deal with an onslaught of genomics data. Considerations such as why the Dell system was the ideal choice to deal with downtime and SAN failure, as well as providing enough processing power per a rack unity and cheap licensing fees, are all touched upon.

HPC Power and Cooling Primer

By Matthew Dublin

Dell’s John Fragalla has a series of informative posts that will walk you through some of the basic issues related to power and cooling in the data center. Everybody that runs a cluster or data center by now knows the mantra of an increasing number of multicore chips and GPUs, more storage arrays, and network switches, results in higher system densities (more and more gear crammed into rack units) which in turn results in more of a power drain to keep both the system running and keep it cool.

Fragalla suggests that the first step IT managers can take in grappling with these issues is to become intimately acquainted with the specific power and cooling requirements by measuring power consumption with LINPACK and familiarize themselves with metrics such as Amps, Cubic Feet per Minute (CFMs), Tons of Cooling, and British Thermal units per hour (BTUs/hr).

In part three of his series of cooling and power, he discusses rack Power Distributions Units (PDUs), which come in various configurations, ranging from single phase power, to three phase power, and different amps per phase, such as 30A to 60A.

In his next post, Fragalla will introduce readers to Cubic Feet per Minute (CFM), which is a measurement of air flow a HPC system produces from component fans.

Download Free High-Performance Scientific Computing Book

By Matthew Dublin

Finally all of your questions about just how large-scale scientific computing actually works and how to better take advantage of your local cluster or supercomptuer are answered here in a freely available book entitled Introduction to High Performance Scientific Computing by Victor Eijkhout, a researcher at the Texas Advanced Computing Center.

In preface to the book, Elikhout writes that "the need for a book such as the present was especially apparent at the Texas Advanced Computing Center: users of the facilities there often turn out to miss crucial parts of the background that would make them efficient computational scientists. This book, then, comprises those topics that seem indispensible for scientists engaging in large-scale computations."

So get cozy with this 337 page page-turner, or stow it away and wait until the summer for an edifying beach read.

CPU and GPU, meet APU

By Matthew Dublin

Just when you thought you were safe from having to wrap your head around yet another new type of hardware, AMD announces a processing architecture that aims to bridge the bottleneck between interconnects on CPUs and GPUs called the Accelerated Processing Unit or APU.

The APU is an attempt to allegedly help the field of heterogeneous computing (which is usually FPGA co-processors couple with standard CPUs) reach its true potential. Hetergeneous computing has gained an increasing amount of momentum over the last two or three years due to the widespread popularity of general purpose computing with GPUs, which everyone wants to integrate into their clusters or workstation so they can be part of all the excitement...

In the video below, AMD explains how this new technology, which looks to be initially integrated on workstations and desktop computers, works and why this is idea worth marketing. Like every great new hardware idea, this looks great on paper (on a white paper that is) and in an animated promotional video with futuristic sounding music, but only the programming Gods know what hell may await those tasked with really getting some use of this new gizzmo.

Online HPC Toolbox

By Matthew Dublin

There's finally a quality controlled go-to site for HPC applications that includes everything from open source code and systems analysis tools to benchmarks, manuals, and documentation. According to their website, HPCtool.org is a community resource which, instead of containing orphaned or incomplete software as most software repositories tend to do, will instead aim to offer constantly updated code through expert development, monitoring, and testing.

The folks behind HPCtool.org include many HPC veterans that have worked on Top 500 systems, including some of the first systems to break the teraflop and, then later, petaflop barriers. The tools on the site are available to everyone and could be used to kick the tires on and maintain a Beowulf cluster, supercomputer system, or large-scale data center.

Keep an eye out for the February issue of Genome Technology where I will outline a slew new and hopefully useful tools for the data center and cluster, many of which were introduced to the community for the first time at the recent SC10 meeting in New Orleans this past November.

NCSA Deploys TeraChem on Hybrid Cluster

By Matthew Dublin

The National Center for Supercomputing Applications has ported TeraChem 1.41, the open source ab initio quantum chemistry package, to run on the center's hybrid CPU/GPU supercomputer Lincoln.

"Before users were on their own to port their software to graphics-processing units," said Thom Dunning, the center's director. "By providing TeraChem on Lincoln, we've made it much easier for chemists to harness the power of GPUs to accelerate their calculations and improve their productivity.

TeraChem was initially developed by Stanford University's Todd Martinez, although NCSA's Alexey Titov has implemented d-functions for the energy calculation within Hartree-Fock and DFT methods that will be available in the next production version of TeraChem.

Lincoln is a comprised of both Nvidia Tesla S1070 accelerator units and Dell PowerEdge 1950 servers.

No mention is made of what the process of successfully porting TeraChem across a hybrid cluster is like, what types of snags they ran into along the way, or if it was fairly trivial, but certainly employing GPUs as accelerators in this capacity looks promising.

According to Nvidia, a workstation with 4 GPUs running TeraChem outperforms 256 quad-core CPUs running GAMESS.

Cancer Research UK/Intel Video on HPC and Cancer Genomics

By Matthew Dublin


Cancer Research UK
at the Cambridge Research Institute has produced a video in collaboration with Intel that takes a look at the computational challenges facing cancer genomics research and the role high-performance computing plays in helping to grapple with storage and analysis. Stafan Graf, a computational biologist focused on breast cancer, and James Hadfield, head of genomics, discuss how the key to cancer genomics is what's under the hood of your local HPC resources.

Peter MacCallum, head of IT and scientific computing, outlines his approach to providing computational services to over 20 research groups and 10 core facilities using a slew of different software and analysis techniques and the challenge of providing all of those services under a single infrastructure. MacCallum also provides a nice walk-through of the institute's server room, which is a 512 core unit comprised of Intel's Xeon dual quad core processors and 100 TBs of storage, and their plans for further expansion of database and analysis clusters in order to meet the growing demands of cancer genomics.

Supercomputer Creates Alzheimer's Protein Interaction Network

By Matthew Dublin

A group of researchers from IRB Barcelona and the Joint Programme IRB-BSC have utilized the Barcelona Supercomputing Center's MareNostrum supercomputer to discover new molecular mechanisms that may be involved in the development of Alzheimer's disease.

Instead of studying individual proteins, the scientists used the power of MareNostrum to analyze the thousands of possible interactions between proteins thought to be involved in the disease and obtained a total of 200 new interactions. These new interactions now bring the total number of known Alzheimer's-related interactions to 6,000, resulting in the largest network of interactions between proteins related to Alzheimer's disease.

The study was led by IRB Barcelona group leader and ICREA researcher Patrick Aloy and was published today as "Interactome mapping suggests new mechanistic details underlying Alzheimer's disease" in Genome Research.

The 94 terabyte MareNostrum is housed in the deconsecrated Chapel Torre Girona at the Polytechnic University of Catalonia, Barcelona, Spain: