ORNL Researchers Identify Molecular "Fingerprints"

By Matthew Dublin

A research team at the Department of Energy's Oak Ridge National Laboratory (ORNL) has developed a theoretical technique to bring together simulations and experimental results called dynamical fingerprints. "Experiments tend to produce relatively simple and smooth-looking signals, as they only 'see' a molecule's motions at low resolution," says Jeremy Smith, the director of ORNL's Center for Molecular Biophysics. "In contrast, data from a supercomputer simulation are complex and difficult to analyze, as the atoms move around in the simulation in a multitude of jumps, wiggles and jiggles. How to reconcile these different views of the same phenomenon has been a long-standing problem."

Smith and his colleagues will publish a paper describing their fingerprint method in the Proceedings of the National Academy of Sciences. The method works by reconciling the different signals between experiments and computer simulations to solidify analyses of molecules in motion. Using ORNL’s supercomputers, the researchers will apply this approach to integrate simulation and experimental datasets so that they can delve into drug development, fundamental biological processes, and gain a better understanding of molecular movement and interactions.

Combining the power of simulations and experiments will help researchers tackle scientific challenges in areas like biofuels, drug development, materials design and fundamental biological processes, which require a thorough understanding of how molecules move and interact.

UF Touts Fastest Reconfigurable Supercomputer

By Matthew Dublin

The University of Florida’s Novo-G is being touted by its designers as the world's fastest reconfigurable supercomputer. Novo-G is the brainchild of Alan George, a professor of electrical and computer engineering at the University of Florida and director of the National Science Foundation’s Center for High-Performance Reconfigurable Computing (CHREC). George and his colleagues are claiming that Novo-G is capable of performing some important science applications faster than China's Tianhe-1A system, currently the world's fastest supercomputer according to the Top500 List.

The reconfigurable computing paradigm is based on the concept that computer architectures should be tailored made to address the needs of each application or project. While the hardware and software development for this approach is usually more expensive and requires more expertise than say setting up a standard cluster of CPUs, the performance of reconfigurable systems consistently outperforms CPUs and GPUs.

Novo-G is comprised of 192 reconfigurable processors, about the size of two home refrigerators, and consumes less than 8,000 watts. In a recent article published by CHREC researchers in IEEE Computing in Science and Engineering magazine, they describe the architecture of Novo-G and its performance in scenarios including genome research, cancer diagnosis, plant science, and the ability to analyze large data sets.

Aachen University Upgrades Supercomputer

By Matthew Dublin

The North Rhine-Westphalia Technical University (RWTH) in Aachen, Germany has chosen a bullx supercomputer to help accommodate researchers in the Engineering and Life Sciences faculties. The new system is comprised of over 28,000 processing cores and will deliver some 300 teraflops of power and three petabytes of disk storage.

The University's Center for Computing and Communication will be collaborating with Bull to optimize standard HPC applications, such as OpenFOAM, for hybrid cluster architectures that use numerous multiprocessor systems with large memory capacity together with a high-performance network. This architecture makes the most of the benefits offered by both Message Passing Interface (MPI) standards and the significant memory available through OpenMP, an application programming interface that researchers at the Aachen Center for Computing have used for quite some time.

Here's a system performance breakdown:

The overall power of the system -- at 300 teraflops (300 teraflops = 300 x 10^12 floating-point operations a second) -- is virtually the same as 10,000 of the latest desktop PCs.

Light travels at 30 centimeters a nanosecond. In the same timespan, the RWTH supercomputer will complete 300,000 operations.

The system can write up to 19 GB of data a second to the attached storage system: filling the equivalent of four DVDs.

The disk storage system has a total capacity of three petabytes, or 3,000 terabytes (3 x 10^15 bytes). It would take an MP3 player 6,000 years of continuous operation to play the equivalent amount of data.

Its processing power would rank the new computer as one of the 30 most powerful supercomputers in the world, compared with the most recently published TOP500 listing.

GPUs Accelerate Protein Interaction Networks Analysis

By Matthew Dublin

Two researchers, Jun Sung Yoon from the Allegro Viva Corporation in California and Won-Hyong Chung from the Korea Research Institute of Bioscience & Biotechnology in Daejeon, Korea, have presented a paper that describes an attempt to exploit GPUs for biological network analysis. In their paper entitled “A GPU-accelerated bioinformatics application for large-scale protein networks,” which was presented at last month’s Asia Pacific Bioinformatics Conference in Incheon, Korea, the authors describe a new parallel implementation of MCODE, a well-known sequential complex finding algorithm, using commodity graphics hardware. The inspiration for taking such an approach came from a need to address two architectural limitations of MCODE as a plugin that is integrated into the open-source visualization and analysis Cytoscape platform. The first is serial computation, where users must wait a long time for an analysis of large interaction networks to complete, and the second is that standalone systems often lack sufficient computing power and usually require researchers to upgrade hardware. Using Nvidia GPU hardware, they achieved a speedup of two orders of magnitudes compared to the original MCODE in the latest CPU for large-scale protein-protein interaction networks.

Click here to download the paper.

Virginia Tech's GPU Testbed Cluster

By Matthew Dublin

Virginia Tech University’s Kevin Shinpaugh, director of high-performance computing, has posted a video presentation on some of their Advanced Research Computing projects.

In the video, Shinpaugh discusses their Athena GPU test bed cluster that provides visualization and simulation capabilities for data intensive applications. Athena supports projects that are headed up by researchers at the Virginia Bioinformatics Institute such as a computational epidemiology application called EpiSimdemics that incorporates a Google maps-like interface with a highly scalable, parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models.

Virginia Tech - GPU Computing for Computational Sciences and Engineering from Appro International on Vimeo.

Best Practices for NAMD

By Matthew Dublin

The HPC Advisory Council, an industry consortium of high-performance computing hardware and software vendors, has published a new best practices presentation entitled "NAMD Performance Benchmark and Profiling" for the 12-core AMD Opteron 6174 (Magny-Cours) processors running at 2.2 GHz. NAMD is a widely used parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. The Theoretical and Computational Biophysics Group (TCB) and the Parallel Programming Laboratory (PPL) at the University of Illinois at Urbana-Champaign developed NAMD in a joint collaboration. The code is freely available based on Charm++ parallel objects and scales to to hundreds of processors on high-end parallel platforms and tens of processors on commodity clusters using gigabit Ethernet. As of March 2010, the first NAMD paper entitled “Scalable molecular dynamics with NAMD” published in the Journal of Computational Chemistry had over 1,000 citations. The best practices presentation attempts to walk readers through running NAMD on various cluster components as well as data on performance capabilities.

Argonne Selects IBM for Next Supercomputer

By Matthew Dublin


Argonne National Laboratory
has announced that its new supercomputer “Mira” will be based on IBM’s Blue Gene/Q supercomputer technology. At 10-petaflops, the new supercomputer will be 20 times faster than Argonne’s current supercomputer called Intrepid and twice as fast the current fastest supercomputer in the world. Mira is slated to become operational in 2012 and will enable research in a range of field including life sciences.

"Computation and supercomputing are critical to solving some of our greatest scientific challenges, like advancing clean energy and understanding the Earth's climate," said Rick Stevens, associate laboratory director for computing, environment and life sciences at Argonne National Laboratory, in an IBM press announcement. "Argonne's new IBM supercomputer will help address the critical demand for complex modeling and simulation capabilities, which are essential to improving our economic prosperity and global competitiveness."

Argonne researchers are characterizing Mira as a sort of stepping stone to the world of exascale computing in that users will be required to scale their code to run on upwards of 750,000 individual processing cores-even though if the concept of exascale class supercomputers ever comes to fruition this require programmers to code for hundreds of millions of cores.

Blocks of time on Mira will be awarded to investigators through the Department of Energy’s Innovative and Novel Computational Impact on Theory and Experiment and the Advanced Scientific Computing Research Leadership Computing Challenge programs.

Before a new supercomputer is rolled out, Argonne “shakes down” the system with the help of a small community of users running production applications called the Early Science projects. The process began back in October of last year with a workshop on Mira’s architecture, configuration, and software environment, wherein project teams kicked off intensive efforts to adapt their software to take advantage of Mira’s Blue Gene/Q architecture. These projects span a range of scientific computing applications including molecular dynamics and computational chemistry experiments.

Spain's CNIO Tests Out Cloud

By Matthew Dublin


The Bioinformatics Unit, Structural Biology and Biocomputing Program at the Spanish National Cancer Research Center (CNIO)
has teamed up with The Server Labs, a IT design firm, to implement a cloud computing solution to deal with genomic data. Working within requirements that included the need for a 64-bit hardware architecture, CNIO researchers and Sever Labs setup a proof-of-concept system that including a base image with the Linux operating system Ubuntu on an Amazon EC2 large instance with 7.5 GB of memory, 4 EC2 Compute Units (this include two virtual cores with 2 EC2 Compute Units each) and 850 GB of local instance storage.

Using the cloud-management platform RightScale, they were able to separate out the selection of instance type and base image from the installation and configuration of software specific to genomic processing. Using their new cloud, CNIO were able to process up to 20-25 sequencing runs in a sequencer. On average, they expect to analyze about 150 sequencing lanes per year, each generating 30 gigabyte of entry data on average, and totaling up to 3-4.5 terabytes in storage and processing requirements.

While they found that processing times in the cloud were comparable to running the same workflow in-house on similar hardware, data transfer to the cloud-probably one of the biggest barriers for cloud computing-was a significant bottleneck. CNIO’s IT staff figured out how to work around the bottleneck by processing data on Amazon’s European data center and avoiding uploads during peak usage hours.

Here's a schematic of Sever Labs' cloud solution:

Another Smith-Waterman Record Broken

By Matthew Dublin

DRC Computer Corporation (DRC), a reconfigurable computing hardware vendor, has achieved 9.4 trillion cell updates per second running the Smith-Waterman algorithm with Affine gap model on the company latest coprocessors. DRC claims their new SW to be five times better in terms of price and performance than any other published results. The benchmark was achieved running 200 base-pair DNA reads against a 650,000,000 nucleotides database on a clustered server operating as a cloud computing environment using the SSEARCH35 tool within the FASTA genomics tool kit.

Smith-Waterman has always been regarded by the reconfigurable computing community and FPGA vendors as one of the best bioinformatics algorithms to port to an FPGA. In the last few months alone several Smith-Waterman implementations have been released, each touting more impressive benchmarks than its predecessor including SGI and Pervasive Software's Smith-Waterman implementation with benchmarks that they claimed were 43 percent faster than any current implementations at the time. Their version used a Java framework to analyze 10 million protein sequences 81.1 seconds across an 384 core SCI Altix UV 1000 for a sustained 986 billion cell updates per second.

The University of South Carolina's Heterogeneous and Reconfigurable Computing Group is one academic effort aimed at exploring reconfigurable computing for various bioinformatics tasks. So far they have adopted Convey's hybrid-FPGA solutions to explore the acceleration of phylogenetic inference methods.

LIBR Overhauls Networking Infrastructure

By Matthew Dublin

The Laureate Institute for Brain Research (LIBR) in Tulsa, Oklahoma has recently installed a new networking infrastructure to ramp up its ability to correlate clinical content and imaging and genetic data of the human brain.

LIBR is a private, non-profit organization with approximately 100 computer scientists, physicists, neuroscientists, medical doctors, mathematicians, computer scientists and chemists who rely on high-performance computing to elucidate root causes for mood and eating disorders.

“There are two main factors to our success: network performance and reliability," said Alex Barclay, IT director at the Laureate Institute for Brain Research. "Because we are processing high volumes of complex research, we cannot afford delays in intensive brain research modeling. Any type of network congestion or irregularity can translate to significant setbacks in medical breakthroughs."

Barclay selected a 10 Gigabit Ethernet (GbE) by Brocade after carefully vetting a number of solutions by different vendors by looking at overall network performance, security-a crucial component when handling patient data, and operational costs. The Brocade system is capable of reliably process critical data at wire speed with a 200 Terabit network attached storage infrastructure.

Below is a video of Barclay discussing the HPC needs of the institute and why they chose Brocade:

TeraGrid Continues to Grow

By Matthew Dublin

The TeraGrid has a post describing the process their Resource Allocation Committee goes through when awarding time on the TeraGrid network for large-scale scientific computing projects. TRAC awarded a total of 1.75 billion service units — the equivalent of 200,000 years on a single processor — to roughly 1,300 applicants in 2010 for a range of disciplines, from clinical investigations of broken bones to drug design.

This year three new supercomputing systems will be added to the TeraGrid network:

Lonestar 4, a 302 teraflops Dell cluster that will go into production at the Texas Advanced Computing Center in February.

Trestles, a 100 teraflops system at the San Diego Supercomputing Center that is expected to come online in January.

Blacklight, at the Pittsburgh Supercomputing Center, an Altix UV1000 system that came online in October.

In addition, the Kraken supercomputer, housed at the National Institute for Computation Sciences, was recently upgraded to achieve a peak performance of 1,174 teraflops. In total, the three new supercomputing sites plus the upgrade will offer researchers an additional 350 millions hours of compute time.

Rommie Amaro, an assistant professor of pharmaceutical sciences at the University of California, Irvine, is running large-scale simulations for biomolecular systems on TeraGrid in effort to identify novel druggable target sites.

"We've been successful in identifying new lead compounds for several infectious diseases, including African sleeping sickness and influenza, as well as cancer," Amaro says. "TeraGrid machines enable us to perform these large-scale simulations easily and efficiently, helping us drive discovery in collaboration with experimental labs."

Researchers Use Supercomputer to Explore Transcription Factor Proteins

By Matthew Dublin

Texas Advanced Computing Center has a post describing how a group led by Vishy Iyer, an associate professor in the Institute for Cellular and Molecular Biology at The University of Texas at Austin, took advantage of TACC’s Ranger supercomputer to explore the role of transcription factor proteins in gene regulation.

The group published research last year in Science entitled “Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans,” that is one of the first studies to use next-gen sequencing and high-performance computing to explore the expression of genes related to a specific regulatory transcription factor (called CTCF) and the role of heredity in the transcription binding process.

Iyer and his colleagues used the thousands of processing cores on Ranger to align short sequence reads generated by ChIP-Seq and to a reference genome. According to TACC, the project used more than 175,000 “processor hours,” or the equivalent of 20 yeas on a single processor.

GNS Healthcare Assists NCI Drug Research With Supercomputer Models

By Matthew Dublin

GNS Healthcare has entered into a subcontract with SAIC-Frederick to support an effort with its supercomputing resources to analyze National Cancer Institute data generated from the application of several cancer drugs to the NCI-60 cell line panel.

GNS will use its supercomputer-driven “Reverse Engineering and Forward Simulation” technology platform in support of SAIC-Frederick's contract with NCI to construct models in a hypothesis-free, unbiased manner for the purpose of identifying key genetic and molecular mechanisms involved in a drug’s effectiveness or lack thereof. Some of the drugs that will be evaluated include doxorubicin X2, bortezimib, paclitaxel, dasatinib, sunitinib, and rapamycin.

According to GNS, the initial phase of this effort will involve machine-learning algorithms and massively parallel computers that will probe the data with the algorithms to uncover which genes and proteins drive drug efficacy and cancer biology. GNS will then build versions of the computer models that may be made available to cancer scientists for their own research through a Web interface.

"GNS is excited to be undertaking this radically new approach to unraveling mechanisms of cancer drugs that is complementary to the expert driven, but biased, approaches that have been the standard in cancer research for decades," says Iya Khalil, co-founder of GNS.