Apple's New iCloud

By Matthew Dublin

Well, it was bound to happen. Apple has now gotten in on the cloud computing mix with something they're calling the "iCloud" — what else would it be called?

While it's too soon to tell what the iCloud will consist of exactly, Apple's "upcoming cloud service offering" initially looks like it will be geared towards iTunes users and entertainment consumers. In a very uncharacteristic move, Apple let the news slip out earlier this week with a "pre-announcement" related to their upcoming Worldwide Developers Conference that left a lot of people speculating what's to come.

There have been some unverified reports that iCloud will integrate with iTunes to mirror media files stored on a user's hard drive. The key word here is "mirrored" not uploaded, which means in theory that bandwidth isn't an issue when it comes to data replication and access.

What does this have to do with genomics research? Well, if one of the main concerns with cloud computing in genomics is bandwidth and latency when dealing with analytics data, maybe there is something bioinformatics developers can learn from Apple's mirroring approach.

And while it's doubtful that scientific cloud computing customers even factor into Apple's iCloud plans right now, the fact that they have just completed construction of a $1B, 500,000 square foot data center located somewhere in North Carolina might mean that they ultimately have designs on competing with Google and Amazon cloud services. So with more providers vying for market share prices should go down, which is good for researchers. In addition, hopefully more cloud computing technology development may result as Apple and its large developer community turn their attention to the cloud.

Cray Releases GPU-Enabled Supercomputer

By Matthew Dublin

Earlier this week at a user group meeting in Fairbanks, Alaska, Cray announced that they have officially joined the "hybrid supercomputing" bandwagon with the rollout of their Cray XK6 supercomputer. The new Cray XK6 is comprised of their patented Gemini interconnect — their not-so-secret secret sauce — and high-performance AMD CPUs and Nvidia Tesla GPUs. If ordered with all the bells and whistles, the XK6 is capable of more than 50 petaflops of computing power.

The Swiss National Supercomputing Centre in Manno, Switzerland, is the first XK6 customer this week, upgrading from their Cray XE6 system called "Piz Palu." Researchers at CSCS have used Cray systems for quite some time for a range of disciplines, including biology, genetics, and experimental medicine.

According to CSCS director Tomhas Schulthess, "Given the remarkable interest in GPU technology from the Swiss computational science community, it is essential that CSCS adopt this technology into its high-end production systems soon. However, we are not looking for another GPU based stunt to place high on any Top500 lists. The Cray XK6 promises to be the first general-purpose supercomputer based on GPU technology, and we are very much looking forward to exploring its performance and productivity on real applications relevant to our scientists."

Part of the pitch for the new systems is that Cray XT4, Cray XT5, Cray XT6 or Cray XE6 systems are easily upgradeable to the Cray XK6 system, which is expected to be available in the second half of 2011. It can also be configured in a single cabinet or a multi-cabinet system with tens of thousands of compute nodes.

IBM Aims to Build a Facebook for Researchers

By Matthew Dublin

Big Blue has announced a new effort to help researchers secure funding, identify collaborators across the globe, and locate the most recent findings in their respective fields using social networking. The effort, which is headed up by IBM and the University of Rhode Island's College of Pharmacy aims to apply IBM's data analytics technology, including the IBM Content Analytics software, cloud computing, and social networking, to develop a Facebook-like solution that can help institutions manage projects within budget and achieve their objectives.

Under the collaboration, the IBM and URI teams are working to apply IBM's data analytics technologies, cloud computing, and a patented IBM technology to develop a social media application that will provide researchers with individual profile pages that can connect to other researcher's profiles with similar interests. The platform will "crawl" across the network to recommend potential collaborators, provide data on grants, journal publications, and other information.

"This IBM-URI College of Pharmacy project holds promise for accelerating the process of locating research support opportunities, forming winning research teams and efficiently collaborating in the creation of research funding proposals," said Ronald Jordan, dean of URI's College of Pharmacy in a statement. "The rate of change we are experiencing in scientific discovery and processes, which support academic research, is exponential. This technology gives us the opportunity to not only keep pace, but potentially further advance it, to the advantage of the University and our state."

NCSA Rolls Out New GPU-CPU Cluster

By Matthew Dublin

The National Center for Supercomputing Applications announced last week the roll-out of a new 153 teraflop supercomputer dubbed Forge. The new system has a "hybrid" architecture that combines GPUs and CPUs, and is set to replace NCSA's previous hybrid system called Lincoln.

Forge will combine 18 Dell blades that contain 36 nodes of dual-socket/eight-core AMD processors, Nvidia GPUs — including eight Fermi GPU units for each node, for a total of 288 —and will use an InfiniBand interconnect fabric. The system will have 700 terabytes of file system space with an I/O bandwidth its designers hope will surpass 16 GB per second at full operational performance.

The new cluster will be housed in a cutting-edge 20,000-square-foot machine room at the University of Illinois' National Petascale Computing Facility.

Forge's predecessor, Lincoln, was NCSA's first foray into large-scale hybrid core computing and its track record so far has validated the concept of hybrid-core cluster computing for life sciences research. Klaus Schulten's research team at the University of Illinois at Urbana-Champaign, used Lincoln to run NAMD to study the organization and function of proteins and protein complexes within cells. Schulten's team found that two of Lincoln's GPUs were equivalent of 24 of Lincoln's CPU cores, while eight of its GPUs were equivalent to 96 CPU cores. And the University of Utah's Thomas Cheatham, who is studying how proteins behave in solutions and how drugs interact with them, was able to accelerate AMBER on the system, achieving 15 times speedup per node.

As of July 1st, the new system will be allocated through the National Science Foundation Translational Research in the Academic Community process.

New GPUs Set AMBER Performance Record

By Matthew Dublin

The fastest acceleration to date of AMBER 11, a molecular modeling software used to simulate the behaviors of biomolecules, has been reported by researchers at the San Diego Computer Center. Ross Walker, a principle contributor to the AMBER code and professor at SDCC, used four Nvidia Tesla M2090 GPUs together with four standard CPUs to achieve a record performance of 69 nanoseconds of simulation per day compared to the previous record on a supercomputer which is 46 nanoseconds of simulation per day.

These new GPU chips will be made available in servers such as the new HP ProLiant SL390 G7 4U server, which is built to cater to computing environments that require both CPUs and GPUs.

In other GPU news, Nortech has installed an 88,000 core GPU cluster in the "world's greenest data center" at Syracuse University. The cluster, which is being used for physics research and data analysis, uses Nvidia chips and uses 100,000 watts of power when running at maximum loads.

FPGA Coprocessor Solution for De Novo Genome Assembly

By Matthew Dublin

A little over a year after announcing the start of their life sciences division, hybridcore hardware maker Convey Computer has rolled out a new addition to their bioinformatics suite aimed at accelerating common bioinformatics applications. The new Convey GraphConstructor (CGC) is a hardware and software solution designed for de Bruijn graphs which are used in short-read genome assembly applications like Velvet and Abyss. According to their announcement, the CGC is capable of providing users with performance speed-ups of 2.2 to 8.4 times using Convey’s FPGA-Intel x86 processor combo and a parallel memory subsystem.

The use cases for Convey’s CGC include the Department of Energy's Joint Genome Institute researchers sequencing and analyzing 268 GB of metagenomic DNA from microbes incubated in cow rumen. With the CGC, the researchers were able to speed up the discovery process by as much as 2.8 times and reduce the memory requirements by roughly 83 percent. The results so far have been the discovery of nearly 30,000 new enzymes that could improve biofuel production.

It’s worth noting that Convey’s “hybrid solution” is essentially a standard x86 processor with an FPGA as a “co-processor” - certainly not a novel combination for bioinformatics. Possibly a better name for what Convey’s hardware does is “coherent co-processing” as the secret sauce is their specific implementation architecture where they've placed the FPGA very close to the CPU so that both processors work in lockstep with the same memory system model.

US & EU Informatics Leaders Collaborate on Health IT Policy

By Matthew Dublin

The ARGOS eHealth Consortium, a project funded by the European Commission to develop and promote common methods for responding to global "eHealth" (healthcare practice supported by electronic processes and communication) challenges, held a final meeting this week in Budapest. The ARGOS meeting was intended to foster agreement among health information technology, or HIT, leaders in Europe and the US. More than 75 participants attended the final meeting, including the US Department of Health and Human Services' Office of the National Coordinator on Health Information Technology.

The attendees finalized a series of recommendations in the following areas:

Interoperability in eHealth and HIT, and certification of Electronic Health Record systems, or EHRs

Defining a common, consistent approach and identifying indicators and tools for measuring adoption, usage, and benefits of HIT and eHealth

Simulating human physiology and diseases with a focus on development of the Virtual Physiological Human and its use to support diagnosis and treatment of rare diseases

Workforce and e-health capacity-building to address current and projected global shortages of individuals trained to develop, implement, maintain, and use HIT and EHRs

According to the American College of Medical Informatics board chair Nancy Lorenzi, a professor of Biomedical Informatics at Vanderbilt University School of Medicine, "the world is increasingly more interconnected, and information and communication technologies are supporting these interconnections." Lorenzi added, "Cooperative efforts such as ARGOS support investigations to address health policy challenges in a world that is becoming virtually border-less in terms of health care."

The idea is that the ARGOS recommendations will ultimately be synthesized to develop coordinated policy actions to implement both the Europe and the US.

Meryl Bloomrosen, AMIA vice president for Policy and Government Relations, explained AMIA's role as the US convener of ARGOS, saying, "AMIA and its members have worked on interoperability, standards, benefits of health information technology, and workforce issues for decades. ... AMIA welcomes the recently signed Memorandum of Understanding between the US Dept. of Health and Human Services and the EU because it promotes a common approach on the interoperability of EHRs and education programs for information technology and health professionals. The more interoperability that exists in health IT, the greater the consistency is in quality of care delivered to patients."

The "Memorandum of Understanding" was signed late last year by the vice president of the European Commission Neelie Kroes and the US Secretary of Health and Human Services Kathleen Sebelius to promote a common approach on the interoperability of electronic health records and on education programs for information technology and health professionals.

New Network Tool for Private Clouds & Healthcare IT

By Matthew Dublin

Diagnosing performance issues in an IT infrastructure is never easy and as we’ve seen with the recent Amazon cloud crash, these issues are even more difficult to track in a cloud computing environment. But a new technology by AppNeta may help IT managers better obtain information about servers, networks, and desktop performance.

AppNeta's PathView microAppliance is being touted as an easy-to-use network administration monitoring tool that’s the size of a cell phone. It can be implemented at remote locations where you want to monitor network activity, requires very little power, and uses a standard Ethernet connection. Once the PathView microAppliance is plugged in, it uploads network performance data to an AppNeta cloud server.

John D. Halamka, chief information officer of Beth Israel Deaconess Medical Center and chief information officer at Harvard Medical School, writes on his blog Life as a Healthcare CIO, “as we all roll out EHRs to small provider offices, often with challenging internet connections, remote monitoring of cloud network performance becomes even more critical… Currently we have deployed these devices at our central EHR private cloud site and two of major remote practices The level of detail and depth of available metrics and reports is amazing. A low cost, zero administration, cloud-based, network sniffer that is truly plug and play. That's cool!”

As the post points out, for electronic health records cloud providers, configuration is simple and you can send it via UPS to a provide practice with installation instructions and gather performance data sans complex onsite network sniffing setups.

AWS Supports Mobile Cloud Computing

By Matthew Dublin

Amazon Web Services has expanded support for the AWS software development kits for mobile device operating systems Android and Apple's iOS for the iPhone and iPad. Developers can now create applications that will enable users to access Amazon EC2, Amazon CloudWatch, Amazon Simple Email Service, Elastic Load Balancing, and Auto Scaling, all from their mobile devices.

AWS already supports mobile device software development for Amazon S3, Amazon SimpleDB, Amazon Simple Queue Service, and Amazon SNS.

Probably the most significant addition to the mobile development canon is AWS Compute, which allows cloud users to launch and manage Amazon EC2 instances, and Monitoring, which allows for monitoring of Amazon EC2 instances as well as EBS volumes, Elastic Load Balancers, and RDS database instances in real-time using Amazon's CloudWatch, a Web service that provides monitoring for AWS cloud resources.

Having access to these AWS services could prove to be quite useful for investigators on the go who want to check in and manage research projects running on the cloud.

There have already been lots of mobile app development for bioinformatics, going all the way back to 2002 when a group of researchers from the University of Turku in Finland released the bioWAP service, which provides mobile access to bioinformatics databases and software tools. But this area of bioinformatics development really took off with the advent of the iPad and iPhone. Some of the most recent bioinformatics apps for iPhone and iPad include the iProto Human and iProto Yeast, applications that let you search the whole human proteome and yeast proteome respectively, with your iPhone, and in your pocket for quick, easy and offline accessibility.

Another application called genomePad for iPhone, iPod touch, and iPad, with the rapid development and release of powerful mobile computing devices, performing bioinformatics tasks on the go is becoming increasingly feasible and useful for busy scientists. GenomePad takes advantage of many features and qualities from both the iPhone and the genomic maps from the UCSC Genome Browser to make portable browsing of the UCSC possible on the iPhone and iPod Touch.

In 2009, researchers from Carnegie Mellon University led by Eugene Marinelli released Hyrdax, a cloud computing solution for mobile devices that uses MapReduce. If the idea with cloud computing is to have limitless computing and storage, and do away from hardware costs and management, then it only makes sense for this trend of mobile devices, which are not increasingly marketed as portable mini-personal computers, and cloud computing to become more and more integrated.

There is also talk, courtesy of Mac Rumors, that in an effort to make the iPhone and iPad even cheap and more lightweight, Apple will do away with most of the devices' internal memory in favor of using a cloud solution for media backup.

Supercomputers Aid Nanopore Sequencing Design

By Matthew Dublin

Earthsky.org has a podcast by Aleksei Aksimentiev, an assistant professor at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign, describing his approach to personalized medicine with supercomputers.

90 second podcast:



8 minute podcast:



Aksimentiev is using the Ranger supercomputer at the Texas Advanced Computing Center to develop cheap DNA sequencing using nanopores. He and his colleagues are using Ranger to simulate in atomic detail the process of DNA transfers through these nanometer pores and develop a numerical model of the nanopore sensor. Recently, they carried out the first-ever atomistic simulations of DNA translocation through synthetic nanopores.

$25 Key Chain Computer Next Big Thing in Distributed Computing?

By Matthew Dublin

While not related to HPC or bioinformatics per say, an effort by the UK-based foundation Raspberry Pi might have some interesting implications for ultra-cheap distributed computing. Led by David Braben, who is well known in the gaming world as the developer of the legendary Elite, Raspberry Pi is looking to mass produce a key-chain computer to mass a PC when they hope to sell for $25.

Here are the provisional specs on their tiny computer:

• 700MHz ARM11
• 128MB of SDRAM
• OpenGL ES 2.0
• 1080p30 H.264 high-profile decode
• Composite and HDMI video output
• USB 2.0
• SD/MMC/SDIO memory card slot
• General-purpose I/O
• Open software (Ubuntu, Iceweasel, KOffice, Python)

The Raspberry Pi Foundation is a UK registered charity aimed at promoting the study of computer science and related topics in an educational setting. It says on their website that they expect their computer to eventually have many other applications both in the developed and developing world. Currently, their prototype is the size of a USB drive and can be plugged into a TV or touch screen used together with a mouse, keyboard, and monitor that are either composite or HDMI-compatible, to make a cheap tablet computer.

This effort will probably remind folks of the One Laptop Per a Child initiative, which is a bit higher-end and has relatively little adoption compared to the scope of the program's original vision.

Raspberry Pi device running Ubuntu 9.04:

Raspberry Pi device with attached 12MPixel camera module:

Harvard Researchers Use IBM Business Software For Drug Study

By Matthew Dublin

Harvard Medical School and Brigham and Women's Hospital are using IBM business analytics technology to study the effectiveness and potential safety of prescription drugs. Using IBM's Netezza data warehouse appliance, Harvard researchers are conducting pharmacoepidemiology studies to analyze data from millions of de-identified patient records that include insurance claims data to develop novel data-intensive drug safety research methods. Netezza data warehouse appliances architecturally integrate database, server and storage components into a single unit.

Netezza's appliances use a proprietary Asymmetric Massively Parallel Processing architecture that combines open blade-based servers and disk storage with a proprietary data filtering process using field-programmable gate arrays.

"We wanted a computing platform with massive analytics power, but was extremely simple to administer," said Sebastian Schneeweiss, Associate Professor of Medicine, Harvard Medical School and Vice Chief of the Brigham & Women's Hospital Division of Pharmacoepidemiology and Pharmacoeconomics in a release. "As global health care evolves toward a learning healthcare system with a need for ongoing comparative effectiveness and safety research integrated in routine care, it is imperative that research methods evolve in parallel. IBM Netezza will accelerate our ability to devise, test and publish new computationally intensive algorithms applied to ever larger longitudinal healthcare databases that we hope will become the gold standard for researchers globally."

This project will also use use the Netezza technology to study economics and outcomes research once the effectiveness of a particular medication is established. Pharmaceutical companies and health insurers will have teams of HEOR analysts then demonstrate the value of new medical products for pricing and coverage decisions to provide higher quality of care with the hope that by intelligently mining claims data with a powerful analytics platform like IBM Netezza, they may be able to provide a faster way to answer questions about the most effective therapies, which has obvious economic benefits for both patients and health care providers.

Using business data architectures and IT technologies is not uncommon in life sciences research. In the journal Bioinformatics, a paper by Lauren Boyd and her colleagues entitled "The caBIG Life Science Business Architecture Model" describes how the cancer Biomedical Informatics Grid, or caBIG has adopted "Business Architecture Models," or BAMs, models that describe what a business does and how activities are accomplished, in order to establish a "Life Science BAM," or LS BAM. The caBIG LS BAM provides a shared understanding of vocabulary and processes common in life sciences research. The latest version of the caBIG LS BAM includes 90 goals and 61 researchers within "Use Case and Activity Unified Modeling Language" UML diagrams.

The authors report that "to facilitate system interoperability across the cancer research enterprise, the LS BAM (and the LS DAM) can be used beyond caBIG to support their requirements definition efforts. Common business and information models provide a consistent understanding of business processes and data to be collected and exchanged, which should lower the barrier to interoperability. The model may therefore be a resource to train software engineers, facilitate development of standards and underpin software validation in many organizations, such as cancer centers, commercial tool providers or other large research institutions."

NSF Leadership Focus of Report on Cyberinfrastructure in Academia

By Matthew Dublin

The results of a 2010 workshop on how universities and research sites can interface with the national cyberinfrastructure are now available in a report. The workshop, hosted by Indiana University, looked at software and services, user support, and information technology, all with two goals in mind: Identifying common elements of widely used software stacks across the world and policy documents that research universities should have in place.

Discussed at length in the report is the concept of "campus bridging," a concept wherein a scientist or engineer’s personal cyberinfrastructure in seamlessly integrated into the cyberinfrastructure on the scientist’s campus; cyberinfrastructure at other campuses; and cyberinfrastructure at the regional, national, and international levels.

Indiana University is leading a task force on campus bridging that is funded by the National Science Foundation.

In early 2009 the NSF Advisory Committee for Cyberinfrastructure charged six different task forces to make strategic recommendations to the NSF on campus bridging, data, grand Challenges and Virtual Organizations, high performance computing; software and tools, and work force development.

Some of the recommendations for fostering campus grid interconnectedness in this report ask the NSF to step up to the plate and lead the establishment of a coordinated, national cyberinfrastructure support system for users, a blueprint for a "National Cyberinfrastructure," and the emphasis of reliability and usability in its grant review process.

You can read the full report here.