Skip to main content
Premium Trial:

Request an Annual Quote

CLC Bio, IBM Launch Combined Hardware, Software System for NGS Data Analysis


BOSTON - CLC Bio and IBM have partnered to develop a combined hardware and software solution for next-generation sequencing data analysis.

The partners launched the offering, called the CLC Bio/IBM Genomics Analytics Solution, at the Bio-IT World conference held here this week. The system includes a computing cluster built on IBM hardware, along with the CLC Genomics Server software for large-scale genomics sequencing data analysis and the CLC Genomics Workbench client software for analyzing, comparing, and visualizing high-throughput sequencing data.

The cluster compute nodes are IBM System x 3550 M4 rack servers, powered by Intel Xeon E5-2650 processors. The nodes are connected to an IBM Storwize V7000 Unified network attached storage system, which consolidates block and file workloads. Storwize V7000 Unified systems support file data storage using the IBM General Parallel File System.

The combined solution comes in three different configurations, beginning with a system that offers 48 CPU cores and 192 gigabytes of memory, one that provides 96 CPU cores and 384 GBs of memory, and a third system that offers 192 CPU cores and 768 GBs of memory.

In a conversation with BioInform at the conference, IBM representatives said the company first broached the idea of a partnership with CLC Bio at the Bio-IT conference three years ago. Last year, the companies worked on benchmarking the performance of each of its three analytics solutions in terms of genomic read mapping and variant calling.

According to their tests, the smallest computing system, which is comprised of three compute nodes and offers 20 terabytes of effective storage capacity, can analyze seven full human genomes at 37x coverage per week. The medium-sized system, which has six compute nodes and 55 TB of storage, can process fourteen genomes per week while the largest system with 12 compute nodes and 90 TB of storage can process 28 genomes per week.

Janis Landry-Lane, IBM's director for world wide deep computing, said the partnership benefits IBM because it provides an opportunity to learn about the performance needs of genomic researchers, the kinds of analysis jobs they need to run on the hardware, as well as explore the full NGS analysis pipeline.

She added that the partnership also lets CLC Bio and its customers benefit from HPC "best practices" that IBM has developed over the years, in addition to receiving systems that are optimized to provide maximum performance.

An additional benefit to CLC Bio is access to potential IBM customers who might be interested in exploring and using genomic data, Lasse Görlitz, CLC Bio's vice president of communications, told BioInform.

It also provides customers with ready-made infrastructure that has been optimized for their needs, he said. "There is a huge difference [between] buying a professional enterprise setup" from IBM and trying to cobble together a similar cluster internally using the same hardware. "You are going to get a totally different performance on that system because its not going to be optimized even though you have the same hardware," he said.

Further, the partnership moves the burden of providing high-performance hardware for NGS analysis to a company that has the appropriate expertise while allowing CLC Bio to focus on algorithm development, he said.

"We are really good on the algorithm side but we have no idea about how to build a cluster system and optimize it," he said.

For example, IBM can make improvements to the cluster's file system "that we have no clue" how to do, but which "will accelerate the user experience significantly," he said.

Other optimization "tricks" to the improve the system's performance focus on relieving input/output bottlenecks and addressing memory issues, Kathy Tzeng, a life sciences senior technical staff member at IBM, told BioInform.

IBM's system also addresses issues associated with power and cooling system costs, which are often not taken into account with smaller "entry level" clusters, Görlitz said.

Görlitz said that pricing for the the CLC Bio/IBM Genomics Analytics Solution varies but that the starting price would be in the six figures.