Skip to main content
Premium Trial:

Request an Annual Quote

CLC Bio, IBM Launch Combined Hardware, Software System for NGS Data Analysis


BOSTON - CLC Bio and IBM have partnered to develop a combined hardware and software solution for next-generation sequencing data analysis.

The partners launched the offering, called the CLC Bio/IBM Genomics Analytics Solution, at the Bio-IT World conference held here this week. The system includes a computing cluster built on IBM hardware, along with the CLC Genomics Server software for large-scale genomics sequencing data analysis and the CLC Genomics Workbench client software for analyzing, comparing, and visualizing high-throughput sequencing data.

The cluster compute nodes are IBM System x 3550 M4 rack servers, powered by Intel Xeon E5-2650 processors. The nodes are connected to an IBM Storwize V7000 Unified network attached storage system, which consolidates block and file workloads. Storwize V7000 Unified systems support file data storage using the IBM General Parallel File System.

The combined solution comes in three different configurations, beginning with a system that offers 48 CPU cores and 192 gigabytes of memory, one that provides 96 CPU cores and 384 GBs of memory, and a third system that offers 192 CPU cores and 768 GBs of memory.

In a conversation with BioInform at the conference, IBM representatives said the company first broached the idea of a partnership with CLC Bio at the Bio-IT conference three years ago. Last year, the companies worked on benchmarking the performance of each of its three analytics solutions in terms of genomic read mapping and variant calling.

According to their tests, the smallest computing system, which is comprised of three compute nodes and offers 20 terabytes of effective storage capacity, can analyze seven full human genomes at 37x coverage per week. The medium-sized system, which has six compute nodes and 55 TB of storage, can process fourteen genomes per week while the largest system with 12 compute nodes and 90 TB of storage can process 28 genomes per week.

Janis Landry-Lane, IBM's director for world wide deep computing, said the partnership benefits IBM because it provides an opportunity to learn about the performance needs of genomic researchers, the kinds of analysis jobs they need to run on the hardware, as well as explore the full NGS analysis pipeline.

She added that the partnership also lets CLC Bio and its customers benefit from HPC "best practices" that IBM has developed over the years, in addition to receiving systems that are optimized to provide maximum performance.

An additional benefit to CLC Bio is access to potential IBM customers who might be interested in exploring and using genomic data, Lasse Görlitz, CLC Bio's vice president of communications, told BioInform.

It also provides customers with ready-made infrastructure that has been optimized for their needs, he said. "There is a huge difference [between] buying a professional enterprise setup" from IBM and trying to cobble together a similar cluster internally using the same hardware. "You are going to get a totally different performance on that system because its not going to be optimized even though you have the same hardware," he said.

Further, the partnership moves the burden of providing high-performance hardware for NGS analysis to a company that has the appropriate expertise while allowing CLC Bio to focus on algorithm development, he said.

"We are really good on the algorithm side but we have no idea about how to build a cluster system and optimize it," he said.

For example, IBM can make improvements to the cluster's file system "that we have no clue" how to do, but which "will accelerate the user experience significantly," he said.

Other optimization "tricks" to the improve the system's performance focus on relieving input/output bottlenecks and addressing memory issues, Kathy Tzeng, a life sciences senior technical staff member at IBM, told BioInform.

IBM's system also addresses issues associated with power and cooling system costs, which are often not taken into account with smaller "entry level" clusters, Görlitz said.

Görlitz said that pricing for the the CLC Bio/IBM Genomics Analytics Solution varies but that the starting price would be in the six figures.

The Scan

Single-Cell Sequencing Points to Embryo Mosaicism

Mosaicism may affect preimplantation genetic tests for aneuploidy, a single-cell sequencing-based analysis of almost three dozen embryos in PLOS Genetics finds.

Rett Syndrome Mouse Model Study Points to RNA Editing Possibilities

Investigators targeted MECP2 in mutant mouse models of Rett syndrome, showing in PNAS that they could restore its expression and dial down symptoms.

Investigators Find Shared, Distinct Genetic Contributors to Childhood Hodgkin Lymphoma

An association study in JAMA Network Open uncovers risk variants within and beyond the human leukocyte antigen locus.

Transcriptomic, Epigenetic Study Appears to Explain Anti-Viral Effects of TB Vaccine

Researchers report in Science Advances on an interferon signature and long-term shifts in monocyte cell DNA methylation in Bacille Calmette-Guérin-vaccinated infant samples.