Skip to main content
Premium Trial:

Request an Annual Quote

IBM Opens Up Informatics Toolkit to Accelerate COVID-19 Research, Drug Development

CHICAGO – Add IBM to the growing list of companies developing new information technology to expedite research into the novel SARS-CoV-2 coronavirus in hopes of finding treatments for COVID-19.

IBM Research said that it has been "actively developing" new cloud and artificial intelligence-based technologies to help researchers accelerate drug discovery and treatment in response to the COVID-19 pandemic. In collaboration with IBM Watson Health, the company has announced the free availability of a series of informatics tools in pursuit of answers.

Big Blue has released a "deep search" tool that lets users query data across several public and licensed databases, including the White House's COVID-19 Open Research Dataset (CORD-19),, DrugBank, and the National Institutes of Health's GenBank. Another tool is an AI engine with information on 3,000 molecules with "COVID-related qualities" that is intended to help researchers identify "desirable properties" for potential treatments, the company said.

IBM is also allowing free access to its Functional Genomics Platform, a cloud-based, interactive data repository of more than 300 million microbial genome sequences. Formerly called OMXware, the Functional Genomics Platform is a relational database meant to help researchers link genotypes to phenotypes for all these sequences, pulled from sources including GenBank and the Sequence Read Archive.

IBM annotates the sequences to identify each gene and protein, drills down to identify protein domains, then assigns standardized codes to describe related molecular functions and biological processes for each domain.

Lead bioinformatician Kristen Beck said April 1 at a virtual conference hosted by the Stanford Institute for Human-Centered Artificial Intelligence that IBM has processed more than 25,000 viral genomes on this platform, including SARS-CoV-2 and other coronaviruses. The SARS-CoV-2 collection contains more than 3 million genes, genomes, proteins, and functional domains, and IBM continues to process publicly available coronavirus sequences as they are released.

The platform includes a developer toolkit so research organizations can integrate Functional Genomics Platform searches into their own computing systems and workflows and share their findings with other investigators. Beck said that IBM hopes to produce new AI algorithms from the learnings of coronavirus researchers.

The company also has opened up access to its Micromedex drug compendium and DynaMed search tool for clinicians. "Making Micromedex and DynaMed jointly available for free to the frontline clinicians is a way for us to take our expertise in assembling disparate information and make it available in a way that's natural to the way that clinicians might seek information," said Anil Jain, vice president and chief health information officer at IBM Watson Health.

"We believe that it's the frontline clinicians who need access. The last thing they need is to have barriers to get them that information. That could be life-saving in some cases," he added.

Jain, himself a physician, noted that many public databases are not organized in ways that could best help researchers. He said that the pandemic has accelerated the application of the scientific method.

"It's very early, but it's moving very, very quickly," Jain said of COVID-19 research and the analytics that support the quest for treatments and vaccines.

"We're looking at an environment that is in a hyper phase where everything is happening much, much more quickly, including collaboration," he continued. "Some of the other things that we've been focusing on is not just that the actual tooling itself, but how do we make the tooling much more collaborative using open-source notebooks where people can share information and things of that sort."

This work dovetails with that of the COVID-19 High Performance Computing Consortium, created by the White House last month to bring government, industry, and academic partners together to apply computing resources to combat the pandemic. That group is led by the US Department of Energy and IBM in conjunction with the White House Office of Science and Technology Policy and Michael Kratsios, who serves as CTO of the United States.

"COVID-19 is not something that is going to be solved by one individual company or by one individual lab or academic research group. It's going to have to take a community to do it," Jain said of the consortium.

The consortium, which also includes tech giants Amazon Web Services, Google Cloud, Hewlett Packard, Microsoft, and Nvidia, plus academic researchers from Massachusetts Institute of Technology, Rensselaer Polytechnic Institute, the University of Texas, Carnegie Mellon University, and others among its 31 members, currently has a combined 418 petaflops of supercomputing power across 3.8 million CPU cores.

As of Wednesday, the group had received 55 proposals from researchers, 29 of which had been matched with supercomputing systems, an IBM spokesperson told GenomeWeb. Twenty-three of those experiments have already begun.

Projects fall into three main categories: improving the understanding of the protein structure of the SARS-CoV-2 coronavirus; applying AI to pinpoint cellular binding sites or identify molecular candidates for drug discovery; and forecasting the spread of the pandemic.

"The focus is on how we can take some of the tooling that we've been working on for some time and either refocusing energy on COVID-19 or taking disparate pieces and putting them together and making the COVID-19 corpus much more useable and available for free," Jain said.

In a blog post last week, HPC Consortium Cochairs Dario Gil, director of IBM Research, and Paul Dabbar, Department of Energy undersecretary for science, discussed some of the proposals accepted to date.

One, headed by an Argonne National Laboratory computational biologist, seeks to apply AI to improve understanding of "fundamental biological mechanisms" of the coronavirus and related respiratory disease in search of potential therapies. A similar plan from German AI firm Innoplexis — which does not currently have access to many of its own computing resources due to facility closures — aims to generate novel molecules in pursuit of a new drug.

A team from MIT is trying to mimic ACE2 receptors in silico to "computationally evolve soluble receptor decoys," to minimize side effects of potential therapies. The ACE2 enzyme has been identified as the primary receptor of SARS-CoV-2.

A fourth group, led by bioscientists at the National Aeronautics and Space Administration, is looking to tap into the space agency's Ames supercomputer to analyze the whole genome of COVID-19 in an effort to define risk groups. According to the consortium leaders, the NASA researchers want to identify patients predisposed to developing acute respiratory distress, which could inform clinical trial enrollment.

Separately, IBM has joined a blockchain-driven data control and communication consortium called MiPasa for COVID-19 outbreak surveillance. Jain said that this network can help the company identify which data sources are trustworthy and which are not and to help safeguard patient privacy.

IBM is one of many that responded to the March 16 "call to action" by the White House Office of Science and Technology Policy, to create a machine-readable dataset for understanding COVID-19. As part of the call to action, the Allen Institute for AI, the Chan Zuckerberg Initiative, the Georgetown University Center for Security and Emerging Technology, Microsoft, and the US National Library of Medicine last month released the CORD-19 collection of literature. 

IBM started working on understanding the novel coronavirus and its genome late last year, shortly after the first human cases of COVID-19 appeared in Wuhan, China, Jain said. IBM computing platforms also ingested data on the related SARS-CoV-1 and MERS-CoV genomes so researchers could potentially learn from earlier coronavirus epidemics.

These platforms include technology developed by Explorys, a big-data analytics company that Jain was cofounder and CMO of until IBM acquired that firm in 2015 as part of the unveiling of the Watson Health business unit. A year later, IBM purchased Truven Health Analytics to bring healthcare administrative data capabilities to Watson Health. Those pieces are helping with disease surveillance.

With technology focused on clinical data like Explorys and on the administrative side like Truven, IBM and its partners are starting to understand the time lag between when procedure codes for COVID-19 become available and when information systems start capturing these codes in the course of patient care. This, according to Jain, can help spotlight the rate of COVID-19 testing, diagnoses, and treatment in specific geographies.

This knowledge also can help manage healthcare system capacity.

"It's probably not appropriate to tell a diabetic patient to come in and get checked out by their physician" in a facility or area experiencing a spike in COVID-19 cases. "It's probably better now for you have a telehealth visit if they're not feeling well."