Skip to main content
Premium Trial:

Request an Annual Quote

Updated Watson Can Help Researchers Identify Patterns in Data, Formulate New Hypotheses, IBM Says

Premium

NEWYORK (GenomeWeb) – IBM has added capabilities to Watson that it claims will be a boon for researchers in the pharmaceutical industry and other arms of the life science market, enabling them to better use the large quantities of data available in the scientific literature.

The company has announced the availability of the so-called Watson Discovery Advisor, a cloud-based system that is able to understand and use scientific language like chemical and biological terms as well as legal and intellectual property language to suggest potential research directions. While historically "Watson has been known for analyzing data to find answers … with Watson Discovery Advisor, we're moving beyond the answers and helping researchers come up with important new questions" and "formulate [new] hypotheses," Lauren O'Donnell, global general manager and vice president, IBM Life Sciences, told BioInform in an email.

It's "a first of a kind system that will allow scientists and researchers to discover and act on new patterns and connections in disparate [scientific] data," she said. Furthermore, since it can ingest and combine information from millions of articles, journals, and studies much faster than human beings can, it can help researchers accomplish more in far less time, she added.

It's "a natural extension of Watson's cognitive computing capability," Mike Rhodin, senior vice president, IBM Watson Group, said in statement. "We're empowering researchers with a powerful tool which will help increase the impact of investments organizations make in R&D."

Among the scientists in academia and industry that are using Watson Discovery Advisor's capabilities in their projects is a group from Baylor College of Medicine. The team, led by Oliver Lichtarge, a BCM professor of molecular and human genetics, biochemistry, and molecular biology, published the results of a retrospective study in the Association for Computing Machinery's digital library this week where they used the Knowledge Integration Toolkit (KnIT) — internally-developed software based on Watson's technology — to identify kinases that modify p53, an important protein related to different kinds of cancer.

BCM's KnIT combines Watson's text-mining capabilities with a method of representing biological data, and a reasoning algorithm that is used to make inferences from the data, Lichtarge, who is also the director of BCM's Center of Computational and Integrative Biomedical Research, told BioInform. The system is able to extract facts from biomedical literature, and to represent those facts as elements of a network — one that covers both known relationships as well as new potential relationships between previously unrelated elements — "so that we could actually make hypotheses based on the facts," he said. Lichtarge and his colleagues began working with a team from IBM, led by Scott Spangler, principal data scientist at IBM, on KnIT roughly two years ago. At the time IBM was looking for partner institutions interested in using Watson's capabilities to interpret biological, medical, and clinical data.

The basic premise of the BCM project is that there are far too many papers currently available — and more being published daily — in the scientific literature for scientists to read and understand properly in a time frame that is conducive to research. For example, there are more than 70,000 papers published on the p53 protein alone. "On average, a scientist might read between one and five research papers on a good day," Lichtarge said in a statement. "To put this in perspective with p53 ... even if I'm reading five papers a day, it could take me nearly 38 years to completely understand all of the research already available today on this protein."

The idea was "that we might be able to build on top of Watson tools that will adapt it to biomedical and molecular biology vocabularies and in so doing we could extract facts from the literature on a very, very large scale," he told BioInform.

After they had developed the system, the partners tested its abilities to make hypotheses about kinases that phosphorylate p53 a priori. Their first step was to feed the system related papers published prior to 2003 and asked the system based on that information to suggest kinases that would target p53. KnIT suggested 74 kinases as potential modifiers. Of these, prior to 2003, 10 were known to phosphorylate p53, and nine were discovered at a later date. KnIT identified the 10 known p53 phosphorylators but more importantly, it successfully predicted seven of the nine p53 modulators that were found after 2003.

Buoyed by that success, the researchers then used the system to explore the roughly 70,000 papers currently available in Medline on p53. They have begun testing the kinases suggested by KnIT in the lab and so far have evidence of two new kinases. These findings suggest that their approach can be used to extract useful information from literature but these are preliminary results, Lichtarge stressed. "We need to do more assays, we need to do assays on the other predictions, and we need to compare with negative controls."

The BCM researchers presented the results of their proof-of-principle study using p53 data at the Association for Computing Machinery's conference in New York City this week. They and their collaborators at IBM are continuing to work on KnIT, which is still very early in its development, Lichtarge said.

Long-term plans for the system include "strengthen[ing] the robustness of our text mining to become better at disambiguating terms so that we can recognize that different names refer to different proteins and that sometimes the same name refers to different proteins," he said. Also, "we need to gather more information on different types of elements in the abstracts so that we can refine our text mining to make it richer in terms of representing the context in which the facts are being mined."

Other plans include developing methods of assessing "the certainty or uncertainty of the facts so that we can assess the risk … of the hypothesis we generate [and] eventually we want to try to develop a way to score the potential value of those hypotheses so that people have an idea of the risk and potential reward associated with those hypotheses," he said.

Ultimately, "the hope is to offer to experimentalists, hypotheses that are based on the totality of the literature, which they themselves simply do not have time to read, and also eventually [to] suggest the risk and reward that may come from those different hypotheses," he said. "We know that the system will never replace reading of the literature by a highly trained specialist … but the strength of the system is that even if its analysis of the literature is very superficial, it can do it on a scale vastly larger than the scientist can. It remains up to the scientist to examine those hypotheses, put them in the context of what he or she knows, critically evaluate them, and decide whether they want to act on it or not."

Watson Discovery Advisor is also being used at the New York Genome Center. Earlier this year, the NYGC said it planned to use a prototype of Watson specifically designed to handle genomic data for a clinical research study aimed at finding better treatments for glioblastoma, an aggressive and malignant type of brain cancer.

On the commercial front, Johnson & Johnson is collaborating with the IBM Watson Discovery Advisor team to teach Watson to read and understand scientific papers that detail clinical trial outcomes that are used to develop and evaluate medications and other treatments as part of comparative effectiveness studies of drugs. The goal of the project is to provide researchers with a system them allows them to ask questions of the data in order to determine the effectiveness of a treatment compared to other medications as well as any side effects.

The Scan

NFTs for Genome Sharing

Nature News writes that non-fungible tokens could be a way for people to profit from sharing genomic data.

Wastewater Warning System

Time magazine writes that cities and college campuses are monitoring sewage for SARS-CoV-2, an approach officials hope lasts beyond COVID-19.

Networks to Boost Surveillance

Scientific American writes that new organizations and networks aim to improve the ability of developing countries to conduct SARS-CoV-2 genomic surveillance.

Genome Biology Papers on Gastric Cancer Epimutations, BUTTERFLY, GUNC Tool

In Genome Biology this week: recurrent epigenetic mutations in gastric cancer, correction tool for unique molecular identifier-based assays, and more.