Skip to main content

Natural Language Processing to Play Major Role in Bringing Watson into Clinics

Premium

By Uduak Grace Thomas

Under the terms of a recently inked agreement between IBM And Nuance, Watson's deep question answering, natural language processing, and machine learning capabilities will be linked with Nuance's speech recognition and Clinical Language Understanding, CLU, solutions to help physicians more accurately diagnose and treat their patients (BI 02/11/2011).

In the months leading up to the first offerings from the collaboration, researchers at IBM and Nuance will work with collaborators at Columbia University and the University of Maryland, to figure out how Watson can best help in the clinical setting as well as to incorporate some healthcare-specific adaptations to the system, Jennifer Chu-Carroll, a member of the Watson Research Team, told BioInform.

"For the most part, the natural language analytics, the machine learning and the whole architecture are domain independent so we expect to be able plug these into the medical domain," she said. However, "there [will] be some ... research and development that is specific to the medical domain that we are going to have to bring in."

She said that Watson will have access to several "dimensions of evidence" that will help the system suggest diagnoses for individual patients. These will include medical information in medical textbooks and other published literature; past case studies that, for example, describe the correlation between symptoms and diseases; patient-specific data including medical history; and experimental results and other kinds of data.

She also said that the partners "don’t know at this point" whether genomic data will be incorporated into Watson's knowledgebase.

She noted that Nuance's CLU solution, which also includes natural language-processing capabilities that are tailored to the healthcare space, will "supplement" Watson's more general abilities.

Nick van Terheyden, chief medical information officer for Nuance Healthcare, concurred and further distinguished the capabilities of the two tools.

"Watson has the capacity to consume large amounts of data in an automated fashion and to present that in a form that can be answered in a question-and-answer-type style," he explained to BioInform. "CLU is technology that takes narrative in the healthcare setting and extracts out distinct data elements but it does not link those data elements to research papers or practical activities ... the sum of the two is much greater than the two individually."

CLU is based on a natural language processing engine that includes an ontology "for structuring and capturing clinical information" and then connecting it to standardized medical vocabularies such as the Systematized Nomenclature of Medicine--Clinical Terms, ICD-9, ICD-10, among others, van Terheyden said.

"What all that translates to is the ability to extract out from a free-form narrative document structured, tagged clinical data, what I would term actionable information because its now semantically interoperable meaning that a computer can consume it without human intervention," he said. "That data is then available for use within clinical and research systems [and] within all of the tools in the healthcare setting that are driven by data that can be generated from the natural form."

An additional feature of the software is its ability to standardize data contained in medical documents using its built-in ontology. To illustrate this point, Van Terheyden noted that while there tends to be less variation in terminologies used in medical research papers, physician's notes are notorious for using different language to describe the same condition. For example, a physician may refer to a heart attack as a myocardial infarction, MI, or simply call it a heart attack.

CLU's ontology incorporates over 1.5 million concepts which it uses to normalize data.

Using the example of the heart attack, the tool records 'myocardial infarction' as a term linked to a set of references that capture other ways that term could be reported as well as instances where it would occur in a clinical setting since — whether it is the patient's current health complaint, a past condition, or a included in family history. Other relationships defined within the ontology include where in the body the condition occurs, if it leads to other diseases and so on.

"What the engine does, through a variety of steps, is to process through this information [and] understand or find those links based on the context of the information," Van Terheyden said. "[CLU] essentially creates a three-dimensional model of the data that says 'based on what I understand, here are all the linkages to my ontology that allows me to say this is what this term is'"

CLU will also add value to Nuance's speech recognition software — which is being incorporated into Watson to quite obviously enable physicians to simply speak to the system — since it can also mine the text based on its ontology.


Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.

The Scan

Rise of B.1.617.2 in the UK

According to the Guardian, UK officials expect the B.1.617.2 variant to soon be the dominant version of SARS-CoV-2 there.

Anne Schuchat to Retire

Anne Schuchat is retiring after more than 30 years at the US Centers for Disease Control and Prevention, Politico reports.

US to Share More Vaccines

CNN reports that the US will share 20 million doses of the Moderna, Pfizer, and Johnson & Johnson SARS-CoV-2 vaccines with other countries.

PNAS Papers on Gene Therapy Platform, Aspergillus Metabolome, Undernutrition Model Microbiome

In PNAS this week: approach to deliver protein-based treatments to cells, pan-secondary metabolome of Aspergillus, and more.