Skip to main content
Premium Trial:

Request an Annual Quote

Challenge of Developing IBM's Dr. Watson Not Technical, as Much as Cultural, Researchers Say

Premium

Originally published Feb. 28.

By Turna Ray

The day after IBM's Watson blew away Jeopardy! champions Ken Jennings and Brad Rutter earlier this month, the technology giant announced its intent to apply the computer's analytical power to providing healthcare solutions. According to a researcher involved in helping IBM develop the healthcare-focused version of the computer, dubbed Dr. Watson, the computing system could be an invaluable aid to physicians in the delivery of personalized medicine, which requires doctors to consider a complex mix of patient data, such as family history, environmental factors, and genomic information.

"As we begin beefing up the electronic medical record of patients, which right now is almost a representation of the paper record, and is not organized or structured in any way, … there's going to be more of a push toward including genomic and proteomic data," Eliot Siegel, director of the University of Maryland's Imaging Research Technologies Laboratory and one of the researchers working with IBM on the Watson healthcare project, told PGx Reporter. "What we're going to need, to fully take advantage of Watson, is to have the genetic and epigenetic information of patients so that the program will be able to utilize and organize that also in medical-decision making."

As genomic medicine becomes more prevalent in healthcare, doctors are being inundated by rapidly evolving scientific knowledge, complex new molecular tests, and vast amounts of DNA data from patients. A physician's assistant in the form of Dr. Watson could be a solution for medical professionals, since it has the ability to not only quickly process large chunks of information from different sources, such as databases and published articles, but through its natural language processing capabilities, the computer can also connect its existing knowledge base with the information found in patients' charts, doctors' notes, and in EMRs.

Dr. Watson will operate on a massively parallel probabilistic evidence-based architecture called Deep Question Answering, or DeepQA, and will possess natural language processing capacity and machine learning capabilities. IBM is working with Nuance to equip the expert system with speech recognition and clinical language understanding solutions. The companies are hoping that the system will help hospitals, physicians, and payors make faster and more accurate treatment decisions.

Nuance and IBM expect the first commercial offerings from this effort to be available in the next year and a half to two years. However, Siegel believes that developing and broadly implementing a physician assistant-type computer system that can help doctors deliver genomically-guided personalized medicine could take more than a decade, depending on how quickly healthcare stakeholders are willing to change the way they store and share data.

"There's not necessarily a technical barrier" to developing Dr. Watson to enable personalized medicine, "but it's about changing the culture, so people share," Siegel said. "And we also have to make sure that genomic data is collected more routinely from people who have cancer or other conditions.

"So it's definitely going to take five to ten years for places to more routinely start collecting these types of data," he said. "And then how long it's going to take for people to start sharing data outside of their own systems — that's so hard to predict."

IBM hasn't revealed how much funding it's pumping into the Dr. Watson project. However, according to several published reports, the company recently told analysts that it spent more than $30 million over four years developing the DeepQA architecture that underlies Watson's analytical muscle.

Getting Started

Researchers from Columbia University Medical Center and the University of Maryland's School of Medicine are also collaborating with IBM and Nuance on the Dr. Watson project. Columbia researchers will identify areas in medicine where Dr. Watson will have the most utility, while UMd researchers will work on figuring out how such a system can best help doctors.

According to Siegel, the first step will entail developing a knowledge base by going through published literature, information on the internet, and in various databases. Then, researchers will develop algorithms with which Watson can draw out important patient information from unstructured data in patient records, hospital charts, test results, and doctors' notes. Finally, Dr. Watson should be able to link individual patient data with information in its knowledge base to provide physicians with a number of diagnostic options and their probabilities of success.

A key challenge for Dr. Watson's developers is the fact that much of this patient-specific information doesn't currently exist electronically in any standardized format. "EMRs are really poorly organized; there's a lot of overlapping information and a lot of it is repetitious," Siegel said. "It can often take a good deal of time to read through an EMR. More and more people are copying notes and more and more progress reports are looking like text messages or short e-mails. So, [for Watson] to able to summarize, analyze, and synthesize what's in the EMR will be a significant step."

[ pagebreak ]

Furthermore, disparate entities in healthcare — doctors' offices, hospitals, managed care systems, academic centers, and insurance companies — each have their own records systems and databases that they don't share with others. And there is little incentive right now for them to open up their proprietary resources.

"Any one system … is really a small drop in the bucket compared to data on a national level," Siegel said. But getting people to agree to share data and connect data systems is going to be difficult, since "everyone seems to be rewarded by all the funding mechanisms for not sharing.

"If I have a big collection of data, then that's my competitive advantage to get funding from agencies such as the National Institutes of Health and others. So, how can I convince" hospitals and academic centers with diverse patient databases, to "make their data public?"

On a smaller scale, particularly in genomic medicine, industry, academia, and the government are starting to share databases and are conducting pre-competitive exploratory studies to advance medical knowledge.

For example, the International Serious Adverse Events Consortium recently announced that it would mine the EMRs at nine HMOs to identify gene markers linked to drug-related adverse events. The international research consortium, which involves drug developers, academic researchers, and health regulators, makes the raw data from its studies available to all researchers, as long as they agree not to jeopardize the anonymity of individual study subjects and agree to use the information for "legitimate" drug development purposes (PGx Reporter 02/02/11).

Earlier this year, the National Institutes of Health announced it would create the National Center for Advancing Translational Science, which aims to build a national infrastructure for translational medicine, support the development of new diagnostics and therapeutics, and provide resources for collaborations between academia and the private sector. Through this new center, NIH is hoping to conduct the types of large genomic studies and research into treatments for rare diseases that drug developers are unable or unwilling to do themselves (PGx Reporter 01/26/11).

Even in the pharma industry, drug developers are more open to collaborating with their competitors when it comes to genomic medicine. Last year, three of the largest drug developers — Lilly, Merck, and Pfizer — formed the Asian Cancer Research Group, aiming to build "one of the most extensive pharmacogenomic cancer databases known to date." They also agreed to share precompetitive genomic data to advance treatments for lung and gastric cancers in Asia (PGx Reporter 02/24/10).

Up in the Cloud

In order to harness the full capability of Dr. Watson as IBM envisions it, genomic data will have to be collected much more readily in healthcare, all patient records will have to be stored in an electronic system, and data sharing will need to happen on a national and global scale. None of these activities is standard practice in healthcare today, and changing the system will take time.

According to Siegel, one idea for implementing Dr. Watson so it's available to the largest number of people is to put its computational power in the cloud. The use of cloud computing is on the rise in research settings and somewhat in the clinic, Siegel said. "You can certainly imagine it operating like Google Earth, for example, where on demand it would allow you access to the large medical databases, and you would then combine information that you had from a particular patient."

[ pagebreak ]

IBM and Aetna subsidiary ActiveHealth Management last year launched a new cloud computing and clinical decision support service, called the Collaborative Care Solution, that they said will enable doctors and hospitals to create a more detailed patient record from multiple sources and use advanced analytical software to more accurately diagnose patients.

Separately, Explorys, which operates a massively parallel cloud-computing platform, announced deals last year with MetroHealth System, Summa Health System, and University Hospitals health system. All three healthcare providers wanted access to Explorys' more than a billion curated clinical records to "search and analyze patient populations, treatment protocols, and clinical outcomes [in a] HIPAA and HITECH compliant environment." Explorys is aiming to expand its network nationwide.

These efforts to apply cloud computing technology don't specifically address DNA data, however, raising the question of whether people might be more sensitive about the dangers of sharing genomic data over the cloud.

Cloud computing is such a new technology that the privacy and ethical parameters of using this approach on a national or global scale still need to be worked out. Even though drug companies and healthcare providers have used "private clouds" to share data in internal networks, many of these players might balk at the idea of sending patient data out of their protected networks into a "public cloud" that grants them access to external computing networks.

"Will hospitals feel comfortable with the computational power being external to the hospital and then potentially shipping some information out of the hospital? Or will they want that to be located within the confines of their healthcare enterprise?" Siegel posited. "I think that will have to be something that is debated. I think the cloud approach will win as long it's something that's private and secure, and the data are analyzed in a de-identified way."

Russ Altman, chairman of the bioengineering department at Stanford University, expressed similar reservations about the willingness of healthcare providers to analyze patient data through cloud computing, but felt that the privacy and ethical challenges weren't insurmountable.

"As an informatician I would love that, and I kind of drool at that thought. But as a practical matter, in terms of privacy — who has control of the record, and who's in charge of making sure that the records are accurate -- that seems like a very difficult thing to do," Altman told PGx Reporter.

"Like the human genome, cloud computing is so new, we haven't really gotten our arms around all the security and policy issues," he said. "We know we can set up a lot of processors that can do a lot of processing and store a lot of data. But the technology is ahead of the social mechanisms for modulating all this."

The Work Ahead

The application of artificial intelligence to assist doctors with clinical diagnoses has been tried in various academic settings. Watson's precursors include the MYCIN system, one of the first AI platforms developed at Stanford in the 1970s to help doctors identify bacteria-causing infections. Then came INTERNIST-I and CADUCEUS from the University of Pittsburgh (see end of article for more information). However, the slow analytical speed and complexity of these systems kept them restricted to academia and out of broad commercial use.

Altman described Watson's AI precursors as "brittle" systems that were hard to maintain and update. They were difficult to operate without the aid of hardware that are commonplace today, such as the mouse and high-resolution screens. These early systems "performed well under normal operating circumstances with a motivated user, but when there were odd features to a case or if the case fell outside their expertise, they failed, not gracefully," Altman said. Also, back then, "MDs didn't want that much help with diagnosis — it's a fun part of the job, and so there was not as much of a market as people thought."

As pressures on doctors' time and breadth of knowledge continue to grow, Dr. Watson's analytical power could potentially overcome many of the limitations of past expert systems when it comes to delivering accurate and timely clinical diagnosis. For the Jeopardy! challenge, Watson comprised 90 IBM Power 750 servers, which used a total of 2,880 core processors. These were clustered over a 10 gigabyte Ethernet network that could process 500 gigabytes per second.

All of the answers Watson gave during Jeopardy! came from knowledge that was stored in its memory, including the entire contents of Wikipedia, dictionaries, encyclopedias, novels, religious texts, and many other types of entries. Watson's computational power enabled it to answer questions in two to six seconds.

[ pagebreak ]

Dr. Watson may not need to answer physicians' queries that fast, but past AI systems took between 30 minutes and 90 minutes for an average consultation with a doctor, and that would be too slow for doctors today. In Altman's view, if an expert system like Watson is to have utility for doctors and hospitals, then it would have to answer a query about a particular patient in 30 seconds or less.

"Ten minutes is actually a lot of time for some practitioners who are seeing 60 patients a day," Altman said. "Physicians are time limited and the real value is talking with patients and gathering data. They will not be willing to type stuff in, so there must be voice recognition or even video recognition of images."

The development of an advanced AI system like Watson at the same time genomic medicine is evolving is fortuitous, according to Altman. "We don't even know how to incorporate genomic data across the healthcare continuum right now and we're just learning how to do it. So, that's kind of good, because as we figure it out we can use the help of advanced information technologies" like Watson, he said.

When it comes time for Watson to process genomic data, Altman noted that it may not be necessary to input all three billion base pairs of each person's genomic sequence into the medical record. "You'd probably want to do a more processed version, with the phenotypic consequences of that sequence," he said. "But we don't know how it's going to go."

Parsing Words

For Dr. Watson to become a reality in the delivery of personalized medicine, it will have to apply numerous algorithms specifically designed to extract genomic knowledge out of databases and the published literature. Many of these algorithms are currently under development by bioinformatics research groups.

For example, Altman and several colleagues at Stanford University are developing ontologies for extracting pharmacogenetics knowledge from the published literature. In a paper published in the Journal of Biomedical Informatics, Altman and his colleagues used a natural language parsing program developed by Stanford University's Chris Manning to map the grammatical structure of 87 million sentences from 17 million MEDLINE abstracts. The network of 40,000 semantic relationships developed out of this exercise may be "used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery," the study authors wrote.

"In the paper, we showed that we could go through a large number of PubMed abstracts and extract out genes and drugs and their relationships," Altman said. "We had previously done this by their co-occurrence in a sentence, but that can be filled with problems." For example, when extracting drug-gene relationships with co-occurrence, if a sentence states that a gene is not linked to a drug, by the fact that the gene and the drug occur together in the statement, the system will assume they are linked.

"By looking at the verbs that are used you can get much higher precision of extraction of information. So, in a sense we're building a gene-drug type resource that's a little bit similar to Watson," Altman said. "We haven't gone to the point where this can answer questions very quickly, or put it into a question form, but the core idea of creating a knowledge base and having a system pull it out of the literature based on natural language is very similar to the ideas in Watson."

This effort to build a semantic PGx network is related to the researchers' larger work cataloging gene variations and their impact on drug response for an online resource called the Pharmacogenomics Knowledge Base. Altman believes that the most immediate application of genomic knowledge in medical care is in the PGx space.

For Watson to be able to process and analyze genomic data, it will have to use similar semantic algorithms. For the Jeopardy! challenge, Watson's natural language processing capabilities were enabled by Princeton University's WordNet and other lexical resources.

Consequences of 'Toronto'

Currently, IBM is highlighting the use of Watson as a physician's assistant-type tool. However, the company plans to eventually make the system available to disparate entities in healthcare. University of Maryland's Siegel acknowledged that Dr. Watson could eventually have myriad applications in healthcare and meet the varying needs of different stakeholders.

"We have concentrated on the clinical applications, because we think it will have the highest visibility and the highest potential applicability," Siegel said. "But once we have a better understanding of how we can analyze large numbers of patient data, then that would be a really interesting and important use for it, particularly as there are increasing questions about the effectiveness of certain types of treatments."

[ pagebreak ]

The US Government has launched an independent institute to conduct research on the relative effectiveness and cost value of medical interventions. This comparative-effectiveness data stands to be an important consideration when health regulators are reviewing products for marketing approval and insurers deciding whether to pay for them. As such, payors, drug developers, and researchers are all ramping up efforts to conduct comparative effectiveness research. In this vein, a system like Watson could potentially be used to compare how patients respond to different medications.

Ultimately, the questions a payor, a doctor, or a researcher might ask of Dr. Watson will vary based on their interests. As such, it isn't hard to imagine instances when Watson may offer conflicting advice on the course of action for a particular patient, depending on whether a payor, a researcher, or a doctor is querying the system.

"This is a pretty standard problem in decision analysis," according to Altman. "It is more straightforward to model the decision-making of an individual with clear utilities for different outcomes. When multiple stakeholders are involved, it becomes an ill-defined problem.

"The patient, the family, doctors, the insurance company, and the hospital all have different motives and rewards, although they are all generally directed at patient welfare (some more than others)," he said via e-mail. "In the end, you must define which utility function is going to be maximized."

Also, before a system like Watson can be broadly implemented, health regulators will need to okay its use, and policymakers will need to develop rules for privacy and ethical use guidelines. "If physicians are using it to make decisions, the US Food and Drug Administration may want to get involved," Altman pointed out. But because "it is hard to validate performance over all possible inputs, and there is no clear regulatory and ethical precedent for approval, there would have to be lots of thought about this.”

Another question that has yet to be addressed with regard to the widespread use of Dr. Watson, is the possibility that the system could misdiagnose a patient. When Watson answered "Toronto" to a final Jeopardy! question under the US cities category, the penalty was only $947. However, if the technology is applied to the national healthcare space as IBM envisions, the consequences for a wrong answer could be dire.

Siegel isn't too worried about mistakes Dr. Watson might make, since the system is not meant to replace human doctors, who will hopefully catch inconsistencies in the computer's diagnosis. "I'm okay with Watson once in a while coming up with an answer like 'Toronto,'" Siegel said. "I'm used to people who are my assistants making mistakes just as I do. But I can tell you my day is much more efficient and effective, and safe having" fellows, medical students, and nurses, "work with me, even though I'm the final arbiter of decisions."

Likewise, in an editorial in Common Health, Isaac Kohane, codirector of Harvard Medical School's Center for Biomedical Informatics, wrote that an AI-enabled physician's assistant can ease the growing information burden on doctors and improve medical care in the US.

"Many will argue that correctly posing Jeopardy! questions is a long way from clinical diagnosis, but I am quite sure that 10 years ago, we would have been similarly skeptical that Watson could achieve its current performance," Kohane wrote. "Moreover, the corpora of medical literature and evidence that are growing exponentially and that so intimidate most training clinicians are exactly the grist that Dr. Watson requires to be accurate and timely. And by the way, Dr. Watson is going to find digesting all the genomic data a lot more palatable than most clinicians."

Even though Watson may beat Jeopardy! champions and answer difficult medical questions more quickly than doctors can, ultimately AI's biggest limitation remains that it will never be human.

"I think Watson may be okay at diagnosis, at being able to come up with statistical prognoses that are reasonable. It may be able to estimate prescriptions and doses, [but] it will have trouble touching patients in a human way," Altman said. Still, "people are working on that."


Have topics you'd like to see covered in Pharmacogenomics Reporter? Contact the editor at tray [at] genomeweb [.] com.


Dr. Watson's Precursors:

MYCIN was a rules-based artificial intelligence system developed at Stanford University in the early 1970s. It asked doctors a series of "yes" or "no" questions, used their answers to identify possible bacteria-causing infections affecting patients, and suggested a course of treatment. The program suggested the correct treatment strategy in 69 percent of cases, which was a better record than Stanford faculty had at the time. However, MYCIN was never used outside of academia due to concerns about using a computer in the diagnosis of disease. Also, the system was hard to use and required lengthy sessions with the doctor. It took around 30 minutes to yield a diagnosis.

INTERNIST-I was a computer-assisted clinical diagnostics tool developed in the 1970s at the University of Pittsburgh. The system was used as part of a long-standing course at the university on problem solving in clinical diagnosis. The program employed ranking algorithms based on stored disease profiles, a method that attempted to come close to how doctors processed complex information to arrive at a diagnosis. The system was never used outside the academic setting because the program wasn't easy to navigate and it took doctors too long to consult the system. According to published reports, average consultations with the system lasted between 30 to 90 minutes.

CADUCEUS was an expert system developed in the mid-1980s by Harry Pople at the University of Pittsburgh and is considered to be one of the most advanced medical knowledge system. It was an attempt to improve on MYCIN and move beyond diagnosing bacterial infections to tackle diagnostic issues for a variety of diseases. The system applied advanced logic concepts to manage complex issues in internal medicine.