Skip to main content
Premium Trial:

Request an Annual Quote

Jax, Microsoft Team to Mine Medical Literature for Curating Clinical Knowledgebase


CHICAGO – The Jackson Laboratory has turned to Microsoft artificial intelligence technology to curate its fast-growing Jax Clinical Knowledgebase (CKB) in hopes of delivering up-to-the-minute clinical insights for the treatment of cancer. The collaboration is also helping to inform research efforts.

Specifically, Jax is applying Project Hanover, a Microsoft AI product that relies on machine reading, deep learning, and probability-based logic to enable tumor boards to sort through the ever-expanding universe of biomedical research in search of appropriate cancer therapies. Redmond, Washington-based Microsoft said that this technology assists human curators in their quest to "leave no fact behind."

Peter Lee, corporate VP of Microsoft Healthcare, announced the collaboration at the HLTH conference in Las Vegas last week and Microsoft shared some of his thoughts in a blog post.

"For something that really matters like cancer treatment where there are thousands of new research papers being published every day, we actually have a shot at having the machine read them all and help a board of cancer specialists answer questions about the latest research," Lee said in the post.

In a study of Hanover at Jax, a human curator extracted just two relevant patient responses to a drug targeting a specific mutation out of a sample of 823 scientific papers mentioning the drug. Hanover reduced the stack of papers to 43 within seconds and was able to extract 23 relevant pieces of information on patient response, according to Susan Mockus, associate director of clinical genomic market development for the Bar Harbor, Maine-based lab.

"The efficiency, the scale, and the relevancy, all of those components together make the Hanover platform so powerful," said Mockus, who works out of the Jackson Laboratory for Genomic Medicine in Farmington, Connecticut.

Mockus said that Jax and Microsoft are currently writing up their results for submission to at least one unspecified academic journal.

The Jax CKB contains structured data about cancer-related genetic mutations, cancer drugs, and patient responses to those drugs. This knowledgebase helps clinicians and researchers alike match mutation information to potential therapies and clinical trials.

However, like so many other biomedical research institutes, Jax has struggled to keep up with the estimated 4,000 new papers indexed by PubMed each day, of which about 200 are related to cancer, according Mockus.

Project Hanover helps Jax triage the literature. "Reading and finding the relevant papers is not a good use of subject-matter expertise. We can use machines to make that process automated," she said.

Hanover produces a ranking of each paper's relevance. Since it is not making decisions on behalf of clinicians, curators and clinicians at Jax and the provider organizations it partners with can then get right to potentially relevant literature to examine only a few candidate papers.

"It has really helped make the team more efficient," Mockus said.

Jax has so far implemented the Microsoft technology at a small scale. "It's at the point where we have high precision and high recall," she said. "We have used that to extract [knowledge] in a manual process [from] what comes off of Hanover, and now they are building the tools to make it automated into that Clinical Knowledgebase workflow."

Jackson Laboratory started building CKB in 2014 and launched it two years later to support interpretation of data coming out of its in-house genomics lab.

"As we started to build this database, we knew it was a great tool that we wanted to share within the community to empower oncologists to make decisions as well," Mockus said. "But I always knew that there was going to be a problem with scale."

Mockus said that a colleague introduced her to Project Hanover lead researcher Hoifong Poon as someone who could potentially address the scale issue.

Poon said that the collaboration will allow Microsoft and Jax to explore a "new frontier in real-world evidence" by breaking through a bottleneck in the interpretation of molecular data caused by the sheer size of the literature.

Poon, director of precision health natural language processing for Microsoft's research operations, said that humans are good at curating and discerning relevant parts of text and datasets. However, they are not so great doing so on the kind of scale needed to keep up with the literally thousands of new biomedical papers that show up on PubMed every day.

"Human and computer actually have a lot of complementary aspects," Poon said.

He likened Hanover to automating the search for needles in a haystack.

"Most of the haystack does actually not even remotely look like needles, so that's where AI can come in," he said. "Even though machine reading is far from perfect at this point, it could actually be very good … to weed out a lot of those irrelevant things and then put the spotlight on the things that actually look like plausible needle candidates."

That allows users to narrow their manual work to this short list of candidates. "They are focusing on validating whether each candidate fact is indeed actually correct" and worthy of passing along to tumor boards for interpretation, Poon said.

One such user is the Maine Cancer Genomics Initiative (MCGI), a Jax-run project that works with health systems and oncology practices across Maine to deliver somatic testing results and interpretation services.

MCGI, which launched in 2016 and has been enrolling patients since 2017, has been using the Clinical Knowledgebase to help generate reports and to inform tumor boards that it convenes on behalf of its member hospitals. It has enrolled 1,100 patients, mostly from sparsely populated areas of what is largely a rural state with no academic clinical oncology practices.

"Having support from an AI-based system is ultimately going to be helpful because it sifts through all the information in a quicker way and a more efficient and effective way, so ultimately you're really getting the most up-to-date information for the individual patient at the time that you need it," said MCGI Medical Director Jens Rueter, a practicing oncologist in rural Brewer, Maine.

"If you're doing a tumor board today with a test result that was delivered a week ago, which is a fairly common scenario, then you want to know the most up-to-date information today and not a week ago," Rueter said. "You would want to know about any evidence that has accumulated in that one-week time span."

MCGI actually is taking minutes of genomic tumor board discussions and disseminating those notes to clinicians to add a subjective component to the decision-making process, according to Rueter. He said that clinicians often employ the minutes to justify genome-informed treatments to payors, particularly off-label prescriptions.

Rueter said that the statewide project is working with Jax to design a strategy for incorporating outcomes information into CKB. "I could imagine that down the road, this experiential data could also feed back into any sort of AI algorithm," he added.

Microsoft of course is one of dozens if not hundreds of companies offering technology to comb through medical literature.

Mockus said that Hanover goes deeper than just presenting drug names to verify the relevance of literature, but it is not trying to be an all-encompassing technology that promises to come up with clinical recommendations. It is more than a "black box" in that curators know what the system is presenting to them, she said.

Poon said that curators need the most help with finding relations between mutations and potential treatments, such as whether a mutation in a tumor is susceptible or resistant to a particular drug.

Natural-language processing outside biomedicine tends to look at binary relations or single phrases in text, but medicine is far too complex for such simple concepts.

"We need to go beyond binary reads," and to start taking into account the cancer type and the evidence type, Poon said. 

"The coverage is really important because if you miss a fact, it could mean life or death for patients," Poon said. "You don't really have a lot of redundancy because your most valuable fact is probably only mentioned once in this paper, so we really need to tackle this more complex linguistic phenomenon."

Mockus said that the Jax-Microsoft collaboration for now is focused exclusively on somatic mutations in solid and hematological tumors, but there is potential to expand the scope in the future.

For example, she envisions using Hanover to look at the influence that epigenetics and microbiomes might have on oncology drugs. "Clinical Knowledgebase is ideally suited to incorporate that, but again, we're going to need Hanover to help us identify what's relevant," Mockus said.