Skip to main content
Premium Trial:

Request an Annual Quote

Dana-Farber Releases Source Code for MatchMiner Cancer Trial Matcher


CHICAGO (GenomeWeb) – The Dana-Farber Cancer Institute has released the open-source code to MatchMiner, a computational platform to match patient genomic data to clinical trials.

The Boston institution's cBio Center last week published the full MatchMiner application on Github, including an application programming interface and user interface. Dana-Farber had previously released the matching engine.

MatchMiner automates the process of pairing candidates with clinical trials and vice-versa, for the benefit of clinicians, trial investigators, and cancer patients alike.

"Finding the appropriate patients with rare genomic profiles, tumor types, etc., is challenging, so we were tasked with developing this computational platform," explained Catherine Del Vecchio Fitz, senior research scientist in clinical genomics at Dana-Farber.

According to a prepublication article about MatchMiner on BioRxiv, less than 5 percent of current cancer patients are participating in clinical trials, but Memorial Sloan Kettering Cancer Center in New York has seen its enrollment rate hit 11 percent for genotype-matched trials from automating notification of new patient matches. MD Anderson Cancer Center in Houston reported similar improvements.

"[T]here is emerging evidence that advanced automated informatics platforms can greatly improve trial matching and increase participation rates," the Dana-Farber paper said.

Fitz added that at Dana-Farber, "everyone realizes that there's a great need for helping to facilitate the interpretation of genomic profiling tests. One avenue is providing therapy recommendations. Another avenue is providing relevant clinical trial matches."

"But there's not really a great platform that exists that does this in a computational manner, partly because the trials themselves don't really exist in the structured formats," she added. Fitz noted that the US National Cancer Institute and others are trying to structure some eligibility criteria, but that did not exist when MatchMiner development began in late 2016, and it still has not been fully developed.

"It's really hard to design a computational system that will actually do automated matching because you need structured content on both sides, both in terms of a patient's genomic profile and in terms of eligibility requirements for the trial," Fitz said.

MatchMiner pulls from four main streams of data: genomic alterations, clinical data, structured clinical trial information, and real-time clinical trial status information. Because clinical trial information in particular tends to be unformatted, the Dana-Farber cBio Center developed a programming language called CTML, for Clinical Trial Markup Language, to add structure.

"It's essentially a human-readable, machine-writable language, such that you can structure the content," said Fitz. CTML is meant for someone like Fitz, who has a background in biology rather than software programming, to be able to structure trial criteria in the language.

Even with the early success, much work remains.

Challenges include fitting MatchMiner into clinical workflows and integrating non-genomic eligibility criteria.

"Given the wide array of such criteria, this is a much greater challenge than integrating genomic data," the prepublication paper said. The team will have to modify CTML to include these non-genomic criteria and to pull structured clinical data from electronic medical records.

"We envision proposing a formal standard for CTML via widely used collaborative standard development processes such as the Global Alliance for Genomics and Health (GA4GH)," the paper said.

Since the paper went up on BioRxiv in November, Dana-Farber has been integrating MatchMiner into clinical workflows, including the institution's Epic Systems EMR. Fitz hopes the EMR integration will go live in the next couple of months.

"That will be a big asset in terms of clinicians being able to access trial matches for their patients during a visit, as they are navigating the EMR, without having to open another window," she said.

The bioinformatics team at Dana-Farber also has been trying to develop customized reports for oncologists. These reports might contain treatment recommendations and other means of expanding a knowledge base, in addition to trial eligibility information, according to Fitz.

"Also, we're experimenting with a model where we integrate with scheduling data so that we can say for a given oncologist, 'Here are your patients with appointments next week and here are the potential trial options,' to try to streamline the process for them," Fitz added.

With the release of the source code, Fitz wants to see MatchMiner deployed at multiple cancer centers, which is something else the technology developers have been working on since last year. Several other centers, which Fitz did not name, are "actively deploying MatchMiner" and curating trials to load onto the platform, she said.

Other institutions might store their data in different formats, which presents another potential roadblock.

"In terms of the genomic information coming from patients, it might be in a slightly different format, but a lot of it can be mapped to something very similar," Fitz said. The MatchMiner developers are helping at least a couple of other cancer centers learn how to encode their own trials in CTML, she said.

The MatchMiner team also has had discussions with the National Cancer Institute about standardizing structured trial information. "We certainly would be happy to ingest the information from them if they were able to reliably curate and make it accessible for people," Fitz said.

The long-term hope is to establish a consortium for MatchMiner use beyond Dana-Farber.