Skip to main content
Premium Trial:

Request an Annual Quote

NCI's Sharpless Calls Data Science 'Critical' to Cancer Research, Precision Oncology

Premium

CHICAGO – In the dawning age of precision oncology, bioinformatics has become indispensable to cancer research, and will continue to grow in importance, according to National Cancer Institute (NCI) Director Norman "Ned" Sharpless.

"I think the cancer research enterprise has really come to a point where the integration of data science into our work is critical," Sharpless said during a keynote address to the virtual American Medical Informatics Association (AMIA) Informatics Summit this week.

"I think data science is really an area where the NCI has a tremendously relevant role," Sharpless added, citing a "commitment to generating high-quality data and embracing really robust policies on data sharing."

Sharpless was at AMIA to discuss how medical informatics is transforming cancer research, and, in a way, tried to sell the field to cancer bioinformaticians because there is so much data to manage and analyze. He said that part of NCI's role involves bridging the worlds of cancer research and data science and training people who "speak both languages" to accelerate data-driven research.

"I know some in the audience are not card-carrying cancer researchers so much as data scientists interested in cancer," Sharpless said before launching into a review of some recent cancer research that could not have been achieved without bioinformatics.

He specifically mentioned the RAS Initiative, part of a collaboration between the NCI and the US Department of Energy to bring the DOE's supercomputing infrastructure, data analytics, and expertise in algorithm development and other areas to bear on cancer research efforts.

Although RAS was the first cancer gene identified in humans and is a common mutation site in several types of cancers, there are no drugs that specifically target it. Molecular modeling developed as part of the initiative has started to produce insights that have led to clinical trials. Sharpless said that this is highly dependent on deep learning to narrow down the search area for potential compounds.

Sharpless, who also served as acting commissioner of the US Food and Drug Administration in 2019, called it a bit of good news and a bit of bad news that researchers have come to understand that cancer really is hundreds or even thousands of different diseases, and even that each case might be unique.

"I think it's probably the most important change in the paradigm of cancer thinking in the last 30 years, but it also has made caring for patients much more difficult because every patient really is different and every tumor really is different," he said. "It has caused this real fragmentation of clinical care and it has brought to the fore the obvious need for data science."

Precision oncology is taking some of the trial and error out of therapy matching. "The precision oncology movement is to really find who's going to respond to what drug by looking at the molecular features of their tumor, the DNA mutations that drive the tumor, and the RNA expression profile of the tumor and epigenetic signatures," Sharpless noted.

Sharpless called precision oncology great for patients in that it has led to more effective treatments with less toxicity and fewer side effects. "But boy, does it create some data problems," he noted.

When Sharpless took the helm of NCI in 2017, he named four focus areas, one of which was "getting serious about the usage of big data." Though he quickly learned that data scientists didn't much care for the term "big data," he saw how passionate they always have been about managing, analyzing, and applying knowledge from the vast piles of information generated in the research lab as well as the medical clinic.

Sharpless, a medical oncologist, used to treat leukemia when he was still in clinical practice. For certain patients with acute myeloid leukemia, there long had only been two therapeutic options: an aggressive course of chemotherapy called 7 + 3 for the number of days cytarabine and an anthracycline are administered, and a milder treatment of DNA hypomethylating agents.

"It was effectively a coin flip," Sharpless said. "Sometimes we'd have a great outcome with one therapy or sometimes we'd do very poorly with one therapy [without] really understanding why."

He realized that if there was so much trial and error in a relatively common cancer, the odds of hitting on the proper treatment for a rare form was incredibly low.

"This really convinced me in a personal way that we needed to get to weight-aggregated datasets on our patients that had genomics and epigenetics and imaging and longitudinal clinical outcomes and any other sorts of other data" to allow more accurate study of tumor evolution of a tumor, Sharpless said.

Today, molecularly informed clinical decision-making still is only able to match cancer patients to targeted therapies about 15 to 20 percent of the time, according to the NCI director. "We believe with further refinements and improvements in clinical care, we can expand on that," he said.

Sharpless said that the NCI has come to realize that building architecture to support data science in research really is part of the agency's mission.

"We're really understanding that the tumor and the human are a unique interaction that has to be understood in a precise way to really make the best progress for patients," he said.

Similarly, Sharpless said, "cancer isn't just a tumor encased in a human." Patient characteristics including the tumor microenvironment, an individual's immunome, comorbidities, diet, and socioeconomic status also have a bearing on outcomes. "Data science and informatics really has to play a role in understanding this part of care as well," he said.

He noted that it is becoming more common to treat tumors by molecular characteristics than by tissue type. "Now we know that certain subtypes of breast cancer, for example, have more in common with ovarian cancer than the other subtypes of breast cancer," Sharpless said.

Accordingly, the FDA has begun designating some therapeutic approvals by genomics and proteomics rather than the affected organ. "I think we will see many more tissue-agnostic approvals, as we are really learning the molecular features of a tumor matter more than the site of origin," he said.

Sharpless also discussed some of the integrated datasets that NCI already offers, including the Genomic Data Commons, a core component of the Cancer Moonshot program that is supported by funding from the 21st Century Cures Act.

"We are really understanding the need to get sufficient volume of data that's highly representative of the population that we study, that is machine-readable and well annotated," Sharpless said. He said that NCI also is concerned about issues such as eliminating bias and noise, providing large enough samples for many types of research, and in protecting subjects from the inadvertent disclosure of their identities.

"The data have to be truly diverse and trained on tools that are not inherently biased, or else we run the risk of perpetuating and exacerbating racial and ethnic and gender and other biases that contribute to disparities," Sharpless said.

He also called the more nascent Cancer Research Data Commons (CRDC) a "signature effort" of NCI on the informatics front.

The CRDC, which follows the "FAIR" principles of data findability, accessibility, interoperability, and reusability, hosts genomic, proteomic, imaging, and clinical oncology data for both humans and canines. The infrastructure features cloud-based analytics tools including visualization technology that Sharpless said serve as a foundation for development of AI and machine learning models.

Sharpless took several questions submitted by the online audience near the end of the hourlong AMIA session.

Session host Samuel Volchenboum, director of the Center for Research Informatics and a pediatric oncologist at the University of Chicago, asked Sharpless about whether oncology is headed to a place where physicians might have to make computer-aided medical decisions without being able to explain the algorithms or methodology to patients.

Sharpless said that this is already happening today in specialties like radiology that rely so heavily on FDA-cleared medical devices. "I think that predicting why certain markers are associated with a certain outcome is a scientific endeavor where clinicians are pretty comfortable with the idea that we don't fully understand why this works," he said.

"I think this is going to happen more and more, and we will have to learn how to explain that to patients," Sharpless added.

During his time at the FDA, Sharpless grappled with regulation of AI-enabled medical devices. "You have to really regulate the process around that device" rather than a specific algorithm in the device itself.

Sharpless also said that the NCI has had to walk a fine line in the realm of open data, balancing the needs of the research community to have adequate study cohorts with the desire of individual scientists to protect some of their intellectual property while also guarding patient privacy.

Sharpless called the NCI's Cancer Genome Atlas "one of the most successful data-sharing initiatives ever in the history of American research." He said that TCGA has proven that the research community will embrace and "do great things with" high-quality, aggregated datasets put in the public domain.

While he called the potential for inadvertently reidentifying aggregated genomic information "really significant," Sharpless said that the institute generally leans toward open data. "The NCI really believes that we have to make these data available to the research community because there's really no other way to make progress without them," he said.

The way to address this risk is through credentialed access, he said, much as NCI has done with TCGA.

Another area in need of clarity is international data-sharing, which Sharpless said has been complicated by privacy laws like the European Union's General Data Protection Regulation (GDPR), which came into effect in 2018. According to Sharpless, NIH hasn't struck a data-sharing agreement with any continental European country since GDPR came about, potentially hindering cross-border research.

"It has really snarled the ability to work across the Atlantic because of concerns about the liability issues in Europe related to GDPR," he said. Willful violation of GDPR could carry fines as high as €20 million ($23.7 million) or 4 percent of a company's annual global revenues, but that does not change the importance of having access to broad datasets in cancer research, according to Sharpless.