Skip to main content
Premium Trial:

Request an Annual Quote

In Sixth Year, Project GENIE Diversifying, Integrating Local Ancestry and Enhancing Clinical Data


NEW ORLEANS – The American Association for Cancer Research's Project GENIE will issue a call for new institutions to join the open-access clinical-genomic registry with the goal of increasing data from underserved, diverse communities.

At AACR's annual meeting, Philippe Bedard, a medical oncologist at the Princess Margaret Cancer Center in Toronto and a member of the steering committee for GENIE, said that the association wants to improve the proportion of minority patients and patients from rural populations represented in GENIE data. Institutions in the US and outside of the US can apply. AACR will begin accepting applications from May 1 to June 15 and announce new members in the winter of this year.

AACR launched Project GENIE, an international, open-source real-world data-sharing effort, in 2015. Over six years, the effort has grown from eight to 18 institutional cancer centers in North America and Europe that every six months contribute somatic DNA data from patients who have had their tumors evaluated using next-generation sequencing panels. They also share structured phenotype data including information about a sequenced patient's tumor type, histology, demographics, and basic vital and survival status.

The registry contains data on around 136,000 sequenced tumors from 121,000 patients with 110 major cancer types. Data on sequenced patients are made publicly available a year after testing was performed. In addition to the 18 institutional partners, Sage Bionetworks and cBioPortal are technology strategic partners.

So far, there have been 11 data releases, and the genomic and clinical data have generated nine publications. More than 10,000 users have registered to use the data for their own research. Bedard highlighted among a list of GENIE's accomplishments that Amgen used data from GENIE to support its regulatory filing for Lumakras (sotorasib) in advanced KRAS G12C-mutated non-small cell lung cancer.

Despite the registry's growth in content and utility, currently more than 80 percent of the clinical and genomic data comes from white patients, while around 7 percent are from Black patients and 6 percent are from Asian patients. AACR recognizes that these demographics don't reflect the patients who are treated at the institutions participating in GENIE, and "one of our key priorities is to expand the diversity within GENIE," Bedard said.

At AACR's annual meeting, researchers from GENIE's partner cancer centers described other efforts to enhance the data within the registry so researchers can start to deconvolute the complex interplay between genetics, ancestry, and environmental exposures and explore in more detail how patients fare on treatments according to cancer biomarkers.

Adding local ancestry

Memorial Sloan Kettering, one of the cancer institutions submitting data to GENIE, has amassed data on 50,000 Black cancer patients who have been profiled using its MSK-IMPACT NGS platform. Jian Carrot-Zhang, an assistant professor and data scientist conducting cancer research at MSK, pointed out that despite having access to genomic sequencing, Black cancer patients still had worse outcomes compared to their white counterparts.

While clearly there are other genetic and non-genetic factors at play that can explain this outcomes difference, "the complex relationship between race, somatic mutations, and clinical outcome is still not fully understood," Carrot-Zhang said. "The lack of knowledge about the somatic differences between populations is a major barrier to implementing precision medicine for the underserved population."

When researchers try to study and understand the multifactorial causes of healthcare disparities, they run into several challenges. Race or ethnicity isn't typically recorded in real-world care settings. Making things even more complicated is the fact that in the US, African-American and Hispanic populations have mixed ancestries and cannot be separated easily into discrete groups, she said. And while self-identified race can track certain shared social determinants of health between populations, it doesn't account for the genetic complexity.

To address some of these challenges, MSK exploited a feature of NGS panels, which is that they end up analyzing random DNA regions outside of the target regions. "We have developed computational methods to translate these off-target reads into rich information about genetic ancestry," Carrot-Zhang said.

Most GENIE samples don't have matched sequencing data on normal tissue samples, so MSK researchers confirmed that their ancestry inference method was accurate using tumor-only sequencing data. "Our method is not platform dependent," Carrot-Zhang added. "We can make accurate ancestry calls across [sequencing] platforms performed at different institutions" participating in GENIE.

To establish the performance of the ancestry inference method, her team used it to infer ancestry for 334,000 patients' tumor samples sequenced by Foundation Medicine, which lacked any associated ancestry or race data. The method inferred European ancestry for the majority (about three-quarters) of samples, but also assigned non-European ancestry to a significant portion of the samples.

Around 10 percent of the cohort, representing more than 32,000 patient samples, was associated with African ancestry, "making it one of the largest oncology cohorts with African ancestry," she said. "Our ancestry inference method allows for the inclusion of diverse populations that would otherwise be ignored due to a lack of reported information and provided substantial statistical power to identify ancestry-associated somatic driver mutations."

In Foundation's cohort, the researchers identified 165 ancestry-gene associations across 14 cancer types. Then, using data within GENIE, which has somatic DNA and clinical data on thousands of non-European patients, including patients treated and tested at MSK, they explored whether these ancestry-associated genes could explain outcomes disparities in Black patients.

The researchers showed, for example, that African-American renal cell cancer patients with NF2 mutations have significantly worse outcomes. Statistical analysis further indicated that African-American ancestry and NF2 mutations are independently associated with outcomes, which suggests that these somatic mutations can, at least partially, explain the outcome disparity.

MSK researchers are also using their off-target computational approach and tumor-only panel sequencing data to calculate ancestry at chromosomal segments within the genome — also known as local ancestry — which is important to determine in population genetic studies involving diverse populations.

"We can use local ancestry to infer the heritability of ancestry associations," Carrot-Zhang explained. "We're working with GENIE to make local ancestry calls for all GENIE samples and use local ancestry to identify somatic and outcome differences driven by germline alleles."

Adding this data to the real-world data already in GENIE can help researchers better understand the influence of ancestry-associated genes on patient outcomes, she noted. "Moreover, analyzing ancestry admixture will allow us to deconvolute the roles of genetics and social determinants in outcome disparities," Carrot-Zhang said, adding that MSK will also work with GENIE to include behavior and lifestyle data from electronic health records in analysis models.

Enhancing clinical and outcomes data

Researchers within another project started under GENIE, called the BioPharma Collaborative (BPC), are working on bolstering the phenotypic data in the registry with structured medical and treatment outcomes information on disease-specific cohorts.

While GENIE has rich genomic data on cancer patients, the associated clinical data is limited, according to Gregory Riely, vice chair of clinical research within MSK's thoracic oncology service. BPC exists to obtain the clinical data "that will allow us to explore how these genomic findings translated into patient outcomes," he said.

BPC involves 10 drug companies and four cancer centers who are working on annotating patients' tumor sequencing results with more detailed clinical and outcomes data in a standardized fashion using the PRISSMM model developed by Dana-Farber Cancer Center. Using this data, researchers can look at the prevalence of tumor mutations according to patients' sites of cancer and even explore how patients with certain genomic tumor profiles do on treatments.

Riely noted that the BPC has layered on top of the basic demographic data already in GENIE additional data like when patients started and stopped treatment and their real-world outcomes, such as overall survival and progression-free survival.

The metastatic non-small cell cancer cohort in BPC, for example, contains data on up to five lines of therapy. Using it, researchers can investigate how long lung cancer patients live on conventional therapy, Riely said, and using the sequencing data in GENIE, they can further stratify outcomes based on tumor mutations on first-line platinum chemotherapy. For example, using GENIE and BPC data together, it is possible to learn that metastatic NSCLC patients with STK11 mutations have worse overall survival compared to those without these mutations.

Extracting outcomes data from real-world cohorts isn't easy, however. At the meeting, Riely described how BPC collaborators brought in progression-free survival data for patients. Curators evaluated CT scans and patient visits with oncologists and scored whether they were "improving, stable, mixed, progressing, or indeterminate" after a patient's index diagnosis date.

"This data allows us to get two flavors of progression-free survival," Riely said, progression-free survival-I based on imaging reports and progression-free survival-M based oncologists' notes.

BPC researchers then compared how curator-derived progression-free survival data compared with progression-free survival determined based on RECIST criteria — the gold-standard system for measuring how solid tumor patients respond to treatments — in a cohort of NSCLC patients who got immune checkpoint inhibitors. They found that when curators used imaging data, they were closer to RECIST measurements, but when they used the oncologists' notes, the progression-free survival estimate "lags a little bit," Riely said.

Data from 8,000 patients across six cancers — lung, colorectal, breast, pancreas, prostate, and bladder — have been curated as part of the BPC. These cohorts should become publicly available over the next year or so, with the NSCLC data slated for release later this month. According to Riely, BPC collaborators have plans to curate an additional 18K tumors in other cancer types.