Skip to main content
Premium Trial:

Request an Annual Quote

Risk-Assessment Phase of eMERGE Looks to Diversify Polygenic Risk Scoring


CHICAGO – This month, the US National Institutes of Health announced $75 million in funding over five years to add risk-assessment capabilities to the Electronic Medical Records and Genomics (eMERGE) network, marking the beginning of the fourth phase of eMERGE.

Specifically, NIH's National Human Genome Research Institute (NHGRI) is funding the creation of the Genomic Risk Assessment and Management Network, which will set protocols and methodologies for measuring and applying genotypic and phenotypic risk for potentially dozens of diseases among diverse populations. About $61 million of the money is earmarked for four general and six "enhanced diversity" clinical sites, while $13.4 million will go to Vanderbilt University Medical Center to coordinate the project.

Vanderbilt has hosted the coordinating center for earlier phases of eMERGE.

The four general sites, at Mayo Clinic, Vanderbilt, Boston's Brigham and Women's Hospital, and Northwestern University, plan on recruiting a total of 10,000 patients, of whom 35 percent should come from racial or ethnic minority groups, underserved populations, or populations that typically have poor medical outcomes, NIH said.

The remaining sites will together recruit 15,000 patients, including at least 75 percent from one of the diverse backgrounds. These "enhanced diversity" centers include University of Alabama at Birmingham, Icahn School of Medicine at Mount Sinai in New York, Cincinnati Children's Hospital Medical Center, Columbia University, Children's Hospital of Philadelphia, and the University of Washington Medical Center.

Each participant chose in advance whether to apply at the 35 percent or 75 percent level.

The enhanced-diversity sites are meant to create risk assessments that are more suitable for nonwhite populations than current approaches.

"It appears that the ability of [existing] predictive scoring systems based on the European ancestry don't work very well for many situations with African ancestry," said John Harley, director of the Center for Autoimmune Genomics and Etiology at Cincinnati Children's, who leads the eMERGE project's enhanced-diversity clinical site there.

These enhanced-diversity sites will be looking to overcome the historic slant toward white populations in the calculation of polygenic risk scores and add new types of clinical data including age, body-mass index, and lifestyle factors such as alcohol use, according to NIH. All 10 participating institutions also will seek to incorporate risk scores into electronic health records and clinical decision support to inform future prevention and care.

To get there, eMERGE Genomic Risk Assessment and Management Network participants will have to develop metrics for measuring risk scores, or at least update existing metrics based on new data they expect to generate, according to Harley.

The eMERGE clinical sites will be conducting low-depth whole-genome sequencing, just enough to produce genotypes. Harley said that Cincinnati Children's is looking at sequencing at a read depth of 1 to 1.4x. "We'll compare that to what the arrays do and how well they perform for imputation and whether it makes a difference to the polygenic risk scores," he said.

Risk scores will consider phenotypic and environmental data as well. The participants will have to develop and test algorithms for this purpose.

Later, the eMERGE leads will be working with IT departments in their organizations to integrate risk scores into the electronic health records and clinical decision support, ideally for clinical as well as research purposes.

"We would hope that the polygenic risk scores and the environmental risk estimates would have actual utility," Harley said. He did acknowledge that it could take a while to develop a tool that clinicians would want to adopt, though.

"The whole focus for eMERGE is to try to figure out how to implement these," Harley noted. That includes the technology infrastructure, the creation of polygenic risk scores, and the application of risk scores to answer medical questions.

Building on history

This fourth phase builds on earlier eMERGE work.

The program started in 2007 with requirements that seem laughable now: each of the five participating sites had to bring 3,000 patients who simply had DNA samples and their medical records in electronic form. This actually predated a $35 billion federal incentive program that started in 2011 to encourage healthcare providers to ditch their paper charts for EHRs.

Each site had to propose a phenotype to study in that cohort of 3,000 patients, such as susceptibility to coronary artery disease, susceptibility to dementia, or a range of normal electrocardiographic intervals, then conduct a genome-wide association study, according to Dan Roden, senior VP for personalized medicine at Vanderbilt, who has been involved in the eMERGE network since its inception.

The initial round was meant to explore how EHRs could serve as a research tool. "We didn't think much in Round 1 of returning much to patients and we didn't think much about how what we were doing would bend any kind of cost curve for care or for efficiency of care," Roden recalled.

In the second round, NHGRI and its grant recipients began thinking about how knowledge of these genotypes and phenotypes could fit into clinical care, such as through pharmacogenomics.

In the third phase, which started in 2015, NHGRI awarded more than $48.6 million over four years to support research into potential medical effects of rare genomic variants in about 100 clinically relevant genes. That round resulted in 25,000 patients being sequenced.

From a data perspective, the coordinating center looked at how it could help the research sites use the EHR as a phenotyping tool for such jobs as establishing function for variants of uncertain significance and assessing penetrance for pathogenic variants, Roden said.

"Now, the [new] round is really quite different," Roden said. "This next cycle really is all about finding people who are at increased risk for the diseases that we will focus on."

Moving to phase 4

The new grants for the Genomic Risk Assessment and Management Network cover five years of work, which is an eternity in the genomics field. "The rest of the world won't stand still while we try to do this project, so we expect a lot of changes to happen under our feet as we continue forward," Harley said.

Right now, the grant recipients are trying to make sense of the "complicated and confusing" genomics related to the diseases they are studying simply so they can figure out what questions they want to answer, according to Harley.

After the first year or two, Harley expects to be actively recruiting patients, to have built a working pipeline, have polygenic risk scores in development, and testing ideas to integrate these scores with environmental risk factors. He also hopes to be at the point where his team is receiving feedback from physicians and the parents of newborns in the cohort.

"Once we have the infrastructure as a foundation, we can ask those questions, and it would be wonderful if we could find the capacity to continue the birth cohort into childhood and adolescence and young adulthood at least," Harley said.

Cincinnati Children's will be enrolling newborns. The medical center has relationships with area hospitals that account for about 2,000 births annually, about half to minority women, according to Harley. "We thought that we could concentrate on that group, especially since the healthcare disparities are acute and they have more difficult problems," he said.

The Ohio pediatric hospital is concentrating on mother-newborn dyads, and will include fathers and siblings who wish to participate. The hospital will follow the birth cohort in hopes of turning the eMERGE work into a longitudinal study. "Ideally, if you're setting up a birth cohort, we'd find the resources to be able to continue it for longer than five years," Harley said.

"We'll be following them forward, making these estimates of their genetic risks and taking into account the environmental issues that they face," he added. The hospital will then create interventions to reduce the likelihood of these children developing the conditions predicted by their genotype-phenotype combinations.

The initial list of conditions that Cincinnati Children's wants to look at include asthma, atopic dermatitis, obesity, hypercholesterolemia, hypertension, prematurity, and breast cancer. Harley said that the eMERGE grant recipients are planning on meeting every few weeks — remotely during the COVID-19 pandemic — for the time being to come up with "phenotypes of interest," according to Harley.

He said that the center is still in the organizational stage in terms of how many people will be involved in developing metrics, collecting data, and deciding what questions they hope to answer with the research.

"We'll decrease the risk and see if we can have an impact on overall health and reduce the burden of some of these disorders," such as atopic dermatitis, Harley said. "I think we might make a difference to the whole lifetime of these new babies."

Many of the diseases under study do not manifest until adulthood, but Cincinnati Children's will be producing polygenic risk scores on the infants, their mothers, as well as those fathers who opt in. Other than Children's Hospital of Philadelphia, all the other Genomic Risk Assessment and Management Network centers are adult hospitals.

Having polygenic risk scores on young children for adult-onset diseases will raise some interesting questions, according to Harley. For example, Ohio gives parents the right to make healthcare decisions for their children until they turn 18. What responsibility does a pediatrician have to inform parents if their teenage child has a BRCA mutation?

Heide Aungst, operational lead for the Precision Genomics Midwest program at the Cincinnati Children's Center for Pediatric Genomics, noted that bioethicists for the last several years have been grappling with what is known as the child's right to an "open future" about their genetics. She said that the hospital will have to work through what to tell children and adolescents as they mature.

Dan Roden, senior VP for personalized medicine at Vanderbilt, leads that institution's clinical site for this new phase of eMERGE. He actually has been involved in the eMERGE network since its inception.

Roden said that the eMERGE steering committee among the 10 sites will have to decide how to weigh clinical risk, family history, and the polygenic risk score in determining which patients need early interventions.

"I think we're also going to have to think about whether we're going to look for rare variants for certain diseases," he said. "For breast cancer, you couldn't say to somebody, 'You're at low risk' without looking at BRCA1 and 2."

The NIH request for applications asked each applicant to propose 20 diseases to study, including five that would get extra emphasis. With 10 sites receiving grants, this eMERGE risk-assessment network could potentially look at as many as 200 different conditions, though there is crossover in more common ailments including coronary disease, diabetes, and chronic kidney disease.

Roden said that the site leaders will have to come up with a "rough outline" for this new phase, including deciding on the accuracy of risk predictors and how to act on findings that patients are at elevated risk. These discussions are likely to take at least the rest of the summer, he said.

"What we'd like to do is decide on the diseases, decide on a standardized set of risk predictors for those diseases, decide on the metrics that we will use to decide that somebody is at high-enough risk that they should be notified, and then decide on how to capture healthcare outcomes after that notification occurs," Roden said. All of that then needs to be automated within the EHR, and each participating institution has different parameters and customizations for their EHR systems.

Also still to be decided is how to define "high risk" to determine which patients need to be closely followed on their healthcare journeys. "Is it the top 1 percent of the population? The top 3 percent?" Roden wondered.

Based on earlier eMERGE stages, Roden said that it is likely that all 10 sites will agree to define high risk the same way, for the sake of simplicity.

"There are other little nuances that we're going to have to settle," Roden added, citing the "open future" notion about whether children be told of their risk for developing Alzheimer's disease. On the opposite end of the age spectrum, should 75-year-olds be informed of their risk for coronary disease?

"I would argue that there's probably not much to be gained there because 75-year-olds aren't at high risk for much because they got to be 75," Roden said.

Adding a polygenic risk score for heart disease for patients in their 50s does not add a whole lot to the calculation of clinical risk, according to Roden. Nongenetic risk factors, such as diabetes, hypertension, hyperlipidemia, and smoking are strong predictors of heart attacks in that age range. But polygenic risk for coronary disease is more important in younger patients.

"We don't know what to do with a bunch of other diseases where we have polygenic risk scores and they may be useful," Roden said. "I'd like to see where they fit into the grand scheme of things rather than telling everybody, 'Here you are. You're at high risk for atrial fibrillation or prostate cancer.'"

Ultimately, the sites are expected to incorporate the risk scoring into clinical decision support, something that has not yet been done successfully on a wide scale anywhere.

"Polygenic risk scores have really burst on the scene pretty recently," Roden said. While the idea has been around for a while, genomic researchers have only begun in the last couple of years to look at whether such scores have clinical utility.

EHR integration will have to be handled locally because of the differences not only in vendors, but in customizations and underlying IT infrastructure, Roden said.

"But I think what we're going to try to do is set up the ground rules for how that integration occurs," according to Roden. That means standardizing rules for when and what information is delivered to patients and providers.

The Data Perspective

While the new phase focuses on polygenic risk scores, Vanderbilt bioinformatician Josh Peterson, who leads the eMERGE Coordinating Center, cautioned against ignoring monogenic risk.

"Part of our applications and the requests for applications the NIH sent out was the idea that clinical risks, family history, monogenic risks, and polygenic risks would all be data that will be collected and used in this phase of eMERGE," he said.

Polygenic risk data adds an extra layer of complexity to the work of the coordinating center, which has to centralize information in a way so it supports research not only across those four data classes, but also on individual patients.

Peterson gave the example of a hypothetical 45-year-old woman with some family history of breast cancer, a polygenic risk score for breast cancer, potentially a BRCA mutation, and several clinical risk factors for breast cancer, including an elevated Gail score. Peterson noted that the Gail score does not use any genetics at all.

"Those are the four different types of things that you might want to consider before you communicate with her about an individualized risk for breast cancer," He said.

Thus, it is the job of the coordinating center to assemble all this data and present it in a form that is easily accessible and understandable for any of the participating researchers.

Peterson noted that eMERGE has historically had a rather even split between phenotypic and genotypic data. In the previous round, the coordinating center subcontracted with the University of Washington to manage phenotypic data. In trying to bring multisite genomic data together in a single cloud environment, NIH in 2018 created the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL), a resource for computing across large genomic and related datasets generated by NHGRI-funded centers and projects.

While the Broad Institute and Johns Hopkins University are the lead institutions building and managing AnVIL, Vanderbilt is a collaborator. The AnVIL resource shares some of the same cloud infrastructure as the data center for NIH's All of Us population genomics research project.

Peterson said that the coordination center has migrated data collected during eMERGE Phase 3 to the AnVIL environment. Data from the new fourth phase will go there as well.

The eMERGE dataset that has been loaded onto the AnVIL platform contains information on more than 145,000 participants from the three previous phases. AnVIL was designed to support cloud analytics without researchers having to download the entire dataset or even specific cohorts, Peterson said.

This risk-assessment network work actually can be broken into two of its own phases, according to Peterson. In the first, Vanderbilt and its partners will validate polygenic risk scores using data from both eMERGE and large external sources, potentially including All of Us, the Million Veterans Program, UK Biobank, and Vanderbilt's own BioVU biobank. Later, the coordinating center will return prospective risk scores.

Peterson eventually wants to be able to reproduce those results across ethnicities. "That's obviously one of the major pitfalls of polygenic risk scores right now," he said.

"We anticipate that we will have a high proportion of underrepresented minorities in the prospective cohort, but before we even get to the point of saying something to those participants, we need to make sure we have validated the polygenic risk scores in a group of their peers," Peterson said. That will require sufficient retrospective data on diverse populations to validate the scores.