NEW YORK — A Broad Institute-led team has developed a new approach to generate polygenic risk scores that leads to improved predictions across populations.
PRSs combine the effects of sometimes hundreds or thousands of different genetic variants to gauge disease risk, but such scores have decreased accuracy when they are developed using data from one population and then applied to a more genetically distant population.
As genome-wide association studies have largely been conducted within populations of European ancestry, PRSs are less applicable to non-European populations, which could exacerbate existing healthcare disparities as scores are implemented in the clinic. As such, PRSs that can be implemented across ancestries have been of commercial interest.
"Polygenic risk scores have been a promising area right now. And a lot of people are talking about clinical translations and using polygenic risk scores to do early diagnosis or stratify patients," said the Broad's Tian Ge, also an assistant professor at Massachusetts General Hospital and Harvard Medical School. "But I think it is well recognized in the field that the predictive performance of polygenic risk scores has been quite imbalanced across populations and that is largely due to the Eurocentric bias of our GWAS studies."
To counteract some of that bias, Ge and his colleagues developed a new approach to developing PRSs, called PRS-CSx that integrates summary statistics from genome-wide association studies of multiple populations. This approach, which relies on linkage disequilibrium data from the GWAS datasets, outperformed other means of constructing PRSs, both using simulated and real data, as the researchers reported on Thursday in Nature Genetics.
In particular, PRS-CSx is based on a previously developed Bayesian polygenic modeling approach called PRS-CS. The newer method relies on the small though increasing number of GWAS using non-European populations. Even though these studies are orders of magnitude smaller than European GWAS, Ge said they provide key information on the genetics of non-European populations that PRS-CSx can harness. They can reveal, for instance, genetic concordance between European and non-European populations as well as highlight population-specific allele frequencies and linkage disequilibrium patterns.
The researchers created a computational model of those similarities and differences that they integrated to develop one PRS using data from multiple populations.
Previous methods to construct PRSs have largely relied on one population in which the PRS is trained, though other methods that rely on multiple discovery GWAS populations have also emerged. Ge noted that those other multiple discovery population approaches are largely post hoc and train PRSs independently in the different populations before then combining them, while PRS-CSx instead relies on a mathematical framework to integrate them going forward.
The researchers compared their PRS-CSx method to three single-discovery and four multiple-discovery approaches: LD-informed pruning and P value thresholding (PT), PRS-CS, LDpred2, PT-meta, PT-mult, LDpred2-mult, and PRS-CS-mult.
Using both simulated and biobank data, Ge and his colleagues found that multiple-discovery approaches to PRS construction led to improved cross-ancestry predictions, with PRS-CSx often leading to the highest improvements. For instance, when they applied these scores to 33 anthropometric or blood panel traits from UK Biobank and Biobank Japan data, they found that the Bayesian multi-discovery methods typically outperformed the single-discovery or PT-based multi-discovery methods.
In particular, when trained on UKBB and JBB data and predicted into a European population, PRS-CSx led to a consistent though small improvement in prediction ability compared to LDpred2 and PRS-CS. But when the target population was East Asian, PRS-CSx led to more pronounced improvements. The researcher noted a median relative improvement of 52.3 percent for PRS-CSx, as compared to LDpred2 trained on UKBB data, and 69.8 percent when LDpred2 was trained on JBB data.
The researchers also applied their approach to predict schizophrenia risk using European and East Asian datasets from the Psychiatric Genomics Consortium. Again, the Bayesian multi-discovery approach increased prediction accuracy, in this case by 45.5 percent as compared to LDpred2 trained on East Asian GWAS and 104.9 percent as compared to LDpred2 trained on European GWAS.
As additional GWAS include more diverse populations, Ge said that it will become even more important to appropriately model data from different populations. "Right now, in some cases if the non-European GWAS was pretty small, the effects are still dominated by the European GWAS," he said. "But going forward, we expect a more balanced sample size and then this modeling framework will be more important."
Cross-ancestry PRSs are also of commercial interest. Earlier this week, for instance, Allelica announced it is partnering with Invitae to develop a breast cancer PRS that can be used in women no matter their ancestry, something that Myriad Genetics has also been working on.
But Ge said his lab is not immediately interested in commercializing their approach. Instead, they are working on further refining their method and integrating PRSs with established risk factors and electronic health records, which he noted would help them further improve their predictions.
In particular, Ge said some sites of the Electronic Medical Records and Genomics (eMERGE) study are implementing PRS-CSx to generate PRS for type 2 diabetes. "We hope that in a couple of years, this method can be implemented in clinical settings, so we can return these predictions to patients," he added.
But Ge noted that while improved statistical approaches like the one he and his colleagues developed are important, they cannot completely overcome the limited number of GWAS in non-European populations. "It's important to both expand our non-European genomic studies and develop appropriate and more advanced statistical methods," he said. "Both are important for [addressing] health disparities."