NEW YORK — A dearth of genome-wide association studies focused on non-European populations, plus issues related to applying polygenic risk scores in populations with multiple ancestries, has spurred researchers to design new methods and tools to address these questions.
Multiple presentations at the recent European Society of Human Genetics meeting focused on questions related to diversity in GWAS as well as applying risk scores in admixed populations, issues that are inextricably linked. The event was held in Vienna and virtually earlier this week.
Elizabeth Atkinson, an assistant professor of molecular and human genetics at Baylor College of Medicine, noted in her talk that while around 80 percent of participants in genome-wide association studies are Europeans, only 16 percent of the global population is European.
In her words, this "Eurocentricity" in association studies trickles down into health disparities and the implementation of genomic medicine. "Obviously, there is an equity issue," Atkinson said.
Population groups with multiple ancestries are often left out of association studies too, because researchers are concerned that population stratification will seep into their analyses and bias the final results, Atkinson added.
Yet populations around the world are admixed, a fact that some companies like MyOme, Myriad, Invitae, and others have also sought to address in recent years by launching test services that factor ancestry into their analyses. Elsewhere at ESHG, participants heard from researchers focused on African, Asian, Latin American, and Greenlandic populations that face the challenge of applying polygenic risk scores derived from European datasets, such as the UK Biobank, in populations that are sufficiently different that those scores suffer from limited utility.
"The predictivity of a polygenic risk score decreases the farther away you get from the UK," noted Luca Pagani, an associate professor of population genetics at the University of Tartu's Institute of Genomics in Estonia. "That is a reflection of how population structure around the world affects predictivity, but also changes the less European you are in your genome," he said.
Pagani's presentation focused on ancestry deconvolution and polygenic scores. He also highlighted previously published work on the topic, such as a 2020 Nature Communications paper in which the various ancestral components of an admixed individual were first deconvoluted, and then subjected to analysis with population-specific polygenic scores.
Baylor's Atkinson also highlighted such an approach, and discussed a statistical tool and software package called Tractor, which her lab has used to include admixed individuals in association studies. The group published work featuring Tractor last year, in which it claimed that the tool can generate ancestry-specific effect-size estimates and P values.
Atkinson noted in her ESHG talk that Tractor earned its name because it can "scoop out" different ancestral components and compare them to similar populations to obtain ancestry-specific estimates. She said it could be applicable in "post-GWAS applications," such as building polygenic risk scores for admixed populations. She added that her group is currently using Tractor to improve the transferability of polygenic risk scores, and is preparing a list of best practices for ancestry deconvolution.
In a follow-up email, Atkinson said the group is finalizing a draft manuscript related to best practices at the moment.
One session at ESHG focused specifically on applying polygenic risk scores in diverse populations. Participants heard from Segun Fatumo, an associate professor of genetic epidemiology and bioinformatics at the London School of Hygiene and Tropical Medicine, who warned that the number of GWAS focused on African populations has actually decreased in recent years.
Fatumo also discussed a study, published this month in Nature Medicine, where he and fellow researchers sought to determine if genetic risk scores developed in African Americans would perform better in sub-Saharan Africans compared to European-derived scores. They found that the scores derived from African Americans did perform better, but only in some African populations. For example, the genetic risk scores performed favorably in Ugandans compared to South African Zulu. However, Fatumo cautioned, other factors could be at play, such as an urbanized cohort versus a rural one, or diet.
Fatumo and colleagues earlier this year published in Nature Medicine a roadmap on how to improve diversity in genomic studies and said that more studies on African populations are needed in order to translate those findings into genomic medicine.
"African genomic data are scarce and transferability of prediction varies within Africa," said Fatumo. "There is a need for improved representation of Africans in genomic studies."
Weang-Kee Ho, an associate professor in applied mathematics at the University of Nottingham Malaysia, discussed similar challenges in implementing European-derived polygenic risk scores for breast cancer in Asian subjects.
Ho noted that European-derived risk scores do tend to be predictive for breast cancer in Asian women, but the performance is lower. There are other factors as not only are Asians genetically diverse but they are also culturally diverse, have different diets, and have children at different ages, all of which are factors that can influence an individual's risk for developing breast cancer.
She is the lead author on a paper on developing polygenic risk scores for predictive breast cancer in Asian populations that was published in Genetics in Medicine earlier this year. She stressed that population calibration is important in applying any risk scores in Asians, because differing frequencies of certain scores can lead to either overestimating or underestimating individual risk.
Ho also noted that Asian-derived polygenic risk scores have performed worse compared to European-derived scores, probably because of smaller sample sizes in the studies from which the data is sourced. Instead, she recommended improving European-derived risk scores by including Asian-specific variants.
Tamar Sofer, an assistant professor in the department of biostatistics at Harvard University, also spoke during the session. In her talk she covered some of the ethical and social implications of ancestry-specific polygenic risk scores, for example, the potential of higher incidences of disease leading to stereotypes about minority populations.
Ancestry isn't everything, Sofer pointed out, as data on multiethnic association studies delivering better polygenic risk scores has been mixed. And while polygenic risk scores derived from a single continental population, such as Europeans, seem to perform better, it is not clear if this will be the case in every population, Sofer said. "Ancestry matters, but it is still hard to tell to what extent," she said. Sofer also said that environmental and lifestyle factors are also influential.
When asked how researchers could best develop risk scores, while taking into account admixture and ancestral differences, Sofer said that there is no single method that outperforms others. Instead, she suggested that multiple studies and methods might be helpful. Machine-learning tools, she noted, could be used to combine multiple players of information including population-specific genetic data, as well as data around environmental factors and lifestyle.