NEW YORK (GenomeWeb) – Disease risk estimates from genome-wide association studies conducted in one population cannot always be extended to other groups, a new study cautions.
The majority of GWAS have relied on populations of European ancestry, raising the question whether their findings are applicable to populations of different ancestries. Genetic risk scores built on GWAS findings are likely to include alleles that are polymorphic or found at intermediate frequency in the study population, however these alleles may not be present at the same frequencies in other populations, skewing the scores.
Researchers from the Georgia Institute of Technology have now compared the frequencies of disease-linked alleles in African and non-African populations. As they reported yesterday in Genome Biology, they found differences in risk allele frequencies between the populations, including one that conflicted with clinical data, suggesting that disease risk could be estimated incorrectly among individuals of African ancestry.
"Only by understanding population genetics and the effects of SNP ascertainment bias can accurate predictive models of genetic disease risks be built," Georgia Tech's Joseph Lachance and his colleagues wrote in their paper.
The researchers examined allele frequencies at more than 3,000 GWASloci for each continental population in the 1000 Genomes Project. Contrary to the null expectation that individuals from different populations would have similar frequencies of risk alleles, they found that African populations had significantly higher risk allele frequencies than non-African populations. This, they noted, was true across a range of disease categories, from metabolic to morphological and neurological conditions, but not in all diseases.
African populations, for instance, had lower risk allele frequencies at cardiovascular disease loci, which contrasts with clinical data. This indicated that genetic disease risks might not be estimated correctly in populations of African ancestry.
Due to human population history, the genomes of individuals of African ancestry are more likely to be heterozygous for derived alleles while non-African genomes are more likely to be homozygous, the researchers noted. They found that 69.2 percent of ancestral risk alleles have higher frequencies in African populations, while derived risk alleles occur at moderately lower frequencies in them.
This leads to risk allele frequencies that are 1.15 percent higher in Africans than other populations, they reported, a result that is in contrast to expectations. Non-African populations, because of the bottlenecks they went through, are expected to have a greater genetic load, they explained.
The researchers traced this effect to both the reliance of GWAS on non-African populations and to genotyping arrays used in those studies.
They simulated a large number of GWAS results in which they varied the continental ancestry makeup of the study population for the different simulations but kept the hereditary disease risk across the populations the same. From this, they found that allele frequencies like the ones observed don't have to be due to underlying differences in disease risk.
In particular, they noted that simulations using American, East Asian, European, or South Asian cohorts produced sets of disease-associated loci with elevated ancestral risk alleles frequencies and reduced frequencies of derived alleles in African populations. However, the researchers noted that their findings also suggested that pooling samples of different ancestries isn't likely to fully address the issue of mistaken genetic disease risk estimates.
GWAS, the researchers pointed out, largely rely on commercial genotyping arrays that were developed using a fairly small number of Europeans and are enriched for SNPs with higher derived allele frequencies outside of Africa. This, they said, also contributes to differences in risk allele frequencies at known disease-associated loci in different ancestry populations.
However, the researchers also found through their simulations that whole-genome sequencing data was less biased but won't fully alleviate this effect.
The effect is then carried over to genetic risk scores derived from GWAS data, they said. While corrections could reduce some population-level differences in predicted disease risk, they did not eliminate it.
These differences, they said, could have consequences for precision medicine and personal genomics and could also obscure health disparities.
"Our results imply that caution must be taken when extrapolating GWAS results from one population to predict disease risks in another population," they wrote.