Skip to main content
Premium Trial:

Request an Annual Quote

GWAS With Higher Genetic Diversity Lead to More Accurate Use of Polygenic Risk Scores

NEW YORK – An international team of researchers studying the effects of genetic diversity in genome-wide association studies of lipids has determined that increased diversity will lead to a more accurate and equitable application of polygenic scores in clinical practice, especially as the emphasis of GWAS expands beyond the identification of genes and toward the use of genetic variants for preventive and precision medicine.

In a study published on Thursday in Nature, the researchers noted that heart disease remains the leading cause of death worldwide, despite advances in prevention and treatment, particularly through the reduction of low-density lipoprotein cholesterol levels. GWAS of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease, but most GWAS have been conducted in populations of European ancestry and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups, such as differences in allele frequencies, effect sizes, and linkage-disequilibrium patterns.

For their paper, the investigators conducted a multi-ancestry, genome-wide, genetic discovery meta-analysis of lipid levels in nearly 1.7 million individuals, including 350,000 of non-European ancestries. They quantified the gain in studying non-European ancestries and provided evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes.

"We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction," the authors wrote. "Modest gains in the number of discovered loci and ancestry-specific variants were also achieved."

The researchers began by analyzing data from the Global Lipids Genetics Consortium, which aggregated GWAS results from 1,654,960 individuals in 201 primary studies, representing five genetic ancestry groups: admixed African or African (99,432 individuals or 6 percent of the total cohort); East Asian (146,492 individuals or 8.9 percent of the cohort); European (1,320,016 individuals or 79.8 percent of the cohort); Hispanic (48,057 individuals or 2.9 percent of the cohort); and South Asian (40,963 individuals or 2.5 percent of the cohort).

They performed a GWAS for five blood lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides (TGs), total cholesterol (TC), and non-high-density lipoprotein cholesterol (non-HDL-C). Of the 91 million variants imputed from the Haplotype Reference Consortium or 1000 Genomes Phase 3 that successfully passed variant-level quality control, 52 million variants were present in at least two cohorts and had sufficient minor allele counts to be evaluated as a potential index variant.

Overall, the researchers found 773 lipid-associated genomic regions that contained 1,765 distinct index variants that reached genome-wide significance from variants that have been previously reported as associated with any of the five lipid traits. Of these loci, 76 percent were identified only in the European ancestry-specific analyses. Of the non-European ancestries, the African ancestry GWAS identified more ancestry-specific loci than any other non-European ancestry group — 15 loci were unique to admixed African or African individuals, six were unique to East Asian individuals, six to Hispanic individuals, and one to South Asian individuals.

In a subsequent analysis, the researchers evaluated the potential of polygenic risk scores to predict increased LDL-C levels, which is a major causal risk factor of coronary artery disease, in diverse ancestry groups. They created three non-overlapping datasets to perform ancestry-specific or multi-ancestry GWAS to estimate variant effect sizes; to optimize risk score parameters; and to evaluate the utility of the resulting scores.

Overall, they found that polygenic prediction for LDL-C in all ancestries appeared to benefit the most from adding samples of diverse ancestries, given a scenario where large numbers of European ancestry individuals had already been included. However, they added, additional studies are needed to determine whether this applies to other phenotypes with different genetic architectures and heritability.

"Our results suggest that diversifying the populations under study, rather than simply increasing the sample size, is now the single most efficient approach" to improving understanding of the biology underlying disease, identifying potential therapeutic targets, and identifying individuals at high risk of adverse health outcomes across population groups, at least in the case of blood lipids, the authors wrote.

"Taken together, our results strongly support ongoing and future large-scale recruitment efforts targeted at the enrollment and DNA collection of non-European ancestry participants," they added. "Geneticists and those responsible for cohort development should continue to diversify genetic discovery datasets, while increasing sample size in a cost-effective manner, to ensure that genetic studies reduce rather than exacerbate existing health inequities across race, ancestry, geographical region, and nationality."