Skip to main content
Premium Trial:

Request an Annual Quote

Polygenic Risk Score Performance Improved With Expression-Related Rare Variant Insights

NEW YORK – A team from Stanford University and other centers in the US and China have demonstrated that polygenic risk scores (PRS) based on common variants can be bolstered by incorporating clues from rare variants linked to significant gene expression shifts, dubbed expression outliers.

"As individual PRS estimates are comprised of variants across hundreds to thousands of genes, we reasoned that the disease effects of outlier-associated rare variants might have the greatest impact in individuals with a relatively larger burden of outlier-associated rare variant effects mapping to disease-relevant genes," Stanford University researchers Craig Smail and Stephen Montgomery, first and senior authors on a study appearing in the American Journal of Human Genetics on Wednesday, explained in an email.

Using genotyping data from UK Biobank participants, together with expression quantitative trait locus data from the GTEx project and a computational method known as CrossMap that takes rare variant-related gene expression changes into account, the researchers first flagged rare variants with outsized effects on gene expression.

From nearly 1.8 million rare variants found in the UK Biobank set and in the gnomAD database, they flagged nearly 90,900 rare outlier variants that appeared to impact the expression of 15,871 genes, based on rare variant annotations gleaned from whole-genome and transcriptome sequences in version 8 of the GTEx.

From there, the team used its "independent outlier gene count" (IOGC) score to demonstrate that the expression outlier-related rare variants could improve the performance of body mass index (BMI) PRSs in more than 96,600 of the UK Biobank participants, distinguishing between individuals at higher or lower risk of so-called severe obesity or early bariatric surgery — findings that were further validated using data for Million Veteran Program participants.

"We have demonstrated that a high burden of rare variants identified by their association with outlier gene expression can lead to substantial deviations in PRS-predicted phenotype," the authors wrote. "Furthermore, by integrating these rare variants into genetic risk prediction using the IOGC score, we demonstrated improvements in predicting risk for obesity beyond what was achievable with common variant-based PRSs."

In particular, the investigators found that predictions made using expression outlier-linked rare variants outperformed those possible by incorporating insights on protein-truncating rare variants into PRSs. Based on UK Biobank GWAS data spanning more than 1,900 traits or conditions, meanwhile, they saw signs that gene expression outlier-linked variants were somewhat enriched compared to rare variants that did not shift gene expression.

The findings so far suggest that "prediction for multiple complex diseases will benefit from integrating outlier-associated rare variants, including coronary heart disease, type 2 diabetes, and breast cancer," Smail and Montgomery wrote, noting that preliminary work points to improved predictive accuracy in at least two populations, hinting that a similar strategy may boost efforts to apply PRSs in non-European cohorts.

The investigators cautioned that the current approach is reliant on rare variant annotation insights from a relatively small set of GTEx representatives so far, and may be further improved by tapping into growing RNA sequence datasets and resulting variant-gene expression annotations. Likewise, whole-genome sequences from the UK Biobank project are expected to reveal far more rare variants missed with more limited sequence data used in the current analysis.

Consequently, Smail and Montgomery called the current work "a baseline for phenotypic prediction of complex diseases by integrating outlier-associated rare variants."

"Future extensions to our model will include more rare variants as we continue to sequence both transcriptomes and genomes in populations, will look at tissue-specific outlier effects, and incorporate longer-range gene expression impacts such as outlier-associated rare variants in enhancer regions," they noted.