Skip to main content
Premium Trial:

Request an Annual Quote

UK Biobank Dataset Helps Elucidate Pathogenicity of Rare Genetic Variants

NEW YORK (GenomeWeb) – Researchers from the University of Exeter Medical School used data from nearly 380,000 participants in the UK Biobank (UKB) to assess the pathogenicity and penetrance of putatively clinically important rare genetic variants, finding that studying these variants in large populations was key to refining their understanding of them.

As they reported online today in the American Journal of Human Genetics, the researchers analyzed data from 379,768 UKB participants of European ancestry and were able to classify 1,244 of 4,585 putatively clinically relevant rare variants genotyped on the UKB microarray as high quality. They defined variants as "clinically relevant" if they were classified as either pathogenic or likely pathogenic in ClinVar or are in genes known to cause maturity-onset diabetes of the young (MODY) or severe developmental disorders (DDs).

The investigators then assessed the penetrance and pathogenicity of these high-quality variants by testing their association with 401 clinically relevant traits and found that 27 of the variants were associated with such a trait in the UKB. Of these, 13 variants had previously been linked with a dominant disease, although most were considered to be only risk factors or low-penetrance variants rather than true highly pathogenic monogenic variants, the researchers noted. Another 11 variants were causally linked to disease and the team observed that they were associated with a related trait in the population-based cohort.

These analyses allowed the researchers to refine the penetrance estimate for some of the variants. For example, they specifically investigated known pathogenic variants and protein-truncating variants in MODY genes, and found two rare variants that were high quality, definitely pathogenic, and strongly associated with diabetes: a very rare stop-gain variant in GCK and a nonsynonymous variant in HNF4A. Both were associated with diabetes in the UKB — the penetrance of the HNF4A variant was previously estimated on the basis of a large MODY diabetes cohort to be up to 75 percent at age 40, but the researchers estimated the minimum penetrance to be less than 10 percent based on their analyses of the UKB cohort data.

"This has important implications for the attributable risk associated with the variant in different cohorts and for the interpretation of genetic test results: if the p.Arg114Trp variant was found in an affected individual after clinical testing, it might still be the primary cause of that person's diabetes, although incidental discovery of the variant in an unaffected individual would not be predictive," the authors wrote.

They also observed associations with relevant traits for heterozygous carriers of some rare recessive conditions — for example, they noted that heterozygous carriers of the ERCC4p.Arg799Trp variant that causes xeroderma pigmentosum were more susceptible to sunburn.

Importantly, the team was able to refute the previous disease association of RNF135 in developmental disorders. There was no association between either stop-gain or frameshift variants in RNF135 with any development traits in the UKB, they noted.

"Given the high-quality genotyping of these variants in the UKB and a lack of association with any clinically relevant traits, together with a pLI [score for haploinsufficiency] of zero for both genes, the age of the original publications, and the lack of enrichment of de novo mutations within the [Deciphering Developmental Disorders] study, we suggest that haploinsufficiency in these genes is not a cause of a severe DD," the authors added.

The team concluded that large population cohorts such as the UKB provide an opportunity to shed light on the pathogenicity and penetrance of rare variants, and that its method efficiently combined intensity plots for individual variants across all genotyping batches, as well as analysis for evaluating the validity of rare variants genotyped by microarray.

"We have shown that population genetic data can be used for estimating lower bounds for the effect size and penetrance of pathogenic, disease-causing variants and refined our understanding of the links between rare variants and monogenic diseases," the investigators wrote. "Although population-based studies will be biased in the opposite direction from clinical studies, i.e. towards healthy individuals, they are nonetheless crucial for informing minimum and age-dependent penetrance estimates, interpreting incidental or secondary findings from clinical testing, and informing direct-to-consumer genetic testing."