NEW YORK — By relying on haplotype sharing within the UK Biobank cohort, researchers from the Broad Institute have imputed exome-wide variants into the full cohort. Through this and additional analyses, the researchers identified several very rare variants likely to be causal for a range of quantitative traits.
The UK Biobank encompasses data on about 500,000 individuals, including genetic and phenotypic information. The researchers noted that exome sequencing data from about 10 percent of the UK Biobank would not allow them to conduct an analysis of the effects of ultrarare variants; however, relying on haplotype sharing among biobank participants could allow them to impute variants in the full cohort and power their analyses.
As they reported in Nature Genetics on Monday, the Broad's Po-Ru Loh and his colleagues used the data they generated to analyze 54 quantitative traits to uncover more than 1,000 significant trait-variant associations, including large-effect rare variants influencing height.
"In general, the main idea of imputation is to leverage reference data to enable analysis of genetic variants that are not directly measured in a cohort — thereby expanding the utility of existing datasets without incurring additional cost," Loh said in an email. "In the past, this approach has primarily been applied to variants that are commonly observed in the population, but in this work, we applied it to very rare coding variants, which traditionally have been viewed as only accessible via direct sequencing."
In particular, the researchers used whole-exome sequencing data from 49,960 UK Biobank participants and SNP-array genotyping data on the full cohort to impute rare variants — including ones with minor allele frequencies of about 0.00005 — with high accuracy. They further tested whether any of these imputed variants were associated with 54 quantitative traits like anthropometric traits, blood pressure, and lung function, using linear mixed-model association on the more than 459,000 UK Biobank participants of European ancestry. This identified tens of thousands of associations.
After stringent filtering, the researchers homed in on a set of 1,189 associations with 675 unique protein-altering variants that were likely causal. They estimated that 30 percent of these associations could only have been identified by imputation using the UK Biobank exome sequencing data. For blood-cell traits and height — two traits the researchers focused their analyses on — they additionally calculated that about a quarter of the blood-cell trait-linked variants they uncovered and 45 percent of the height-linked variants were not captured by a recent large association study.
About half the likely causal variant-trait associations uncovered occurred in genes with multiple likely causal rare coding variants for the same trait, underscoring the role of long allelic series. The researchers further teased out long allelic series affecting core genes for numerous traits, one harboring 45 different likely causal variants.
The researchers also identified new large-effect variants, such as ones affecting height. Some of these genes had previously been implicated in Mendelian diseases like NPR2, COL2A1, and HERC1, but the variants themselves were novel.
According to the researchers, their study foreshadows what future studies will be able to do as exome association studies become larger, as very large exome sequencing datasets represent natural genetic perturbation experiments. They added that whole-exome imputation into larger cohorts could help gauge the effect of coding variation within the human genome. Loh noted that their approach could be applied to other large genetic datasets.
He and his team are now "pursuing a few related research directions examining penetrance of pathogenic variants and ascertaining rare gene-altering copy-number variants," he added.