NEW YORK – A team of researchers from Helix, as well as the Renown Institute of Health Innovation and the Desert Research Institute in Nevada, has analyzed rare variants in the exomes of more than 70,000 individuals from the UK Biobank (UKB) and the Healthy Nevada Project (HNP) against nearly 6,200 phenotypes, identifying 64 statistically significant gene-based associations.
In a study published on Tuesday in Nature Communications, the researchers said they applied a gene-based collapsing analysis method to 49,960 UKB participants for 4,264 phenotypes and to an additional 21,866 participants from the HNP cohort for 1,934 traits. In addition to the 64 gene-based associations they uncovered in the meta-analysis of the two cohorts, they also found 37 statistically significant associations for phenotypes available in only one cohort.
"We show the unique power of including rare variants from exome sequence data in analyses by demonstrating the significant contributions of singletons to our results and identifying associations that could not have been discovered with a genotyping chip," the authors wrote. "Our analysis makes rare-variant discoveries by combining tens of thousands of exomes with thousands of phenotypes across multiple health systems."
The gene-based collapsing analysis identified genes in which rare variants were, in aggregate, associated with a phenotype, the researchers noted. They explored two gene-based collapsing models: one that identified all non-benign coding and one that only identified loss of function (LoF). The LoF model was used to identify associations where only putative LoF variants had an effect. In the coding model, the researchers included nearly 1.1 million qualifying variants across 16,341 genes in the UKB cohort and 754,459 variants across 17,023 genes in the HNP cohort. In the LoF model, they included 165,480 qualifying variants across 15,276 genes in the UKB cohort and 111,735 variants across 14,848 genes in the HNP cohort. There were 15,999 coding model genes and 13,474 LoF model genes that overlapped between the two cohorts.
After performing several analyses, the researchers found that the vast majority of the significant gene-phenotype associations they identified was consistent with the current knowledge in the field. For example, rare variants in PCSK9 and APOB were associated with low density lipoprotein levels, and rare LoF variants in TUBB1 were associated with platelet count.
They also found several associations that could be expected given the current knowledge in the field, but which had not been previously identified in this type of population. For example, rare coding variants in GP1BB were associated with higher mean platelet volumes in the general population, consistent with their previous association with some familial bleeding and platelet disorders. Further, the researchers found associations between rare coding variants in TYRP1 and blonde hair — a variant in this gene has previously been shown to cause blonde hair in dark-skinned individuals of Melanesian ancestry from the Solomon Islands.
The team also made novel discoveries. For example, it found that rare coding variants in STAB1 were associated with MRI imaging measures in several brain structures, with the strongest association in the putamen. STAB1 is a transmembrane receptor that is thought to play a role in angiogenesis, so this finding provides novel hypotheses for further study, the researchers noted.
Finally, they investigated what proportion of the gene-based signals could be traced back to one causal variant, and they found that while some significant associations had single variants that made major contributions to their effects, few were completely explained by individual variants.
The authors did note that their analysis had some limitations. For example, it included rigid criteria for variant qualification and grouped variants at the gene level. They suggested that future studies in this dataset could utilize more complex weighting algorithms and could look at different ways of grouping rare variants, such as by gene family or by exon. The researchers also noted that this study used a simple dominant model of inheritance, but that recessive models and models that include gene-gene or gene-variant interactions should also provide novel insights.
"This analysis presents one of the first forays into a new standard for human genetics research," the authors concluded. "As the sample sizes of cohorts with extensive phenotypic data and next-generation sequencing grows, both through publicly available cohorts such as the UKB and population-based screening efforts such as the Healthy Nevada Project, we are now able to investigate the biological impact of rare variants with the same fine-tuned precision with which we currently assess the effects of common variants."
The researchers made their results are available for interactive browsing in a webapp.