NEW YORK – A team from the Broad Institute, Massachusetts General Hospital, and other centers has turned to sequence data for individuals in the UK Biobank to assess rare variant associations with thousands of phenotypes, along with natural selection patterns tied to trait- or disease-related variants or genes.
"Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale," co-first and corresponding author Konrad Karczewski, a medical and population genetics researcher with the Broad and Mass General, and his colleagues wrote in Cell Genomics on Monday.
They added that population biobank exomes "provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease."
The team is sharing the rare single-variant associations, gene-based association, and related summary statistics with other investigators through a publicly available "gene-biobank association summary statistics" (Genebass) browser framework hosted on the Google Cloud platform.
"Our web application features a unique layout and navigational scheme for rapidly browsing phenome-wide associations by integrating results across genes and variants," the authors explained. "Customizable controls, plots, and tables enable flexible filtering and visualization of phenotypes, genes, and variants of interest, results can be exported for downstream analyses, and variant associations across traits can be compared with inform pathways associated with complex traits and develop therapeutic hypotheses."
Starting with protein-coding sequence data for 454,697 individuals sequenced through the UK Biobank Exome Sequencing Consortium, which includes UK Biobank investigators and eight biopharma firms, the team focused in on 450,953 high-quality exomes, including 394,841 sequences from individuals of European ancestry.
From nearly 23.9 million high-quality variants detected in the European ancestry exomes, the researchers turned to functional annotation clues to further narrow in on nearly 8.1 million variants and 75,767 variant groups across more than 19,400 human genes to search for associations with some 4,529 quantitative or binary traits or conditions. Each phenotype was represented by at least 200 individuals in the UK Biobank cohort considered, they noted.
With this strategy, the team tracked down tens of thousands of significant associations, including 18 single-variant associations, on average, for each phenotype of interest, along with an average of 1.7 group test associations per phenotype. In addition to known gene-based associations with traits related to cholesterol, bone density, red blood cell, and other measures, the analyses uncovered a group of white matter brain imaging-related variants with predicted loss-of-function effects on the SCRIB gene.
"To our knowledge, this gene has not been associated in previous genome-wide association studies, although it is a constrained gene … that shows evidence for neural tube defects in mice with ultra-rare occurrences in humans," the authors noted.
While the associations involving individual variants tended to be more significant, the investigators explained, the group analyses uncovered more than 2,200 phenotype associations that might have been missed by looking only at single variants in a given gene.
Likewise, they noted that relatively rare predicted loss of function variants played an outsized role in the associations identified by group testing, despite the preponderance of more common missense and synonymous variants making up individual phenotype associations.
From the large collection of phenotype associations, the team was able to start delving into still other variant features behind these associations — from allele frequency and natural selection patterns to gene function and apparent effect sizes.
The study's authors cautioned that the results represent individuals of European ancestry, or just a "slice of human genetic diversity," and noted that "expanding to additional ancestries has been shown to increase power and resolution for genetic discovery."
They further suggested that "[c]oncentrated efforts in building large biobanks with diverse participants will be required to overcome these limitations and provide more insight into the contribution of rare variants to common disease etiology."