NEW YORK – A team from the Broad Institute and other centers in the US, Netherlands, Finland, and Germany has tallied phenotypic and disease ties for rare variants in hundreds of genes with the help of large-scale exome and genome sequencing population datasets.
"Our results have implications for multi-ancestry and cross-biobank approaches in sequencing association studies for human disease," senior and corresponding author Patrick Ellinor, a researcher affiliated with the Broad Institute's Cardiovascular Disease Initiative, the University of Amsterdam, and Massachusetts General Hospital, and his colleagues wrote in Nature Genetics on Thursday.
Using exome sequence or whole-genome sequence data generated for nearly 748,900 diverse participants from the UK Biobank, Mass General Brigham Biobank, and the All of Us research projects, the researchers carried out gene-based analyses on rare variants in relation to 601 disease phenotypes, dubbed "phecodes," that were selected from a larger set of 1,866 phecodes.
"For the 601 phecode endpoints, we then performed exome-wide, gene-based burden testing in each dataset followed by a meta-analysis," the authors explained, adding that "[a]lthough ancestry is not truly categorical, we grouped individuals into principle continental ancestry groups based on their genetic similarity to samples from the 1000 Genomes project."
Based on exome sequences for 454,162 UK Biobank (UKB) participants and 51,815 individuals from the Mass General Brigham Biobank (MGB), together with genome sequences for 242,902 All of Us participants, the team's analyses led to 363 significant associations between new and known phenotypes and rare variants found in disease-related genes.
While several phenome-wide association studies (PheWAS) for protein-coding variants have been published from the European ancestry of UKB, All of Us and MGB represent less well-characterized cohorts, the authors noted.
Along with known associations, including ties between variants in the CFTR gene and cystic fibrosis, the researchers unearthed previously unappreciated ties between rare variants and new phenotypes. For rare variants in the Marfan syndrome-causing gene FBN1, for example, they identified relationships with more than a dozen conditions, ranging from cardiovascular disease to chromosomal abnormalities.
In the case of the UBR3 gene, meanwhile, the team saw associations related to cardiometabolic conditions or traits such as hypertension or type 2 diabetes. For a gene called YLPM1, which contains common variants previously implicated in mood instability, depressed affect, or neuroticism, on the other hand, the PheWAS pointed to a relationship between rare YLPM1 variants with psychiatric disease.
Together, these and other PheWAS findings highlighted rare and ultrarare variants — and related effect sizes — that coincided with specific phenotypes or conditions within and across ancestry groups, the authors suggested, noting that such results "are of relevance given the important and continued efforts to sequence underrepresented populations."
"Future studies with even larger diverse datasets might be needed to identify benefits for rare variant burden testing, especially considering our focus on binary outcomes," the authors wrote, adding that "we have made our results available for download and browsing in the Human Disease Knowledge Portal."