NEW YORK — By combining genomic data from two large population cohorts, researchers have discovered new variants in protein-coding genes that are associated with human disease.
An international team of researchers combined whole-exome sequencing data on more than 390,000 UK Biobank participants with imputed genotypes for more than 260,000 FinnGen participants, enabling them to better examine the contribution of rare and low-frequency variants to disease. Genome-wide association studies, they noted, are often underpowered to detect these types of variants.
As they reported Wednesday in Nature, the researchers uncovered nearly 1,000 associations between protein-coding variants in those cohorts and 744 disease endpoints and were able to begin to describe possible disease mechanisms.
"Our results showcase the benefits of combining large population cohorts to discover and replicate novel associations, explain disease mechanisms across a range of common and rare diseases, and shed light on a substantial gap in the allelic spectrum that neither genotyping nor sequencing studies have previously been able to address," senior author Heiko Runz, head of human genetics at Biogen, and colleagues wrote in their paper.
Runz and his colleagues conducted a coding-wide association study of the different disease endpoints, teasing out 975 associations that met genome-wide significance and 717 that met a more conservative multiple-testing threshold for significance. Of the 975 associations, the researchers estimated that 387 would not have been detected if the two datasets had been analyzed separately.
In particular, their analysis benefited from the enrichment of rare alleles among Finns as compared to non-Finnish Europeans, they noted, as Finns represent a distinct gene pool. This boosted their ability to detect associations.
The researchers cross-referenced their findings with previous genome-wide association studies and ClinVar entries to find that about a third of their associations had not been reported before and that 177 of the distinct loci were in genes not previously mapped to that disease.
Further, 482 genes were associated with 148 distinct disease clusters. Most genes were tied to one disease cluster, but about a dozen were linked to at least five clusters. For instance, variants in CHEK2 were associated not only with breast cancer but also with colorectal and thyroid cancer risk as well as with ovarian cysts and benign meningeal tumors.
With these findings, the researchers could also begin to home in on disease-linked pathways. For example, they uncovered two rare variants linked to pulmonary embolism risk, both of which are in genes encoding coagulation cascade proteins and affect the circulating levels of those factors.
Meanwhile, a deletion in SLC34A1, which codes for the NPT2a sodium transporter, was linked to increased risk of renal and urinary tract stones, as well as to increased serum calcium and reduced phosphate. These and other findings suggested that therapies targeting NPT2A gene pathways might benefit deletion carriers.
The researchers also focused on variants associated with atrial fibrillation risk. A low-frequency missense variant in the methylase gene METTL11B, for example, was associated with increased atrial fibrillation risk. That variant falls in a ligand binding site and is expected to affect the methylation of other disease risk genes. Similarly, the researchers noted other variants within the SCN5A–SCN10A locus that were associated with reduced atrial fibrillation risk, lower pulse, and increased atrioventricular block risk, indicating that the loss of function of the sodium channel encoded by SCN10A might affect vagus nerve activity on the atria.
In all, the researchers noted that their approach enabled them to uncover additional rare and low-frequency variants linked to disease. "Our results foreshadow the discovery of many additional coding and non-coding associations from cross-biobank analyses at even larger sample sizes," they added.