Skip to main content
Premium Trial:

Request an Annual Quote

Exome Sequencing of UK Biobank Participants Highlights Natural Variation, Gene Function

NEW YORK — By sequencing the exomes of more than 450,000 people from the UK Biobank, researchers at the Regeneron Genetics Center and their colleagues have uncovered hundreds of genes that contribute to different traits and diseases.

The researchers were able to use natural genetic variation found within the biobank participants to begin to tease out what the effects of variants in protein-coding genes might be, which could help determine the biological functions of some genes as well as point to new therapeutic strategies. In all, they identified 12 million coding variants and tested whether they were associated with about 4,000 health-related traits. As they reported in Nature on Monday, the researchers found more than 500 genes with such trait associations, including some that increase risk of diseases, like liver disease, as well as others that decrease risk of diseases, like hypertension and asthma.

"This unique catalog of coding variation, combined with the large sample size and thousands of available phenotypes, provides a unique opportunity to assess gene function at unprecedented scale," the researchers, led by Regeneron's Manuel Ferreira and Gonçalo Abecasis, wrote in their paper.

The UK Biobank Exome Sequencing Consortium sequenced the exomes of 454,787 biobank participants to a 20X or higher depth. Of the 12.3 million variants identified, 3.4 million were synonymous, 7.9 million were missense, and about 915,000 were putative loss-of-function (pLOF) variants. The scientists then tested the associations between the deleterious missense and pLOF variants and 3,994 health-related traits that had been measured in the cohort.

Following some 2.3 billion association tests — focusing at first on WES data from biobank participants of European ancestry — the researchers uncovered 8,865 significant associations that involved 564 genes, 492 traits, and 2,283 gene-trait pairs. A total of 81 percent of the associations were replicated in a separate cohort of 133,370 people.

Some variant-trait associations appeared to be protective and linked to a lower risk of disease. For instance, the researchers noted previously reported ties between PCSK9, APOB, and APOC3 and protection against hyperlipidemia. In addition, using a more liberal significance threshold, they found associations between SLC9A3R2, which encodes a scaffolding protein that is expressed in the kidney, and lower risk of hypertension, as well as between variations in SLC27A3 and lower risk of childhood asthma.

Another 131 genes, meanwhile, had protective effects on quantitative traits. For instance, pLOFs in FAM234A were associated with lower serum glucose levels, and alterations in ASGR1 were linked to lower apolipoprotein B levels.

The researchers noted that the identification of protective variants like these are of particular interest as they could point to genes that could be blocked by antibodies or other inhibitors as treatments.

The researchers also conducted association analyses using WES data collected from individuals of African, East Asian, and South Asian ancestry. They found that many of the associations held across ancestries, especially for quantitative traits, but less so for binary traits, which they attributed to low power.

At a related session at the American Society of Human Genetics annual meeting this week, Joshua Backman from the Regeneron Genetics Center, the first author of the paper, noted that whole-genome sequencing of UK Biobank participants is already underway.

"Accomplishing our original goal of understanding the health consequences of genetic variation in each human gene will likely require sequencing millions of well-characterized and diverse individuals," the researchers wrote in their paper. "In our view, our results not only show this goal is within reach but also suggest that sequencing 5 million individuals would enable the identification of 500+ heterozygous LOF carriers for ~15,000 genes – that is, for the great majority of human protein-coding genes."