NEW YORK – A team from Regeneron Genetics Center, GlaxoSmithKline, and elsewhere has demonstrated the variant profiles and disease clues that can be gleaned from exome sequencing profiles from individuals enrolled in the ongoing, prospective UK Biobank study.
"These data greatly extend the current genetic resource, particularly in ascertainment of rare coding variation, which we demonstrate is useful for resolving variant-to-gene links and the directionality of gene-to-phenotype associations," co-corresponding authors Aris Baras, with the Regeneron Genetics Center, and Laura Yerges-Armstrong, at GlaxoSmithKline, and their colleagues wrote.
Prior to expanding exome sequencing to the more than 502,500 participating in the UK Biobank study, the researchers set out to profile exome sequences in a subset of almost 50,000 participants for a study published in Nature on Wednesday. Across the individuals profiled, they saw examples of loss-of-function (LOF) variants in some 97 percent of the protein-coding genes considered, represented by nearly 198,300 specific autosomal variants.
The team noted that pathogenic or likely pathogenic variants in the BRCA1 and BRCA2 genes were over-represented in participants who have been diagnosed with breast, ovarian, prostate, or other cancer types, for example, while around 2 percent of the exome-sequenced individuals had at least one medically actionable variant in their germline.
"We illustrate the unique value of this expanded [whole-exome sequencing] resource in the [UK Biobank] to assess pathogenic and likely pathogenic variants in a disease-agnostic, large-scale, population-based study with longitudinal follow-up," the authors wrote.
For the study, the researchers sifted through protein-coding sequences generated for 49,960 participants in the UK Biobank effort by collaborators at the Regeneron Genetic Center, uncovering almost 10.2 million autosomal variants — a set that spanned some 4.5 million single-nucleotide variants, nearly 212,500 small insertions and deletions, and 198,269 LOF variants.
After quality control steps, the team was left with around 4 million SNPs in protein-coding sequences, with the vast majority falling at frequencies below 1 percent. Variants detected in the exome sequence set included those not found in prior array-based profiles and imputation analyses on UK Biobank participants.
"[Whole-exome sequence] data have identified new associations that are unique to the exome sequence and detected in only approximately one-tenth of the sample size," the authors noted, adding the new findings "highlight the considerable power of [whole-exome sequencing] for the discovery of LOF variants and rare variant associations and the further promise of new biological insights through the sequencing of all participants in the [UK Biobank] resource."
By bringing in additional phenotyping clues already collected from the participants, the researchers also detected LOF in genes such as PIEZO1, COL6A1, MEPE, IQGAP2, and GMPR that appeared to coincide with the presence of varicose veins, corneal features, bone density, and blood cell traits, respectively.
Exome sequence data from the current study is available to other researchers, they noted, adding that the UK Biobank team ultimately plans to expand the exome analyses to the project's more than 500,000 participants in the future.
"Coupled with the rich laboratory, biomarker, health record, imaging, and other health-related data that are continually added to the [UK Biobank] resource, [whole-exome sequencing] will enhance the power for discovery and will continue to yield many important findings and insights," the authors concluded.