Skip to main content
Premium Trial:

Request an Annual Quote

Analysis of 150K UK Biobank Genomes Leads to Discovery of New Variants, Trait Associations

NEW YORK – UK Biobank investigators at Amgen subsidiary Decode Genetics, Reykjavik University, and other centers have shown that the vast collection of genetic variants revealed with whole-genome sequencing in more than 150,000 of the study's participants can improve efforts to find informative trait or disease associations.

"The large-scale sequencing described here, as well as the continued effort in sequencing the entire [UK Biobank cohort], promises to vastly increase our understanding of the function and impact of the noncoding genome," first and co-corresponding author Bjarni Halldorsson, a researcher at Decode Genetics and Reykjavik University, and his colleagues wrote in Nature on Wednesday.

"When combined with the extensive characterization of phenotypic diversity in the [UK Biobank]," they explained, "these data should greatly improve our understanding of the relationship between human genome variation and phenotype diversity."

As they reported at the American Society of Human Genetics annual meeting last year, members of the UK Biobank team at Decode and the Wellcome Sanger Institute performed whole-genome sequencing — to an average depth of more than 30-fold coverage — on 150,119 of the study's 500,000 participants. The effort was supported by firms such as Amgen, AstraZeneca, GlaxoSmithKline, and Johnson & Johnson, as well as the UK government.

In the newly published paper, UK Biobank researchers from centers in Iceland and Denmark described single nucleotide variants, small insertions and deletions, and larger structural variants found in the data, while highlighting three main ancestry clusters and related haplotype features.

"We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort, and a South Asian cohort," the authors reported, noting that "[a] haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals."

The team's search led to more than 585 million single nucleotide variants, more than 58.7 million indels, and almost 895,100 structural variants, along with more than 2.5 million microsatellites — a collection that was used to search for rare variants influencing conditions ranging from type 1 hemiplegic migraines or myotonic dystrophy to epilepsy, episodic ataxia type 2, or spinocerebellar ataxia type 6 within and across the genetic ancestry-based cohorts.

"Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation," the authors reported.

The team noted that association analyses appeared to get a boost by distinguishing between parts of the genome with or without genetic diversity between individuals with similar ancestral backgrounds. In particular, an analysis of a so-called depleted region (DR) score — representing regions that were relatively devoid of genetic diversity — suggested that strong conservation often turns up outside of protein-coding portions of the genome covered by exome sequencing.

"We expect the DR score presented here to be an important resource for identifying genomic regions of functional importance, although further evaluations should be taken to understand its properties and implications and how it compares to other measures of conservation and sequence constraint," the authors explained.

The investigators reportedly plan to sequence the genomes of all 500,000 UK Biobank participants in the coming years. Individuals enrolled in the study have already been assessed using exome sequencing, phenotypic profiling, and other approaches.

"Data of this type and quantity are going to revolutionize our ability to identify and characterize intergenic sequences of importance to human diversity, be it to risk of disease and response to treatment or some other attributes," senior and corresponding author Kari Stefansson, Decode founder and CEO, said in a statement.

The Scan

Open Pediatric Brain Tumor Atlas Team Introduces Genomic Data Collection, Analytical Tools

A study in Cell Genomics outlines open-source methods being used to analyze and translate whole-genome, exome, and RNA sequence data from the Pediatric Brain Tumor Atlas.

Neurological Outcomes Linked to Innate Immune Features After Cardiac Arrest

Researchers reporting in Med dig into immune features found a few hours after cardiac arrest that correspond with neurological outcomes.

Mouse Study Finds Circadian Rhythm-Related Gene Expression Changes Linked to Sleep Apnea

A paper in PLOS Biology reveals tissue-specific circadian rhythm and gene expression patterns in an intermittent hypoxia-based mouse model of obstructive sleep apnea.

Polygenic Risk Score to Predict Preeclampsia, Gestational Hypertension in Pregnant Women

Researchers in Nature Medicine provide new mechanistic insights into the development of hypertensive disorders of pregnancy, which may help develop therapeutics.