NEW YORK – A University of Oxford-led team is turning to identify-by-descent (IBD) features found in UK Biobank participant genotypes to dial in on disease or trait associations involving ultra-rare variants — associations that are apt to be missed in genome-wide association studies based on common variants.
At the American Society of Human Genetics virtual meeting on Thursday, University of Oxford statistics researcher Georgios Kalantzis outlined an IBD-based strategy for finding pathogenic, loss-of-function mutations associated with a given trait or condition by digging into stretches of shared IBD sequence in individuals profiled by array-based genotyping who have not necessarily been sequenced.
"If we pick two random individuals and analyze their genealogies, we will eventually reach a common ancestor," Kalantzis said, noting that IBD sequences from seemingly distant but evolutionarily close relatives can contain recent mutations, including rare variants.
To explore these regions, he and his colleagues came up with a new two-step analytical tool called "fast sequentially Markovian coalescent," or FastSMC, designed for finding and validating stretches of sequences inherited from a shared ancestor in related individuals within large biobank collections.
The researchers found some 214 billion IBD segments stretching back to as many as 50 generations in haploid genome sequences from the full UK Biobank collection, for example, and used these sequences to look at population structure within populations in the region.
They also demonstrated that it was possible to predict ultra-rare variant sharing in more than 487,000 genotyped UK Biobank participants with phased genomes — results backed up by their analyses on protein-coding sequences from a subset of 49,000 individuals who had their exomes sequenced for a UK Biobank pilot effort.
"Individuals sharing IBD segments with known carriers of pathogenic mutations are also likely carriers of these variants (by inheriting them from a shared ancestor) and are thus at increased risk for disease," Kalantzis and co-authors wrote in the ASHG presentation abstract. "IBD sharing can therefore be utilized to detect association between complex traits and causal variants, even when these have not been fully sequenced."
Indeed, after using IBD profiles, "loss-of-function segment burden" scores, and other analyses to impute variant patterns in up to almost 500,000 genotyped or exome-sequenced UK Biobank participants, the team demonstrated that it was possible to track down 29 loci with exome-wide significant associations between rare and ultra-rare variants and seven measurable blood traits in some 303,000 genotyped individuals from the biobank.
A handful of the associations involved loci that were also picked up with data from the pilot exome sequencing study, Kalantzis noted, and the IBD-based associations appeared to overlap to some extent with genes and loci found in past genome-wide association studies on specific blood traits. But the IBD-based strategy also led to previously unappreciated blood trait loci.
"This analysis highlights the utility of leveraging IBD detection in a hybrid sequenced/genotyped cohort to both identify novel associations and characterize the contribution of rare genomic variation in the architecture of complex heritable traits," he and his co-authors wrote.
In a related conference presentation, University of Oxford statistician and human genetics researcher Pier Palamara provided additional details on the types of investigations being done by uncovering relatedness among individuals from a given populations, including examples of analyses based on IBD and time to most recent common ancestor using biobank data from the UK, Japan, and the Netherlands.
Analyses hinging on IBD and other evolutionary features are also showing promised for uncovering complex trait contributors, population structure, historical demographic patterns, and natural selection pressures that different populations have faced, he explained.
Large and diverse biobank collections are needed for such analyses because of the distinct geographic, environmental, and demographic events that have helped shape the human genome in populations from different parts of the world.
"Human populations, as we know, have been somewhat isolated from each other in recent history, which means that we're looking for variants that are private to different populations, and also low frequency," Palamara said. "So we need large data sets to be able to observe them."
Based on IBD sharing and postal code data for UK Biobank participants, for example, Palamara, Kalantzis, and their team narrowed in on specific stretches of IBD sequence that could be used to place randomly selected individuals from the biobank to within a median of around 45 kilometers (30 miles) of their birthplace, based on their genome sequence.
The researchers shared similar results, along with findings from the IBD-based blood trait association and other applications of the FastSMC analytical tool, are described in a preprint posted to BioRxiv this spring. They also developed a publicly available website for visualizing and exploring some of the fine-scale population features that are being teased out of the biobank data with FastSMC analyses of IBD sequences, particularly geographic regions within the UK that have contained related pairs of individuals over time.
"[L]ooking at downstream applications, a direction of future work will be to leverage FastSMC to better control for subtle population stratification for both rare and common variants in association studies," they noted in the manuscript. "Our results show that birth coordinates can be effectively inferred from recent IBD sharing, and suggest that this may be a path towards capturing subtle environmental covariates that are missed by genome-wide [identical-by-state]-based approaches."