UK10K Project Demonstrates Utility of Population-Scale Sequencing

NEW YORK (GenomeWeb) – Researchers leading the UK10K project to sequence 10,000 individuals have published initial results today detailed in two publications in Nature and one paper in Nature Communications.

The UK10K project launched in 2010 with the goal of sequencing whole genomes at low coverage of 4,000 people and exomes of 6,000 people in order to discover rare genetic variants that are important in human disease.

The project involved researchers at the Wellcome Trust Sanger Institute, Bristol University, King's College London, the Medical Research Council, the UK Department of Health, the University of Oxford, the University of Cambridge, the University of California Los Angeles, and Edinburgh University. The Wellcome Trust was the primary funder, awarding £10.5 million ($16.2 million) to the Sanger Institute and clinical collaborators.

In the Nature study that detailed the main findings and lessons learned from completing a project of such large scale, the researchers characterized over 24 million novel genome-wide sequence variants and were able to identify novel pathogenic variants for rare diseases. In addition, the data enabled the researchers to develop a genotype imputation reference panel that enabled further discovery of variants associated with disease.

The team also explored the occurrence of secondary findings from the exome dataset, finding that, consistent with other studies, approximately 2.3 percent of individuals harbor a pathogenic variant in one of 56 genes deemed medically important and interpretable by the American College of Medical Genetics and Genomics.

The UK10K project included two main arms. The first was a cohorts arm with a goal of assessing the contribution of genetic variation to 64 phenotypes related to obesity, diabetes, cardiovascular and blood biochemistry, blood pressure, and dynamic measurements of ageing, birth, heart function, lung function, liver function, and renal function. For this portion, researchers assessed the whole genomes of 3,781 healthy individuals from two extensively studied cohorts of European ancestry — the Avon Longitudinal Study of Parents and Children and TwinsUK. Whole genomes were sequenced to an average of 7x coverage.

From this arm, over 24 million novel SNVs were discovered. The researchers evaluated the association of the low-frequency and rare variants with 31 core traits, finding that 27 independent loci were associated, two of which had not been previously identified — a low-frequency intronic variant in ADIPOQ, which was associated with decreased adiponectin levels; and a rare splice variant in APOC3, which was associated with plasma triglyceride levels and was published prior to the full UK10K Nature publication.

The other 25 loci included common, low-frequency, and rare variants with known associations to adiponectin levels, lipid traits, hemoglobin levels, and fasting glycemic traits. Somewhat surprisingly, there was "no evidence of low-frequency alleles with large effects upon traits with classical lipid alleles identifying extremes of single- variant genetic contributions for these traits," the authors wrote. "This suggests that few, if any, low-frequency variants with stronger effects than those we see are likely to be detected in the general European population for the wide range of traits that we considered."

To search for additional variants with a moderate effect or variants with a rarer frequency, the researchers used the UK10K reference panel that they created as part of the project and published in a subsequent paper in the same Nature issue. They applied it to over 22,000 additional samples from the 14 cohorts imputed to the panel. That effort identified two novel associations with low-density lipoprotein cholesterol.

In the second arm of the UK10K study, researchers sequenced the exomes of 5,182 individuals to 80x coverage, discovering 842,646 SNVs and 6,067 indels, of which more than 60 percent were found in only one individual.

Approximately 1,000 exomes were from individuals with one of eight rare diseases. The researchers identified 25 novel genetic causes for five of those, including 14 ciliopathies, seven neuromuscular disorders, two eye malformations, one congenitial heart defect, and one case of intellectual disability.

In contrast, analyses of three complex diseases — obesity, autism spectrum disorder, and schizophrenia — did not yield causative variants.

Secondary findings were identified in 42 of 1,805 individuals, or 2.3 percent. The researchers identified two main challenges when searching for and reporting on secondary findings. The first challenge is in interpreting potential pathogenic variants, including the need for clinical expertise and more complete databases. In addition, the researchers also found that for some disorders, "the frequency of carriers is likely to be too high compared to the disease frequency," the authors wrote, suggesting that further study needs to be done regarding penetrance of variants and accurate estimates of population frequencies.

A third publication in Nature Communications highlighted the utility of using the combined whole-genome and exome UK10K data, as well as the genotype reference panel. In that study, researchers identified a low-frequency noncoding variant associated with large effects on bone mineral density, which can cause osteoporosis.

"Overall, this effort has given us both new genomic tools and insights into the role of low-frequency and rare variation on human complex traits, and will inform strategies for future association studies," the authors wrote.

Technology Review reports that 2017 was the year of consumer genetic testing and that it could spur new analysis companies.

A phylogenetic analysis indicates two venomous Australian spiders are more closely related than thought, the International Business Times reports.

In Science this week: CRISPR-based approach for recording cellular events, and more.

A new company says it will analyze customers' genes to find them a suitable date, though Smithsonian magazine says the science behind it might be shaky.

Sponsored by

This webinar will discuss the findings of a recent effort to sequence microbial communities in the Dry Valleys of Antarctica, one of the world's most extreme environments.

Sponsored by

This webinar will walk through key considerations and helpful guidelines to accelerate next-generation sequencing (NGS)-based clinical genomics assay validation for less money and greater confidence in results.

Sponsored by

In this webinar, Jill Viles, an Iowa mother with no clinical training, shares her story of how she self-diagnosed her rare condition, a muscle-wasting disease caused by a mutation in the LMNA gene. She will also discuss how she discovered that a mutation in the same gene is the underlying cause for the excess muscle phenotype exhibited by Canadian Olympic hurdler Priscilla Lopes-Schliep. 

Sponsored by
Swift Biosciences

This webinar will discuss an optimized protocol for methyl-CpG binding domain sequencing (MBD-seq), which enables comprehensive, adequately powered, and cost-effective large-scale methylome-wide association studies (MWAS) of almost all 28 million CpG sites in the genome.