Skip to main content
Premium Trial:

Request an Annual Quote

High-Coverage WGS of Expanded 1,000 Genomes Project Cohort

The high-coverage whole-genome sequencing (WGS) and analysis of the original 1,000 Genomes Project (1kGP) cohort, along with additional samples that complete hundreds of complete trios, is presented in Cell this week. The 1kGP represents the largest fully open resource of freely available WGS data. Its final release — based primarily on low-coverage WGS — included 2,504 unrelated samples from 26 populations and included 84.7 million single-nucleotide variants (SNVs), 3.6 million short insertions and deletions (INDELs), and a separate set of 68,818 structural variants (SVs). While this dataset captured the vast majority of common SNVs in the population, shortcomings in bioinformatic tools available at the time limited the detection of rare SNVs, as well as INDELs and SVs across the entire frequency spectrum. In this week's report, a team led by Broad Institute scientists describe using the Illumina NovaSeq 6000 System to perform WGS of the original 1kGP samples, along with an additional 698 related samples that completed 602 parent-child trios in the project's cohort, bringing the total number of sequenced and jointly genotyped samples to 3,202. The researchers performed SNV and INDEL discovery, generating a comprehensive set of SVs by integrating multiple analytic methods through a machine learning model. They show gains in sensitivity and precision of variant calls versus phase 3 and build an improved reference imputation panel, making variants discovered here accessible for association studies. Through this work, "we have updated this critical resource with benchmarks and standards for the next generation of large-scale international WGS initiatives," the authors write. "Although many larger sequencing projects have now been conducted, the open nature of the 1kGP samples will continue to make this a foundational resource for the community in the years to come."