NEW YORK (GenomeWeb) – In a study in Nature Genetics today, members of the Human Reference Consortium published a panel of variants gleaned from tens of thousands of human haplotypes — a resource meant to assist in improving human genotype imputation.
"[O]ur aim is to provide a single centralized resource for human genetics researchers to carry out genotype imputation," the study's authors wrote. "This first release of the HRC is the largest human genetic variation resource thus far and has been created via an unprecedented collaboration of data sharing across many groups."
The team used whole-genome sequences for 64,976 haplotypes from 32,488 diploid samples from individuals of European ancestry collected for 20 prior studies — most were covered to between four and eight-fold coverage. With this data, the team assembled a reference panel that makes it possible to impute variants with minor allele frequencies as low as 0.1 percent in individuals of European descent.
After uncovering nearly 95.9 million SNPs with minor allele counts of two or more, the team did a number of quality control and analytical steps, ultimately focusing in on the 39.2 million or so SNPs with minor allele counts of five or more in 32,488 samples.
The resulting HRC appeared to improve the accuracy of genotyping imputation relative to the 1000 Genomes Project dataset, the researchers reported, based on their comparison to array-based genotyping patterns. Likewise, in simulated and real GWAS datasets, the HRC-based imputation unearthed variants that were missed by doing imputation with 1000 Genomes Project data.
To demonstrated the potential utility of the HRC set, for example, the researchers used it to impute genotypes in 1,210 individuals enrolled in the InChianti study. Compared with the 11.9 million SNPs identified with 1000 Genomes Project-based imputation, for example, the 15.5 million SNPs they imputed with the HRC provided a more refined look at variants involved in blood lipid levels, inflammatory markers, and more in that genome-wide association study of circulating blood markers.
To assist investigators using the HRC, the team also established online imputation tools that can be applied to phased or unphased genotyping data, as well as a tool for estimating haplotypes based on genotyping data and rare variant sharing patterns.
"It is our intention to make a limited number of HRC haplotypes available for researchers via the European Genome-Phenome Archive for the sole purpose of phasing and imputation," the team wrote.
The researchers noted that they already planning a second release from HRC, which will include genotyping data from more diverse ancestral populations. That data set will reportedly contain not only SNP profiles, but also information on small insertions and deletions in the genome.