NEW YORK (GenomeWeb News) – Members of the International HapMap 3 Consortium reported in Nature today that they have used a combination of genotyping and targeted re-sequencing to catalog common and rare genetic variants in 11 populations from around the world.
"In this paper, we really have an integrative analysis," co-author Fuli Yu, a researcher with Baylor College of Medicine's Human Genome Sequencing Center, told GenomeWeb Daily News. "We have both common and rare (very low frequency) genetic variation in this study."
The team genetically characterized individuals from 11 different populations, genotyping nearly 1,200 individuals and re-sequencing specific genomic regions in almost 700 individuals from 10 of the populations. In the process, they were able to find and compare both SNP and copy number variation patterns across the populations tested, and explore the accuracy with which such variants can be detected through imputation.
"This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease," the team wrote, "and serves as a step towards a high-resolution map of the landscape of human genetic variation."
The HapMap 3 effort was led by Richard Gibbs, director of the BCM Human Genome Sequencing Center, David Altshuler, a geneticist at Harvard University and director of the Broad Institute's Program in Medical and Population Genetics, and the late Leena Peltonen, who was head of human genetics at the Wellcome Trust Sanger Institute.
"Despite the remarkable achievements following from the Human Genome Project, our knowledge of human genetic variation remains limited," Gibbs said in a statement. "Here we have studied more populations and were able to include [copy number polymorphisms] in genome-wide studies."
The project involved genotyping individuals from 11 populations, including many of the samples collected in the US, China, Japan, and Nigeria for the first two phases of HapMap, along with additional samples from those locations and samples from seven more populations.
Among the newly sampled populations: Luhya and Maasai populations from Kenya, Tuscan populations from Italy, and individuals from the US with African, Chinese, Gujarati Indian, or Mexican ancestry.
The team initially genotyped 1,486 samples using the Affymetrix Human SNP 6.0 array and 1,284 samples using the Illumina Human1M beadchip array. After their quality control steps, the researchers were left with data for 1,326 of the Affymetrix-genotyped samples and 1,211 samples genotyped using the Illumina platform — information that they compiled and pared down to a consensus genotype set consisting of data on 1,184 individuals at more than 1.4 million SNPs.
For 692 of the unrelated participants, researchers used Sanger sequencing to sequence 10 genomic regions — each about 100,000 base pairs long — that had garnered the attention of researchers involved in the Encyclopedia of DNA Elements or ENCODE project, Yu explained. And, he noted, ENCODE members had previously sequenced half of these regions in HapMap1 and 2 samples.
Of the SNPs identified through sequencing, more than three-quarters — 77 percent — were new and not housed in the dbSNP database.
Overall, the team's genotyping and sequence data suggest population-specific differences are particularly pronounced for less common and rare variants — consistent with the notion that common variants are older and, consequently, shared by more human populations.
"As expected, lower-frequency variation is less shared across populations, even closely related ones, highlighting the importance of sampling widely to achieve a comprehensive understanding of human variation," they noted.
Interestingly though, the researchers found that some rare variants showed up in more than one of the populations tested. These rare variants belonged to distinct haplotype backgrounds that varied by population, Yu explained, suggesting the changes arose independently in multiple populations and have been maintained at low levels within these populations.
The team also found dozens of genes that appear to be under selection in specific populations using the composite of multiple signals, or CMS, method. For instance, they explained, they found signals of selection associated with immune genes in populations from Kenya. The Tuscan population, on the other hand, appears to have signals of selection affecting genes related to pigmentation and other processes.
Meanwhile, the researchers noted, their findings support the idea that the ability to impute copy number changes and low frequency SNPs hinges on access to appropriate reference and genotyping data.
"There's a lot of [imputation] power if you interrogate a large number of individuals — larger than the previous HapMap1 and 2. If you incorporate multiple populations, you do see improvement in imputation accuracy," Yu explained, noting that such information also improves the imputation of rarer genetic variations and copy number changes.
Data from the HapMap 3 project is available online here.