COLD SPRING HARBOR (GenomeWeb News) – The initial analyses of 1000 Genomes Project pilot data are uncovering new common genetic variants in the human genome and offering insights into the genetic variation between individuals and populations.
Speaking at the Biology of Genomes meeting here this morning, University of Michigan biostatistician Goncalo Abecasis brought attendees up to speed on the progress made so far in analyzing data from 1000 Genomes pilot datasets.
The international effort, which began early last year, involves sequencing more than 1,000 human genomes in an effort to catalog and create a public database of all common genetic variants.
For the pilot stage of the project, the team set out to get low coverage sequence for the genomes of 180 individuals, deep coverage sequence for two parent-offspring trios (one European and one African), and targeted re-sequencing of about 1,000 genes in 1,000 individuals. By the end of this year, the team plans to generate sequence data on 400 individuals each from European, East Asian, and African populations.
The project has had two public data releases so far — the first last December and another this January.
The team has identified differences in the number of SNPs per trio, uncovering some four million in the European trio and about five million in the African trio. Both datasets results showed good concordance with the HapMap genotypes already available for these individuals, Abecasis explained. In general, though, SNPs found in the European trio showed more overlap with dbSNP than those in the African trio.
The team's initial analysis of low coverage samples has yielded nearly 22 million SNP calls, including 11.2 million new SNPs. Nearly five million of the SNPs are shared between individuals from all of the populations tested.
Abecasis noted that the team also plans to start depositing newly identified SNPs into dbSBP shortly.
The results to date suggest that applying this new SNP information to genome-wide association studies may provide new insights into existing data, Abecasis explained. For instance, he said, the researchers used 1000 Genomes SNP calls to re-evaluate Wellcome Trust Case Control Consortium data on type 1 and type 2 diabetes, turning up additional loci that were significantly associated with each condition.
The data is also being evaluated for short insertions and deletions as well as copy number variants, Abecasis noted. Since CNVs are more complicated than SNPs, the team is carefully validating all of these variants. So far, roughly 4,000 CNVs have been validated from the 1000 Genomes pilot data.