MONTREAL (GenomeWeb News) – Analyses of data for nearly 1,100 individuals assessed through phase 1 of the 1000 Genomes Project have uncovered more than 40 million genetic variants in the human genome, including almost 30 million SNPs not detected previously.
At the International Congress of Human Genetics here this week, University of Oxford researcher and 1000 Genomes Project Consortium representative Gil McVean described findings from integrated analyses of the SNPs, small insertions and deletions, and large deletions identified through low coverage genome data and deeper coverage exome sequence information for 1,092 individuals.
Data from this integrated analysis of 1000 Genomes Project data, variant discovery, variant integration, and haplotype integration were released this week.
Overall, the researchers identified some 37.9 million SNPs in the dataset, including 29.7 million new SNPs. In addition, the team tracked down 3.8 million short indels and 14,000 large deletions. McVean noted that this set represents highly conservative indels and deletions taken through integrated analyses.
So far the sensitivity for finding low frequency variants has been high, he said, and with the data and analyses available so far, the team has more than 96 percent sensitivity to detect variants in any given genome.
The sequences appear to be more than 99 percent accurate at chip heterozygous sites. For common variants, the imputation accuracy is believed to exceed 95 percent.
McVean noted that the exome sequencing data has been particularly useful for finding rarer genetic variants that may have been missed with low coverage whole-genome sequence data alone.
When they compared the genome and exome sequence data within and between the populations tested, the researchers found a substantial amount of differentiation between the rare variant patterns found in populations that seem to be quite closely related.
"The new data allow us to describe the sharing of rare variants across related and admixed populations and to evaluate the benefit of different data types and experimental designs for population-scale sequencing," McVean explained in the abstract for his ICHG presentation.