NEW YORK (GenomeWeb News) – In a paper appearing online today in Nature, members of the 1000 Genomes Project Consortium reported on the sequencing and analysis strategies used for the pilot phase of the study — and described the genetic variation that's been identified through these pilot efforts.
"Already, just in the pilot phase, we've identified over 15 million genetic differences by looking at 179 people," corresponding author Richard Durbin, a genome informatics researcher and group leader at the Wellcome Trust Sanger Institute, told reporters during a press briefing this week. As such, he said, the finings provide "a more complete catalog of variation than was available previously."
Improvements in DNA sequencing technology have made it possible to find human genetic variation by sequencing many individuals, he noted, explaining that members of the international 1000 Genomes Project are employing such large-scale genome sequencing strategies to put together a catalog of common variants — information that is already being used to inform studies of common and rare variants and their role in disease.
For the pilot stage of this effort, they focused on three projects: low-coverage, whole-genome sequencing of 179 individuals from four populations, more in-depth sequencing of mother-father-child trios, and exon sequencing targeting 8,140 exons in 697 individuals from seven populations.
So far, the team has uncovered roughly 15 million SNPs, along with a million small insertions and deletions, and 20,000 structural variants.
Moreover, those involved in the collaboration touted the project for helping to spur improvements in the accuracy, efficiency, and availability of genome sequencing and analysis methods.
By analyzing this data, the researchers have also found evidence suggesting each individual carries some 250 to 300 genes with at least one defective copy, as well as 50 to 100 variants that have been linked to inherited genetic conditions.
In addition to mapping the variants — and characterizing their frequency and haplotype patterns — researchers explained that data from the pilot projects are also being used to explore everything from de novo mutation rates in the sequenced families to natural selection patterns and recent evolution patterns in human populations.
Meanwhile, in an accompanying paper appearing online in Science, University of Washington genome science researcher Evan Eichler and his co-workers outlined how they have used low coverage 1000 Genomes pilot data — combined with data for more than a dozen high coverage genomes — to examine copy number profiles in parts of the genome that were once considered out of reach.
To do this, the researchers focused on about 1,000 repetitive, duplicated genes that "have been largely inaccessible to traditional genetic study as a result of their repetitive nature," Eichler told reporters, using short-read sequence data on 159 human genomes that had been generated using the Illumina sequencing platform.
By mapping short-read genome sequences to the human reference genome with mrsFAST aligner, the team was able to evaluate read depth in the genome and create sequence tags to help distinguish between genes. That, in turn, helped assess both the copy number and content in specific regions of the genome.
"We're looking at individual level variation, as opposed to variation at the population level," Eichler noted, explaining that his team's analysis points to a "whole new level of genetic diversity" in terms of copy number differences in specific gene families.
For instance, they found that many of the genes that seem to be the most copy number variable fall in duplicated regions of the genome. "You can think of these almost as accordions of the genome," Eichler explained, "expanding and contracting in terms of their copy number."
Their findings suggest that there are often copy number differences within these regions between different human populations tested.
In addition, the researchers noted, comparisons between copy number patterns in humans and primates pointed to 53 gene families that have expanded in the human lineage since separating from the chimpanzee and gorilla lineages, including gene families implicated in neural development and disease.
In the future, Eichler said, he and his team plan to begin doing functional studies of these sorts of repetitive, duplicated genes. With the newfound ability to assess copy number in these genomic regions comes the opportunity to do association studies involving gene families in these regions, he added.
Meanwhile, 1000 Genomes Project investigators are currently working on sequencing 2,500 individuals for the main phase of the project, which is expected to produce an even more complete catalog of genetic variation in the human genome.
Although the project itself is not a medical study, it is already finding favor as a foundational tool for genetic studies, 1000 Genomes steering committee member David Altshuler, director of the Broad Institute's medical and population genetics program, told reporters. And in the future, he explained, having this catalog of genetic variants is going to help researchers distinguish background genetic variation from common and/or rare variants involved in disease.
"It's clear from the history of genetics of humans — but also of many model systems — that following the genetic contributors to disease can be a powerful tool to discover new clues about the genes and underlying biological basis of diseases, both rare and common," Altshuler said.
Data, genetic variant analysis, and other information on the 1000 Genomes Project is available here.