PHILADELPHIA (GenomeWeb News) – The 1000 Genomes Project collaborators plan to begin releasing data early next year and expect to finish sequencing 1,200 human genomes by around the end of 2009, project representative David Altshuler announced yesterday at the American Society of Human Genetics meeting here.
The team anticipates an official data release starting in January 2009, following a pilot data release this December, said Altshuler, an associate professor of genetics and medicine at Harvard Medical School and a lead investigator for the project. After January, new data will likely be released quarterly.
Meanwhile, the three 1000 Genomes pilot projects — which began in January and are aimed at achieving low coverage of 180 individuals, high coverage of two parent-offspring trios, and targeted sequencing of 1,000 genes in approximately 1,000 individuals — are nearing completion, Altshuler said. Those efforts seem to be generating high-quality data and have already uncovered new genetic variants, he added.
“We declared the pilot project very much a success at this point,” Altschuler told reporters at a press briefing yesterday.
So far, the 1000 Genomes Project has generated 3.8 terabases of data. This September and October, Altshuler said, the team deposited as much data each week as was present in GenBank when the effort began. In 2009, the project is expected to up that dramatically, producing a petabyte of data.
Along with the sequencing effort itself, Altshuler emphasized a need for developing shared data formats for different stages of the analysis. In the absence of standard formats or a clear framework for such analysis, he added, efforts to decipher the genetic information would be delayed. Consequently, team members are working to develop draft formats to aid this analysis.
The goal of the 1000 Genomes Project, an international effort, is to uncover the genetic variants that are present at a frequency of one percent or more in the human genome.
Some have suggested that the large-scale sequencing effort may also help researchers impute new information for the more than 100,000 genotyped genomes available already. While that has not been shown for rare variants, Altshuler explained, it is possible that it could add value to the multitude of samples already scanned with chips.
But beyond the direct implications for the 1000 Genomes Project, the effort has spurred researchers to pioneer and evaluate methods that benefit other research efforts as well. For instance, researchers have been working with high-throughput sequencing, developed new approaches for exchanging and analyzing data, discovering SNPs and CNVs, and making imputations based on next-generation sequence data.
Discussing the project at the press briefing yesterday, former National Human Genome Research Institute director Francis Collins noted that while the project itself is not aimed at linking genotypes to phenotypes, “It will be the engine of many follow up studies.”