This story was originally published June 21.
The 1000 Genomes Project said this week that data from its three pilot projects is now available from public databases as well as Amazon's Web Service. Under the full-scale effort, the consortium plans to sequence the genomes of 2,500 individuals from 27 populations.
Building on the International HapMap Project, the 1000 Genomes Project was launched in 2008 (IS 1/22/2008) in order to increase the catalog of variations in the human genome using next-generation sequencing technologies.
The effort started off with three pilot projects, and the consortium said last year that it would collect additional samples and analyze about 2,000 genomes in total (IS 9/8/2010).
Data from the three pilot projects — 7.3 terabytes in total — is now available at the project's website and can be downloaded from the National Center for Biotechnology Information or the European Bioinformatics Institute. In addition, researchers can access the data through Amazon Web Services, where it is integrated into Amazon's Elastic Compute Cloud. Other cloud computing services are also permitted to provide the data to their customers.
The first pilot project involved sequencing the genomes of two parent-child trios at 20- to 60-fold average coverage and used Illumina's Genome Analyzer, 454's GS FLX, and Life Technologies' SOLiD. All platforms "were able to sequence 85 to 90 percent of a genome and produce high-quality data," according to the consortium.
The second pilot project sequenced the genomes of 179 people at three-fold average coverage, and "the results of the pilot project confirmed that this strategy is effective and will allow the project to meet its goal of discovering sequence variants that are shared with other people."
For the third pilot, the researchers sequenced the exons of 1,000 genes in about 700 persons, and this project "provided unprecedented sample size to learn about the patterns of rare variation in the human population."
Under the full-scale effort, the project now plans to sequence 2,500 genomes of individuals from 27 populations who have been consented to release their DNA samples and full sequence data.
The project is also working on a database for the full-scale effort that will contain several types of variations, including SNPs, small insertions and deletions, structural variants, and copy number variants.
Later this year, the consortium plans to publish a description of the pilot data and the design of the full project in a peer-reviewed journal.