Skip to main content
Premium Trial:

Request an Annual Quote

1000 Genomes Project Nears 3 Terabases of Sequence Data; Berlin's MPI Joins Effort

COLD SPRING HARBOR, NY (GenomeWeb News) – The 1000 Genomes Project has almost tripled the amount of sequence data it has produced during its pilot phase since this spring, to 2.8 terabases, or approximately 100 terabytes.
 
The project has also added another production center, the Max Planck Institute for Molecular Genetics in Berlin, that has recently begun to generate data for the effort.
 
Paul Flicek, head of the vertebrate genomics group at the European Bioinformatics Institute in Hinxton, UK, and co-leader of the 1000 Genomes Project's data flow group, gave an update on the project's progress at the Personal Genomes meeting at Cold Spring Harbor Laboratory last week.
 
Raw sequence data generated by the production centers is amassed at the EBI, where researchers in collaboration with colleagues from the Wellcome Trust Sanger Institute recalibrate it in order to obtain accurate and uniform quality scores that allow data from different centers and sequencing platforms to be compared.
 
It is then uploaded to both the EBI’s and the National Center for Biotechnology Information's FTP sites for public access. Long term, the data will be stored in the NCBI’s Short Read Archive and the EBI’s European Read Archive.
 
The next batch of data — resulting from a data freeze in August — will be ready for download early this week, according to Flicek. As a result of the increased data production, data transfer between the production centers and the data storage centers is becoming increasingly difficult, he added.
 
The next data freeze, which is planned for Oct. 24, is expected to complete data production for the first two of the three pilot projects.
 
Under the first pilot project, researchers are sequencing 60 HapMap samples from three different populations at low coverage. The second pilot involves sequencing two trios – parents and child – of European and African descent at high coverage. The third pilot project aims to sequence 1,000 genes in 1,000 individuals at high coverage.
 
Later this year, following a meeting in November, the scientists are planning to release a first genetic variation map, according to Flicek.
 
Following the pilot phase, the entire project, he said, will probably generate about 20 terabases of sequence data. Sequencing production worldwide, he estimated, will soon be just an order of magnitude smaller than data generation by the Large Hadron Collider that recently opened in Geneva, which is expected to produce 15 petabytes of data per year.
 
The 1,000 Genomes project, a three-year project, was launched in January. The goal of the project is to produce a detailed catalog of genetic variants in the human genome.
 
In May, the projects organizers announced they had generated 300 gigabases of sequence data, more than the amount of data stored in GenBank.
 
The following month, Illumina, Roche/454, and Applied Biosystems joined the project as data producers, which already included the Sanger Institute, BGI Shenzhen, the Broad Institute of MIT and Harvard, Washington University School of Medicine’s Genome Center, and Baylor College of Medicine’s Human Genome Sequencing Center.
 
The MPI in Berlin is the latest production centers to join the effort, Flicek told GenomeWeb Daily News.

The Scan

Positive Framing of Genetic Studies Can Spark Mistrust Among Underrepresented Groups

Researchers in Human Genetics and Genomics Advances report that how researchers describe genomic studies may alienate potential participants.

Small Study of Gene Editing to Treat Sickle Cell Disease

In a Novartis-sponsored study in the New England Journal of Medicine, researchers found that a CRISPR-Cas9-based treatment targeting promoters of genes encoding fetal hemoglobin could reduce disease symptoms.

Gut Microbiome Changes Appear in Infants Before They Develop Eczema, Study Finds

Researchers report in mSystems that infants experienced an enrichment in Clostridium sensu stricto 1 and Finegoldia and a depletion of Bacteroides before developing eczema.

Acute Myeloid Leukemia Treatment Specificity Enhanced With Stem Cell Editing

A study in Nature suggests epitope editing in donor stem cells prior to bone marrow transplants can stave off toxicity when targeting acute myeloid leukemia with immunotherapy.