Skip to main content
Premium Trial:

Request an Annual Quote

‘1,000 Genomes’ Challenges Data Flow


Managing and analyzing the data from the recently announced 1,000 Genomes Project will pose a number of challenges, some related to the nature of the next-generation sequencing platforms, but organizers say the results will boost both medical research and scientists' understanding of human evolutionary history.

The study consortium, which aims to sequence at least 1,000 and up to 2,000 human genomes within three years, includes two data-related working groups. A data flow group will be responsible for collecting and archiving the sequence reads, helping map them to a reference genome, and making the data available to the research community in different formats and levels of detail. Meantime, an analysis group will focus on aligning the reads, reconstructing the 1,000 genomes from the data, calling genetic variants, and interpreting the results.

A major challenge will be the "sheer volume of data," according to Gil McVean, a professor of statistical genetics at the University of Oxford, who co-chairs the analysis group. The consortium expects the study to produce on the order of 6 terabases of data, or 60 times the sequence data that has been deposited in public DNA databases over the last 25 years.

According to McVean, the analytical tasks fall into three broad areas: technology-related tasks that focus on translating the raw data into DNA sequence and mapping the sequence reads to a reference genome; calling genetic variants such as SNPs and structural variations and reconstructing individual genomes; and using the results to help disease studies and other research projects.

On the technology side, data analysis experts have to grapple with the fact that the nature of the data produced by existing next-generation sequencers is still in flux. "The data that comes out of the machines is changing pretty much month by month as the engineering improves," McVean said.

Julia Karow

Sequencing  Notes

Knome, a personal whole genome sequencing startup, lined up its first two customers and announced a partnership with the Beijing Genomics Institute. Based in Cambridge, Mass., Knome offers whole-genome sequencing and genomic analysis
for $350,000.

DNA sequencing technology startup Genome Corp. raised $250,000 in venture funding to continue work on its massively parallel Sanger sequencing tool. It also named three founding members of its scientific advisory board: Norm Dovichi, Annelise Barron, and Patrick Doyle.

Pacific Biosciences said it expects to commercialize a next-gen sequencer by 2010 that could eventually generate 100 gigabases of sequence per hour, or 10x coverage of a human genome in 15 minutes.


$12 million
NHLBI and NHGRI set aside $12 million in grants for developing cheaper methods of exon sequencing.

Funded grants

Sequencing DNA by transverse electrical measurements in nanochannels
Grantee: Robert Riehn, North Carolina State University
Began: Aug. 1, 2007; Ends: July 31, 2009
With this exploratory grant, Riehn and his team plan to build a sequencing technology using stretched and linearized DNA in nanofluidic channels, and detection using nanoelectrodes, according to the grant abstract. Riehn says this kind of technology would enable ultralong read frames of more than 100 kilobases.

Exon Specific Sequencing of Whole Genomic DNA
Grantee: Darren Link, RainDance Technologies
Began: July 1, 2007; Ends: June 30, 2009
Link says his long-term goal is to build a way to simultaneously sequence thousands of different exons from a genomic DNA sample with 30 to 50 times coverage of each exon. The technology will be based on RainDance's microfluidics platform and 454 Life Sciences sequencing, according to the abstract.

The Scan

Team Tracks Down Potential Blood Plasma Markers Linked to Heart Failure in Atrial Fibrillation Patients

Researchers in BMC Genomics found 10 differentially expressed proteins or metabolites that marked atrial fibrillation with heart failure cases.

Study Points to Synonymous Mutation Effects on E. Coli Enzyme Activity

Researchers in Nature Chemistry saw signs of enzyme activity shifts in the presence of synonymous mutations in a multiscale modeling analysis of three Escherichia coli genes.

Team Outlines Paternal Sample-Free Single-Gene Approach for Non-Invasive Prenatal Screening

With data for nearly 9,200 pregnant individuals, researchers in Genetics in Medicine demonstrate the feasibility of their carrier screening and reflex single-gene non-invasive prenatal screening approach.

Germline-Targeting HIV Vaccine Shows Promise in Phase I Trial

A National Institutes of Health-led team reports in Science that a broadly neutralizing antibody HIV vaccine induced bnAb precursors in 97 percent of those given the vaccine.