NEW YORK (GenomeWeb News) - An international team of researchers, led by the US National Institutes of Health’s National Human Genome Research Institute, the Wellcome Trust Sanger Institute in England, and the Shenzhen branch of the Beijing Genomics Institute in China will sequence at least 1,000 genomes using new DNA sequencing technologies, the consortium said this week.
The goal of the three-year “1,000 Genomes Project” is to produce a detailed catalog of common as well as less common genetic variants in the human genome that will help researchers pinpoint genetic causes of diseases and better understand human biology.
While existing databases list genetic variations found in at least 10 percent of the population, the new map will also include variants that are present in as few as 1 percent of humans across their entire genome, and 0.5 percent of people when only looking within genes.
The first 1,000 samples in the project will come from the International HapMap project as well as from the extended HapMap set. These anonymous samples – a total of 1,085 – were collected from several populations originating in Africa, Japan, China, Europe, India, and Mexico.
Unlike the HapMap project, this study will also not only map SNPs but also produce a high-resolution map of structural variants, such as insertions, deletions, or rearrangements.
“Between these two types of genetic variants – very rare and fairly common – we have a significant gap in our knowledge,” said David Altshuler, a researcher at Massachusetts General Hospital and the Broad Institute, and co-chair of the consortium’s steering committee, in a statement. “The 1,000 Genomes Project is designed to fill that gap, which we anticipate will contain many important variants that are relevant to human health and disease.”
Findings from the project could also help researchers interpret the results of genome-wide association studies. In these studies, researchers often find genomic regions that correlate with disease, but they do not know the causal variant.
A steering committee, co-chaired by Altshuler and Richard Durbin, a principal investigator at the Sanger Institute, manages the project. In addition, the consortium consists of a sequence production group, an analysis group, a data coordinating group, and a samples and ethical, legal, and social issues group.
The project will kick off with three pilot studies that are expected to last for about a year, followed by a two-year production phase. The entire study is expected to add at least 6,000 gigabases of data to the public databases, 60 times more than all sequence data deposited in these databases over the last 25 years.
The anticipated cost of the project is $30 million to $50 million, a “ballpark estimate” of the sequencing costs, according to Adam Felsenfeld, director of the large-scale sequencing program at NHGRI. That estimate is based on “our best information of what we think the new technology platforms can do.” Using Sanger-based sequencing, the project would likely cost more than $500 million, which would be “prohibitive,” he said.
Five centers will generate the sequence data: the Sanger Institute, BGI Shenzhen, and the National Human Genome Research Institute’s three large-scale sequencing centers, namely the Broad Institute of MIT and Harvard, Washington University School of Medicine’s Genome Sequencing Center, and Baylor College of Medicine’s Human Genome Sequencing Center.
A comprehensive version of this article will appear in today's issue of In Sequence.