HONOLULU (GenomeWeb News) – Researchers involved in the 1000 Genomes Project are wrapping up their pilot studies and undertaking the main phase of the project: low coverage sequencing of hundreds of genomes each from several different populations, attendees at the American Society of Human Genetics heard here yesterday.
In a session on "The 1000 Genomes Project and Medical Genetic Uses of Sequencing," yesterday morning, University of Oxford statistics researcher Gil McVean, co-chair of the 1000 Genomes Project analysis group, offered an overview of the project so far, noting that remaining data from the pilot phase of the project is expected to be released within a month or two.
Meanwhile, Broad Institute researcher David Altshuler, co-chair of the project's steering committee, discussed the potential of sequencing studies such as the 1000 Genomes Project for filling in gaps left by genome-wide association studies and identifying the biological mechanisms behind these associations.
The 1000 Genomes Project is an international collaboration aimed at sequencing well over 1,000 human genomes to find common variants present in at least one percent of the population. Those involved say the effort will not only provide a resource for accelerating the identification of disease mechanisms, but also provide the opportunity for assessing the feasibility of next-generation platforms for population-scale genome sequencing.
McVean said the team has completed the data collection for the three pilot projects: low coverage (two to four times) sequencing of 60 unrelated, deeper sequencing (about 20 times coverage) of an African trio and a European trio, and exon sequencing (to about 20 times coverage) in roughly 1,000 individuals.
Some of that data is already online, he noted. The team is currently validating the remaining pilot data, which it plans to release this November or December, McVean said. He hopes to see others use the data for their own research — for instance to impute SNPs and fill in gaps in their data.
By analyzing data from these pilot efforts, the team has identified millions of new SNPs as well as many insertions, deletions, and other structural variants. The researchers are also finding many SNPs that have different frequencies in different populations, along with hints about some of the biological features and mutations that appear to knock out gene function.
Based on these findings, McVean said the team believes they are cobbling together a nearly complete record of common SNPs in the human genome. "We really are producing a very, very comprehensive catalog," he said.
Still, McVean added, the team has more work ahead of them. "The pilot is already complete in many ways," he said. "The machines haven't stopped running though."
For the main phase of the project, the team plans to sequence the genomes of at least 400 individuals each from European, East Asian, African, and Central/South/North American populations to about four times coverage. They expect to finish generating that data sometime next year.
During the session's panel discussion, Richard Durbin, co-chair of the 1000 Genomes steering committee, lauded the potential of doing low-coverage sequencing in many individuals, explaining that the approach seems to be well-suited to finding the sorts of variants they are looking for — as well as those present at even lower frequencies in the population.
Data so far supports that notion, Altshuler explained. Still, he cautioned, it will be important when looking at 1000 Genomes and other sequencing data to distinguish between low-frequency, ancestral polymorphisms and the rare, private mutations that may have larger effects.