By Julia Karow
Complete Genomics plans to sequence and contribute 500 human genomes to the 1000 Genomes Project, In Sequence has learned.
Under an agreement it signed with the project in October, the company will sequence 500 samples chosen by the project organizers at a coverage of 40x within a 12-month period. The 1000 Genomes Project will store the sequence data.
According to Jay Kaufman, Complete's vice president of product management, most of the samples analyzed by the 1000 Genomes Project to date have been sequenced at relatively shallow depth. "This opportunity to look deeply at samples will enable the community to collect unique data and insights into human variation," including identifying novel variants and confirming previously discovered variants, he said.
In addition, the company's participation "nicely complements" its internal work on file formats and file compression, he said, "which will also benefit our customers and the broader scientific community."
"The 1000 Genomes Project is delighted to have Complete Genomics join as a member, and we are looking forward to their contribution of complementary data that will add value to the public human genome variation reference resource we are building," said Richard Durbin, co-chair of the project, in a statement.
Regarding the samples to be sequenced, the project is considering parent-child trios as one possible option, according to Jan Korbel, co-chair of the project's structural variation group.
Information from trios would be "highly valuable to verify structural variants that we observe," he told In Sequence last month (see Q&A, this issue). Also, there would likely be overlap with samples already sequenced by Illumina technology, which would improve SNP, indel, and structural variation mapping.
This is not the first time that Complete Genomics is providing human genome sequence data for scientific projects for free. Earlier this year, for example, the company released data for 60 human genomes, sequenced at 55x coverage, to the research community.
Those samples were drawn from the National Institute of General Medical Science's Human Genetic Repository and the National Human Genome Research Institute's Sample Repository for Human Genetic Research and included a 17-member three-generation family from the CEPH population, a Yoruban trio, and a Puerto Rican trio, along with unrelated samples. Most of these had previously been analyzed as part of the International HapMap Project or the 1000 Genomes Project.
Also, the company said in October that it would sequence 1,000 genomes from healthy individuals over the age of 80 for the Wellderly Study, which is being conducted by the Scripps Science Translational Medicine Institute (Clinical Sequencing News 10/5/2011).
Have topics you'd like to see covered in In Sequence? Contact the editor at jkarow [at] genomeweb [.] com.