NEW YORK (GenomeWeb News) – Researchers from Complete Genomics, Harvard Medical School and Washington University in St. Louis today reported on three of the human genomes that they have sequenced using Complete Genomics' nanoarray technology.
In a paper in the advance, online version of Science, the team described their efforts to sequence the whole-genomes of three individuals: a Caucasian individual and an African individual from the HapMap project and a Personal Genomes Project participant. In so doing, the team found around three million SNPs in each Caucasian genome and four million SNPs in the African genome. They also detected between about 270,000 and half a million small insertions and deletions in each genome.
As reported in GenomeWeb Daily News' sister publication In Sequence last fall, Complete Genomics sequenced its first human genome, that of a HapMap individual of European descent, in the summer of 2008. This September the company announced that it had sequenced 14 human genomes since its limited trial launch in March.
But the current paper marks the first publication on Complete Genomics' nanoarray human genome sequencing method — an approach that uses combinatorial probe anchor ligation or cPAL chemistry in conjunction with nanoarrays that cause the DNA to assemble into so-called nanoballs.
The patterned DNA arrays amplify DNA into a square grid, Clifford Reid, Complete Genomics' chairman, president, and CEO, told GWDN. That keeps the bits of DNA close together, he added, allowing high-throughput sequencing with low reagent use.
Reid also noted that the paper's lead author, Radoje Drmanac, chief scientific officer at Complete Genomics, developed a method for getting unchained base reads, meaning the identification of one base does not depend on knowing the identity of the base before it.
For the current study, the team used Complete Genomics' research grade instruments to sequence three human genomes: one Caucasian man of European descent from the HapMap project, a Yoruban woman from HapMap whose genome is also being sequenced as part of the 1,000 Genomes Project, and a Caucasian man from the Personal Genomes Project.
The team used a range of approaches, including Sanger sequencing and comparisons with known SNP genotypes, to verify the accuracy of their sequencing approach, Drmanac told GWDN. For instance, Drmanac said the error rate in the Caucasian HapMap genome is estimated at about one in every 100,000 bases.
For each genome, the team generated reads covering between 86 and 95 percent of the reference genome at 45 to 87-fold depth.
The researchers' analyses of the genomes revealed around three million SNPs in the Caucasian HapMap and Personal Genome Project genomes. Ten percent of these were new. Meanwhile, they found more than four million SNPs in the Yoruban genome, 19 percent of which had not been identified in past studies.
The African genome also contained nearly 500,000 SNPs, 42 percent of which were new, and the Caucasian HapMap genome contained almost 338,000 indels, 37 percent of which were novel.
The Personal Genomes Project genome previously reported as belonging to Complete Genomics' scientific advisory board member and Personal Genomes Project leader George Church, contained nearly 270,000 small insertions and deletions. Consistent with results from the Caucasian HapMap man, 37 percent of these indels were new.
Reid said Church is currently analyzing data from his own genome. "We believe he'll make that information public in the relatively near future," he said.
The average consumables cost to sequence the three genomes was $4,400. While the basic technology used to sequence all three genomes was the same, Drmanac said, the team made improvements to the library preparation method, eliminating the GC bias that they had in early sequencing efforts. They also made improvements to the assembly steps.
The company plans to sequence 10,000 complete human genomes at their Mountain View facility next year. The current price per genome for its customers is around $20,000, though that should come down as the number of genomes being sequenced goes up, Reid noted, landing somewhere around $5,000 within the next year.
"We are at the very early stages of scaling up our genome output," he said.
Earlier this week, Complete Genomics and the Institute for Systems Biology announced plans to collaborate on a project to sequence 100 Huntington's samples. The two groups worked together in the past on a family sequencing study to find genes involved in Miller syndrome and a lung condition.