Julia Karow
Several early-access customers of Complete Genomics are talking about large-scale human genome sequencing projects with the company, In Sequence has learned, while others are eagerly awaiting their first data from pilot projects to arrive.
Last week, Complete Genomics said that more than a dozen customers have signed up for pilot projects, and that it has delivered 14 genomes to customers since March (see In Sequence 9/8/2009).
Among the customers are Pfizer, the Flanders Institute for Biotechnology, Duke University, Brigham & Women's Hospital, the HudsonAlpha Institute for Biotechnology, and the Ontario Institute for Cancer Research, the company disclosed last week, in addition to the previously announced Broad Institute and Institute for Systems Biology. In addition, the firm has sequenced a single genome for Harvard Medical School's Personal Genome Project.
But besides ISB and the PGP, all the genomes delivered so far have gone to undisclosed customers, and none of the named customers have received data from the company to date, Complete Genomics' vice president of marketing Jennifer Turcotte told In Sequence last week. The customers that it disclosed last week "are all in various stages of projects with us," she said, adding that "a couple will receive data within the next four to six weeks."
An ISB scientist is scheduled to present results from its pilot project with Complete Genomics at the Personal Genomes meeting at Cold Spring Harbor Laboratory this week, she said. The title of the talk is "Full genome sequences of a four-person family," according to the CSHL website.
Overall, the pilot customers represent a "pretty good mixture" of academic groups, not-for-profit institutes, and pharmaceutical companies, Complete Genomics Chairman, President, and CEO Cliff Reid told In Sequence. "We are seeing a pretty broad spectrum of the genomics research community," he said. "We take that in part as a validation that delivering data, not instruments, is something that is going to have a very broad-based appeal to the genomics community."
In the future, he said the company expects about two-thirds of its business to come from academic and not-for-profit groups, and about a third from commercial companies, "typically the big biopharmaceutical companies."
After examining data from the pilot projects, consisting of five to ten human genomes each, the company hopes customers will sign up for larger projects.
"It's really a learning experience for [the customers] to go through this data, work with the data, work with our bioinformatics team to fully understand [the data]," Reid said. "And after this, we sit down and discuss a much larger project."
Several early pilot projects are "now moving into conversations about large-scale projects next year," he said, adding that the scale of these projects is "typically in the hundreds of genomes."
For the pilots, Complete Genomics sequences samples at a price of $20,000 per genome, and it will start offering larger projects at a price of $5,000 per genome next year.
The company's internal reagent costs per genome are currently $4,000, according to Turcotte, not including equipment, materials, labor, and overhead. The firm does not disclose its total cost associated with sequencing a human genome.
For the commercial service next year, Complete Genomics scientists have been developing a new generation of sequencing instruments, using the company's proprietary hardware, software, and biochemistry, that will be able to read "well over" one terabase per run.
The company says it will achieve this by increasing the density of its DNA nanoarrays to 2.85 billion spots of DNA per array with a read length of 70 bases. Also, each instrument will be able to read out several nanoarrays in parallel.
Reid said that the company currently has eight prototypes of the upgraded instruments running in house but did not comment on their current throughput. It is currently building out its genome center, which is expected to be completed in November. "Then, we will start installing the new generation of instruments in November and December, and in January, we will start doing commercial projects out of the new genome center," he said.
Getting Data into Researchers' Hands
Harvard Medical School is one of the few named institutions that has received data from Complete Genomics, as part of the Personal Genome Project.
The company sequenced sample PGP1, the genome of George Church, principal investigator of the project and a member of Complete Genomics' scientific advisory board.
Church told In Sequence last week that the project also has exome sequence data for his sample available, generated on the Illumina Genome Analyzer after enriching the exome using Agilent technology, as well as sequence data of recombined V-D-J regions in immune cells, the so-called VDJome, generated by 454 sequencing.
He said that at 45-fold coverage, a little less than 90 percent of the genome has high-quality coverage of both alleles, though another 40-fold coverage by dilution haplotyping is almost complete and "should push this higher."
Besides comparing Complete Genomics' data with the Illumina exome data, he said he and his colleagues "prioritize likely false positives by deleteriousness using Trait-o-matic software," which correlates variations in the genome with phenotypes, "and then check these with targeted sequencing."
According to Church, Complete Genomics has committed to sequencing nine additional PGP genomes in the near future, "and it is likely that we will do many more together."
[ pagebreak ]
In addition, the PGP will continue other collaborations, he said, including one that will contribute two genomes sequenced on the Illumina platform, another that will contribute a genome sequenced on the Helicos BioSciences platform, and one with 454, which is sequencing VDJ-omes.
Other customers are still waiting for their first data to arrive, allowing them to evaluate Complete Genomics' service.
Mark Veugelers, integration manager at the Flanders Institute of Biotechnology, or VIB, a not-for-profit research institute in Belgium that includes groups from Ghent University, K.U. Leuven, the University of Antwerp, and the Vrije Universiteit Brussel, said that several researchers at VIB became interested in Complete Genomics a year ago, when the company first announced its business plans (see In Sequence 10/7/2008).
Early this year, the institute sent Complete Genomics five human samples from an undisclosed disease area for sequencing but has not yet received any data because of the company's six-month funding delay (see In Sequence 4/14/ 2009).
However, he and his colleagues have been in frequent contact with the company to discuss data analysis, data formats, and how the data will be delivered.
One initial challenge might be that right now, not much software has been developed to analyze Complete Genomics' types of reads. "So we have to see, if we get the data, how are we going to move further with it? How are we going to get the stuff that we are interested out of that?" Veugelers said.
He said he expects to receive both sequence reads and variants called by Complete Genomics. Analyzing the variants will likely not pose any issues, but "what is still kind of difficult is when you have the raw reads and you want to do more advanced analysis with that," he said. "If you want to do your own analysis on those reads, and the maps, then you need to develop your own pipelines."
Complete Genomics, he said, is also working on making its software and data accessibility "much more user-friendly."
Once the data arrives, the VIB researchers are planning to evaluate it both through bioinformatics -- checking how much of the genome is covered, how many gaps exist, and how many SNPs are in the dbSNP database -- and experimentally, probably by resequencing portions with interesting variations.
The pilot project with Complete Genomics is the institute's first foray into human whole-genome sequencing, according to Veugelers, because the cost has been prohibitive until now. "This seemed like an interesting opportunity to see what would come out of sequencing several genomes," he said.
Whether larger projects will follow "depends on what type of data we will get," he said.
Though the institute is "quite interested" in working further with Complete Genomics, "our scientists really want to see the primary data first."
Veugelers said that VIB has also talked to other providers of sequencing technology. "At this point, Complete Genomics clearly makes an interesting proposal cost-wise," he said. "But the other technologies, for example, Illumina and SOLiD, also have quite some advantages of their own. We would still be interested in checking whether we could use their platforms for some of the studies we have."
"It really depends on what we get out of Complete Genomics, how some of the other technologies are developing, and what the main interests of our scientists are," he said.
For example, scientists might want long reads for one type of application, lots of short reads for another, and to sequence large numbers of samples for a third project. "What I try to do is match what the scientists want to do with the different technologies that are out there," Veugelers said.
Researchers at Duke University are also waiting to receive data from Complete Genomics, expected to arrive next month. In the meantime, they have been investing heavily in Illumina's sequencing technology in house.
When they first decided to sign up for a pilot project with the company early this year, "we were just getting started with the sequencing in the lab," Anna Need, an assistant research professor in the Center for Human Genome Variation at the Duke Institute for Genome Sciences and Policy, told In Sequence. "What appealed to us about Complete Genomics was that as well as doing the sequencing, they would also assemble the sequence, and do the variant calling as well."
In the meantime, the researchers have upgraded their in-house sequencing facility to 11 Illumina Genome Analyzers. They have sequenced one of the five pilot samples submitted to Complete Genomics -- all from schizophrenia patients -- also in house and plan to compare the two datasets. "We want to compare specifically in one patient, but also in general, the coverage, the accuracy, the kind of output that we get from them, as compared to our in-house variant calling and alignment processes."
"Probably, there will be some variants that are called in one but not the other [dataset], and we would like to learn from the nature of those variants," she said.
Although the results of the comparison are not yet available, "the likelihood is that we will be doing most of our sequencing in house," Need said. "At the moment, it seems that we are actually generating sequence data very fast in the lab, and the variant calling tends to be going very well."
Sequencing a human genome at 30-fold coverage currently costs the Duke researchers less than $40,000 per genome on the Illumina GA, according to Kevin Shianna, director of the genomic analysis facility at the Duke IGSP, including labor, reagents, and instrument costs to produce and QC the sequence, but no overhead or storage costs. He projects this cost will fall to $10,000 early next year.
Complete Genomics expects to offer its service for $5,000 per genome next year, but "the difference between $5,000 and $10,000 could easily be made up for by speed," Need said. "I think in our lab, if we are getting the data very quickly, and it's standardized ... then that might be more valuable for us."
"But of course, if the [Complete Genomics] data looks completely different in October, or Complete Genomics could do it much cheaper or faster ... then we could certainly consider using them as a supplementation, if they were significantly better than what we can do here in house."