Complete Genomics said this week that it plans to start offering a human genome sequencing service for companies and academic institutes next year, charging $5,000 per genome.
For the last two years, the company has been developing a short-read sequencing-by-probe-ligation technology behind closed doors. It sequenced its first human genome this summer but has not made the results public.
In the coming months, Complete Genomics said it plans to sequence five more genomes for its first customer and partner, the Institute for Systems Biology. Next year it intends to sequence 1,000 genomes and 20,000 in 2010.
Complete Genomics was founded in 2006 by scientist Rade Drmanac and entrepreneur Cliff Reid. The company has approximately 100 employees and operates from a 32,000-square-foot facility in Mountain View, Calif.
Prior to starting the company, Drmanac had been developing a human genome-sequencing technology, based on combinatorial probe ligation chemistry, at Callida Genomics, a former subsidiary of Hyseq, which he also co-founded (see In Sequence’s sister publication, GenomeWeb Daily News 5/30/2006).
To date, Complete Genomics has raised $46 million in venture capital from three funding rounds. Investors include Enterprise Partners Venture Capital, OVP Venture Partners, Prospect Venture Partners, Highland Capital Management, and Genentech.
The company plans to start selling its services in the second quarter of next year to pharmaceutical companies, biotechnology firms, personal genomics companies, and academic research organizations.
“We want to be the wholesaler of complete human genomes to everybody who can get scientific value from them,” Cliff Reid, Complete Genomics’ chairman, president, and CEO, told In Sequence last week.
Unlike its competitors in the second-generation sequencing space — 454 Life Sciences, Illumina, Applied Biosystems, Helicos Biosciences, and Danaher Motion — Complete Genomics has chosen a pure service model for its business, and analyzing entire human genomes will be its exclusive application.
These choices were driven both by the perceived customer demand as well as by the expected data volume, according to Reid.
“Pharma doesn’t want to buy instruments. They have no interest in building large-scale genome sequencing centers internally,” he said.
Also, he said, “the amount of data that is generated by sequencing thousands of complete human genomes is well beyond what any of our customers can deal with. It simply would not work as a model.”
To deal with the data, the company has built a data center with 400 terabytes of disk storage and 600 processors. Next year, it plans to scale up to 5 petabytes of disk storage and 10,000 processors, and by 2010, it wants to ramp up capacity another sixfold, to 30 petabytes of storage and 60,000 processors. The center will be equipped with “bank-level data security,” according to Reid.
The Service Offering
The $5,000 cost per genome includes $1,000 in materials costs and $4,000 for labor, equipment, and overhead. For that price, customers will receive 120 gigabases worth of reads, generated from two paired-end libraries, that are mapped to the NCBI reference genome. Since the company is able to sequence parental chromosomes independently with its technology, it will sequence each haploid genome at 20-fold coverage.
Complete Genomics will sequence completely more than 90 percent of each genome and more than 99 percent of its exome, according to Reid, adding that “we are hoping to do a whole lot better than 99 percent of the exome, but we don’t have the data yet.”
In July, the company sequenced its first human genome, a HapMap sample of European origin, reportedly for less than $4,000 in materials costs. However, it is not releasing or publishing the results because “we are already a lot better than that,” Reid said. Potential customers interviewed by In Sequence had also not yet seen data for an entire human genome generated by the company. Reid told In Sequence that 98.9 percent of the SNPs called by Complete Genomics agreed with the HapMap SNPs.
Rather, the firm plans to publish five human genomes from a single family that it is going to sequence for its first customer and partner, the Institute for Systems Biology, within the next few months, Reid said.
ISB president Lee Hood is a member of Complete Genomics’ scientific advisory board, and the project is part of a collaboration between ISB and the Center for Systems Biology Luxembourg that aims to sequence the genomes of at least 100 individuals for a disease-focused study (see In Sequence 6/10/2008).
“We are going to get those five genomes done and then really use that to define the quality of future genomes,” Reid said.
Following this proof-of-concept study, Complete Genomics plans to sequence 1,000 genomes next year and 20,000 in 2010, Reid said. By the end of 2010, it wants to do 200 genomes a day. ISB will have access to 10 percent of the firm’s sequencing capacity over the next two years.
“We want to be the wholesaler of complete human genomes to everybody who can get scientific value from them.”
The company does not currently plan to participate in the X Prize for Genomics because this would take away too much of its sequencing capacity for commercial pilot projects, Reid said.
“Twenty-thousand sounds big, but on the other hand, that’s nothing. That’s just a few clinical trial-sized studies,” he said. To be able to tackle even larger numbers, the company wants to build another 10 genome centers across the world over the next five years that use its technology, in partnership with governments, companies, or research institutions. Collectively, these 10 centers would produce a million genomes a year by 2014.
But the company first needs to demonstrate that its technology can live up to its promises. “We have to make our genome center work before we can start building a second one,” Reid said. By the end of the year, it wants to increase the number of sequencers to 16, and then to 192 in 2010. Before launching the service, the firm also needs to get the materials cost down to $1,000 per genome.
“We are most of the way there already, but we are refining a lot of the process in preparation for real scale-up,” he said.
The company will also need to raise a yet-undetermined amount of funding and has recently engaged Morgan Stanley as its financial advisor for its next financing round, which it intends to close in the first quarter of 2009. Reid did not disclose how much capital the company has on its balance sheet.
Finally, after spending the last two years on technology development, the firm now needs to focus on attracting customers. It is offering $100,000 pilot projects to prospective customers that will start in mid-2009, hoping they will translate into bigger contracts later on.
“2009 is our pilot project year,” Reid explained. “We will do tens of pilots with tens of early adopters across the commercial and academic sectors, and then based on the results of the pilots, we will queue up big genome sequencing projects for 2010.”
Initially, the company will offer two types of services: sequencing so-called “normal” human genomes, and human cancer genomes. Later in 2009, it will also provide transcriptome sequencing services. According to the company’s website, future applications include small-RNA-profiling and -discovery and DNA-methylation analysis.
Reid said he believes that the service will be especially attractive for cancer and mental illness research. “We have an investment from Genentech, and we have been working with them on cancer sequencing,” he said. “We think that is really the first major application of complete genome sequencing.”
It is unclear whether big pharma will develop an appetite for human genome sequencing, but others have indicated that pharmaceutical companies are not interested in buying large numbers of sequencing instruments. For example, Illumina has mentioned in presentations that pharmaceutical companies represent a minority of customers for its Genome Analyzer.
Also, Expression Analysis, a commercial genomic-services provider that bought the first Helicos Genetic Analysis system earlier this year, said one reason for its purchase decision was that pharma customers were interested in using its sequencing services.
“We had lengthy discussions with our pharmaceutical partners prior to making this decision [and] very few were interested in actually acquiring next-generation sequencers,” Expression Analysis CEO Steve McPhail said at the time. “Most were looking for a reliable outsourcing partner.”
Several potential academic customers said they were intrigued by Complete Genomics’ price but stressed that they needed to see more data to assess the quality of the product.
“I think that we would be interested in using their technology, assuming that it’s cheaper and better than what we do,” said Chad Nusbaum, co-director of the genome sequencing and analysis program at the Broad Institute. “We all realize that one tumor or one human genome doesn’t teach you anything, and that you need 10 or 100 to really learn something, and at $5,000 a crack, you could do 10 or 100.”
Also, the service model is appealing because sequencing is only “a path to an end,” he noted, “and if I can get my collaborators to that end without having to do the sequencing [myself], and it costs less money, then I’ve done my job.”
He added, though, that he and his colleagues “would want to be closely involved with how this works” rather than remain on the sidelines.
In the absence of data, some potential customers also find it difficult to sign up for a $100,000 pilot project. “That didn’t seem to make sense at this early stage,” said Elaine Mardis, co-director of the Genome Center at Washington University St. Louis. Rather, her center would like to start out with a single genome to test the technology.
George Church, a professor at Harvard Medical School and a scientific advisor to Complete Genomics, said he is considering the service for his Personal Genome Project, a large-scale study that aims to enroll 100,000 participants.
The project, which began last year, is currently using Illumina’s Genome Analyzer and the Polonator, a platform developed by Church’s own lab in collaboration with Danaher Motion (see In Sequence 2/5/2008), but Church said “we would be happy to use a three-way mixture. We will use whatever the least expensive and highest quality sequencing method is.”
“I can’t see any reason in principle why we would not outsource sequencing to a service provider if that helped get the science done in a time- and cost-efficient manner without compromising quality,” said Richard Durbin, co-chair of the steering committee for the 1,000 Genomes Project, an international effort to sequence the genomes of at least 1,000 HapMap individuals.
454 Life Sciences, Illumina, and Applied Biosystems are contributing data to the project, and “we would be open to others joining if the necessary conditions and commitments were met,” he said, noting that he had not talked with Complete Genomics yet.
Nano-Balls on Micro-Chips
The company’s technology uses a new method to amplify millions of DNA clones in solution and condense the products to nanoscale balls, to deposit these on patterned arrays, and to assay them using pools of fluorescently labeled ligation probes.
Much of it was conceived by Drmanac, Complete Genomics’ chief scientific officer, and his co-workers at Callida Genomics and Hyseq, and he exclusively licensed his technologies to Complete Genomics. “We did not license any technologies from other institutions,” he said last week.
“They have approached some of the challenges with some real innovation, for example in terms of how they generate and array their sequencing features,” said Jay Shendure, a professor at the University of Washington. In 2005, while in George Church’s lab at Harvard, he published a paper on a related technology, polony-based sequencing by ligation. “In terms of competitive advantage, it will probably be more in the realm of array density rather than read length,” he added.
The company can make three different types of libraries to cover different types of structural variation in the human genome: a 500-base-pair library and a 5- to 10-kilobase library for paired-end sequencing with two 35-base paired reads, and a 100-kilobase long-fragment read library to sequence parental chromosomes separately.
For the latter, the company starts with DNA fragments of 100 kilobases or more, and places about 10 percent of a genome, or 300 megabases of DNA, in different wells of standard 384-well plates.
“By doing that, we have created statistical separation of these 100 kb fragments between parental chromosomes,” Drmanac explained. “In each well, most of the DNA is from different chromosomes.”
Before amplifying genomic DNA, the company converts it into circular templates, each consisting of approximately 80 bases of genomic DNA that is broken up by four adaptor sequences.
“Most of our solution is really at that point, at the DNA engineering, where we insert specific synthetic adaptors in genomic DNA,” Drmanac said. “One of the core developments [is] to make these libraries efficiently. And the next stage is to use simplified protocols to implement them automatically on robotic liquid handlers.”
Using rolling-circle amplification, these circles are amplified in solution, resulting in concatemers that each consist of hundreds of copies of the template. These concatemers are then coaxed into forming so-called DNA nano-balls, more than 10 billion per milliliter reaction volume.
The company then washes these DNA bundles across gridded arrays, which are made using standard silicon-processing techniques, where the DNA adheres to “sticky spots” approximately 300 nanometers in size. An array the size of a standard slide holds about a billion spots, but company researchers are already working on increasing that density severalfold.
About 90 percent of the spots are each occupied by a single nano-ball, which due to its charge prevents others from binding to the same spot.
Using DNA nano-balls on an ordered array “allows us to minimize the imaging demand,” Drmanac said. “And that’s actually one of the biggest costs.” The nano-balls are denser than other types of amplified DNA, he explained, thus providing a stronger signal, “so you can get very short image exposure.”
The DNA sequence is read by so-called combinatorial probe-anchor ligation, which the company says uses the advantages of sequencing-by-hybridization but stays clear of its limitations. The approach uses separate pools of fluorescently labeled probes to read one of 10 positions adjacent to one of the adaptors that are built into the circular template. These probes hybridize to their target and are ligated to an anchor sequence, and the fluorescent signal is recorded. After each cycle, the anchor-probe complex is washed away and a new anchor is hybridized to read the next position.
The technology generates 35-base-pair reads. However, these reads are gapped, consisting of 10 bases, a gap, 20 bases, another gap, and another 5 bases. “We have software that maps these types or reads,” Drmanac said.
Because it has different types of libraries available, the company believes it can resolve any type of structural variation in the human genome. “On paper, we have a solution for every one of them,” Reid said. “By the time we do thousands of genomes, we will address all of the strange structures inside the human genomes” as well as be able to assemble missing pieces of the genome de novo.
Sample preparation takes about a week. Each run lasts approximately a week as well, and each instrument will be able to process between three and six slides. The company estimates that by the time it launches its service, its raw output will be 420 gigabases per run, of which at least half will be “mappable” data.
The turnaround time for a 100-genome project will be on the order of 90 days, according to Reid. “This is intended to do large-scale clinical discovery work as opposed to point-of-care diagnostic work,” he said.