An earlier version of this article was published Feb. 5.
Complete Genomics said last week that it has signed up the Broad Institute of Harvard and MIT to test its human genome sequencing services in a pilot project.
The company, which plans to sequence 1 million human genomes over the next five years, also presented for the first time data from a human genome that it sequenced in house as a proof of concept, and is making these data publicly available.
Starting in June, Complete Genomics plans to offer a $5,000 commercial human genome sequencing service to genome centers, research institutes, and direct-to-consumer companies, following a series of pilot projects in the coming months.
Last week during a talk at the Advances in Genome Biology and Technology conference in Marco Island, Fla., Complete Genomics Chairman, President, and CEO Cliff Reid presented the company’s business model as well as results from a HapMap sample the firm finished sequencing last month, its sixth human genome to date.
Genome No. 6
Using its proprietary short-read sequencing-by-probe-ligation technology (see In Sequence 10/7/2008), Complete Genomics sequenced a Caucasian HapMap sample to 91-fold coverage, generating 630 gigabases worth of short-read paired-end sequence data in nine instrument runs, which took eight days each, translating to an average yield per run of 70 gigabases.
Applying internally developed alignment software, company researchers mapped approximately 250 gigabases of the data to the National Center for Biotechnology Information’s human reference genome, covering about 92 percent of the genome.
The remaining 8 percent to which the short reads could not be mapped include long repeats and duplications; the researchers plan to tackle these regions with the company’s Long Fragment Read technology in the future.
The reason why only 40 percent of the reads mapped to the genome is that the technology is still “an immature process,” according to Reid. This was compounded by a power outage that occurred when the data for the project was being collected, which forced the company to shut down its instruments temporarily.
At launch, Complete Genomics expects that about 60 percent of its data can be mapped, and plans to improve this percentage to 80 percent by the end of the year.
As part of their human genome study, the researchers called approximately 3.3 million SNPs — about 400,000 of which are novel — as well as almost 400,000 small insertions and deletions. Comparing those SNPs with a set of high-quality SNPs previously determined by microarray technology, they found the accuracy of their assembly to be better than 99.99 percent.
Complete Genomics is making the data publicly available through its website and has submitted the sequence reads and base-quality scores to the NCBI. It hopes eventually to publish the results in a peer-reviewed journal, Reid told In Sequence.
[ pagebreak ]
Gearing up for Launch
By the end of this year, Complete Genomics plans to sequence 1,000 human genomes as a service for $5,000 per genome, and 20,000 next year.
Having just doubled its office and lab space by leasing another 32,000 square feet, the company, based in Mountain View, Calif., is currently building a “pilot” genome center that is scheduled to be completed this summer.
The firm said it hopes the center will serve as a “blueprint” for other genome centers it plans to open elsewhere in the future. Indeed, over the next five years, the company said it wants to build approximately 10 genome centers across the world in conjunction with partners, and to sequence a million human genomes.
Complete Genomics has also increased its headcount by about 20 percent since last October, to 120 staffers, and expects a “big increase” in headcount when the genome center opens.
By the time its service launches in June, the company plans to increase the output per sequence run from 70 gigabases on its current R&D instruments to 200 gigabases on production machines, and to 600 gigabases by the end of the year.
Reid declined to reveal how many sequencers the company currently has on site but said that it has capacity to sequence approximately 10 genomes per month, which will increase to 100 genomes per month once the genome center opens.
The firm’s data center currently has about 1,000 processors and a petabyte of disk space. By the end of the year, it will host 5,000 processors and 5 petabytes of storage, which will increase next year to comprise 60,000 cores and 30 petabytes of storage.
The company’s target is to pay $1,000 in internal materials cost to sequence a single genome, which it is hoping to reach by launch time. A complete price list detailing the service offerings will be available in June, according to Reid.
Complete Genomics, which does not disclose details of its funding, is currently “working on” a new financing round, he said.
The company plans to provide human genome sequencing services to large genome centers, research centers, and direct-to-consumer companies, Reid said.
Under the agreement with the Broad Institute, signed last week, Complete Genomics will sequence five human genomes. It is also currently sequencing five human genomes for the Institute for Systems Biology under a partnership established last year.
According to Chad Nusbaum, co-director of the Broad Institute’s genome-sequence and -analysis program, the institute believed it was worth testing the service.
“It’s pretty exciting, and the potential is pretty great,” he told In Sequence during the AGBT conference last week. However, “I want to know first-hand what it’s going to do and how it’s going to work, and I want to look at the data.”
Complete Genomics stresses that it has no intent to compete with large genome centers but rather wants to collaborate with them. Nusbaum admits that the idea of having someone else provide the sequencing technology takes getting used to. “We are used to honing the technology,” he said. “But we are not in it to sequence, we are in it to do the science. So I’m happy to collaborate with them if that’s effective.”
However, he said he doesn’t want to be left out of technology questions completely. “We want to be involved with the technology because we know about technology and [how to] make things better,” he said.
The Broad has not yet decided which five human genomes to sequence, but they will be samples that have previously been genetically analyzed and are fully consented for sequencing, such as the HapMap samples. “We want to start with something we understand,” Nusbaum explained.
[ pagebreak ]
‘Ambitious’ Price Points
Complete’s business model is unusual in that it focuses exclusively on providing a single product: sequencing human genomes. Researchers will not even be able to seek sequencing services for closely related species, such as mice, Reid pointed out.
“The only way we can operate at these prices and with these volumes is by building a streamlined and standardized factory that takes exactly one input and generates exactly one output,” he said. “And that’s our niche in the marketplace.”
Several experts at last week’s AGBT conference told In Sequence that it will be challenging for Complete Genomics to become profitable at the prices it promises.
“I think their price points are ambitious” given the fully-loaded cost of sequencing, said George Grills, director of operations of core facilities at Cornell University’s life sciences core laboratories center.
The majority of the company’s cost will likely be in its computational infrastructure, he said. “Obviously, they are trying to go for bulk volume to reduce unit cost to an affordable level. The question is, will they be able to get those bulk volumes?”
On the other hand, Complete Genomics will have lower expenses for customer support. “The reason things are so cheap is that they are doing things totally in-house,” said the Broad’s Nusbaum. “They don’t have to worry about supporting things in the field and having an army of field service engineers, and shipping reagents, etcetera.”
Some researchers appreciate the fact that Complete Genomics is making its human-genome data publicly available for scrutiny. “I’m impressed by the fact that they are trying to put their data into peoples’ hands,” said Grills. “I think that’s a very good way to go about what they are trying to do.”
But several scientists asked at the AGBT meeting expressed concern about sending out precious clinical samples to a service, and about the privacy of the sequencing data.
When asked by a member of the audience about privacy concerns and compliance with the US Department of Health and Human Services’ Health Insurance Portability and Accountability Act privacy rule, which “protects the privacy of individually identifiable health information,” Reid said that “today, we have done nothing.”
“We are a pre-commercial company just coming out of the startup phase and launching into the commercial phase,” he explained. “We are relying heavily on our partners to guide us through that process.”
In terms of HIPAA compliance, he added, Complete Genomics envisions receiving anonymous, bar-coded samples from its customers.
However, some potential users don’t believe privacy issues will be that easy to resolve. “The concerns of privacy are a major impediment to using their type of model for certain types of studies,” Grills said. “I think commercial entities will not be comfortable with their data in that type of environment, as well as any clinical-based studies.”
Grills said he would think about using Complete’s service if other technologies cannot provide the same low cost. “I definitely would consider it for certain types of projects that are just not otherwise doable,” he said.
But competitors are also working on decreasing the cost of sequencing, he cautioned. “What [Complete Genomics has] is extremely promising at this particular snapshot in time,” Grills added. “The question is, things are moving rapidly, and when they come out of the gate, who else is going to be on the racetrack?”