NEW YORK (GenomeWeb) – Newly-minted Dovetail Genomics is preparing to launch a first set of commercial sequencing-assembly services based on a proprietary method of constructing mate-pair libraries and accompanying software designed to work with the unique style and format of its library data.
Richard Green, the company's founder and a member of its scientific advisory board, told GenomeWeb this week that the company has yet to set a firm launch date but that it expects to begin marketing the first services based on its technology sometime soon. "Our de novo genome assembly and genome improvement approach is mature and reliable for most genomes [but] of course there are other applications of this technology that we are working furiously to explore and develop," he said.
For now, the company is focused on testing and improving its technology as part of an ongoing beta. So far, several dozen customers, largely from academia, have signed on to try out the methodology and the company continues to field inquiries from other potential clients, Green said.
At the core of Dovetail's offering is a method of constructing mate-pair libraries that's based on in vitro chromatin assembly. Creating the libraries takes a few days and does not require any expensive equipment or reagents, nor does it require the use of living cell cultures or samples, but instead uses just a few micrograms of naked DNA. The resulting Chicago libraries — not named for the US city or a physical resemblance to it — contain inserts that span all distances up to the size of the input DNA. Following the sequencing of a given sample and de novo assembly using standard software packages, Dovetail uses its software to scaffold contigs from the preassembled data using its mate-pair libraries. The result, according to the company, are highly accurate assemblies with a 100-fold increase in scaffold contiguity.
Dovetail presented its technology publicly for the first time at the recently concluded Plant and Animal Genomes conference held in San Diego earlier this month. "The feedback from the PAG talk was phenomenal," Green told GenomeWeb. "I knew from being on the other side of this that there is a frustratingly low ceiling for making contiguous genome assemblies from high-throughput data. It seems there are many folks who share our experience and we're happy to be working with many of them now to quickly improve their genomes."
The company's underlying technology was developed in Green's laboratory at the University of California, Santa Cruz, where he is also an assistant professor of biomolecular engineering in the School of Engineering. Both applications grew out of his team's efforts to generate more quality assemblies from high-throughput sequencing data. Although sequencing technologies have made it possible to sequence genomes with high coverage, the resulting assemblies still generally have fairly low contiguity.
"We stumbled onto this rather simple idea of using the proximity information that one gets from Hi-C [data]," Green told GenomeWeb. Hi-C data, which is used to reveal the physical proximity of all segments across the genome within living cells, he explained, has two components. On the one hand, it is useful for seeing long-range contacts on chromosomes that indicate biologically interesting interactions like enhancer-promotor interactions. It also contains data that's useful for genome assembly. Dovetail's approach is to isolate the information relevant to assembly by doing in vitro chromatin reconstitution to get rid of the biological signal, he said.
It does so using a fundamentally different approach from standard methods of constructing mate-pair libraries which, generally speaking, work by ligating the ends of large DNA fragments of defined sizes into circular structures. It's a difficult process because "the kinetics of circularization are disfavored." Green said. It's also time consuming and expensive and "the data that you get out [of the process] have lots of problems," he added.
Dovetail's approach is to take long pieces of DNA — typically 150 kilobases — and then do in vitro chromatin reconstitution, basically condensing the DNA on histones using standard commercial kits, Green explained. The next step in the method is to introduce a fixative agent that locks in the conformation of this condensed DNA on histones. The frozen DNA is then chopped up into smaller pieces with a restriction enzyme resulting in several pieces of DNA and, by extension, several free ends obtained from a single fragment instead of just one large piece with only two ends as is the case with the circularization approach used in standard library construction methods, he said.
Next, through a series of steps, the free ends of the fragments are marked with biotin and randomly ligated to other fragments that they happen to be in proximity to. What that means is "we get lots of information about what was close by in this big piece of DNA that had been condensed," Green said. Also, because these data are not of a defined insert distance, "we are getting all of the distance information," he added. "So we see read pairs that are like one kilobase away, lots that are 10 kilobases away [or] 100 kilobases away, up to whatever size we had in that input DNA in the beginning."
The data generated by this approach is in a different style and format from what would typically be used by existing assembly software, so Green's lab developed separate software that would be able to make sense of the information and combine the contigs into scaffolds. Since the cut-up and stitched-back fragments don't have defined insert distances "we've refactored the scaffolding algorithm [to] expect data that is telling you about short-range, medium-range, and long-range connectivity in the genome," he said.
After sequencing, the first step is to use standard software packaages like DISCOVAR, Meraculous, and others to assemble the reads into contigs, typically around 50 kilobases for a vertebrate-sized genome. It's only after this step is completed that Dovetail's technology comes in, Green said. It uses its scaffolding software to map the contigs assembled with the standard software to its Chicago libraries. Since these contain information about adjacent reads of varying lengths in the genome, it provides a much clearer picture of which contigs should be adjacent to each other in a correct assembly. It's similar, Green said, to trying to put together a puzzle with several thousand pieces, and enables a "very fast, easy, inexpensive way to put these puzzle pieces together into long segments that in many cases represent entire chromosome arms."
Dovetail plans to offer its technology to the market via a production service, Green said, meaning that instead of selling a do-it-yourself kit to customers, "we take a sample … make the library, do the sequencing and the scaffolding, and return to them a genome." However there will be several offerings under this model, he said, including a service for customers with nothing other than biological sample and who need help with making all of the requisite libraries, handling sequencing arrangements — either at the company's partner core facilities or with a vendor of the client's choice — performing the assembly, scaffolding, and quality control.
The company will also offer a genome improvement service for customers who've already sequenced and assembled their samples and are unhappy with the results, he said. So far, much of the demand for the company's technology has come from academia, although Dovetail has also seen some interest from industry, with customers typically seeking to improve existing genomes, Green said.
The company is targeting the research community primarily, Green said, and its initial offering will focus on assembling genomes from animals and plants in that context, but it will be keeping an eye on the clinical market and possible applications of its technology there.
Dovetail is still fleshing out the details of its pricing structure, Green said. However, the cost will vary depending on the stage of the sample in question and the amount of work required. For example, the company estimates that basic pricing for sequencing and assembling vertebrate genomes from sample collection to finished product — assuming there are no special circumstances like high ploidy or exotic base composition — will cost somewhere in the neighborhood of $50,000 per genome, Green said, and the cost will drop depending on how far along in the sequencing and assembly process the sample or samples in question already are. The company will also offer bulk pricing options and could potentially have a separate pricing structure for industry customers as well, he added.
As expected, turnaround times also vary — from a few weeks to a month — depending on the stage of the client's project, among other factors. For example, "if they've done lots of shotgun sequencing and have de novo assembly and contigs already [that are] in good shape, then it takes less than two weeks to make and QC our libraries," Green said, but that time is also impacted by how quickly the libraries get sequenced by the selected sequencing partner. It takes a little bit of CPU time to do the scaffolding, because the company performs multiple iterations adjusting the parameters each time, but that's not a rate-limiting step, Green said.