SAN FRANCISCO (GenomeWeb) – Researchers from Illumina and the University of Washington have developed a bead-based haplotyping method dubbed contiguity-preserving transposition sequencing on beads (CPTv2-seq) that they said can be scaled up to process many samples in an automated fashion.
The researchers described the method recently in Nature Biotechnology. CPTv2-seq builds off a previous version of the method by incorporating bead-immobilized barcodes, enabling barcoded library preparations to be performed on many DNA molecules in one tube — a feature that makes it amenable for automation and for processing many samples at once without relying on microfluidics or any ancillary equipment, the researchers wrote in the study.
The method "produces megabase-scale haplotyping blocks with very low error rates," Frank Steemers, associate director of scientific research at Illumina and lead author of the study, said. It could have applications in a number of areas, including haplotype-resolved sequencing, assembly, phasing and characterization of structural variants — for instance in cancer genomes — and metagenomics assembly, he said. In particular, because the method can be scaled to process many samples, it could be especially useful for population sequencing projects, he added.
Steemers declined to disclose whether Illumina plans to commercialize the method or whether it has filed for patents on it or would license intellectual property from the University of Washington. The Illumina team developed the method in collaboration with Jay Shendure at UW.
Although Illumina already markets a long-read technology — the TruSeq Synthetic Long Read technology it acquired from Moleculo — Steemers noted that that technology generates synthetic reads only up to around 10 kilobases, while the CPTv2-seq method can generate linked reads up to 500 kilobases. In addition, he said the two methods would have different applications. While the long synthetic reads would be useful for genome assembly, the longer reads generated from CPTv2-seq would be more useful for phasing, "as phasing quality and accuracy is a function of read length."
In addition, current haplotyping methods require "complicated microfluidics when scaling up the number of compartments or genomes," Steemers said.
Peter Fraser, a researcher at the Babraham Institute in Cambridge, UK, who was not involved with the study, said that the method was "very clever" and "seems to be fantastic for haplotype phasing." However, he noted, that it appeared to have some problems with calling structural variants. Fraser recently demonstrated in a study published in Genome Biology that Hi-C sequencing could detect structural variants in tumors at lower sequencing depth than what was described for the CPTv2-seq method. One advantage of the Hi-C method, he noted, is that it does not require high sequencing depth and so can identify structural variants at a very low cost. In the Genome Biology study, the researchers estimated that their Hi-C method would cost around £376 per sample versus £1,314 per sample (about $484 to $1,690) for deep whole-genome sequencing.
In the original version of CPT-seq, which the Illumina team described in 2014, a DNA transposase enzyme, Tn5, is used to label long DNA molecules with adaptor-transposase complexes. When Tn5 introduces an adaptor sequence, it doesn't break up the DNA molecule, and it also remains associated with the molecule after transposition. After labeling long DNA fragments, the DNA is distributed into new pools, where they are fragmented further and labeled again. After sequencing, the labels can provide longer-range information that can be used to phase the genome.
The main difference in the new version of the method is the use of beads, which enables the process to be done in a single tube. "The key innovation is simplicity," Steemers said. "Indexed beads and enzymes generate indexed linked reads all in a single tube without the use of an instrument."
The researchers tested two versions of the method on the extensively characterized HapMap sample NA12878. In one version, which they called a hybrid approach, they first manually prepared 96 transposome beads, which they used to tag around 3 nanograms of starting long DNA molecules. The molecules were then diluted into a 384-well plate, where a second tagmentation reaction was performed — further fragmenting the DNA for sequencing and adding a second barcode.
But, to scale up the process, the team developed a second iteration of the method, which used combinatiorial indexing to enable 150,000 beads, each with a unique index combination in a single tube.
Overall, the team demonstrated that the method had a phasing accuracy of greater than 99 percent and that it could reconstruct phase blocks with an N50 of just over 1 megabase using the one-tube combinatorial indexing strategy and more than 2 megabases using the hybrid approach.
The team noted that the method would enable high-throughput plate-based automation of 96 to more than 1,500 samples in less than three hours with 30 minutes of hands-on time. In addition, they wrote that the method "shifts the challenge from making many physical partitions to simply building a high-complexity barcoded-bead pool," a process they wrote was akin to making a bead pool for Illumina's Infinium bead array products.
Steemers noted that Illumina is continuing to work on the method and said that there are a number of factors that can be optimized. For instance, optimizing the bead pool geometry and complexity could further improve phasing performance. Structural variant calling and sequence uniformity were also lower for the method, compared to the PCR-free TruSeq workflow, the researchers noted, with repetitive regions still being difficult to analyze. The researchers also cited improvements to the capture efficiency of DNA by the transposomes as a way to improve on the method.
Over the years, there has been increased interest in generating long-range genomic sequence data, which can help with phasing genomes, de novo assembly, and identifying structural variants. And, as costs of next-generation sequencing have come down, a number of solutions have hit the market, both commercial and homebrewed. For instance, 10x Genomics' system makes use of linked reads to phase genomes. And, other researchers have turned to Hi-C sequencing, including a commercial service from Dovetail Genomics. In addition, Pacific Biosciences and Oxford Nanopore Technologies' platforms generate longer sequencing reads.
Steemers noted that he has so far not done any head-to-head comparisons of the CPTv2-seq method with other technologies. He said that the researchers' goal in this study was to develop an "extremely simple phasing method" that did not require instrumentation — "just enzymes and beads in a single physical compartment" and that would be scalable, fast, and low cost. "Such methods are desirable for broad adoption in the context of population-scale human genome sequencing projects," he said.