SAN FRANCISCO (GenomeWeb) – A number of recently developed sample prep techniques are enabling researchers to take advantage of long reads in targeted sequencing experiments.
At last month's Advances in Genome Biology and Technology meeting in Orlando, Florida, Adam Ameur, a bioinformatician from Uppsala University, presented posters on two different techniques he is testing, using both Pacific Biosciences' single-molecule sequencing technology and Oxford Nanopore Technology's nanopore sequencer. Denmark-based startup Samplix is now looking to commercialize one of those techniques, which is droplet-based.
In addition, PacBio is working to optimize CRISPR/Cas9 in combination with its Sequel instrument and described the method last year in a publication on the BioRxiv preprint server. It has also been working with select customers, including researchers at the Parkinson's Institute and Clinical Center in California, who are using it to analyze repeat expansions that could cause Parkinson's disease or ataxias.
Long reads have the advantage of being able to capture structural variants and to sequence through repetitive regions and other genes that are difficult to sequence with short reads, but most commercially available capture and enrichment technologies have been developed for short-read sequencing.
"We want to isolate fragments of DNA that are longer than what's possible to do using long-range PCR or other methods," Ameur said. He is testing both Samplix's technology as well as CRISPR/Cas9 for targeted long-read sequencing.
Samplix is developing a droplet-based technology for targeted sequencing. The method takes input DNA that is sheared into long molecules, uses PCR to target a small region of interest, and then encapsulates the DNA fragments into droplets. The droplets are then sorted, with those containing the target molecule identified through fluorescence. Following that, a sequencing library can be prepared.
Lars Kongsbak, CEO of Samplix, said that the firm has developed a system called Xdrop that generates the droplets and plans to commercially launch it by the end of the year. Currently, the firm is working with a number of early-access users, including Ameur.
The instrument includes microfluidic cartridges and will be able to run eight samples in parallel in about one hour, Kongsbak said. Currently, the technology isolates DNA fragments between 20 kilobases and 40 kilobases in length, but the goal is to enable 100-kb molecules.
In 2014, the firm described an enrichment technology it called PINS, but that was based on separating a sample into wells rather than droplets, and Samplix is now solely focused on developing Xdrop.
Ameur said that so far, he has been able to generate DNA fragments 10 kilobases in size, and is now aiming for 40-kilobase fragments. He has tested the platform in conjunction with sequencing on PacBio and also plans to test it with Oxford Nanopore's MinIon.
One application he is interested in using the technology for is to identify viral integration sites, for instance for the human papilloma virus. This would be particularly amenable to this technology, he said, because primers can be designed around a small piece of known sequence, the HPV virus, but then long read sequencing can be used to "easily see where this piece of DNA is integrating into the genome," Similarly, he also wants to use it for applications such as identifying gene fusions and translocations, where one gene partner is known and long read sequencing could help resolve the complexity.
Ameur has also been using CRISPR/Cas9 for target enrichment. The benefit of this is that it does not require any amplification. Genomic regions of interested are cut using guide RNAs and can then be sequenced directly.
"We've been using it to look at repeat expansions, repeating over several thousand base pairs," he said. "Those regions can be quite difficult to get products from using PCR reactions."
Currently, he said, the lab is using the method in research, but the ultimate goal is to develop it for diagnostics, which still requires optimization. In addition, the lab has been running it on the RS II, but wants to switch over to the Sequel to enable higher throughput and lower sequencing costs.
Also, he said, because the method does not require amplification, it can detect base modifications and on which alleles methylation occurs.
In some research studies where Ameur's team has used the method to evaluate repeat expansions in patient samples, it has found cases where different molecules from the same patient had different repeat counts. "That would indicate mosaicism," he said. In the future, his lab is interested in looking at these cases in more depth to see how mosaicism impacts phenotype.
PacBio CSO Jonas Korlach said that the company had a successful "alpha program" working with customers on the CRISPR/Cas9 targeted sequencing method and plans to launch it as a product sometime this year. He said the firm had initially developed the protocol on the RS II and is now in the process of migrating it to the Sequel.
Another group, from Stanford University, has also been testing CRISPR/Cas9 target enrichment, but in combination with Sage Science's HLS platform, a library prep system that enables high molecular weight DNA extraction of fragments up to 2 megabases in size from cells or nuclei. The Stanford group coupled this with linked-read sequencing, using 10x Genomics' Chromium instrument and Illumina platforms, to get long-range phased sequence information.
GiWon Shin, a postdoctoral researcher in Hanlee Ji's Stanford laboratory, described in a presentation at AGBT how the lab has used the method to phase larger targets, such as the BRCA1 gene and the MHC locus, 200-kb and 4-megabase-sized molecules, respectively.
For BRCA1, the method enabled a local assembly with two phased haploid copies that had a contig N50 of more than 142 kilobases and a scaffold N50 of more than 190 kilobases and was highly concordant with Genome in a Bottle reference material.
To sequence the 4-megabase MHC locus, Shin said the group first fragmented the molecule into 200-kilobase fragments. After sequencing, they were able to generate an assembly that was 3.86 megabases long, with a scaffold N50 of 882 kilobases. In addition, Shin said, they were able to genotype and phase 30 HLA genes and assemble 38 candidate structural variants.
Ameur said that while all these technologies are still in development, they could become important alternatives to existing capture and enrichment technologies that are geared primarily for short-read sequencing. For instance, he anticipated that such methods could enable "screening lots of clinical samples with repeat expansions, resolving complex regions, looking at base modifications, and sequencing full-length RNAs," among other potential applications.