NEW YORK (GenomeWeb) – The promise of CRISPR genome editing as a way to treat or cure various diseases has always received the biggest headlines. But as researchers and companies learn more about what the technology is capable of, its utility as an advanced research tool has become more apparent.
CRISPR's applications for the efficient creation of animal models, for genome-wide screening, and in drug discovery have been well-documented for the past few years. The method saves time and money, and often produces more accurate results than older approaches. More recently, a startup firm called Genetics Research patented a CRISPR-based method for sample preparation called negative enrichment.
There is also a great deal of interest in using CRISPR in targeted sequencing. A team from Pacific Biosciences, the Icahn School of Medicine at Mount Sinai, Uppsala University, and Harvard Medical School published a study on the BioRxiv preprint server in October 2017 describing the development of a novel, amplification-free enrichment technique that used CRISPR-Cas9 for specific targeting of multiple genomic loci. The researchers then combined that approach with long reads generated with PacBio's single-molecule, real-time sequencing technology, noting that their method enabled enrichment and sequencing of complex genomic regions that couldn't be investigated with other technologies.
Researchers from the Parkinson's Institute and Clinical Center in Sunnyvale, California published a study in NPJ Parkinson's Disease in September 2017, in which they described using CRISPR-Cas9-based capture enrichment combined with sequencing on PacBio's platform to identify pathogenic repeat expansions, including slight differences in the expansions that can cause different phenotypes in Parkinson's disease.
Now, Ann Arbor, Michigan-based Arbor Biosciences is using its expertise in hybridization capture-based targeted sequencing to optimize CRISPR-based techniques for targeted sequencing on NGS and third-generation sequencing platforms.
"Right now, there are two options if you want to resolve a long region of a genome. You can throw short-read sequencing at it, but if it's highly complex or repeat-rich or has a weird structure, that can be really difficult to resolve with short-read sequencing," Arbor Senior Scientist Jacob Enk said. "The other option is to just whole-genome sequence it on one of these long-read sequencers like PacBio or Oxford Nanopore. But if your region of interest is just a fraction of the genome, you're spending a lot of money to sequence that long region."
"So, since hybridization capture — which is the typical targeted sequencing approach for high-throughput sequencing — doesn't work on templates much longer than 7 kilobase pairs, if you want to sequence a 30-kilobase-pair region economically, the technology that's currently available and that works very well is CRISPR-based targeted sequencing," Enk added.
The approach generally works by pairing a Cas enzyme and guide RNA to cut a specific region of the genome, and then sequestering that region using size selection or another method such as a pulse field gel electrophoresis system.
One major advantage to this approach is that it doesn't require the use of Cas9. In fact, according to Enk, Arbor prefers not to use Cas9 because it's "just not as specific as we'd like." The company prefers Cas12a instead. "That's a really attractive enzyme to us right now both for targeted excision as well as for depletion, which is the other really powerful feature of this technology," he said. If a researcher working with an Illumina or PacBio sequencing library is looking to remove certain molecules — repetitive elements that might not be of interest, for instance — Cas12a could be paired with a guide RNA in a targeted capture system to deplete those molecules from the library simply by cutting them to render them non-sequenceable.
"That's the new side of the targeted sequencing capabilities of the CRISPR system that's not available with hybrid capture right now — the ability to deplete molecules and potentially a very complex set of targets," Enk said. "This has been used for depleting mitochondrial DNA molecules from ATAC-Seq libraries. People are fielding it … where you can deplete uninteresting molecules from RNA-seq libraries. Ultimately, a lot of people want to deplete host DNA from clinical samples and just study the microbiological profile in those samples. And if you have a complex enough guide RNA library, like something that Arbor is really good at producing, then there's the potential for targeted depletion of extremely complex targets."
There is one drawback to Cas-driven targeted sequencing, however. Unlike hybridization capture, which can tolerate a lot of mismatch and infer genomic regions in closely related organisms using a reference genome, with Cas researchers may be restricted to studying or targeting regions for which they have rather well-resolved reference information, Enk noted. But even this problem may have a solution — the previously maligned Cas9. Because Cas9 is a little less specific than its counterparts, it could be exploited to do more broad phylogenetic targeted sequencing.
Even off-target effects — the bane of any CRISPR researcher's existence — aren't really an issue with CRISPR-based targeted capture and sequencing, Enk explained. "With hybridization capture, the go-to targeted sequencing method, you have the same issue as you potentially have with Cas-driven targeted sequencing in that occasionally you're going to have cutting occur not exactly in your target," he said. "On the one hand, the fact that you're doing targeted sequencing as opposed to gene editing makes that less critical. So, sure, you might have a few extra cuts but you're only detracting from your percent-on-target by a couple of percentage points at most in those situations."
Enk also noted that Arbor's expertise in designing and synthesizing complex pools of guide RNAs, as well as its design platform, can give its customers some sense of where such off-target cuts might occur, and how much that might affect the specificity of their CRISPR-based targeted sequencing experiments.
The company has a versatile array synthesis platform and offers services for design of custom guide RNAs, whether for depletion or targeted enrichment. "Arbor is really well-positioned because we can make thousands and thousands of unique guide RNAs very inexpensively that are really perfect for targeted sequencing applications," Enk said. "We're one of the few [companies] on the market that can do this and we're certainly the best priced on the market."
Among its many products, the company sells target capture kits under its myBaits line, providing customers with pools of in-solution biotinylated RNA probes plus reagents for efficient, scalable targeted sequencing on any NGS platform. Under the myCRISPR line, Arbor manufactures custom, error-free DNA templates for in vivo or in vitro transcription of sgRNAs. And now the firm is introducing its myNGS Guides product line of guide RNA libraries to support CRISPR-based targeted sequencing.
Matthew Moscou, group leader at the Sainsbury Laboratory in the UK, is beta-testing the technology. CRISPR-based targeted sequencing, he noted, is quite powerful for working with large genomes.
"Despite the reduction in cost of sequencing technologies, sequencing genomes is still not cheap," Moscou said. "We work on barley, which has a 5-gigabase genome. That still costs $2,000 dollars to sequence a genome using Illumina. And the other problem is that you're still using short-read technology, which is fine for non-repetitive regions of the genome. But if you want to study a repetitive region — and when I say repetitive, not just transposable elements but also any gene family that might have tandem duplications on a locus — we want to use long read-based sequencing technologies, whether it be Pacbio or Oxford Nanopore to actually sequence the region and determine the structure of it. This approach is now pretty much the best way to do it."
Moscou's lab is specifically focusing on an immune receptor region in the barley genome where the sequence does not match between two different accessions of barley but has flanking regions that are conserved between accessions. "Historically, we would have had to build Pac[Bio] libraries, which would cost $20,000 for barley, and even then a library is limited by the type of enzyme that you have to use to digest," he said. "With this [CRISPR-based] technology, you can specifically target any region the genome, you make specific cuts, and then you're isolating the DNA fragments for that specific region, and then sequencing on [Oxford] Nanopore. This is unheard of for us, because one of the challenges we have with our region is we have what looks like a 40-kb duplication that has occurred at least three times. So, it really represents one of these really difficult regions of the genome."
Moscou also noted that the utility of this method isn't limited to researchers who study barley. Anyone who studies immunity — in humans as well as other organisms — is likely working with a region of the genome that is under very strong selection. These regions are often full of structural differences — SNPs, transposable element insertions or inversions, or other major genomic changes that an Illumina sequencing platform wouldn't be able to assess, Moscou said, adding, "And that's where PacBio and Oxford Nanopore … allow us the resolution that we never had before."
Add CRISPR to that and you add cost-savings and speed to the increased resolution. "The Pac[Bio] library approach, with probing and everything, and then sequencing would take months. I mean, we're talking about anywhere between — even if we're focused — four to six months," Moscou said. But [with CRISPR], you make the high-molecular weight DNA which might take two days, then you have your Cas9 guides and complex target, and then you generate your library, and then probably within a matter of about a week you could get your initial results. That's a huge, dramatic change. This is actually the first opportunity to resolve these extremely complex loci, barring sequencing the whole genome with Oxford Nanopore, which is also hugely cost prohibitive."
Overall, he believes this approach will be useful for any researcher working with large genomes or genomes that contain a great deal of variation — that would make it especially useful for plant scientists. "There are other examples, too, where they're trying to finish genomes and they might need to specifically target a few regions of the genome in order to finish it. This is clearly a solution for that," Moscou added. "Basically, there's still regions of the genome that are not finished in some species. Despite when they say that they're sequenced, there are still places that are unfinished, so this technology would start to allow you to really [finish them], as long as it's still within the length that you can manage with nanopore sequencing."
And in human research, this system could be applied to learning about genomic structural variation or a particularly complex locus, particularly if that variation could be contributing to a particular phenotype or disease. "If you know a disease is associated with some kind of genomic instability of a region, this would help to try to resolve that," he said.