NEW YORK (GenomeWeb) – Genomic screening firm TransOmic has partnered with researchers from Cold Spring Harbor Laboratory and Cedars-Sinai Bioinformatics and Functional Genomics Center to offer a promising but unproven artificial intelligence-driven CRISPR/Cas9 guide RNA design tool.
Last week, Huntsville, Alabama-based TransOmic launched its transEdit-dual CRISPR Arrayed Library for knockouts. It's a system developed by Cedars Sinai Medical Institute researcher Simon Knott that uses two single guide RNAs (sgRNAs) to target the same gene at different places, in an attempt to boost efficiency.
Through a paid partnership with Knott and his collaborator, CSHL researcher Greg Hannon, this technology is available exclusively through TransOmic and won't show up at the plasmid repository AddGene.
Double sgRNA targeting could help improve the efficiency of CRISPR-based knockout screens, but the product can also be ordered on a per-gene basis. It might also help empower deletion-based screens.
"In functional screen data analysis, we look for consistency between multiple independent reagents targeting the same gene, so it is certainly possible that double sgRNA constructs can make each individual reagent more potent by virtue of having more shots on goal," said Neville Sanjana, a researcher at the New York Genome Institute and an expert on CRISPR screening who was not involved in the partnership or its collaborators. "This kind of approach sounds like a nice way to further boost consistency between reagents targeting the same gene or genome element."
"Two sgRNAs improving gene knock-out makes sense," although success can be had with just one, he added. "Lentiviral delivery and constitutive expression of single sgRNAs often results in complete modification of alleles. You can think of this as 'letting time do the work' since just one cutting event might be sufficient to get the job done and produce an indel."
TransOmic's transEdit-dual CRISPR Arrayed Library has its limitations, since it's only available for protein-coding genes at the moment and not for regulatory elements. But Knott says it has performed as well or better than many of the leading sgRNA design and vector systems out there, including ones from leading academic and commercial CRISPR outfits like the Broad Institute and GE Dharmacon.
That's largely because of the CRoatan algorithm he and CSHL's Nicolas Erard put together, combining three different ways to find more potent guides.
Like TransOmic, Knott has previously worked with short hairpin RNA screening. While a postdoc with Hannon, he helped develop an algorithm called shERWOOD using so-called random forest machine learning to select potent shRNAs. When CRISPR took the world by storm, Knott saw an opportunity to take what he'd learned with shRNAs and apply them to CRISPR sgRNAs.
They're far from the first group to apply machine learning to sgRNA design. Last year, Broad Institute researcher John Doench published a study from his collaboration with Microsoft Research's Jennifer Listgarten and Nicolo Fusi.
But the random forest method is helpful because it does a lot of the work identifying what's important all on its own. "You feed it the data and all the variables, and it selects for you which variables and combination of variables are the most predictive," Knott said. "We can feed it the whole target sequence and it will find the most important bases," to make a potent sgRNA.
The process starts with a data set — in this case an sgRNA target sequence and a corresponding measurement of efficacy: how well it knocks out a gene.
The algorithm works by selecting subsets of data to train on, creating decision trees that help separate effective from ineffective guides.
"It will look into the most important variable in that subset, then split the data into two nodes and continue splitting the data into two nodes until it has developed a perfect decision tree," Knott said. "If you keep going back and selecting subsets of the data, you will eventually build a 'forest' of decision trees that together are generalizable to the entire data set."
Hence, the name "random forest." That's also why Knott decided to call this algorithm CRoatan, named for a national forest in North Carolina. It's a play on his previous random-forest driven algorithm for shRNAs, dubbed Sherwood. "The 'CR' is for CRISPR," Knott said.
To build CRoatan, Knott used real-world experimental data sets from both Harvard University's George Church and the Broad Institute. Church's data came from inserting random DNA sequences into cells and designing guides against them, then looking for mutational burden. The Broad tiled sgRNAs across cell surface proteins, using fluorescence-activated cell sorting to find those that had lost protein expression.
He also added two more "sub-algorithms" to consider findings from studies that had come out. The first was a 2015 study in Nature Biotechnology that showed editing functional regions of a protein helps ablate the target.
"If you're not hitting a functional region, you're relying upon a frameshift mutation, but in a functional region, even an in-frame indel gets you some impact on protein function," Knott said. Unfortunately, functional regions aren't known genome-wide, so he used conserved amino acid sequences as a surrogate, which worked well in validation.
The last piece of finishing CRoatan was incorporating information on how likely the cut was to induce an indel via non-homologous end joining (NHEJ). The fewer homologous sequences upstream and downstream, the likelier an indel.
Knott said he's validated that a higher CRoatan score signifies sgRNA potency, that two guides is better than one, and that the CRoatan dual-guide system works as well as other systems used in screening, such as the Church lab's sgRNA Scorer algorithm and vector, Dharmacon's Edit-R platform, and the Broad Institute's algorithm and vectors.
"When we take the level of depletion of essential gene-targeting constructs and apply a Wilcoxon Rank Sum statistical test to compare that level of depletion to each of the corresponding depletion levels of the other algorithms, we see a significant increase in depletion for our construct," he said.
While Knott hasn't yet published his work on the CRISPR system (he said he's submitted a manuscript to Molecular Cell), TransOmic has already had several labs across academia, pharma, and the National Institutes of Health take it for a test-drive and is ready to sell.
It has freedom to operate, thanks to a non-exclusive license for CRISPR/Cas9 for use in research products from the Broad Institute, and experience in genomic screening.
"This all mimics what we do with shRNAs," TransOmic CEO Blake Simmons said in an interview.
Researchers with a protein coding gene of interest can enter it into the website and order the transEdit-dual product from there, for $495 for unpackaged plasmid vectors, or in a lentiviral format, at $1,495 for 100 microliters with 10 million transforming units per milliliter.
The firm is also offering custom arrayed libraries for subsets of genes, "like a kinase set," Simmons said.
TransOmic is also looking to offer a service for combinatorial screening, where each sgRNA in a vector targets a different gene.
Matthew Pipkin, a T cell researcher at the Scripps Institute, has already ordered the transEdit-dual CRISPR product. He's been working with the firm's RNAi products for years and that was enough for him to trust the new product. "Computationally-guided designs for RNAi definitely has a benefit," he said. "Our experience is that the TransOmic tools are superior to other vendors we had tried."
He plans to use CRISPR screening as a follow-up to RNAi, or as an alternative in cases where RNAi isn't so effective. "It was worth it to make the investment and give it a try," he said.
As for Knott, he's also planning on making a version for mouse. "There are a lot of people interested in that," he said. But what he really is looking forward to is using CRoatan and CRISPR screening alongside shRNA screening to investigate drug resistance in breast cancer.
"While we to try to improve the tools, what we really want to do is apply them to understand some biological phenomenon," he said. "We're looking at how and why the breast cancer cells have failed to respond well to anti-angiogenic therapies."