
NEW YORK (GenomeWeb) – Researchers at the Wellcome Sanger Institute and the University of Cambridge have developed a machine learning tool to predict the exact mutations that can result in a cell from CRISPR-Cas9 gene editing, based on the sequence of DNA being edited and the guide RNA (gRNA) being used.
As they reported today in Nature Biotechnology, the researchers systematically studied edits generated by 41,630 gRNAs in synthetic constructs, in a range of genetic backgrounds and using various CRISPR-Cas9 reagents. In total, they gathered data for more than 109 mutational outcomes and found that single-base insertions, short deletions, or longer microhomology-mediated deletions made up the majority of the resulting mutations.
"Each gRNA has an individual cell-line-dependent bias toward particular outcomes," the authors wrote. "We uncover sequence determinants of the mutations produced and use these to derive a predictor of Cas9 editing outcomes. Improved understanding of sequence repair will allow better design of gene editing experiments."
The researchers began by designing an assay to measure a large number of repair outcomes at once. They generated several libraries of gRNA-target pairs with the total of more than 40,000 constructs, delivered them into cells, and then sequenced the cells at high coverage to measure the frequency of insertions and deletions that had occurred. They observed that the assay faithfully and reproducibly captured most endogenous mutational outcomes.
The researchers then went on to survey a collection of 6,568 gRNAs that target human genes and found single-nucleotide insertions and deletions to be most common, with larger insertions occurring only rarely. They also saw that shorter deletions occurred more often than longer ones, but that a long tail of larger deletion events was present.
"Despite shorter deletions being more frequent, most of the Cas9-generated mutations (58 percent) resulted in a deletion of at least three base pairs. About half of these (31 percent of the total) occurred between repeating sequences of at least 2 [nucleotides] ('microhomology')," the authors wrote, adding that deletions of one or two base pairs made up 18 percent of the observed mutations, and that insertions of a single base made up 13 percent. Larger insertions were rare at 3 percent.
Overall, half of the measured gRNAs had a single outcome that contributed at least 20 percent of the observations, and 11 percent of them had an outcome that contributed at least 40 percent of the observed mutations, according to the researchers.
"Together with evidence of profile reproducibility above, this paints a picture of a complex yet not completely random repair process for Cas9-generated breaks," the authors wrote. "Repair outcomes depend on local sequence properties."
These observations and those from follow-on experiments suggested to the researchers that mutations generated by Cas9 ought to be predictable from sequence alone. To test this hypothesis, they developed a computational predictor of the mutational outcomes of a given gRNA, which they called FORECasT (favored outcomes of repair events at Cas9 targets).
They began by generating candidate mutations for each gRNA and deriving features for them based on local sequence characteristics. They then split the set of available gRNAs into training, validation and test sets, and trained a multi-class logistic regression model, and found that it achieved "good accuracy" not only on the K562 cells that it was trained on, but on other cell lines as well.
The team has made the predictor available as a web tool at and as a command line tool on GitHub.
"The Cas9-generated alleles show strong sequence-dependent biases that are reproducible and predictable for dominant categories of mutation … despite some variability between genetic backgrounds and species," the authors concluded.
They also noted that genetic diseases such as Huntington's disease or fragile X syndrome — which are due to expansions of short tandem repeats — are potential candidates for microhomology-mediated repair with Cas9 editing, especially as a future therapy might only involve a contraction of these expansions without a need to replace the malfunctioning allele. "Indeed, a few preliminary efforts in this direction have already given promising results, but given the possible unintentional genomic damage, utmost rigor is required to demonstrate safety before any applications in humans," the researchers added. "The data and model presented here will help in guiding gRNA design towards the desired outcomes for genome-wide screens and custom edits."