NEW YORK – Researchers from the Chan-Zuckerberg Biohub, Stanford University, and the University of California, San Francisco have developed and trained a machine learning model that can accurately predict the types of repairs that are made in primary human T cells after editing with CRISPR-Cas9.
As they reported today in Nature Biotechnology, the researchers sequenced repair outcomes at 1,656 on-target genomic sites in primary human T cells, and then used the data to train the model, which they named CRISPR Repair Outcome (SPROUT).
"SPROUT accurately predicts the length, probability and sequence of nucleotide insertions and deletions, and will facilitate design of [Streptococcus pyogenes Cas9 (SpCas9)] guide RNAs in therapeutically important primary human cells," the authors wrote.
Primary T cells can be engineered ex vivo and adoptively transferred to patients in therapeutic genome editing. However, researchers currently lack detailed information about the genomic outcomes of Cas9-dependent editing in primary human cells. For this study, the team systematically characterized SpCas9 repair outcomes in primary T cells from 18 healthy blood donors, sequencing 1,656 unique genomic locations within 559 genes in primary CD4+ T cells.
The researchers quantified the distribution of repair outcomes at each target site from the generated amplicon library, and found that 31 percent of reads contained deletions centered around the cut site with an average deletion length of 13 base pairs. They also found that 20 percent of the reads had insertions at the cut site and that 95 percent of the insertions were one nucleotide in length. Only 0.008 percent of the reads contained both an insertion and a deletion.
"The repair outcomes from each target site were similar between donors, but very different across target sites," the investigators said. "Comparisons of repair outcomes between all sites showed that outcomes for replicate editing experiments from individual target sites were significantly more similar to each other than to outcomes from different sites. We hypothesized that the variation in repair outcomes across cut sites was largely due to sequence variation near the cut site."
To test their theory, the researchers developed SPROUT to predict SpCas9 repair outcomes. The model took the 20 nucleotides of the spacer sequence plus the protospacer adjacent motif (PAM) as input, and then predicted the fraction of indel mutant reads with an insertion or deletion and the average length of insertions and deletions at each target site.
On an independent set of 304 target sites in primary T cells, the researchers found that SPROUT was able to accurately predict the fraction of indel mutant reads with an insertion and the fraction of total reads with an insertion. SPROUT was also able to predict whether a target had a high, medium, or low fraction of frameshift repair outcomes with high accuracy.
SPROUT can also be used for in silico gRNA design, the team added. For each of the 532 genes with multiple gRNAs, the investigators used the predictions from SPROUT to rank the targets in a gene from the most likely to have frameshift repair outcomes to the least likely, and SPROUT correctly identified the best performing frameshift gRNA in 54 percent of the genes.
When they investigated whether SPROUT could correctly select which SpCas9 target site in a gene was the most likely to have an enrichment of insertions over deletions, the researchers found that SPROUT correctly chose the top short gRNA for 73 percent of the genes, and correctly predicted the complete ranking of all the candidate gRNAs by insertion proportion for 60 percent of the genes — a significant improvement from random guessing.
"These results demonstrate that SPROUT is a state-of-the-art method for predicting SpCas9 editing outcomes in both T cells and human iPSCs, two cell types in which concerted efforts are underway to harness CRISPR for engineered cellular therapies," the authors wrote. "The potential therapeutic applications of CRISPR in primary T cells and other human cells warrant further investigations into the mechanisms and prevalence of insertions and other rearrangements during genome editing."