Skip to main content
Premium Trial:

Request an Annual Quote

Machine Learning Algorithm Predicts Efficiency of Prime Editing Insertions

NEW YORK – A team led by researchers at the Wellcome Sanger Institute has come up with a machine learning method for predicting genome editing efficiency when using a CRISPR-Cas9-based approach known as prime editing.

The approach relies on prime editors capable of making precise changes to single bases or small stretches of sequence using a nicking version of Cas9 that introduces DNA single-strand breaks rather than double-strand breaks.

To achieve prime editing precision, the investigators explained, the nicking Cas9 enzyme is fused to a reverse transcriptase domain as well as a prime editing guide RNA that contains the desired edit and corresponds to the target sequence of interest.

"Double-strand breaks are problematic because they can sometimes lead to large-scale, potentially catastrophic rearrangements in the genome. In addition, Cas9 sometimes makes off-target breaks at sites in the genome that look similar to the target site," Wellcome Sanger Institute researchers Leopold Parts, Jonas Koeppel, and Juliane Weller explained in an email. "Prime editing, in contrast, requires multiple, independent DNA binding events, which leads to less off-target effects."

This combination of precision and versatility, coupled with a reduced risk of unintended edits, has made prime editing a compelling system for attempting to correct disease-related mutations, alter protein functions, or change gene regulatory features, the team explained in a paper appearing in Nature Biotechnology on Thursday.

To develop their method, the investigators began by systematically profiling insertion rates for a series of prime editing experiments involving more than 3,600 DNA sequences with lengths ranging from a single base to 69 bases, targeted to four locations in the genome in three different human cell lines.

"The potential of prime editing to improve human health is vast, but first we need to understand the easiest, most efficient, and safest ways to make these edits," Parts, the study's senior author, said in a statement. "It's all about understanding the rules of the game, which the data and tool resulting from this study will help us to do."

Based on patterns in the genomes of the human cell lines, the team found that insertion sizes, nucleotide composition, secondary DNA structures, and broader DNA repair contexts all contributed to insertion rates in a given genome region.

"The variables involved in successful prime edits of the genome are many, but we’re beginning to discover what factors improve the chances of success," co-first author Koeppel said in a statement.

"Length of sequence is one of these factors, but it's not as simple, as the longer the sequence, the more difficult it is to insert," Koeppel explained. "We also found that one type of DNA repair prevented the insertion of short sequences, whereas another type of repair prevented the insertion of long sequences."

While insertion rates for longer sequences — those spanning at least 30 nucleotides — were dialed down in the presence of TREX1 or TREX2 3' flap nuclease enzymes, for example, short sequence insertions under five nucleotides had high overall insertion rates that were muted by the mismatch repair (MMR) mechanism.

On the other hand, medium-length sequences spanning 15 to 21 nucleotides tended to have insertion rates that eclipsed both long and short sequences, the researchers explained, suggesting that there may be a benefit to stretching out short sequences in MMR-proficient cells to improve the chances of successful insertion during prime editing.

Expanding on such findings, the team came up with a machine learning algorithm for predicting insertion frequency during prime editing experiments.

"[T]here are hundreds of ways to edit a gene to achieve the same outcome at the protein level," co-first author Weller said in a statement. "By feeding these potential gene edits into a machine learning algorithm, we have created a model to rank them on how likely they are to work."

"We hope this will remove much of the trial and error involved in prime editing and speed up progress considerably," Weller added.

The approach, known as "modeling insertion efficiency for prime insertion experiments" (MinsePIE), brings together features ranging from insert length to prime editing guide RNA folding energy in order to model prime editing efficiency — predictions aimed at optimizing prime editing efforts and boosting the efficiency of edits made with this approach.

"[M]insePIE can be used to select the best nucleotide sequence from many codon variants to modify a protein or to identify optimal barcodes for insertion," Parts, Koeppel, and Weller explained. "We also demonstrate that it is possible to increase the editing rates by optimally lengthening the insert sequence with additional nucleotides."

When the researchers applied the MinsePIE model to gene editing experiments designed to introduce half a dozen different protein tags, it was able to group the proposed protein tag sequence-prime editing guide RNA combination into groups with high or low anticipated prime editing performance — predictions that were backed up by their experimental data.

"Our improved understanding of insertion efficiency using the prime editing system naturally leads to recommendations for experimental design," the authors wrote, noting that "our model can help prioritize targets and pick high-efficiency replacement sequences" when attempting to repair pathogenic sequences.