Genome editing researchers based at Stanford and Emory Universities have developed a method for tracking the outcome of editing experiments using single-molecule, real-time, or SMRT, sequencing with Pacific Biosciences' RS instrument.
As they reported recently in Cell Reports, the investigators turned to the long reads generated by SMRT sequencing to span sites targeted for genome editing, using circular consensus reads to boost the accuracy of the sequence data.
By amplifying and sequencing DNA from a site targeted for editing, along with sequences on either side of each site, they showed that it's possible to pick up a range of potential genome editing outcomes in a population of molecules without introducing additional sequence changes to encode a reporter protein, for example.
"With this tool, you can analyze what's going on in a population of molecules," co-first author Ayal Hendel, a post-doctoral researcher in Matthew Porteus' pediatric hematology and oncology lab at Stanford University, told In Sequence.
"The most important application right now is using this technique to optimize current and future genome editing technologies to improve the efficiency and accuracy," he added, noting that genome editing is of interest for both research and medical applications, including gene therapy.
Because most cell types are tricky to expand from a single clone, Hendel noted that it is especially beneficial in the clinical realm to be able to accurately identify populations of cells with a genome edit of interest.
For their part, he and his colleagues are particularly interested in continuing to improve genome editing itself in the hopes of using it to come up with modified hematopoietic cell treatments for blood disorders such as sickle cell anemia, beta-thalassemia, and an X chromosome-linked form of severe combined immunodeficiency, SCID-X1.
Several genome editing methods have been developed to date, but each shares the same general principle: snipping open a particular site in the genome using nuclease enzymes as a means of removing or introducing specific bits of sequence.
"The way it works, basically, is that we engineer or introduce a double-strand break at the specific site that we want to modify," Hendel said. "Then we take advantage of the DNA repair machinery of the cell to introduce the precise modification."
In particular, existing DNA repair methods such as non-homologous end joining or homology-directed repair make it possible to delete stretches of sequence or introduce new insertions or larger sequence cassettes.
For instance, non-homologous end joining can be used to remove or knock out a given gene. On the other hand, introducing new sequences that overlap with those found on either side of the newly introduced break is often used to add in an insertion, gene, or genetic elements via homology directed repair.
While nuclease-based methods for editing genomes are finding favor amongst researchers interested in everything from plant improvement to gene therapy, it can still be tricky to accurately assess all the potential outcomes of genome editing without producing additional cell lines or introducing extra changes at the targeted site.
Some of the assays involve gel-based steps, while others rely on reporters such as green fluorescent protein that need to be encoded near the site targeted for editing, Hendel noted, explaining that "once you introduce these reporters, you change the original sequence and then you measure it."
Moreover, many of the approaches that gauge editing events that end with non-homologous end joining do not always pick up those based on homologous repair mechanisms and vice versa, he added.
"There are methods that can measure either non-homologous end joining or homologous recombination," Hendel said. "But with our method we can now measure both outcomes simultaneously across the entire genome."
His team's paper is not the first to apply sequencing to solve this problem. Researchers have also taken a crack at using other next-generation sequencing technologies such as Illumina or Roche 454 to sequence sites targeted for genome editing.
While those methods can offer a look at some DNA inserts or modifications, authors of the new study argued that they produce reads that are too short to see larger sequence additions and/or inserts with long overlapping sites designed to match existing genome sequences that neighbor the targeted site.
They noted that Illumina and 454 sequencing "have recently been used to measure [homology directed repair] and [non-homologous end joining] outcomes when single-stranded oligodeoxynucleotides or plasmids with short homology arms are used as donor templates."
"But," they continued, "the read-length limitations of these platforms do not allow analysis of longer arms of homology that drive more efficient [homology directed repair] and provide the flexibility to target long gene cassettes."
"If you look at the population of cells, in some cells you have precise modifications and in other cells you have small insertions or large insertions or small and large deletions," Hendel added. "Using the long reads, you can see all the possible outcomes and start to identify them."
In their Cell Reports study, he and his colleagues employed two PacBio RS instruments housed at Stanford to consider the feasibility of the SMRT sequencing-based approach for evaluating modifications made at several sites in the genomes of human cell lines and primary cell samples.
After amplifying the region of interest — typically around 600 to 1,000 base pairs on either side of the modified site, as well sequences introduced by the editing process, if any — the team produced barcoded SMRTbell libraries before doing circular consensus sequencing of the amplicons on the PacBio RS.
Though the amplification step used to target sequence around the modification site may introduce some errors, Hendel noted that he and his colleagues got results that coincided well with those obtained with standard methods.
Still, he said it would be helpful to avoid amplification altogether — a possibility that some researchers are reportedly pursuing with the single-molecule PacBio sequencing instrument.
The group applied circular consensus sequencing to cover each position several times — an approach that's been used to boost the accuracy of PacBio sequencing for a range of SMRT sequencing applications.
A significant fraction of the resulting sequences in each experiment did match the sought after genome editing modifications. Nevertheless, Hendel noted, the results also revealed some unanticipated editing events, making it possible to understand potential editing outcomes across the population of edited cells and the performance of the genome editing process.
For instance, the researchers tracked down inserted sequence from chromosome 12 when they looked at outcomes of experiments designed to modify an IL2RG region implicated in SCID-X1 with CRISPR/Cas9-based genome editing. Other insertions and deletions were detected by SMRT sequencing as well, including sequences captured at alternative editing targets that came from plasmids or other chromosome.
So far, the team has successfully used the SMRT sequencing-based approach to assess genome editing experiments done with the most widely used genome editing nuclease system: the CRISPR/Cas9 RNA-guided endonuclease system, transcription-activator-like effector nucleases (TALENs), and zinc finger nucleases (ZFNs).
Nevertheless, the long read sequencing method for measuring editing outcomes is expected to be agnostic to the approach used to introduce the changes, Hendel said.
"It's not really important which … engineering strategy you're using," he said. "Once you've introduced the editing and you have the genomic DNA, then DNA is DNA and you can use our approach to study the genome editing outcomes."
The same approach is also expected to prove useful for gauging genome editing outcomes in a wide range of other plant and animal cell types.
At the time the study was done, SMRT reads came in at around 3,000 bases apiece, on average, stretching out to 15,000 bases on occasion. Those reads have since increased further as PacBio continues pressing for longer reads and higher throughput.
While SMRT sequencing remains more expensive than Illumina sequencing, making it difficult for large-scale efforts to assess genome editing outcomes, Hendel noted that the approach is a fast and cost-effective way to evaluate the outcomes of individual genome editing experiments.
"If you want to optimize a given nuclease for something like 20,000 genes, it will be very, very expensive," he said. "But if you want to develop nuclease [editing] for a specific disorder, this is very cost effective."
In addition to assessing the genome editing consequences at the site target, Hendel explained, it's also important to try to find potential off-site targets of such modification.
To that end, he noted that collaborators at Emory University and the Georgia Institute of Technology, including co-first author of the Cell Reports study Eli Fine, have been developing off-target editing assessment methods that use SMRT sequencing and bioinformatics tools for finding sites prone to off-target modifications.
Efforts are also ongoing to continue developing and refining genome editing protocols to not only allow for more precise modifications and less frequent capture of foreign DNA at the editing site, but also to dial down the risk of off-target modification alterations.
The study's authors pointed out that a similar sequencing method may prove useful for exploring ways in which epigenetic features influence genome editing efficiency and the double-strand break repair method the cell uses.
On the sequencing side of the genome editing equation, Hendel said he expects SMRT sequencing to become applicable to larger scale studies as the throughput, accuracy, and read length of PacBio RS sequencing continue to improve.
Hendel and his colleagues did not collaborate with researchers at PacBio while developing their method for gauging genome editing outcomes, though they have since discussed it with representatives from the company.