NEW YORK – A recent study on a new class of transposon-encoded RNA-guided nucleases suggests not only that they could be used as biotechnology tools, but also that such mechanisms could permeate all domains of life.
The paper published earlier this month in Science by the Broad Institute's Feng Zhang, the National Institutes of Health's Eugene Koonin, and their colleagues showed that IscB proteins — which are the likely ancestors of the RNA-guided endonuclease Cas9 — are putative nucleases encoded in a distinct family of IS200/IS605 transposons. Using evolutionary analysis, RNA-seq, and biochemical experiments, they reconstructed the evolution of CRISPR-Cas9 systems from IS200/IS605 transposons and showed that IscB utilized a single non-coding RNA for RNA-guided cleavage of double-stranded DNA.
The researchers also experimented with the RNA-guided nuclease activity of TnpB, another IS200/605 transposon-encoded protein and the likely ancestor of Cas12 endonucleases. Overall, they said, this work revealed a widespread class of transposon-encoded RNA-guided nucleases, which they named OMEGA, for Obligate Mobile Element Guided Activity.
Importantly, the researchers concluded not only that these nucleases could be used for purposes such as genome editing, but that they could also reveal new insights on the evolution of both Cas9 and Cas12. Further, they said, the broad distribution of the OMEGA systems suggests that RNA-guided mechanisms are even more broadly distributed in prokaryotes than previously thought, and may even extend into eukaryotic genomes.
The ISC proteins, a group of bacterial and archaeal DNA transposons that encode Cas9 homologs, were first published in 2016 by Koonin and his colleagues at the NIH in a study in the Journal of Bacteriology.
Bacterial genomes encode numerous homologs of Cas9, with the homology region including the arginine-rich helix and the HNH nuclease domain (which cleaves the DNA strand complementary to the RNA guide) that is inserted into the RuvC-like nuclease domain (which initiates cleavage of the DNA strand not complementary to the guide RNA). But some of these genes aren't linked to cas genes or CRISPR, the researchers wrote in that paper. Instead, they showed that certain Cas9 homologs represented a distinct group of non-autonomous transposons, which they called ISC, for insertion sequences Cas9-like.
They identified many diverse families of full-length ISC transposons and demonstrated that their terminal sequences (particularly the 3' termini) were similar to those of IS605 superfamily transposons. Their findings implied that the ISC transposons evolved from IS605 family transposons and that Cas9 subsequently evolved via immobilization of an ISC transposon.
In their new paper, Zhang and his colleagues took that research one step further, finding that IscB (which is about 400 amino acids long) has an architecture similar to that of Cas9 — it contains an RuvC endonuclease domain split by the insertion of a bridge helix and an HNH endonuclease domain. When the researchers performed a comprehensive search for proteins containing an HNH or a split RuvC endonuclease domain, they found that Cas9 and IscB were the only proteins that contained both domains, suggesting that all extant Cas9s descended from a single ancestral IscB.
According to Broad researchers and the paper's co-first authors Han Altae-Tran and Soumya Kannan, one of the most exciting applications of the knowledge they've gleaned from this study is the potential for large-scale evolution of Cas9 and Cas12, learning how to engineer the scaffolds of the proteins in order to add different kinds of functional domains.
When they investigated the evolutionary relationships between IscB, Cas9, and other homologous proteins, they detected another group of shorter IscB homologs of about 350 amino acids long that were also encoded in IS200/605 superfamily transposons. They renamed these proteins IsrB, for insertion sequence RuvC-like OrfB. Finally, they identified a family of even smaller (about 180 amino acids) proteins that only contained the PLMP domain and HNH domain but no RuvC domain, which they named IshB, for insertion sequence HNH-like OrfB.
"One of the key things that we've seen is that IscB and IsrB are very, very small relative to Cas9, and so on one level, they provide information about how Cas9 has gained all these additional domains to make it what it is," Altae-Tran said. "But you can imagine the reverse as well. Not everything that Cas9 has is necessarily useful, and you can start subtracting from it. And people have started to think about doing this, but evolution has already started off somewhere. And so, you can try to trace back to understand really what the minimal domains are for this type of system to work."
Additionally, he said, IsrB is very similar to IscB, except it doesn't have the HNH domain, making it a nickase. That type of nickase could be very useful for certain applications, and understanding how a domain gets inserted into the nuclease could also help the researchers understand how to reverse that process.
"If we have a nuclease that has two domains, could we remove that?" Altae-Tran posited. "On the flip side, we could also understand how to add additional domains into the protein that could be useful. So instead of HNH, you could imagine adding other domains."
Interestingly, the researchers were also able to identify two distinct groups of Cas9s. The first was a new subtype called II-D — a group of relatively small Cas9s of about 700 amino acids long that are not associated with any other known cas genes. The second is a distinct clade branching from within the II-C subtype, which includes exceptionally large Cas9s (more than 1,700 amino acids) that are associated with TnpA.
According to Altae-Tran, the small Cas9 gene group is the most minimal full Cas9 gene group that's known to exist. He also noted that the larger genes are associated with the IS200-like transposases. And while their function isn't yet completely clear, the larger systems possess several genes that are DNA-interacting.
"Whether or not they all function together in various large systems is unknown," he added. "But it is a really interesting area for follow-on research."
Indeed, the researchers suggested several areas for follow-up investigation or new areas of inquiry in genome editing or engineering.
Kannan noted that the newly discovered Cas9 systems might have some of the same utility as the Cas9 nucleases that are currently used in genome editing applications, as well as Cas12, though more research would have to be done on their activity in mammalian cells. However, she added, "we also showed that TnpB has collateral activity, which is similar to some Cas12s and also Cas13s. So that might be applied for nucleic acid diagnostics and other types of detection technologies."
Further, while the small Cas9 system could prove more useful than the commonly used SpCas9 nuclease for therapeutic applications because of its small size, IscBs could also fill that niche, she said.
"We also show that it is possible to use them for human genome editing and they're actually even smaller than those Type II-D Cas9s, probably half [their] size," Kannan noted. "So, they would also be probably more amenable to fitting into delivery vectors."
According to Zhang, the small size of the IscB also changes the structure of its R-loop. In CRISPR systems, the CRISPR RNAs form a hybrid with a matching protospacer on an invading DNA, which leads to the displacement of the noncomplementary strand. The resulting R-loop creates a signal for DNA degradation.
According to Zhang, the IscB's R-loop structure is different from Cas9's and it exposes more DNA, which may be beneficial for applications that require engineering of the R-loop, such as prime editing.
There's also a lot of potential for multiplexing the various IS proteins with different Cas nucleases to create new systems that could have various, though as yet unknown, applications, Altae-Tran said. The new scaffolds could provide different types of cuts or nicks, and multiplexing them together could also provide different scaffolds and multiple R-loops in the same system.
"In the new age of structural biology and the ability to predict more types of systems, understanding that these scaffolds even exist and knowing what they are can help us provide insights into [using] de novo modeling approaches potentially in the short-term future, a few years from now, to just create new RNA-guided systems altogether."
For Kannan, the most exciting insight from the study was the idea that RNA-guided mechanisms exist beyond CRISPR, and perhaps even beyond prokaryotes and into every domain of life.
"These IS200/IS605 systems are extremely widespread. We were able to see that the RNA-guided mechanism not only exists beyond these canonical examples that people have found already, but also in systems that are, themselves, very abundant," she said. "With CRISPR, the thing that makes it so useful for biotechnology is the ability to just reprogram the guide. To discover that type of mechanism in systems that perform different types of enzymatic reactions or interact with DNA and RNA in different ways, we believe, expands that space a lot."