NEW YORK – An international team of researchers has published papers in Science on new methods to rearrange the genome to create previously unseen structural variants, including large deletions, translocations, and inversions.
In one proof-of-concept study, researchers led by George Church of Harvard Medical School and the Wyss Institute and Leopold Parts of the Wellcome Sanger Institute in the UK demonstrated the use of CRISPR prime editing to add recombination sites to repetitive LINE elements to create a structural variants readout in bulk long-read sequencing.
Meanwhile, researchers from Jay Shendure's lab at the University of Washington used the piggyBac transposon to insert the recombination sites into the genome as part of a DNA "cassette," which also included identifying barcodes and a promoter for a T7 phage RNA polymerase. This enabled readout by single-cell RNA-sequencing, as the T7-generated transcripts could be fished out from all the other RNAs. The authors likened their method, dubbed Genome-shuffle-seq, to Perturb-seq, a CRISPR-based method that knocks out genes in single cells and measures changes in transcriptome-wide expression.
Jonas Koeppel, a former graduate student in Parts' lab and one of the lead authors of the first study, said while the methods are technically different, they are implementations of the same vision "to understand structural changes in the genome."
"We see a lot of them," he said. "If you think of all the mutations that happen in, for example, the germ line, most nucleotides are affected by these structural changes. And then also in cancer, we know that there are all these structural changes and the genome deletions, inversions, [and] duplications, but we didn't really have a way to experimentally make more than one of those at a time."
Now, researchers will be able to generate a lot of structural variants and follow up on what happens to the cells that carry them.
The methods are comparable to some of those used to measure the impact of SNPs and indels in human biology, said Fritz Sedlazeck, a researcher at Baylor College of Medicine and an expert on structural variation in the genome. "This is one of the first studies that looks really promising for doing this for structural variants themselves, which is really fascinating," he said.
According to Koeppel, the method has roots in the Church lab's work on editing repetitive elements to inactivate retroviruses in pig cells in order to develop pig organs to transplant into humans. Researchers, including Sudarshan Pinglay, first author of the UW group's paper, have also developed similar methods in yeast to explore the effects of large-scale structural engineering.
Pinglay, who recently started his own lab at UW and the Seattle Hub for Synthetic Biology, noted that approaches that worked in yeast, such as synthesizing entire chromosomes to contain recombination sites, were simply not feasible in humans. Moreover, the process of picking clones and doing WGS to analyze them was not scalable.
Around 2019, at the beginning of Koeppel's doctorate in Parts' lab, he began discussing the use of prime editing in these sites to shuffle repetitive elements.
Specifically, this method uses prime editing within the long interspersed nuclear element-1 (LINE-1) retrotransposon to incorporate thousands of Cre-recombinase sites. Recombination then reengineered the genome, leading to deletions, inversions, extrachromosomal DNAs, fold backs, and translocations.
The UW group instead used a barcoding approach to tag recombination sites paired with a RNA polymerase promoter, which are read out through single-cell RNA-sequencing. This method was able to detect deletions, inversions, and translocations.
These technical differences lead to some respective advantages and disadvantages. Genome-shuffle-seq enables a single-cell readout while the prime editing method enables more and smaller variants to be generated.
"That's actually quite important, because you have these essential genes that are scattered throughout the genome," Koeppel said. "So if you want to learn something about a part of the genome, but your variant is so large that it also hits an essential gene, this variant would just die, and you have a hard time learning something." More and smaller variants offer "a bit more fine-grained control over which parts you delete."
But this method only generates single clones, "and that's not very scalable," he said, while Genome-shuffle-seq can reengineer and analyze millions of cells. However, the single-cell method does not provide information on cell viability.
That's important because one of the applications of these techniques is to help find out what parts of the genome are necessary for cells to be maintained. Both papers mentioned the use of these techniques to whittle down the genome to create a minimum viable cell.
"We have these massive genomes and only like 1.5 percent or something encode for proteins," Koeppel said. "A method to more or less randomly create lots of deletions everywhere in the genome and then see which ones have an effect, which ones seem to be tolerated … would tell us a lot about the organization or principles of our genomes."
Sedlazeck noted that his lab has published a preprint where they identified multiple repeat combinations that are happening on the single-cell level. Mutations seen in a very low fraction of cells show potential impacts in neurological diseases, he said.
Another application is in cell line engineering for pharmaceutical development.
"Probably the last thing a cell wants to do is make a ton of virus for like a vaccine or something, but somehow, we still use basically almost unmodified cells to create these viruses," Koeppel said. "You could imagine that, basically, if you have this flexibility of generating cell pools where every cell has … these really large genome rearrangements, and then you screen for which cells are maybe best in producing a virus or producing an antibody … maybe you find certain kinds of genotypes that are much better in producing these compared to the cell lines we currently use."
Koeppel noted that he explored commercializing this aspect of the method, having applied for intellectual property related to the work. "I think there's probably something there, but I'm not actively pursuing it at the moment," he said.
Instead, he is now a postdoc at UW where he splits time between Shendure's and Pinglay's labs, working to marry the methods and capture the advantages of each.
"An ideal method would have a single-cell-compatible or generally very high-throughput readout, where you don't need to whole-genome sequence your entire population," he said. Moreover, "both of our methods were actually sort of held back by not being able to control how many variants you generate. So in both other cases, essentially, 99 percent of the cells died after this process because there's just too much [mutation]. You want every cell to have maybe something between one and 10 variants, so that they can survive but are significantly different from their parental cell." Pinglay added that they're looking to increase the diversity of types of structural variants they can introduce and do it in more cell types to one day provide more information to researchers like Sedlazeck, who could be looking to bridge their findings.
"Something that just fascinates me a lot is just, 'Do we need all these repetitive elements in our genomes?'" Koeppel said. "Half of our genome is essentially an archive of dead viruses, and I think there just never was a selection pressure to get rid of them."
"Could you remove a significant chunk of them and still have a normally functioning genome? I think that's something that we can uniquely answer," he said.