NEW YORK – New long-read sequencing-based methods for analyzing RNA isoforms are proving a valuable complement to short-read sequencing.
Using sequencing platforms from Pacific Biosciences and Oxford Nanopore Technologies, researchers are able to more precisely analyze RNAs thousands of bases long and their effect on gene transcription, translation into proteins, and ultimately cellular function. Applications include single-cell studies, analysis of exotic RNA species, and even research on RNA viruses, including SARS-CoV-2.
In a study of RNA isoforms in single hematopoietic stem cells using PacBio's Iso-Seq protocol, led by Laura Mincarelli, a postdoc in Iain Macaulay's lab at the Earlham Institute in the UK, were able to detect more than one isoform in half the genes they studied. "The genes express different isoforms," she said. "And isoforms can have functional diversity and encode for proteins with different functions."
For projects where cell typing is important, such as the Human Cell Atlas, this information can provide a clearer picture of cellular heterogeneity. "We give you extra information," Mincarelli said. "To become different cells is to produce different RNAs."
While nanopore sequencing helped Yi Xing, a researcher at Children's Hospital of Pennsylvania (CHOP) Research Institute, create a catalog of 107,147 full-length circRNA isoforms across 12 human tissues and one cell line, he'll continue to use short reads as well, as will Mincarelli.
"Most likely we'll use a hybrid approach," Xing said. "There are lots of advantages of short reads as well. In the foreseeable future, a combination is probably a very good approach to certain transcriptome problems."
Short-read RNA sequencing, in bulk and in single cells, has been the workhorse technology for differential gene expression analysis. Its contributions to the current state of genomics would be impossible to summarize in just a few sentences. But the fact that it can only sequence several hundred bases at a time has always meant that analyzing full-length RNAs, which can be thousands of bases long, must rely on some amount of computational guesswork.
Beginning in 2014, PacBio began offering its Iso-Seq protocol on its RSII. Now, on the Sequel II, researchers can analyze full-length cDNA sequences up to 10kb in length. According to PacBio, characterizing isoforms can be done using one SMRT Cell 8M chip per sample, or about $1,300 in reagent costs excluding sample preparation.
In addition to isoform analysis, including connecting alternative polyadenylation motifs to isoforms in ag-bio applications, Iso-Seq can help sequence novel fusion transcripts and long non-coding RNAs.
A few years ago, several groups started working on protocols to do isoform sequencing for single-cells isolated in droplet-based preparations. The first of these, the straightforwardly named but more creatively acronymed single-cell isoform RNA-seq (ScISOr-seq), was published in 2018 in Nature Biotechnology.
Mincarelli said her method is similar, but is optimized for PacBio's Sequel II platform and applied to the hematopoietic system. Her method uses 10x Genomics Chromium to prepare single-cell cDNA libraries, but splits those to be analyzed on Illumina's instruments, for gene expression, and PacBio 's instruments, for isoform analysis. The Illumina barcodes appended to transcripts in the 10x preparation are also present in the PacBio data.
In a BioRxiv preprint published in April 2020 and currently under review for publication, Mincarelli analyzed gene and isoform expression of key regulators of hematopoiesis. Some of those important genes, such as transcription factors and cytokines, contained multiple isoforms, where short reads would have not had the resolution to reveal that. In addition to seeing the different isoforms, they were able to validate that different stem cells seemed to express different isoforms. "None of them express all the isoforms and none of them seem to express only one isoform," she said. "All express a combination, it's a complex heterogeneity going on."
Some are even functionally different. One of the exons in a gene called Mpl, which encodes the thrombopoietin receptor, is translated into a trans-membrane domain and doesn't show up in all isoforms, suggesting the final protein from that isoform will likely have a different function in the cell.
For now, using long reads to quantify isoform expression at single-cell level in thousands of cells would be very expensive. "Maybe one day, if long-read sequencing becomes higher throughput," it could be really used for that purpose, she said. Still, 90 percent of the sample goes onto the Sequel II, because it requires a high amount of input material (PacBio recommends starting with at least 160 ng of input cDNA for one SMRT cell).
Oxford Nanopore's technology, on the other hand, allows researchers to directly detect RNA bases, a feature launched in 2016 on the MinIon instrument, now also available on the GridIon and PromethIon. Avoiding conversion to cDNA removes PCR biases and preserves chemical modifications, although Oxford Nanopore also allows researchers to sequence cDNA from RNA.
For Xing, nanopore sequencing was key to developing isoCirc, his lab's method for circRNA isoform analysis, published in January in Nature Communications.
"Circular RNAs are the predominant transcript isoform in hundreds of human genes," he said, and they play various biological functions, including acting as molecular sponges for RNA-binding proteins and microRNAs that might otherwise bind to linear RNAs to various effect. "They're considered to be more stable [than linear RNAs]," he said.
The isoCirc protocol relies on rolling circle amplification to create concatemers several Kb long containing multiple copies of the target circRNA. The method uses size selection to grab reads averaging around 4 Kb; with the average circRNA around 0.4 Kb, that means reads often have more than 10 copies of each circRNA.
This actually helps solve the problem of higher error rates seen in nanopore sequencing, compared to PacBio's HiFi reads and Illumina's sequencing-by-synthesis, as the errors can be spotted during computation.
While the paper cataloged many isoforms, "I think we're still missing a lot," Xing said. Like with single-cell isoform data, circRNA isoforms "is an important step towards addressing function," he said. "It provides a way to fill some pieces of the puzzle."
Already, other researchers are inquiring about the isoform catalog and collaborators at CHOP are interested about applying this to disease models, especially in pediatric cancers.