NEW YORK – Researchers at the Institute for Basic Science, Seoul National University, and the Korea Centers for Disease Control & Prevention in South Korea have mapped the transcriptomic architecture of the SARS-CoV-2 virus that is responsible for the COVID-19 pandemic.
In a journal pre-proof published in Cell on Friday, the researchers said they used two complementary sequencing techniques to develop a high-resolution map of the SARS-CoV-2 transcriptome and epitranscriptome. They found that sequencing-by-synthesis (SBS) methods such as the Illumina and MGI platforms conferred high accuracy and coverage but were limited by short read length. And while nanopore-based direct RNA sequencing (DRS) was limited in sequencing accuracy, they added, it enabled long-read sequencing, which was particularly useful for the analysis of long nested CoV transcripts. Moreover, because DRS detects RNA instead of cDNA, RNA modification data could be obtained directly during sequencing. Therefore, they combined DRS and SBS.
SBS showed that the transcriptome is highly complex, owing to numerous discontinuous transcription events — in addition to the canonical genomic and nine sub-genomic RNAs, the researchers found that SARS-CoV-2 produces transcripts encoding unknown open reading frames with fusions, deletions, and frameshifts. DRS revealed at least 41 RNA modification sites on viral transcripts, with the most frequent motif being AAGAA, the researchers further reported. Modified RNAs had shorter poly(A) tails than unmodified RNAs, suggesting a link between the modification and the 3' tail.
"Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to our understanding of the life cycle and pathogenicity of SARS-CoV-2," the authors wrote.
To delineate the SARS-CoV-2 transcriptome, the researchers first performed DRS runs on an Oxford Nanopore Technologies MinIon sequencer using total RNA extracted from Vero cells infected with SARS-CoV-2. They obtained 879,679 reads from infected cells, and the majority of the reads mapped to SARS-CoV-2, indicating that viral transcripts dominate the transcriptome while the host gene expression is strongly suppressed.
They also performed DNA nanoball sequencing based on the sequencing-by-synthesis principle (DNBseq) and obtained more than 305 million reads with an average insert length of 220 nt. The results were overall consistent with the DRS data, and the depth of DNB sequencing allowed the researchers to confirm and examine RNA junctions on an unprecedented scale for a CoV genome.
Altogether, they found that SARS-CoV-2 expresses nine canonical sub-genomic RNAs (S, 3a, E, M, 6, 7a, 7b, 8, and N) together with the genomic RNA. In addition to these canonical sub-genomic RNAs, with their expected structure and length, the researchers also found many minor junction sites.
They observed three main types of fusion events: the RNAs in the first group had the leader combined with the body at unexpected 3' sites in the middle of open reading frames or untranslated regions; the second group showed a long-distance fusion between sequences that didn't have similarity to the leader; and the last group underwent local fusion, which led to smaller deletions, mainly in structural and accessory genes, including the S open reading frame.
"Functionality of sub-genomic RNAs are not clear, and some of them have been considered as parasites that compete for viral proteins, hence referred to as 'defective interfering RNAs' (DI-RNAs)," the authors wrote. "While the non-canonical transcripts may arise from erroneous replicase activity, it remains an open question if the fusion has an active role in viral life cycle and evolution."
They also examined the epitranscriptomic landscape of SARS-CoV-2 using the DRS data. In a comparison of the viral transcripts to unmodified controls, the motif they most frequently observed in 41 RNA modification sites was AAGAA. Long viral transcripts were more frequently modified than shorter RNAs, suggesting a modification mechanism that is specific for certain RNA species.
Since DRS allows for the simultaneous detection of multiple features on individual molecules, the researcher then cross-examined the poly(A) tail length and internal modification sites. They found that modified RNA molecules had shorter poly(A) tails than unmodified ones, suggesting a link between the internal modification and 3' end tail.
"Since poly(A) tail plays an important role in RNA turnover, it is tempting to speculate that the observed internal modification is involved in viral RNA stability control," the authors wrote. "It is also plausible that RNA modification is a mechanism to evade host immune response."