NEW YORK – RNA sequencing is not as advanced as it should be, according to Vivian Cheung, a researcher at the University of Michigan who is leading a call to develop better methods to directly detect and characterize ribonucleic acid.
"We don't really know RNA sequences and we really do need to know them," she said.
As with DNA, RNA bases can have chemical modifications; however, the set of possible modifications is even larger — more than 140 have been discovered to date, compared to more than 40 for DNA — and the prevailing methods of RNA-seq, which are based on converting RNA to cDNA, are blind to nearly all of them.
The problem couldn't be more pressing: "For SARS-CoV-2, there are 30 to 40 bases [for which] we don't know the identity of the modifications," Cheung said. "If one of those modified bases that we don't know mutates, we will never know." Moreover, as RNA therapeutics, such as antisense oligos, advance they "will work better if we know the underlying RNA that we are targeting," she said.
In a paper published this month in Nature Genetics, Cheung and five colleagues from US universities laid out the current problems with RNA sequencing and proposed a large, international project to solve them. At least 50 other researchers cosigned the paper in support.
"I fully agree that this is very much needed," Eva Maria Novoa, a researcher at Spain's Centre for Genomic Regulation who has developed computational tools to identify modified RNA bases but was not involved with the paper, said in an email. "We are now starting to appreciate the large diversity of biological functions and dynamic behavior that these chemical moieties can have in RNA molecules. Therefore, such project would greatly contribute to our understanding of how cells work and how they are regulated."
RNA-seq methods that first convert RNA into cDNA have taken off in recent years, especially with the advent of single-cell transcriptomics. Special sample preparation methods can provide information on some of the most common RNA base modifications, such as 5-methylcytosine, N6-methyladenosine, and pseudouridine, but otherwise that information is lost with the most popular RNA sequencing methods.
The best existing methods to directly analyze RNA, mass spectrometry and nanopore-based sequencing, each have their limitations. Mass spec, specifically liquid chromatography tandem mass spec, is restricted by the length of the sequence it can analyze and is very expensive.
Nanopore sequencing is "limited by the lack of base-calling algorithms that directly call modified and unmodified bases from the current intensity," Novoa said, noting that all existing algorithms that detect RNA modifications, including her own, use so-called "post-base-calling" approaches.
These two technologies are the most promising ones available at the moment, but "whether they will be the 'right' tools to characterize the RNAome, we still don’t know," she said. Absent a direct base calling algorithm from Oxford Nanopore Technologies, "we might then need a third or fourth way."
Oxford Nanopore Technologies did not directly respond to questions about whether it is developing a base calling algorithm to detect modifications from changes in current intensity.
"We have long been advocates of making technology that can characterize base modifications from native molecules, and it's great to see the biological significance of these modifications being uncovered. The nanopore platform is the only one that can sequence RNA in its native form, and we're working on analysis tools to interrogate both RNA and DNA modifications," Clive Brown, Oxford Nanopore's chief technology officer, said in an email.
But the challenges facing Cheung and her colleagues are perhaps even more fundamental. "We know how to draw [the modified bases], but don’t know how to make them," she said. Before new detection technologies can be developed, the field needs chemical standards and tools for manipulating biomolecules. "There are lots of enzymes that cut DNA, but relatively few RNases," she said. "What seem to be basic things for DNA, we don't have the equivalent for RNA."
Brown agreed that the "lack of reliable ground truth datasets for base modifications" was a challenge and said Oxford Nanopore is working on a way to use synthetic oligos to train its algorithms to identify modifications.
Novoa also proposed using synthetic molecules. Doing so would not be cheap — Novoa said she looked into it a few years ago — "but if this is done in a collaborative manner within a big consortium, or in collaboration with organic chemistry laboratories, I believe it should be feasible to cover at least a significant proportion of the currently known universe of RNA modifications," she said. An international consortium is exactly what Cheung and her co-authors are asking for. "We call for an investment of funds and infrastructure to develop technologies to sequence full-length RNA and the informatics to detect and identify all modifications," they wrote. "Innovations to develop standards for each modification, instruments that sequence RNA directly, and computational methods that support those instruments and analyze the results are all needed."
"Admittedly, the resources, technology, and informatics necessary to sequence RNA directly will be on the scale of the Human Genome Project (HGP). By building on the Human Genome Project’s success, complete RNA sequences should advance understanding of gene regulation and lead to new frontiers in health and medicine," they wrote.
Cheung said she has reached out to Eric Lander, a leader in the HGP and now director of the White House Office of Science and Technology Policy. "He has the resources, and he certainly has the know-how of organizing," she said. However, he has not yet responded to their proposal.
Some efforts to study RNA modifications already exist in the US and Europe. In 2016, the National Cancer Institute began offering grants for pilot and feasibility studies to evaluate the role of RNA modifications in cancer biology. According to the National Institutes of Health's RePorter database, it has so far awarded a total of $3.9 million across 21 grants.
Also, in 2017, researchers in Europe banded together under the banner Epitran, for epitranscriptomics, to study RNA modifications, Novoa said. "We have not pursued the approach of 'big science,' perhaps because in Europe there are fewer opportunities to fund such efforts, but joint efforts, ongoing collaborations, and newer grants across European laboratories have been created and started, as a fruit of this Epitran network," she said.
Instrument companies, such as Oxford Nanopore, Pacific Biosciences, and Illumina, will also be key to advancing the field, Cheung said.
Cheung's ultimate goal is to reveal new biology obscured by RNA-seq's current blind spots. She's confident that new methods will do so. "In transfer RNA, the shapes are determined by the modifications," she said, and m6A modifications affect transcript stability and translation. "I assume they affect the shape of mRNA, and therefore interactions of RNA and protein. I think of it as the regulatory code we don't know of today, because we don’t know the spelling of the RNAs."
"I think part of the problem is that with DNA, there's only one copy," she said. "RNA is so dynamic, with many modifications coming on and off. The magnitude is just much larger."