This article was originally published Feb. 7.
A California team developing various molecular indexing-based RNA quantification tools has applied its barcoding technology in the RNA sequencing realm, coming up with applications that show promise for quantifying absolute levels of targeted transcripts or assessing the quality of RNA sequencing libraries.
The schemes, detailed in a paper published last week in Proceedings of the National Academy of Sciences, center on a stochastic labeling approach similar to that described by Affymetrix researchers a few years ago, first author Glenn Fu told In Sequence. "The concept was basically to use DNA barcodes or tags to label individual DNA molecules or RNA molecules so you can actually track it in various processes — for example, sequencing."
Fu, who was formerly the director of genotyping research at Affymetrix, went on to co-found a California startup called Cellular Research, where he is currently a senior scientist.
The first approach involves adding barcodes to complementary DNA molecules as they are generated from targeted messenger RNAs at the early stages of RNA sequencing library preparation, making it possible to match each copy of the cDNA back to the original transcript after amplification. That, in turn, lets researchers quantify absolute, rather than relative, levels of transcripts.
"RNA-seq is a fantastic approach for sampling molecules," Fu said. "But one of the main disadvantages is that the information that you get is a relative value."
"What happens with stochastic labeling — this molecular indexing approach — is that it allows you to get an absolute measurement, which is very, very different from sampling and trying to say whether gene A is more frequent than gene B," he said. "It tells you how many copies you have of gene A and how many copies of gene B."
That application can combat errors introduced through amplification biases and so on, but it does not speak to the overall quality of a given RNA sequencing library or the efficiency of the protocol used to prepare it.
For that, the researchers developed a set of nearly 1,000 barcoded synthetic RNAs, which they have used to assess RNA molecule movement through the library preparation process.
In the PNAS study, Fu and co-authors from Cellular Research and Stanford University's Genome Technology Center introduced these barcoding-based targeted capture and synthetic RNA spike-in methods, also highlighting the poor RNA sequencing library efficiency they detected when applying these techniques.
"The quantitative targeted sequencing revealed extremely low efficiency in standard library preparations, which were further confirmed by using synthetic barcoded RNA molecules," they wrote. "This finding shows that standard library preparation methods result in the loss of rare transcripts and highlights the need for monitoring library efficiency and developing more efficient sample preparation methods."
Fu added that the team's main purpose "was not, initially, to understand what the efficiency of the [RNA sequencing] library construction approach is. But as a result of doing this work, we came to quickly realize that library construction was very inefficient."
Whereas early RNA sequencing protocols involved large amounts of messenger RNA, people have started using increasingly smaller amounts of starting material. Over time, the amount of starting material used has diminished — enabled, in some cases, by specialized low-input RNA sequencing protocols.
Nevertheless, Fu noted that there have been ongoing challenges with quantifying the definitive transcript levels in a given sample by RNA sequencing, in part because most gene expression measurements obtained by sequencing are relative, but also because of the broad dynamic range in transcript representation.
In particular, information on the abundance of relatively rare transcripts often gets skewed due to amplification biases, loss of input material during library prep, and the like — making them easy to miss even in deeply sequenced libraries.
"One misconception that's quite common these days is that many researchers think about how deep they sample their input source in terms of how much sequencing they've done," Fu said.
"Though sequencing really does tell you how much you've sampled, it only tells you how much of the library you've sampled," he explained. "If the library has limited complexity and it doesn't represent the input material well, then you're not going to be able to find the small differences present in the original input material."
To address such issues, he and his colleagues turned to molecular indexing as a means of marking individual transcripts for quantification, as well as following the fate of mRNA inputs more generally.
"One of the utilities of having the barcodes is that you can actually distinguish molecules that were generated as part of the replication process, such as PCR, versus identical copies that were originally present in the sample," Fu explained.
For their targeted capture experiments, for example, the researchers did Illumina MiSeq sequencing on libraries prepared from RNA that had been isolated from human lymphocyte cells and spiked with specific quantities of RNA standards.
By using molecular indexing adaptors in place of typical Illumina adaptors, the study's authors could group reads with shared barcode and transcript sequences. The number of different barcodes present for a given gene, in turn, provided information about the number of transcripts for each gene of interest in the original library.
"With the added information gained from molecular indexing, fragments of identical sequence become distinct, and re-sampling of clonal duplicates can be identified," they wrote.
For their proof-of-principle experiments, the researchers demonstrated that they could quantify transcript levels representing seven RNA standards that had been spiked into the original sample.
The same approach is expected to prove useful for quantifying the expression of hundreds or even thousands of genes, Fu said. "It really has no disadvantage, because the data you get from the sequencing results can still be analyzed using conventional methods."
The molecular indexing approach doesn't significantly change the library construction process, since the barcodes are added to sequencing adaptors. Because the sequence of the starting mRNA itself is also available, the same barcode set can also be used to detect transcripts from multiple genes simultaneously.
Consequently, not all that many barcodes are needed for a given experiment. At the moment, the group's approach includes nearly 10,000 different barcode sequences, meaning it can be used to track roughly that many different transcripts from each targeted gene.
For the current study, the researchers paired the barcodes with Illumina sequencing adaptors, though the same approach is expected to be compatible with other instruments as well. "It's really flexible enough that you could do this on any platform by simply including barcodes on the right adaptors," Fu said.
The quantification experiment, however, did more than provide information on the number and nature of the transcripts present for the seven spike-in RNAs. It also hinted at low efficiency in the library preparation process — something the researchers explored further in follow-up experiments using a newly developed set of 960 synthetic RNAs containing internal barcodes.
Results from those experiments suggested that just one and two transcripts remain in a typical RNA sequencing library for every 1,000 transcripts in the starting sample.
The poor efficiency may seem somewhat surprising at first, Fu said. But given the large number of steps in a standard RNA sequencing library protocol, the analysis suggests that it is easy to end up with only a fraction of the input material at the end, even with up to 70 percent yield for individual steps.
"If one assumes even moderate stepwise yield going through all of the steps, the cumulative yield … matches pretty well with the measurements that we had," Fu said.
Even so, he and his co-authors argued that researchers should be able to keep tabs on this incremental loss — and more accurately interpret their RNA sequencing data — by using barcoded RNAs such as those described in the study.
"Somebody making an RNA library would take a bit of [the barcoded synthetic RNA set], add it into their sample and process it in the standard way," Fu explained.
"Even before you go into sequencing, one way of doing a quality control check on … your library prep, is to take a bit of the library, amplify the barcodes from the spike-in RNA using PCR, and then put that onto a very simple hybridization detector to count the number of barcodes," he said.
Researchers at Cellular Research worked with the Austin-based company Bioo Scientific to develop a Bioo-branded molecular indexing adaptor kit compatible with Illumina instruments.
As reported earlier this week in PCR Insider, Cellular Research is gearing up to introduce a platform that uses the molecular indexing method to detect the levels of individual genes in single cells.
Fu said the company also hopes to develop highly efficient library preparation methods that are suitable for very small amounts of input RNA material.