By comparing and contrasting data generated with three low-input RNA sequencing methods, a California team has characterized the technical variations introduced at increasingly lower RNA levels, while considering ways to mitigate that variation in the future.
"My goal in this comparison was two-fold," Shankar Subramaniam, a bioinformatics, systems biology, and bioengineering researcher at the University of California, San Diego, told In Sequence.
"One was to say, 'What are the limits of technical variation?'" explained Subramaniam, who led the comparison. "The second motivation was for ourselves, to understand how we can fine-tune our primers and methodology so we are analyzing, regime-by-regime, what the noise contribution is."
In a paper published in Scientific Reports earlier this month, Subramaniam and his co-authors used Smart-seq, CEL-seq, and a "designed primer-based RNA-sequencing" (DP-seq) approach developed by their own team to do RNA sequencing on serially diluted messenger RNA libraries from the same type of growth factor-treated and untreated mouse embryonic stem cells profiled in a Scientific Reports study that introduced DP-seq last year.
The team's comparison of the resulting RNA sequence data indicated that technical variation increases at ever-lower mRNA input levels for all three of the amplification-based approaches. In particular, the analysis pointed to distorted coverage of mRNA transcripts found at low or medium levels in the starting sample.
"High technical variations ultimately masked subtle biological differences," Subramaniam and colleagues wrote, "mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA."
Even so, the investigators found that some forms of technical variation were specific to the method at hand. For instance, their data indicated that the Smart-seq approach tends to show poor amplification and representation of long transcripts, while DP-seq is prone to spurious PCR products during amplification at low mRNA inputs.
For CEL-seq, the most costly and time-consuming method, they detected reduced long transcript coverage as well as a pronounced dip in the coverage of transcripts with low expression when RNA inputs waned.
For the time being, authors of the study are continuing to employ Smart-seq and DP-seq for their own low-input RNA sequencing experiments. The team has also applied for patents related to its DP-seq technology.
Going forward, Subramaniam noted that dramatically distinct RNA sequencing strategies may be needed to avoid some of the technical difficulties associated with amplification-based RNA sequencing methods for low-input samples.
"There may be a radically different method — for example, in the future, some optical method — that would read [mRNA] without shearing or having to do any amplification," he said.
Subramaniam said his group is also exploring the possibility of using targeted primers to amplify and sequence panels of mRNA transcripts — for instance, transcripts known for distinct expression profiles in cancer or other diseases. Should that approach pan out, it would theoretically make it possible to do RNA sequencing of several targeted transcripts on many patient samples simultaneously.
"We're having technical limitations in designing these primers because of the fact that the primers are very promiscuous across many genes," he said. "I'm sure with hard work we'll be able to come up with primers for a specific disease, a specific gene set, and so forth."
The current analysis of variation in low-input RNA-seq studies comes on the heels of a Nature Methods study by researchers at the Broad Institute that compared Smart-seq with four other methods for sequencing low-quality or scant RNA.
For their new comparison, Subramaniam and colleagues focused on commonly available approaches for sequencing low-input RNA sequencing when the analysis started (Smart-seq and CEL-seq), adding their own DP-seq approach into the mix.
The DP-seq method was developed as an offshoot of a National Heart, Lung, and Blood Institute project the group was working on that involved extensive gene expression profiling, Subramaniam explained.
Following some early attempts to do RNA sequencing with available methods, the researchers came up with the notion of amplifying RNA with primers designed to target heptamer nucleotides that flank transcripts rather than random primers. The resulting DP-seq approach also uses polymerases known for functioning optimally at low and high temperatures.
"We came up with the idea of using two polymerases … one polymerase that amplifies to a particular extent and then a thermophilic polymerase — a high temperature polymerase — that can take the slightly amplified fragments and amplify them further," Subramaniam said.
As they reported last year, that RNA sequencing approach has shown promise for detecting transcripts across a wide dynamic range, even with low starting RNA inputs on the order of 50 picograms or so.
Still, the group wanted to get a sense of how this method and others performed as input RNA levels declined. They were also keen to determine when and how such methods break down.
To interpret authentic biological variation from RNA sequence data, it's necessary to know how much technical variation occurs for a given RNA-sequencing protocol — and the nature of that variation, Subramaniam explained.
In general, researchers employing RNA-sequencing want to get an accurate idea of the representation by not only the most highly expressed transcripts, but also those found at relatively low levels in a sample, which tend to show pronounced variation, he noted. "Our goal was to ask the question, 'If I push the technology hard, how much better can I get [information] on the lowly expressed transcripts?'"
Along with that metric, authors of the new analysis also considered features such as the reproducibility of the tag distribution and noise across experiments using the various amplification methods, as well as the accuracy of mRNA quantification and dynamic range of each sequencing approach.
To look at such features, the team did serial dilutions of mRNA from mouse embryonic stem cells that had or had not been treated with activin A, a compound that kicks some early developmental events into gear.
For these samples, they used Smart-seq or DP-seq exponential amplification protocols or CEL-seq — which involves linear amplification via in vitro transcription by a T7 polymerase enzyme — to prepare and sequence libraries from between 25 picograms and 1 nanogram of mRNA apiece.
Their subsequent comparison was done using reads generated from the libraries using Illumina's HiSeq 2000 platform and either single-end sequencing (in the case of Smart-seq and DP-seq) or paired-end sequencing (for CEL-seq).
Because the RNA-sequencing methods considered involve differences primarily at the amplification stage of the protocol, Subramaniam noted that the findings from the current analysis are expected to hold regardless of the sequencing instrument used.
The team also prepared samples using a standard RNA sequencing approach, which is typically done with much larger RNA inputs (around 1 to 10 nanograms of mRNA), and tested for levels of specific transcripts using quantitative real-time PCR.
Not surprisingly, the transcript coverage and accuracy was high for libraries prepared with the largest initial mRNA inputs with all three amplification-based schemes, the researchers reported.
But the representation by transcripts with low expression dropped off dramatically in the 25-picogram libraries, particularly in samples prepared using CEL-seq, they found. The coverage tended to remain somewhat higher for very low-input samples prepared with Smart-seq, though that approach was marked by lower-than-usual representation by long transcripts.
On the other hand, sequence data generated with DP-seq showed a jump in reads associated with spurious PCR products when mRNA levels were low, despite maintaining relatively robust coverage overall.
In samples with relatively high amounts of starting mRNA, the group was able to pick up gene expression patterns and differentially expressed gene profiles with Smart-seq, CEL-seq, and DP-seq consistent with those found by standard RNA sequencing samples containing 1 nanogram of mRNA.
Again, though, there was a decline in the accuracy of such data at lower input levels such as 25 picograms of starting mRNA.
"Regardless of the method used, we noticed a significant increase in technical variations in the libraries prepared from low amounts of mRNA," they noted. "This resulted in poor quantification of the vast majority of low expressed transcripts including the transcription factor family genes."
Based on their results so far, the researchers believe that either the Smart-seq approach or their own DP-seq approach will continue to find favor with those who want to do transcriptome profiling on small numbers of cells without extensive amplification.
Even so, as with the other RNA-sequencing methods that have been designed to deal with small RNA inputs, Subramaniam noted, researchers need to be aware of the types of technical biases that can occur.
"Regardless of the method used, increased technical variations in low-input sequencing libraries prevented accurate quantification of the majority of the low to moderately expressed transcripts," he and his co-authors wrote. "We expect biological interpretation of the transcriptome data to suffer further as the amounts of mRNA are reduced to single-cell levels and biological variations are incorporated."