By Julia Karow
A research team led by the Hubrecht Institute and the University Medical Center Utrecht in the Netherlands has found that current sequencing-based methods for small RNA digital gene expression profiling are strongly biased towards certain types of small RNAs.
The findings could limit the use of these results for absolute measurements of small RNAs, the researchers said, despite the fact that the methods are "digital."
The scientists, who published their results in Nature Methods this month, determined that the bias is "largely independent of the sequencing platform" but is instead "strongly determined" by the method used for preparing the small RNA library.
"Systems biology approaches are hampered by [the bias]," said Edwin Cuppen, a researcher at the Hubrecht Institute and the senior author of the paper. "You cannot really see the data as quantitative at the molecule level in a sample."
But the researchers also found the biases to be highly reproducible, meaning that DGE profiling of small RNA is still suitable to assess expression differences between samples, and has advantages over microarrays for that application.
"I think this is extremely interesting information to know about," said Frank Slack, an associate professor at Yale University who recently published a paper on the expression of small non-coding RNAs in C. elegans. He said via e-mail that he had known about the biases "through the grapevine" but had not seen them firsthand. In his experiments, he has been comparing small RNA levels from different life stages of the worm, "which we are happy to see is not affected by these biases."
Cuppen and his colleagues decided to do a systematic study of RNA DGE profiling methods after they found that one of the most commonly used library preparation methods — modban adaptor ligation — showed a clear preference for a certain class of microRNAs called let-7. "We were wondering whether that was really the case because that [result] came back in every tissue, every sample that you analyzed," he said.
For their study, they compared three popular methods for preparing small RNA sequencing libraries: modban adaptor ligation, which uses a pre-adenylated adaptor; polyadenylation, which uses a bacterial- or yeast-derived enzyme to add a poly(A)-tail to the RNA; and SREK, a small RNA expression kit made by Life Technologies' Ambion.
Because at the time they started their experiments — almost two years ago — there was a lot of discussion about biases of second-generation sequencing platforms, they decided to sequence the libraries independently on three platforms, choosing the Roche/454 GS FLX and the Applied Biosystems SOLiD — both of which they had in-house — as well as cloning followed by traditional Sanger sequencing.
After preparing duplicate libraries from a single rat brain sample and sequencing them on all three platforms, they noticed that the results did not vary significantly between sequencing platforms, but they differed a lot depending on the library prep method they used. "Those differences were enormous," Cuppen said. "Based on those results, we could not tell which method reflected the biology most closely."
For example, the 10 most frequently sequenced microRNAs from each library-prep method differed substantially, but were consistent within each method, suggesting a systematic bias.
Furthermore, none of the three methods correlated well with results from qPCR, considered to be the gold standard for quantitative molecule detection.
To study the differences between the methods further, the researchers used two of them — modban following by Illumina sequencing and SREK following by SOLiD sequencing — to sequence a set of approximately 500 synthetic human microRNAs that were mixed in equal amounts. They also profiled the same set by qPCR as a control.
[ pagebreak ]
Somewhat to their surprise, they found substantial differences — up to four orders of magnitude in size — between the most and the least frequently detected microRNAs with all three methods, including qPCR. The biases differed between the methods, so they did not result from errors in mixing the microRNAs.
"We were sort of surprised that these biases were so big," Cuppen said. "Four orders of magnitude difference in efficiency of how you can capture [the RNAs] in the sequencing — that amazed us."
The fact that the library prep methods favor some small RNAs so strongly over others "limits the usability of the technology to some extent," he said, because the sequence reads are dominated by these molecular species. On the other hand, he said, because the sequencing depth on today's platforms is so high, "you still have many sequencing reads that are informative for studying other types of molecules."
By studying the types of synthetic RNAs that were favored by each method, the researchers tried to figure out the basis for the bias. Part of it can be attributed to the last or the last couple of nucleotides in the RNA molecules, but they were unable to come up with "a satisfactory correction model" that is based on the RNA sequence or secondary structure, according to the article. "We have also not been able to correct the biological sample with the synthetic RNA data," Cuppen said.
At the root of the problem is the fact that any library prep method requires an enzyme to put on adaptors or manipulate the small RNA molecules in other ways, he said, and enzymes are well-known to have preferences. Biases could stem from RNA ligase, the reverse transcriptase reaction, or from PCR, for example.
For RNA-seq, which sequences entire mRNA transcripts, sample prep biases are probably "less of a problem," according to Cuppen, because the start sites for the adaptor ligation are random. "You may get very local biases at the nucleotide level, but if you take a broad picture of the whole messenger RNA, I do believe that the effect is much less for mRNA sequencing," he said.
The small RNA bias could be overcome, in theory, by a technique that sequences RNA molecules directly, he suggested. "I am not aware of any technology that can do that yet," he said.
Even Helicos BioSciences' single-molecule sequencing approach, which requires no amplification or ligation steps to sequence small RNA and allows for "virtually unbiased small RNA quantitation and discovery," according to the company's website, converts RNA to cDNA prior to sequencing, so "it is not sequencing the molecule itself," he said.
The finding that small RNA DGE profiling introduces bias means that researchers cannot rely on absolute molecule counts in their analyses, according to Cuppen, because they "[don't] really say anything about the actual levels in the sample."
That, he said, could be a problem for systems biology approaches, which rely on absolute counts of messenger or microRNA molecules for their models. "Then, quantitative information becomes very important for the system itself, and I think those things cannot be done with DGE," he said.
On the upside, he and his team showed that the method can still be used for differential expression analysis — comparing levels of small RNAs in different samples.
Because the methods' biases are very reproducible, "you can compare samples with each other very easily and reliably," he said, and fold-differences obtained with different methods are very similar.
Using small RNA DGE profiling has a number of advantages over microarray-based profiling, he added. For example, molecules can be detected and new ones discovered at the same time, and there is no background noise to correct for. Also, molecules differing in length can be distinguished with nucleotide resolution, making it possible to discriminate between isoforms.
"The positive part is in the fact that we can still do differential expression determination between samples, and that's what most people are interested in," he said.