Professor, functional cancer genomics/applied bioinformatics, Ghent University
• Founder, BioGazelle — 2007
• PhD, medical genetics, Ghent University — 2002
• MS, bioengineering, Ghent University — 1997
This week, a team of researchers from Ghent University published a report in Genome Biology detailing the creation of a new method for normalizing microRNA RT-qPCR data.
Current methods used to normalize miRNA expression data involve the use of "multiple reference genes" such as reference miRNAs or other small non-coding RNAs, according to the investigators. But using the mean expression value of all expressed miRNAs in a given sample "outperforms the current normalization strategy in terms of better reduction of technical variation and more accurate appreciation of biological changes."
RNAi News this week spoke with Jo Vandesompele, the senior author on the paper, about the findings.
The starting point for the study was this notion that RT-PCR is good for microRNA profiling, but there can be problems controlling for variables, correct?
Exactly. There is a lot of variability in the entire workflow, getting from the living cells to the final data result, and most of this is arguably related to the quality and quantity of the input. By doing a proper normalization, you can actually compensate for most of the experimental variability.
It is a well-known concept, and people in the field of mRNA gene-expression analysis have been doing this for many years.
In the paper, you note that in doing microRNA profiling, people traditionally use reference microRNAs and other non-coding RNAs, but that there are some drawbacks to the approach. Can you talk a little about this?
I want to draw a parallel to the work we've done here in the microRNA field and the work we and others have done in the coding messenger RNA field.
Fifteen or twenty years ago, people normalized their Northern blots for measuring mRNAs using ribosomal RNAs. … [But this is essentially] using an RNA molecule that has little, if anything, to do with the molecule that you want to measure … and it has been shown extensively in the literature that this can be a dangerous strategy that can introduce a lot of error.
The same occurs now; with the lack of stably expressed microRNA controls, people rely on small nuclear or small nucleolar RNAs. In fact, we have to recognize that these have little, if anything, to do with microRNAs. They are RNAs and they are small, but that's about it.
In our study, we've shown that if you rely on these conventional controls, you miss a lot; you introduce a lot of false positives and get a lot of false negatives.
Your approach is to use the mean expression value to normalize data. Can you talk about what this is and how it works?
All studies were performed using the stem-loop RT-qPCR method from Applied Biosystems, and when we developed our method about a year ago, there were some 450 microRNA assays available. We did a qPCR analysis of all of them in two 384-well plates.
At the beginning, we simply wanted to use the mean expression value of all microRNAs that are expressed in a given sample as a kind of control to see variability in large series of the experiments we did. While doing so, we noticed that this is not only a quality control to assess the reproducibility of the workflow, but in fact it proved very useful to normalize sample-related differences.
What we do is calculate the mean expression values of all the microRNAs that are expressed in a given cell. As I've already said, we measured 450 at that time. Now it's 650 microRNAs, and on average, two-thirds to three-quarters are expressed in a given cell type or tissue under investigation.
For these two-thirds or three-quarters that are expressed, you can calculate the average expression level, and this is what we now consider as a universal, powerful method to normalize the sample.
What sort of things did you do to confirm this approach?
We did three experiments, [the first of which] simply looked at the variability, the noise, in a dataset.
The very purpose of normalization is to reduce the experimental noise as much as possible. By measuring the noise in a system before and after normalization, or [in comparisons of] different normalization strategies, you can already have a very good clue about which method is best. Mean normalization was able to reduce the experimental variation much better than the other methods that we evaluated [including] the use of a small nuclear or nucleolar RNAs.
The second one was to see what the contribution of normalization is on the identification of differentially expressed genes.
This was the trickiest part [of the work] because when evaluating a normalization strategy, you need to know the answer you are looking for in advance. Often, this involves circular reasoning, so it was very difficult to find microRNAs for which we knew in advance their expression level or fold change.
[ pagebreak ]
The way we approached this was to use microRNAs that were under the control of a transcription factor. It was clear from the literature that these should be up-regulated when the transcription factor was active, so these were our positive controls. Only when we used the new normalization method were we able to show that, indeed, these microRNAs were expressed upon induction of the transcription factor.
[In regards to] the third, if you do gene-expression analysis, irrespective of the method itself … the underlying assumption is that some genes are stably expressed and don't show any differential expression, some are up-regulated, and some are down-regulated.
If you look at all the datasets that are published, the up-regulated and down-regulated genes are well balanced. What we noticed in many of the microRNA datasets we could analyze is that existing normalization methods biased the results toward the identification of more up-regulated or more down-regulated microRNAs.That is simply incorrect … [and] our new method does a much better job in finding the true positives and balancing the down-regulated versus the up-regulated genes.
Have any other groups used this method and gotten the same results?
We actually are acting as a core facility in our university, so we've done a lot of collaborations in different fields … and these [collaborators] followed the same approach we developed to do the normalization. They, too, can confirm that mean expression normalization is definitely much better than the previously accepted method of using small nuclear or nucleolar RNAs.
I'm not aware of any other method that actually does something similar except from one on mRNA normalization [in a] paper that was recently published by John Quakenbush [at the Harvard School of Public Health]. To a much lesser extent, [he and colleagues] have evaluated the problem, too, of high dimensional RT-qPCR data and came up with the conclusion that it’s better to rely on all the expressed values instead of single controls.
This paper came out while our study was under review.
At this point, is there additional work being done with this method?
I'd like to note that when presenting this method to people at scientific conferences, I often hear the comment that it's impossible to use the mean expression value because [those particular researchers] are not interested in all microRNAs.
It's very important to realize that our method only works if you measure the expression of a lot of microRNAs, an unbiased set of microRNAs, so that you can be pretty certain the actual mean expression value is an unbiased value for the input amount.
One workaround to this is to set up a pilot experiment, which we explained in the paper … in which you measure all microRNAs, or as many as possible, and you determine the mean. Then, you look for microRNAs with an expression pattern across your samples that is very similar to the mean.
We are now following up on this with collaborators to be sure that in validation studies this works very well.
Coming back to your question, do we stop here? No. In collaboration with Applied Biosystems, we are analyzing a huge dataset that [the company] has prepared and in which the expression of over 1,000 different mature human microRNAs in 40 different human tissues has been measured.
With them, we're doing an analysis, and preliminary results confirm that mean expression normalization does work very well when dealing with such a huge and complex heterogeneous dataset. We will try to finalize this during the summer.
I'd also like to mention one small point to consider, which is that most of the software to do real-time PCR analysis is not compatible yet with this method. Our group has been a crusader for 10 years to convince people to use more than one reference gene for coding mRNA normalization. Finally, people are actually doing this and software is finally available to use multiple reference genes.
But here, we don't talk about multiple reference genes, we talk about all the genes. So we still await modifications of [existing] software that can do the calculations for the scientist.
Our own software [available from BioGazelle], qBase Plus, is fully compatible and can be used to apply this new method, but I'm sure in the near future all the other software out there to do gene-expression analysis will be compatible.