A team of researchers in Germany has proposed a set of non-protein-coding RNAs that can serve as stable internal controls for data normalization in real-time PCR for studies involving these molecules.
The researchers, who describe their work in a recent issue of RNA, note that despite the sensitivity of real-time qPCR for measuring RNA abundance in small amounts of material, "the lack of appropriate internal controls necessary for accurate data analysis is a limiting factor for its application in [non-protein-coding RNA] in npcRNA research."
While several protein-coding reference genes, or "housekeeping genes," are commonly used as references in many PCR studies, the authors note that these molecules have their limits because their expression levels "vary among tissues and different experimental conditions." Furthermore, the use of these genes as reference in npcRNA studies is "questionable," they said, "due to the differences in biogenesis."
The team, comprising researchers from the Max Planck Institute for Molecular Genetics and the Institute of Experimental Pathology at the University of Münster, started out with a set of 18 RNAs from different structural and functional classes and then used real-time qPCR to assess their expression levels across 20 different tissues.
They ultimately came up with a final list of 11 RNAs that exhibited stable expression across all the tissues. Of those, five — 7SL scRNA, U1 snRNA, 5.8S rRNA, and U87 scaRNA consistently exhibited the most stable expression across all human tissues and also demonstrated "significantly higher expression stability than the best protein-coding [housekeeping genes] in this study, ACTB and B2M."
The authors recommend that three of the five top-ranking RNAs — 7SL scRNA, U6 snRNA, and U87 scaRNA — be included in the "minimal set of reference candidates to be evaluated for normalization in any given npcRNA transcriptome analysis."
PCR Insider spoke to Zoltán Konthur, a Max Planck researcher and a co-author on the paper, to learn more about these "housekeeping RNAs" and how they might be used in future studies. The following is an edited version of the interview.
What made you decide to look into the use of housekeeping RNAs as opposed to housekeeping genes?
There were a couple of reasons. We got involved in a project to analyze bioinformatically predicted RNA molecules. There are quite a lot of ESTs in databases that do not match with any protein-coding genes, so a bioinformatics colleague of ours came across these sequences in the human genome and thought that could be [a] quite interesting [project]. So we got involved in this project, and we wanted to see if these [bioinformatically predicted RNAs] really existed or not.
The predictions were in the range of 15,000 transcripts for the human genome, and obviously you cannot analyze that with a Northern blot, so we thought that maybe real-time PCR would be an appropriate tool.
What we then looked at was what controls or what reference genes are generally used in real-time PCR, and since most people are investigating expression levels of protein-coding genes, there was not much knowledge about non-coding RNAs at the time we started this project, which was 2006.
So we thought that it may be worthwhile to look into non-coding molecules in general. Many of them were analyzed only in certain tissues, like brain, which is very rich in non-coding RNAs, or placenta, which is also quite rich. [But no one had looked at these molecules] on a very broad basis. So we thought that it would be generally interesting to see whether non-coding RNA molecules are ubiquitously expressed in all different tissues or not.
Then, looking further into it, we said, 'Let's compare them with the more generally used protein-coding reference genes, the housekeeping genes, to see if we do have expression for all these non-coding RNA molecules in all these different tissues.' And by looking at that data we can see that some of these non-coding RNA molecules are simply much more stable.
[ pagebreak ]
Also, because of the biogenesis of the molecules, we thought it might be better if [researchers working with npcRNAs weren't] relying on the protein-coding genes.
What was quite interesting and quite astonishing in a way was that some of these molecules not only show very high stability in the way that they are ubiquitously expressed at more or less the same level in all tissues we analyzed, but the dynamic range was also quite interesting — these non-coding RNA molecules basically [exhibit] eight orders of magnitude difference … [while] most of the protein-coding housekeeping genes [exhibit] only maybe three or four orders of magnitude. So the non-coding molecules are basically spread out along the expression level much further.
That was unexpected?
Not necessarily, no. We actually had no idea, so we looked at it and found that this is quite interesting. Some of these molecules we of course knew were strongly expressed because Northern blots use a lot of ribosomal RNA and so on as controls, so we knew from the results of colleagues and the literature that some of them are really heavily expressed. But other molecules, we also knew from the literature, can hardly be detected in Northern blots, like U105 [snoRNA], for instance, which also turned out to be one of the lowest expressed molecules and also showed some variation between the tissues.
Your paper notes that housekeeping genes actually exhibit a great deal of variation across different samples. It sounds like what you've seen so far indicates that won't be the case with housekeeping RNAs.
That's something that others have reported quite frequently [for reference genes], starting back in 2002. Quite a lot of people reported that many of the reference genes aren't really that stable if you look at them under different conditions. If you compare the same tissue or the same sample, then it might be fine to use a single reference gene, but if you want to compare the expression levels within different tissues, then you must have more than one reference. That's exactly what Vandesompele [and colleagues] came up with [in a paper published in Genome Biology in 2002] — the fact that you need more than one reference.
So this is why we said, 'OK, we'll go through the whole procedure and take all of these different approaches and look for stability in these non-coding RNAs.' We analyzed quite a lot of these molecules.
How did you come up with that first set of 18 initial housekeeping RNAs, when there are so many to choose from?
We discussed this with our colleagues from Münster, who are co-authors on this. Primarily what we wanted was to have a fairly diverse set of molecules that originate from different classes and are maybe even located in different parts of the cell. So we wanted to have a broad set, because if you take a set that is too narrow — reference genes that are involved in the same pathway, or whatever — then the experimental conditions might have much more influence on the expression level. So we wanted to spread them out quite broadly.
That was one thing. The other thing was that of the 18, some of them we chose basically because they were reported to be tissue-specific, so they were making our test case. The BC200 [scRNA] and HBI-36 [snoRNA] are reported to be strongly expressed in brain and not expressed, or, at least by Northern blot, not really detectable in other tissues, so we took them knowing that they're not going to be good housekeeping genes.
And then, of course, some of these show different expression levels. Ribosomal RNAs are obviously very abundant, while some other molecules are not.
Given that noncoding RNA is a relatively new area of research, what are your thoughts on the longevity of the set of housekeeping RNAs that you ended up with? Is this likely to change as new RNAs are discovered?
Well, the harder you look the more you find. Obviously, this set is not going to be universal, necessarily, for all applications in the next [several] years. However, we think that at least the top five candidates are pretty well characterized and pretty well known RNA molecules, and they are of different abundance within the cell as well. They seem to be really expressed quite stably. I'm not that sure that there are so many RNA molecules around that are expressed very stably and haven't been found yet.
[ pagebreak ]
In a way, it also depends on your experimental conditions and the scientific questions you want to answer. But I think it's a very good starting point, and that's exactly what we propose here: that within this set, there are certainly some molecules that would be suitable as a starting point to analyze in the experimental condition one would want to use. So, for example, if you want to use cells in different stress conditions or whatever, then you might find differences, but [using] a few of these [molecules in your experiment] should eliminate that.
What would be the next steps to further this work?
Obviously, you can go into more detail within certain tissues. If you go into brain for instance, it would be interesting to look at the different brain regions, or it would be interesting to look into certain types of cells — nerve cells or whatever. So I think it's a matter of how deeply you want to get into this.
We've taken a fairly broad approach to it, and taken 20 tissues that are really, really different, mainly because the general idea in the beginning was to look at molecules that were bioinformatically predicted, so obviously we had no clue whether they were tissue-specific or not. So we needed a very broad base.
Given that non-coding RNA is such a hot area of research, why do you suppose this is first work to propose set of reference molecules for normalization that is specific for these types of experiments?
For microRNAs people have looked into this, but that work was also published very recently. We didn't focus on microRNAs because we're using SYBR Green as a detection method and not TaqMan probes, and obviously for microRNAs it would be basically impossible to set up probes. Because this came from the idea of testing the predicted molecules, these were rather longer molecules, so that's why we stuck to the set we described here.
So for microRNAs, people have looked into this, but not so much generally for non-coding RNAs. I think one of the reasons for that is that real-time PCR has not really been picked up in this community that strongly yet. Most of the work is based on different experimental methods — Northern blots and looking at function and so on, and not so much in this high-throughput fashion.
What do your findings mean for most researchers who are using real-time PCR to study non-coding RNAs? Will it be a simple change for them to begin including the recommended RNAs in their experiments?
I think it's very simple. That's the beauty of it. These molecules are now pretty well-characterized. For the SYBR Green platform, it should be as simple. It's just a matter of time and people might pick up on it. Some of these non-coding molecules have been used as a reference before, but they've never really been systematically characterized. Ribosomal RNA has been used, U6 [snRNA] has been used before, but it's been in one or two papers. Most people nave not really looked into that yet.
What about the concept of using multiple reference genes for an experiment? I understand that there is still some resistance to doing that, at least for protein-coding gene-expression experiments.
I think our publication might have come out just in time for the RNA field, since [researchers in this area] are just starting start to pick up [real-time PCR], so people might not make the same mistakes that they have in other fields.
I can understand in a way why people use single references, because the housekeeping genes were simply expected to be stable from cell to cell before it was reported that there are variations and that you need to use more.
What's next for you and your colleagues?
We are actually still looking at the bioinformatically predicted molecules, so hopefully we will come up with something there soon. And we are also looking into the concept of an expression ruler. Since we discovered that there is such a huge difference in orders in magnitude in expression strength, one might actually be able to use some of these non-coding RNA molecules as an indicator of expression for comparison with Northern blot data.
That would be a bit of a standardization effort, but we think that the data that we have produced so far will actually [prove] that are possibilities to use these real-time PCR methods as a kind of quality standard or reference for other experimental methods.
That's one of the reasons why we incorporated the [Minimum Information for Publication of Quantitative Real-Time PCR Experiments] standards in our work.