Researchers conducting global gene expression studies may be assuming too much when it comes to interpreting their data, according to Whitehead Institute scientists. In a paper published in October in Cell, the researchers question the assumption that all cells produce similar levels of messenger RNA — an assumption they say has likely led to erroneous interpretations about the relative regulation of genes in different cell types. In their paper, they use three different gene expression analysis methods to demonstrate that such a problem exists, and propose a standardization method using synthetically produced RNA "spike-ins" to produce better assessments of changes in steady-state levels of RNA. Ben Butkus recently spoke with Whitehead's Tony Lee about the findings. What follows is an excerpt of their conversation.
Genome Technology: Your main finding is that there is cell-to-cell variation in the amount of mRNA produced, correct?
Tony Lee: Yes, more or less. In this particular experiment we're doing this in [different] cell types or cell conditions, so we're not looking at single-cell resolution.
GT: Had this potential variation in mRNA production by cell type previously been hypothesized?
TL: It's something that is relatively well known. For instance, when expression arrays first came out, a lot of their users studied what happens to transcription genome-wide when you knock out certain factors that are important for transcription. At that time, there was recognition that if you knocked down something that was important for general transcription, you were going to affect all of transcription.
We … formally showed this effect across a number of different gene expression platforms; and more importantly tried to show one way to … tackle this potential problem in the future.
GT: What gene expression analysis technologies did you use to interrogate this variation?
TL: So far we've used DNA microarrays from Affymetrix; RNA sequencing technology [from Illumina]; and NanoString's [digital counting] technology. The important part of the paper wasn't really testing the platforms against one another. We were testing whether all three … end up with the same bottom line, which is that with the standard normalization you get expression indicating that many genes are unchanging, a few change up, and a few change down; and when you revisit normalization you get a very different impression. The effect of the normalization was constant across all three platforms.
GT: What are the implications of this for prior and future gene expression analysis research?
TL: The bottom line is that it's difficult to anticipate when this kind of phenomenon might be happening. For [future] gene expression experiments, we think it's probably a good idea to use this type of standardized control.
GT: Do your results imply that all previous gene expression analysis studies need to be looked at again to make sure this didn't happen?
TL: We haven't really come up with a way to salvage, so to speak, old data. You could probably reconstruct it if you had tracked the cell numbers and the total amount of RNA you were getting. That's a little bit complicated because most people don't track total RNA. In addition, it's actually changes in total mRNA production that are the most problematic, because that's what you're measuring with gene expression analysis. We actually have seen situations [in which] the total RNA doesn't seem to change very much, but the total mRNA is changing quite a lot. Normalization back to the cell number is actually the important factor.
We don't want to suggest this is a situation where everything that has been done is wrong. It's just another thing where researchers should be aware of the assumptions that went into their experiments.