By Ben Butkus
Duke University researchers have created a "pipeline" for choosing genes potentially useful as control genes in quantitative real-time PCR-based comparisons of gene expression among different species and tissue types, according to a research paper published last week.
Although the method was designed to help the Duke researchers identify candidate RT-qPCR control genes for their research comparing human and primate gene expression, it is adaptable to other species for which expression data exist and may serve as a useful general tool for normalizing RT-qPCR experiments, they said.
The method, described in the September issue of PLoS One, uses published microarray data to select a pool of candidate control genes which are then whittled down using tissue-specific expression assays to identify an optimal subset of genes with high expression stability across the species and tissues being examined in a given experiment.
Olivier Fedrigo, a postdoc in the laboratory of Greg Wray at Duke's department of biology and Institute for Genome Sciences and Policy, told PCR Insider this week that his group developed the method as part of its work to quantify how differential gene expression in humans and chimpanzees confers differences in disease susceptibility between the species despite their genomes being 98.8 percent similar.
"There are several ways to look at gene expression, but the most common way is to look at RNA abundance with qPCR," Fedrigo said. "This method is very robust, but we need to use control genes to be able to normalize RNA abundance to account for the number of cells present in a sample, or total RNA, or other external factors that may influence gene expression."
Fedrigo said that a common pitfall in RT-qPCR experiments is that researchers use so-called "housekeeping genes" as control genes because they are thought to be expressed in the same manner across tissue types, life stage, and even species.
"People have been using a priori genes based on the idea that it's what everybody else used," Fedrigo said. "But … genes vary quite a bit between life stages, and organisms, and tissues. You cannot use the same control genes for primates and for birds, right? You really can't use the same control genes for liver and for brain tissue."
Another problem is that even if researchers identify a useful control gene, they might employ that as the sole normalization gene in their experiment, which is "very risky," Fedrigo said. "If the control gene you use is not optimal you will introduce bias. If you use several control genes, outliers will be kind of eliminated from the pool. So that was the big idea behind this project. Really our goal is to be able to find control genes that are applicable to a certain project."
To identify a set of control genes for their RT-qPCR experiments comparing human and chimpanzee gene expression, Fedrigo's group developed a three-part "pipeline." First, they used published microarray data to determine a set of genes with low variation between and within species, and across tissue.
They then designed primers for the genes and tested their specificity; and performed expression assays and variation analyses to determine the best sets of control genes. "We basically started from scratch to find new genes," Fedrigo said.
Overall, the researchers computed the "evenness score" — a calculation of the evenness of expression across all tissues — for 22,667 genes from the Novartis expression atlas for 27 human tissues and examined within and between human-chimpanzee variation for 4,365 genes and five tissues, according to the PLoS One paper.
They subsequently calculated combined variation scores for 3,556 genes present in both the Novartis expression atlas and the human-chimpanzee microarray dataset, and honed in on the top 5 percent of the list, about 178 genes, with the smallest score.
Next, they whittled this list down even further by applying the geNorm algorithm, which was developed by the University of Ghent's Jo Vandesompele to determine the most stable reference genes from a set of tested candidate reference genes in a given sample panel.
They ended up identifying 13 genes from their pipeline and from commonly used control genes for comparison's sake; then tested the genes and validated their expression stability across species. They found that for at least three tissue types — cerebral cortex, liver, and skeletal muscle — the genes EIF2B2, EEF2, HMBS, and SDHA are useful for normalizing human and chimpanzee expression using qPCR.
The researchers' results also suggested that commonly used control genes TBP, GAPDH, and especially ACTB do not perform as well.
"When people established that these genes were useful before, they had a very limited sample size," Fedrigo said. "They were looking at one tissue, so they had no idea how it is expressed in other tissues."
The researchers wrote in their paper that their method "can be easily adapted and applied to other tissue and species comparisons." They also noted that "although many control gene lists have been previously published … they are limited to their own specific application," and that the new approach is notable because it is not based entirely on a priori candidate genes.
And, while other studies have proposed microarray-based methods to detect candidate control genes, "to our knowledge our pipeline is the first attempt to implement an approach appropriate for comparisons among species," they added.
Fedrigo conceded that when choosing control genes for RT-qPCR experiments, researchers must strike a balance between "what is the best thing to do and what is the most practical thing to do. The more control genes you use, the better you're going to be, but the more expensive it's going to be. You can imagine using 10 control genes — that would be fantastic — but you'd fill your plate with control genes rather than samples."
He added that researchers should consider using at minimum two or three control genes for a given experiment.
The researchers next plan to optimize their pipeline method by employing next-generation sequencing either in place of or supplementing microarray data analysis.
"Microarrays have a very small dynamic range," Fedrigo said. "When you select genes for microarrays you have a tendency to select highly expressed genes … but lower expression can be good as well, and microarrays are going to miss that."
Next-gen sequencing, Fedrigo added, "will allow you to pick up more genes, as well as new genes that people don't know about. I would bet that in a few years a lot more of these studies will be out, and it will be much easier to select control genes."