Editor's Note: This guide has been updated to reflect Chris Harrington's current title.

Gene Expression Technical Guide

Table of Contents

Letter from the Editor
Index of Experts
Q1: How do you evaluate which approach — microarrays, real-time PCR, or sequencing — is right for a gene expression study?
Q2: What quality control steps do you take?
Q3: What steps do you take to ensure that your power will be adequate?
Q4: What bioinformatic tools do you generally turn to for analyzing gene expression data?
Q5: What approach do you take to normalize your data?
Q6: How do you determine absolute expression levels?
Q7: How do you confirm your findings?
List of Resources

Letter from the Editor

The expression levels of genes vary in different tissues and under different circumstances. These days, there are a number of approaches that researchers may take — microarrays, real-time PCR, or sequencing — to untangle differences in gene expression levels. This installment of Genome Technology's technical guide series focuses on such gene expression studies, including how to choose which of those approaches to take. (Hint: Which is best often depends on the goal of the research project at hand.)

In addition, the researchers queried for this guide address the issues of quality control and data analysis. They also offer suggestions on how to best confirm findings from gene expression studies. Ralph Schlapbach from the Swiss Federal Institute of Technology in Zurich notes that "the ultimate confirmation is done by functional studies that are to be carried out by the research groups and which may lead to new analyses at the level of gene expression, ultimately leading to multiple cycles of hypothesis generation, data generation, and hypothesis validation."

Read on for more.

— Ciara Curtin

Index of Experts

Genome Technology would like to thank the following contributors for taking the time to respond to the questions in this tech guide.

Craig Praul
Director, Expression Analysis, Genomics Core Facility
Pennsylvania State University

Gary Hardiman
Director, BioMedical Genomics Microarray Facility
University of California, San Diego

Chris Harrington
Director, Integrated Genomics Laboratory
Oregon Health and Science University

Ralph Schlapbach
Functional Genomics Center
Swiss Federal Institute of Technology Zurich

Q1: How do you evaluate which approach — microarrays, real-time PCR, or sequencing — is right for a gene expression study?

The answer is largely dependent on the objectives of the study. Traditional Q-PCR is still best suited to small numbers of targets and large numbers of samples. Microarrays and RNA-seq are best suited to whole transcriptional profiling. The choice between microarrays and RNA-seq are guided by issues such as cost, ease of data manipulation and detection sensitivity.

Microarray experiments are advantageous in that they are fast, are relatively inexpensive and data storage and manipulation is easy. In a short period of time, a list of genes that are differentially expressed between two samples can be generated and this can be used to guide pathway, gene ontology and network/interactome analysis, giving clues as to biological differences between two samples. This can also be done with sequencing although the process is a little more involved. If alternative splicing needs to be studied, then sequencing is clearly the way to go.

Arrays can be confounded by multiple problems, including cross-hybridization of related species, poor hybridization kinetics and poor sensitivity in relation to low-abundance transcripts and the inability to distinguish between genes of interest and pseudogenes. This all leads to noisy data.

Commercial microarray builds generally do not age well. They have the disadvantage of being frozen in time, dependent on a genome build that may be 12 months old or greater and sometimes with annotation that is no longer current. This can lead to probe content that is no longer relevant. Additionally important probes pertinent to a particular study may be missing from the array. This is not as great a problem with mouse and human genome arrays, but with other organisms (e.g. zebrafish) can be.

High throughput sequencing does not have these limitations. The end result is a series of sequence tags, which can be mapped to transcripts and the data improved as genomes become better annotated. Sequencing is an ordered process and does not possess the noise inherent in array technology and can therefore produce very small numbers of true hits. Microarrays on the other hand rely on the energetics of binding and so a low-number transcript can easily get swamped by non-specific hybridization and therefore remain undetected.

Massively parallel sequencing will clearly replace DNA microarray technology for monitoring transcriptomic changes. However the cost associated with these experiments and the computational demands for analyzing these experiments has made sequencing less accessible than microarray technology, but this is changing rapidly as sequencers generate more output, as clever barcoding schemes are implemented and as robust analytical tools gain in traction. That said if the immediate aim is to uncover differentially expressed genes between two samples, at this point in time I would still probably run an array experiment.

— Gary Hardiman

To determine which gene expression profiling technology is right for a particular study, we consider several different things in consultation with the research client. What are the focus of the overall study and the specific goals of the experiment? Does the client want to measure how particular genes or networks are changing? Or is the study more about discovery or examining global patterns of expression? What about alternative transcripts and non-coding RNAs? Is measurement of very rare RNAs important? Is this a clinical research study in which sample size requirements may be high or a well-controlled model system experiment in which fewer replicates may be adequate? We use this information to guide the selection of the platform(s) and protocol that will meet the research goals and to determine the cost of running the experiment on the different platforms. If the client's research question is focused on a small number of pathways or cellular functions or can be answered by examining patterns of change detected by array hybridization, then a qPCR or microarray approach may be adequate and cost-effective.

Another set of important questions is how much RNA is available for the experiment and what is its quality. At present, we are still turning to microarrays and qPCR for expression profiling when we have total RNA input amounts of less than about 50 ng, but we expect to have robust RNASeq protocols implemented for smaller amounts of RNA soon.

The final set of questions for determining which technology platform to use revolves around the plans and resources for data analysis. For the most part, our core is not a full service facility when it comes to data analysis. While robust and fairly easy-to-use analysis pipelines are in place for qPCR and microarray data, RNA-seq analysis is still a challenge for many of our researchers. Before we recommend a particular technological approach, we try to insure that the client understands what is required for effective data analysis and has the resources and/or collaborators needed to reach a successful end point.

In summary, we work with our clients to determine which gene expression technology best meets their specific experiment goals, fits their budget, and will work with the amounts and type of RNA they have available. If RNA, money, or data analysis resources are limited, we tend to steer clients to microarrays or qPCR.

— Chris Harrington

The goals of the experiment and cost guide the choice of the methodology. The three methods occupy different experimental space in terms of the number of genes analyzed and the number of samples that can be analyzed with the technique at a reasonable cost. Next-generation sequencing and microarrays are obviously suited to genome-wide expression analysis while real-time PCR is of course much more limited in the number of genes that can be assessed in a single experiment. Sample number is also an important consideration. If for example large numbers of samples from mammalian cells need to be analyzed, next-generation sequencing may be cost prohibitive with microarray experiments costing one half to one third as much. If the number of genes of interest can be restricted, then real-time PCR is a cost effective method for analyzing large numbers of samples.

Finally, if discovery of novel transcripts or splice variants is one of the goals of a gene expression study then next-generation sequencing is the only choice.

— Craig Praul

The selection of the analytical technology is based on an in-depth analysis of the research project's needs and the research group's preference and pre-existing knowledge and data.

Depending on the level of information that is needed to clarify a particular research question, the most precise, cost-effective, time saving — overall the most efficient —approach is selected. This process is carried out in close collaboration between analytical and bioinformatics experts of our center and the members of the research group approaching us with the scientific project.

Having a large array of current next-generation sequencing technologies (HiSeq, MiSeq, SOLiD, Roche/454, IonProton and PGM, PacBio RS) and array options at hand (Affymetrix cartridges and Atlas arrays, Agilent arrays), the identification of the most suitable technology can be supported by pilot experiments that we run in order to base the decision on actual data.

If accurate quantification, high sensitivity, or discovery-related aspects are of central importance, we clearly recommend using next-generation sequencing. For scientists working with organisms without a reference genome or any previous knowledge on expressed gene sequences for example, we recommend de novo transcriptome sequencing, where the generation of the reference transcriptome and differential gene expression analysis are carried out with one sequencing effort. Also, we do see a very strong interest in the analysis of non-coding RNA species for which no other approach than sequencing has the potential to provide data on known and new factors at comparable speed and costs. If the goal of the study can be achieved by providing gene-level expression data, many researchers still opt for microarrays as they produce data that are well compatible with existing studies, are generated in a rather short time and at reasonably low cost, and for which the analysis knowledge and experience in the group may already be existing.

Another consideration for the selection of the appropriate approach is based on the available sample amounts and the preciousness of the sample. If only minute amounts of material are available, we can increasingly use next-generation sequencing for data generation because more and more protocols for low-input material become available. However, in some cases we will opt for using microarrays to be on the safe side and generate less but very likely still valuable data. If a sample is available in sufficient amounts but would be impossible or at least very hard to be collected or generated again, we would recommend to use the technology with the richest data output, in most cases high through-put sequencing, ideally combining different protocols to capture not only gene expression values based on RNA analysis but also small RNAs and long non-coding RNA sequences. At the other end of the spectrum, if cost and turnaround time are the most important factors, real-time PCR would still be a very valuable option that is available at many institutes and therefore can be used easily for the characterization of gene expression at lower throughputs as well as for the validation of sequencing and microarray experiments.

— Ralph Schlapbach

Q2: What quality control steps do you take?

For all applications involving RNA: QC of RNA samples using an Agilent Bioanalyzer/tape Station or related approach is a must to ensure that the RNA is intact and not degraded. For arrays, Bioanalyzer analysis of the cRNA target can be useful to ensure uniform transcript amplification before hybridization. The behavior of control probes on the arrays (spike in et cetera) can be useful as an alert to any potential problems with the data. For sequencing RNA, the RNA is typically fragmented, and reverse transcribed into cDNA and adaptors are ligated on. It is important to QC the RNA-seq library using the Agilent Bioanalyzer or Agilent 2200 TapeStation to ensure that there are inserts present and that the library is not comprised of oligonucleotide primer dimers. Prior to sequencing the library should be subjected to accurate DNA quantification by qPCR, and this will guide the amount of DNA to load to reach the optimal cluster density. For qPCR expression studies a suitable housekeeping gene must be used to correct for different sample input amounts but it can also serve as a measure of RNA integrity.

— Gary Hardiman

All RNA samples received in the core lab are evaluated using our RNA Assessment service. This involves measurements of quantity, purity, and quality (integrity). Samples received at estimated concentrations of greater than 20 ng per ul are quantified using UV spectrophotometry. More dilute samples will either be concentrated prior to UV measurement or, if quantities are limited, quantified using the RiboGreen fluorescence assay. Following quantification, RNA samples are examined for their size distribution using either the Agilent Bioanalyzer or the Caliper LabChip GX. We then evaluate the results of the concentration measurements, the UV260/280 and UV260/230 ratios (where available), and the electropherograms to determine the overall quality of each sample. For any given study, we find it is optimal to use samples of similar quality in the expression assay so as not to bring in a sample quality bias. However, assuming RNA is not highly degraded, very limited, or contaminated with DNA or other material, the final selection of samples for the expression profiling assay is decided based on the experiment requirements and the importance of any particular sample to the client or the study design.

After RNAs are selected for profiling, additional QC steps occur with each sample processing depending on the technology platform being used. In general, both cDNA libraries (RNA-seq) and cRNA or cDNA targets (microarrays) are quantified by UV spectrophotometry and sized on the Bioanalyzer to determine whether the material is of adequate quality for the next step in the expression profiling assay. Libraries for sequencing are also generally quantified by PCR prior to template preparation steps.

— Chris Harrington

Good RNA quality is essential in producing accurate and consistent results in gene expression studies. Many laboratories fail to appreciate the importance of isolating high quality RNA and how difficult it can be to isolate good RNA from some tissues.

We employ both the NanoDrop and Agilent Bioanalyzer, and occasionally the Qubit for RNA quality control. It is important for researchers to clearly understand the capabilities of these instruments so they can be used properly for quality control. The NanoDrop is used primarily to assess the purity of RNA by examining 260/280 and 260/230 absorbance ratios. Poor ratios are indicative of contaminants that can potentially interfere with enzymatic reactions. The NanoDrop however cannot assess the quality of RNA and neither can it assess the presence of contaminating DNA as the spectral properties of DNA are very similar to RNA.

The Agilent Bioanalyzer is used to assess RNA quality. This instrument produces an RNA integrity number (RIN) which is an objective measure of RNA quality. RIN scores vary from 1 to 10 with 10 being the highest quality. We do not use a hard cutoff for quality scores as it may be very difficult to obtain high quality RNA from some tissues. However, we generally don't want to use any samples lower than RIN 6 to 7 for difficult tissues and certainly expect RIN of 9 to 10 for RNA from tissue culture. Perhaps as important as the absolute RIN score is the range of RIN scores from a group of samples. We like to see the range be at most 1 to 1.5 on the RIN scale. We generally recommend a new isolation for any samples that are outliers from the group. For example if most of the samples from a large group have a RIN from 9 to 10, but there were a few at 7 to 8 we would be concerned about those. What we are trying to avoid is having differential degradation be mistaken for differential expression.

Most kit based RNA isolation methods seem to produce RNA with very little contaminating DNA but in some cases DNA can be a problem. To accurately assess RNA concentration and the concentration of contaminating DNA concentration we use the Qubit. Employing specific binding of fluorescent dyes this instrument allows accurate quantitation of RNA and DNA even when they are present in a mixture.

— Craig Praul

The very first step of quality control consists of the extensive discussion between our experts from the technological and data analysis side with the users from the research groups evaluating and defining the options and goals for a gene expression study. Already at this point, we can ensure that all relevant information on protocols for sample isolation, on experimental design, and on data analysis and interpretation are addressed and taken into account in all following workflow steps.

Once the above issues have been clarified and the samples have been generated in the research group, the samples will be sent to our center and undergo a strict initial quality control. Regardless of the technology chosen, microarrays or sequencing, all samples are run on the Bioanalyzer to check for integrity and to determine whether there is any degradation of the RNA or DNA. The sample concentration is also double-checked and determined by fluorometric quantitation (in our case the Qubit). As some applications require a certain fragment length to be tightly controlled, we check the size distribution and concentrations of the then prepared libraries on Bioanalyzer, Caliper GX, or Agilent Tape Station. If the approach requires an emulsion PCR, the enrichment is also measured to ensure monoclonal amplification of the library on a bead and to avoid too high multiclonal amplification on a bead that would lead to mixed reads during sequencing. If all these QC steps are passed successfully, we can confidently ensure defined analytical output, in the case of HiSeq sequencing for example 25 to 30 Gb of Q30 data per lane. On the data level, next generation sequencing reads are first quality controlled in terms of base qualities and sequence contents (i.e. GC bias, adaptor contaminations et cetera). For de novo transcriptome sequencing projects, raw reads are first preprocessed to remove low quality data and contaminants. We further control microarray hybridization data as well as next gen sequencing data for potential indications of RNA degradation in terms of a 3'-bias. Additionally, we verify the expression data at the global level with respect to consistency of replicates and consistencies of effects with the experimental design.

— Ralph Schlapbach

Q3: What steps do you take to ensure that your power will be adequate?

This is again guided by the study at hand. The power of a statistical test is the probability that the test (correctly) decides that there is a difference in expression when there truly is a difference. The quantities that determine power are the sample size, the effect size, and the alpha level, which is the probability of detecting the effect when in fact there isn't one (i.e. type I error rate). Since 'sample' has a very different meaning in biology, we use the term number of replicates instead. Given the sought-after effect size, the alpha level, and the desired power, we calculate the number of replicates needed to achieve this power using the pwr.norm.test function of the pwr library for R (R Development Core Team, 2005).

— Gary Hardiman

Whenever possible, we coordinate with a biostatistician to insure that a particular experiment is adequately powered. If needed, we make representative data available for the power analysis or identify data sources described in the literature that can be used for this purpose. There are many papers on sample size and power analysis in microarray studies, and our advising statisticians (Dr. Tomi Mori and Dr. Shannon McWeeney) provided these papers as a good examples: http://www.ncbi.nlm.nih.gov/pubmed/15845654 and http://www.ncbi.nlm.nih.gov/pubmed/22204525 (see Publications section for full citations).

Dr. McWeeney, Head of our Division of Bioinformatics and Computational Biology, reminds collaborators and investigators that for RNA-seq experiments, we need to not only consider the number of samples, but also the coverage and read depth (depending on the question being addressed).

— Chris Harrington

To some extent this has to be determined empirically for every experiment because you need some measure of the variance of your system before you can accurately determine the power that you require for your study. That being said many researchers using microarrays and next-generation sequencing are limited in the number of replicates they can use simply by the cost of the experiments. Many expression studies therefore end up using three to four replicates.

— Craig Praul

As a standard procedure we do not perform any power estimates. The reason for this is that power estimates would need as input the sample-to-sample variation, which we usually do not know before having carried out the study. While the available financial resources of the research groups in most cases limit the experimental design options, we suggest as a general rule a minimum number of replicates per analysis: three replicates for cell lines or animal studies under well controlled conditions; five replicates for human samples and in general for observational studies. If the study involves small amounts of tissue, we advise to double the number of replicates, if the expected effect is suspected to be small we suggest multiplying the number of replicates again by a factor of 2.

— Ralph Schlapbach

Q4: What bioinformatic tools do you generally turn to for analyzing gene expression data?

* QC of reads using in house scripts
* Mapping to the reference sequence (Bowtie 2)
* Novel splicing detection (Cufflinks)
* Transcript quantification (in FPKM - fragments per kilobase of exon per million fragments mapped)
* Differential expression analysis incl. false discovery rate (edgeR, SAMR, cuffdiff)
* Coverage maps of selected genes or transcripts
* Gene Ontology and Pathway analysis
* Interactome analysis (gene networks)
* Clustering of differentially expressed genes or significantly represented gene sets

— Gary Hardiman

We use the equipment vendor-provided software for initial processing of the data from each of the expression profiling platforms we work with. Our core does not generally provide comprehensive data analysis, so after data delivery, many of our clients use open source software tools or collaborate with Bioinformatics faculty experienced in RNA-seq or microarray analysis who are generally using open source software (including the highly regarded Bioconductor framework) or their custom pipelines. In addition, they routinely utilize public knowledge bases like Pathway Commons for downstream annotation and visualization. The core does, however, in-license several commercial software packages that we make available to our clients. We have found that GeneSifter, Partek Genomics Suite, and MetaCore are relatively easy-to-use and meet the needs of a number of our clients, particularly for small, nonclinical studies. Within the core we use all three of these software tools for our internal projects.

— Chris Harrington

For microarray gene expression analysis we rely on R/Bioconductor. This framework provides dedicated packages for each step of the microarray analysis: preprocessing, normalization, differential expression computation and gene ontology analysis. The Bioconductor packages unite the characteristics of (a) supporting all array types from all manufacturers, (b) implementing state-of-the-art statistical models, (c) supporting reproducible research, or more specifically reproducible data analysis by the fact that all analyses can be triggered by human readable scripts, and (d) being open source. Specifically we use for preprocessing of Affymetrix data the Affy package. For all other array types we have written own pipelines that implement the preprocessing steps that are also available in the limma package. For differential expression analysis we use the limma package independent of the array type.

For the analysis of NGS transcriptome data we rely on tophat for mapping the reads. We assess the quality of the data by checking the reads for contaminants, evaluate the percentage of reads that are attributable to repeats and evaluate on top of this fragment size distributions, 3'-bias and coverage of splice junctions. We quantify isoform abundances using the RSEM software. For differential expression we rely on the Bioconductor package edgeR. We functionally annotate gene lists with GO categories, KEGG pathways, etc. using the goseq package. For an in-depth mining of associated pathways we recommend a commercial software product like GeneGo's MetaCore because their curated pathway database is more comprehensive and accurate than publicly available pathway resources.

In cases where a reference genome is not available, we use Trinity to assemble a reference
transcriptome from quality filtered and trimmed reads, followed by RSEM isoform quantification and edgeR differential expression analysis. The reference transcriptome is annotated with the best hit in the NCBI non-redundant protein database, GO categories, KEGG pathways, EC numbers, etc. using BLAST followed by BLAST2GO. We also annotate with PFAM using pfam_scan, and COG/KOG using RPSTBLASTN.

— Ralph Schlapbach

Q5: What approach do you take to normalize your data?

The goal of normalization is to eliminate systemic variation and allow appropriate data comparison across different samples. With sequencing RNA, each sample generates a discrete number of transcript counts and it is very important to adjust the counts for varying sequencing depth and other potential technical effects. We use primarily the total count normalization method for comparison across different samples for expression studies. This is a global procedure and only a single factor is used to scale the counts. With this method of normalization, the sum of the transcript counts of all the individual sample columns is calculated and a median sum is set as the reference. All the transcript counts are scaled relative to this reference.

— Gary Hardiman

In the core lab, we do not generally perform data normalization except as part of the standard processing of microarray data for QC and distribution. For microarrays, as part of this processing, we use the default normalization algorithms provided by the array vendor. For qPCR we have normalized data using both single reference genes and groups of reference genes. Again, this is where we facilitate collaborations with our Biostatistics and Bioinformatics faculty, who have expertise in these analyses.

— Chris Harrington

The choices for the normalization method are determined by the assumptions on the expression levels and the potential presence of biases in the data. The standard assumption is that within a study the majority of genes do not change in expression and that up- and down-regulation are symmetric. Given this assumption, the median or the mean of the logarithmic expression values are equivalently good choices. If the data exhibits signal-dependent biases then a signal-dependent normalization as implemented by the loess or quantile normalization methods is warranted. Many well-established processing methods have already built-in normalizations, for instance, the RMA summarization method for Affymetrix microarrays includes quantile normalization. In general we treat expression estimates the same way independent of whether they are generated by sequencing or by microarrays. However it has to be noted that count-based differential expression methods (e.g. DESeq, edgeR) expect unnormalized counts as input and compute the normalization factors internally. Here, we recommend specifying methods that rely on median or logarithmic mean statistics. We do not recommend using a method that relies on total counts because total counts are prone to be driven by a few ultra-high count genes only.

In the case of microarrays, it is also worth applying detection filtering in order to exclude probes that show near background values. This avoids statistical testing of genes with expression below the detection level.

— Ralph Schlapbach

Q6: How do you determine absolute expression levels?

Ideally absolute quantification should be done using a digital PCR method such as that provided by the BioRad QX100 instrument. In the absence of a digital instrument a more traditional approach can be employed where a standard curve based approach in utilized and the concentration of an unknown targets in a sample of interest is determined based on comparison to a known quantity.

— Gary Hardiman

Neither microarray nor next gen sequencing do provide absolute expression levels in terms of number of transcript copies per cell. As mentioned above, we use RSEM to compute for each
isoform its relative abundance in the sample that was sequenced. This approach assumes that the total number of RNA copies per cell is either constant, or that a change in the total number of RNA copies is not relevant. If these assumptions do not hold true, then we recommend sequencing with additional spike-ins to control for the amount of starting material.

— Ralph Schlapbach

Q7: How do you confirm your findings?

Expression data on targets of interest from RNA-seq and microarrays is confirmed by Q-PCR

— Gary Hardiman

In general, we do not do confirmation studies ourselves, but we encourage our clients to use RT-PCR or other molecular or staining methods for detecting individual RNAs or proteins to validate the expression differences detected in an expression profiling study. We find that PCR is the most common method of validation among our researchers.

— Chris Harrington

I believe that if researchers are careful to compare "apples to apples" they should find that real-time PCR, microarrays, and next-generation sequencing should yield similar results. As the MAQC study pointed out, you need to make sure though you are examining expression levels of the identical probes when comparing microarray experiments to real-time PCR. I would assume the same when comparing next-generation sequencing results to real-time PCR or microarrays; one must be careful to only compare reads covering the same regions as the probes. You must constantly be aware that differences in the expression at the gene level seen between platforms can be simply due to the fact that the platforms may be measuring different exons or collections of exons. Failure to take differential expression of splice variants into account when measuring "gene" expression certainly accounts for some level of discordance seen between platforms.

All that being said I would encourage anyone using next-generation sequencing or microarrays that discover what they believe to be differentially expressed genes to confirm those findings by analyzing large numbers of biological replicates using real-time PCR. Real-time PCR allows one to examine a much larger number of samples at a more reasonable cost than can be accomplished with the other technologies. Analyzing this much higher number of replicates will provide the researcher significantly more statistical confidence than can be obtained by analyzing the very small number of replicates usually examined by next-generation sequencing or microarrays.

— Craig Praul

We verify the consistency of our findings with existing knowledge on gene and protein interactions, known metabolic pathways and dependencies, as well as with pre-existing knowledge about individual factors and networks by the research groups. However consistency is not equivalent to confirmation. The ultimate confirmation is done by functional studies that are to be carried out by the research groups and which may lead to new analyses at the level of gene expression, ultimately leading to multiple cycles of hypothesis generation, data generation, and hypothesis validation.

— Ralph Schlapbach

List of resources

Publications

Churchill GA. (2004). Using ANOVA to analyze microarray data. Biotechniques. 37(2):173-5, 177.

Hu J, Zou F, Wright FA. (2005). Practical FDR-based sample size calculations in microarray experiments. Bioinformatics. 1;21(15):3264-72.

Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. (2011). Synthetic spike-in standards for RNA-seq experiments. Genome Research. 21(9):1543-51.

Jung SH and Young SS. (2012). Power and sample size calculation for microarray studies. Journal of Biopharmaceutical Statistics. 22(1):30-42.

Jung SH. (2005). Sample size for FDR-control in microarray data analysis. Bioinformatics. 21(14):3097-104.

Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. (2008). RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research. 18(9):1509-17.

Nilsson B, Håkansson P, Johansson M, Nelander S, Fioretos T. (2007). Threshold-free high-power methods for the ontological analysis of genome-wide gene-expression studies. Genome Biology. 8(5):R74.

Nookaew I, Papini M, Pornputtapong N, Scalcinati G, Fagerberg L, Uhlén M, Nielsen J. (2012). A comprehensive comparison of RNA-seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Research. 40(20):10084-97.

Okoniewski MJ, Leśniewska A, Szabelska A, Zyprych-Walczak J, Ryan M, Wachtel M, Morzy T, Schäfer B, Schlapbach R. (2012). Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage. Nucleic Acids Research. 40(9):e63.

Sudo H, Mizoguchi A, Kawauchi J, Akiyama H, Takizawa S. (2012). Use of non-amplified RNA samples for microarray analysis of gene expression. PLOS One. 7(2):e31397.

Trachtenberg AJ, Robert JH, Abdalla AE, Fraser A, He SY, Lacy JN, Rivas-Morello C, Truong A, Hardiman G, Ohno-Machado L, Liu F, Hovig E, Kuo WP. (2012). A primer on the current state of microarray technologies. Methods in Molecular Biology. 802:3-17.

Verdugo RA, Deschepper CF, Muñoz G, Pomp D, Churchill GA. (2009). Importance of randomization in microarray experimental designs with Illumina platforms. Nucleic Acids Research. 37(17):5610-8.