Clinical Focus Technical Guide

Table of Contents

Letter from the Editor
Index of Experts
Sample Prep: FFPE in the Clinic: Mark Bouzyk, Stephen Hewitt, Mohammad Ilyas, and Rolf Jaggi
Clinical qPCR: Greg Shipley and Jo Vandesompele
Multivariate Biomarkers and Molecular Diagnostics: Janine Erler
List of Resources

Download the PDF version here

Letter from the Editor

The bright days of summer are here, but at Genome Technology, school is still in session. However, there's no need to cram for finals because here at GT, we give you all the answers. From the tried and true — formalin-fixed paraffin-embedded tissue sample prep and clinical qPCR methods — to the new and challenging — multivariate biomarkers — we called on several experts from around the world to bring you their expertise and advice.

This guide contains tips for everything from the latest methods of nucleic acid extraction from FFPE samples to cutting-edge data standards for qPCR and the best ways to validate multivariate biomarkers. Our information comes straight from the source — our experts are pioneers in their field, and are still researching better and faster ways to extract samples and validate biomarkers. In our section on FFPE, you'll get the latest advice from researchers who are still working on sample extraction methods, while in the section on PCR, you'll read about data standards from the people who developed them. And in our multivariate biomarkers section, you'll see the opinions of a researcher who has written one of the latest papers on this developing field. If you really want to know about the best primer design software for qPCR or data mining software for multivariate biomarkers, the answers are in this guide.

— Christie Rizk

Index of Experts

Many thanks to our experts for taking the time to contribute to this technical guide, which would not be possible without them.

Mark Bouzyk
Emory University School of Medicine

Janine Erler
Institute of Cancer Research

Stephen Hewitt

Mohammad Ilyas
Queen's Medical Centre, Nottingham University

Rolf Jaggi
University of Bern

Greg Shipley
University of Texas Health Science Center

Jo Vandesompele
Ghent University

Sample Prep: FFPE in the Clinic: Mark Bouzyk, Stephen Hewitt, Mohammad Ilyas, and Rolf Jaggi

Genome Technology: Which method of RNA or DNA extraction yields the purest samples?

Mark Bouzyk: Over the last few years we have evaluated a number of commercial kits and further optimized [them] for higher yield and throughput. These kits, suitable for extracting DNA and/or RNA from FFPE samples, are from vendors, which include Ambion, Roche, and Qiagen. Some comparison data and protocols can be found in our publications, [see Resources, page 13] which led us to choosing Ambion as our preferred kit for nucleic acid extractions from FFPE, even though this method is column-based and the throughput is low to medium. We have recently had excellent experience and results in terms of yield, purity, and throughput using products from Omega Bio-Tek to isolate DNA from a variety of sources including whole blood, blood products, and saliva. It is a magnetic bead-based chemistry and we have automated the protocol on a Thermo KingFisher Flex magnetic particle processor robot, in 96-well format. We are now planning on evaluating Omega Bio-Tek Mag-Bind FFPE KF kit and include automation. The purified DNA should be compatible with a variety of downstream analyses, including real-time PCR and multiplexed genotyping.

Genome Technology: Which method of RNA or DNA extraction yields the purest samples?

Stephen Hewitt: The best method is the one that takes longest in time. We have worked with a number of in-house protocols as well as commercial kits. In general, we find that most protocols are adequate; however, we have a protocol [see Resources] that takes more than three days and provides a small margin of improvement in quantity. Especially with RNA, quantity is the best measure of quality. More is better. We do not go to this extreme in all circumstances, but it is an option if we need it.

GT: How do PCR protocols that can be used without the need for DNA extraction or paraffin removal compare with older methods?

SH: These new extraction-free protocols are impressive and for many applications are more than adequate. DNA in FFPE is generally stable and of reasonable quality. That does not mean it is not without nicks and other alterations that may show up with sequencing; however, it can be easily amplified. Many of these protocols use heat and detergents to mediate sufficient de-paraffination for PCR. It becomes a simple issue of sufficient template for the polymerase. Long amplicons are certainly harder.

GT: What's the best method for getting around spurious copy-number changes in FFPE tissue samples in array-CGH profiles?

SH: We are still trying to get to the bottom of this issue ourselves. Our general finding is that starting with more DNA is the easiest approach. Our model is that the DNA has undergone some level of nicking and modification, and if you provide more template, you are more likely to get even and consistent amplification.

Genome Technology: Which method of RNA or DNA extraction yields the purest samples?

Mohammad Ilyas: In our experience, the best yield of both RNA and DNA comes from the special commercially available columns. The amount of nucleic acid available from FFPE tissue is usually limited (since some of the original tissue, if coming from a clinical case, should be retained for the purposes of diagnostic records). The quality of nucleic acid is also limited after formalin fixation and processing into paraffin. The key is to ensure that the material is fully digested by proteinase K; usually overnight digestion is sufficient although occasionally it may require a longer period (especially for mesenchyme-rich tissue). Following that, it is important to ensure that all the residual debris is spun down prior to taking off the supernatant, which can then be run down the column. Whilst nucleic acid precipitation is an acceptable method, it is more time-consuming and is more likely to have contaminants and a high salt concentration, both of which may compromise downstream applications. A number of columns are available which are specifically designed for nucleic acid extraction from FFPE tissue. They are from a number of different sources and our experience is that they are all pretty similar in their performance and it boils down to individual preference. Good sample preparation and adherence to protocols will give a good acceptable yield in most cases. However, to repeat a hackneyed truism: it is all about experimental design. DNA derived from FFPE tissue, even of the highest quality, will have an upper limit of 300 to 400 base pairs after which the PCR will become unreliable. These are considerations which need to be taken into account when designing experiments using FFPE tissue.

GT: How do PCR protocols that can be used without the need for DNA extraction or paraffin removal compare with older methods?

MI: This question can only be answered bearing in mind two considerations:

(a) How robust is the PCR reaction? Without doubt FFPE tissue can be incubated with proteinase K without de-waxing and the resulting digest can be used directly for PCR (after inactivation of the proteinase K). This has some advantages inasmuch as the melted wax floats to the surface and forms a seal for the digestion reaction. However, such a crude preparation will inevitably contain contaminants and (notwithstanding the commercially available reagents which purportedly bind the contaminants) will be more prone to PCR inhibition. If the proposed PCR is robust, it [will] not be affected by minor variations in template quality and the experiments will be successful. If, however, the PCR requires more stringent conditions, there is a much greater likelihood that template variation will interfere with the efficiency of the PCR and may result in PCR failure in some samples. Thus savings in time may be offset by reduction in the data set and the more PCR tests that are planned, the greater the likelihood that one of them will be fickle and prone to failure. If quantitative PCR is being considered, we would recommend that it is definitely not performed on crude DNA template.

(b) What are the planned downstream applications? PCR with low-quality template is more likely to generate PCR artifacts such as non-specific products. If there are plans to perform other procedures with the PCR product, then some may be highly susceptible to data corruption by these artifacts. Methods such as high-resolution melting, single-strand conformation polymorphism, denaturing high-performance liquid chromatography are very stringent methods, which will pick up PCR artifacts and give false positive signals and generate poor data. Similarly, if there are plans to clone PCR products, then the presence of non-specific products will inevitably mean more colonies have to be screened. Other reactions (such as bisulfite modification for methylation-specific PCR) may be inhibited by the contaminants in the template. Again, the most important thing is to understand the purpose of the experiment and design it appropriately. As a general rule, the more pure the DNA template, the better and more reliable the PCR data. Time can be saved by not doing formal DNA extraction, but this may be lost by having to repeat experiments or having a reduced sample size from PCR failures.

GT: What's the best method for getting around spurious copy-number changes in FFPE tissue samples in array CGH profiles?

MI: We don't have much experience in this but from first principles we would say that
(a) ensure the highest quality of template for the whole genome amplification step and
(b) validate your data using another method.

Genome Technology: Which method of RNA or DNA extraction yields the purest samples?

Rolf Jaggi: We did not perform very systematic comparisons related to this issue but the impression I get from various users is that purity is not a major issue when they use their RNA for regular applications like cloning or real-time PCR. All the reagents that we use in the laboratory give highquality RNA (E260/280 close to 2, regular spectrum between 220 nm and 320 nm). We have not used phenol/chloroform-based methods (e.g. Trizol) for many years.

GT: How do PCR protocols that can be used without the need for DNA extraction or paraffin removal compare with older methods?

RJ: We regularly use RNA and DNA isolated from FFPE material and we have invested a considerable effort to optimize RNA extraction with regard to recovery and quality. We found that the quality of the RNA can be improved considerably by adding a separate incubation at 50ºC to 90ºC in a special buffer. This incubation apparently eliminates part of the chemical cross-links in RNA molecules, which are a result of fixation with formalin; thereby the efficiency of reverse transcription is improved. We usually use valuable and unique patient material and, therefore, we do not accept any steps that may reduce quantity and/or quality. The protocol and reagents which regularly give optimal results are commercially available from AmpTec, a German company that specializes in RNA amplification technologies

Clinical qPCR: Greg Shipley and Jo Vandesompele

Genome Technology: How do you work around the lack of proper normalization techniques?

Greg Shipley: The reason there is no standardized normalization method is that if you use one or more transcripts to normalize transcript data, the ones you choose depend on what effect the experimental conditions have on the target cells, tissue, or organism. Thus, it is not possible to standardize on a single or small group of transcripts for all experimental conditions. It's one of the conundrums of real-time qPCR. For a single treatment, single target, and specific set of target transcripts or genes, it would be possible to set up a specific standardization set of transcripts. But this only works when you have a repetitive experiment and assay. Thus, for the clinic, this would be how it would work for a clinical test. But for research, we have to find the best transcripts, or use total RNA, or 18S rRNA. Or in some cases you have to measure total RNA using a fluorescent assay separate from real-time qPCR. When folks come to talk with me about a new project, when I ask, "What are you going to use to normalize for loading?" the invariable answer is, "What do you suggest or what does everyone else do?" Then I have to tell them there is no answer to their question. If someone has run a microarray, they can look there for transcripts that did not change. Or they can check the literature if someone has reported a transcript from an experiment with identical conditions. However, that has lots of risk as many folks use GAPDH because that's what folks used for northerns. But guess what? We can really quantify things now and GAPDH hops around at the slightest perturbation of the cell. It's on a case-by-case basis.

GT: Do you use real-time PCR data markup language, and how does it compare to other PCR data standards?

GS: This is a standard data export that was suggested by Jo Vandesompele and Jan Hellemans for all vendors. If every instrument made it possible to do an RDML export of their data, it would be much easier for the end user to import their data into downstream software packages for data analysis. However, that is not possible now for all instrument software. One company told me that RDML did not allow the export of all the features their software can bring to the data set so they were not going to implement it. I see their point, but I also think they are being shortsighted. So, the answer is I'm not sure anyone is using RDML, as it hasn't been universally implemented yet.

GT: What kind of primer design software do you use, and why?

GS: I use Beacon Designer and AlleleID from Premier Biosoft. The reasons are: 1) Both programs work well. The designs I have received from the programs work well empirically when tested in the lab. And 2) I am a consultant for Premier Biosoft and so am able to use the programs for free. I receive no other remuneration from them and suggest improvements to the software or point out bugs that I find. Lest you think this association is unduly influencing my comments, I can tell you as a core lab director I cannot afford to have bad primers and probe sets being purchased on a regular basis. We also purchase a long oligo for each assay spanning the longest PCR amplicon so we have a sizable investment in each assay that we order. We make a lot of new assays here. If the software didn't work well, I wouldn't use it and that's that. I will say that I used Primer Express in my early days when it was still running on the Mac. I ran the old Mac software long after it migrated to the PC, as the PC software didn't work as well. I learned to compensate for the shortcomings in that software, which had some excellent features, and made good assays for years. However, since I'm not a PC user (although I run XP on my Mac) I preferred software that runs under the Mac OS. Both of these programs will work with either OS X or Windows. Since I have 14 years of probe-based assay design experience, I know how to choose optimal primer pairs. I always order four primers around a probe (same for UPL or SYBR assays) and try them in all four possible combinations. The highest-rated primer pair is not always the one that works best empirically. I'm not sure the software will ever completely replace the empirical test, but I don't want to say never. With the low cost of primers these days, the added cost is negligible compared to the benefits of knowing you have the best assay possible and the only way to know that is through empirical testing. One feature of the Premier Biosoft software packages that sets them apart from the pack is the ability to run an m-fold analysis from within the software (goes out to the m-fold server of course) and then can use that information to make assays that avoid strong stem structures. This is a wonderful feature and extremely helpful.

Genome Technology: How do you work around the lack of proper normalization techniques?

Jo Vandesompele: The gold standard and universally applicable normalization method is the use of multiple, stably expressed reference genes. Such reference genes are typically identified in a pilot experiment on a representative set of samples in which a list of candidate reference genes is measured. This pilot data set is then analyzed using a dedicated algorithm (such as geNorm built into qbasePLUS, Hellemans et al., Genome Biology, 2007; [see Resources]). We and others could demonstrate that such a normalization strategy can better reduce the experimental noise and yields statistically more significant results and is able to reliably detect small expression differences.

GT: Do you use real-time PCR data markup language and how does it compare to other PCR data standards?

JV: As co-author on the RDML paper (Lefever, Hellemans et al., Nucleic Acids Research, 2009 [see Resources]), I'm a strong advocate of making your realtime PCR publicly available (e.g. as supplemental data file with your publication). The RDML data format is really the perfect instrument to do so. Various leading qPCR instrument companies are acknowledging this and currently making their software RDML compatible. In addition to the online RDML generator (, third party qPCR data-analysis software is also becoming RDML compatible. Biogazelle's qbase- PLUS software is the first of its kind to be fully RDML compatible (import and export of qPCR data and run annotation information).

GT: What kind of primer design software do you use, and why?

JV: There are many primer design tools available, both free and commercial, and many do a good job. Most importantly is that state-of-the-art bioinformatic tools are used to predict the performance of a qPCR assay, in terms of absolute specificity and absence of sequence polymorphism and secondary structures in the primer annealing regions. While such in silico quality control tools are separately available online (e.g. UNAfold, BiSearch, dbSNP [see Resources]) we have integrated these concepts in our own primer design pipeline primerXL. This not only results in a high success rate, but also enables high-throughput designs.

Multivariate Biomarkers and Molecular Diagnostics: Janine Erler

Genome Technology: What is the best method to validate a multivariate biomarker, and how does it differ from the validation of a univariate biomarker?

Janine Erler: A multivariate biomarker is best validated through repeated testing under multiple conditions. For example, the activation status of a protein should be investigated in multiple physiologically relevant conditions. It differs from a univariate biomarker because it reflects the network state of the cell or disease. Univariate biomarkers are a snapshot of a situation and do not really reflect what is going on in the system.

GT: What kind of data-mining software do you use for multivariate discovery and why?

JE: We mostly use mass spectrometry as we are interested in dynamic protein interactions because these are what determine cell behavior, phenotype, or disease. Mass spectrometry is an extremely sensitive technique. We tend to focus on phosphoproteomic data, and analyze this with NetPhorest and NetworKIN algorithms developed by our collaborator, Rune Linding at the Institute of Cancer Research [in London]. These allow assignment of probabilities to protein-protein events. However, we always integrate these data with behavioral and phenotype data. When analyzing microarray data, we collaborate with Monica Nicolau of Stanford University and use a DSGA algorithm she has developed. This allows disease-specific analysis of data.

GT: What type of analysis do you use to evaluate the stability of candidate multivariate biomarkers?

JE: We will attempt to look at the protein half-life to have an idea of turnover. However, through network biology studies, we will be able to determine stability as a biomarker.

List of Resources

If you need more information, here are some resources that can help answer your questions.


McSherry EA, McGoldrick A, Kay EW, Hopkins AM, Gallagher WM, Dervan PA. (2007). Formalin-fixed paraffin-embedded clinical tissues show spurious copy number changes in array-CGH profiles. Clinical Genetics. 72(5): 441-447.

Abramovitz M, Ordanic-Kodani M, Wang Y,Li Z, Catzavelos C, Bouzyk M, Sledge GW Jr, Moreno S, Leyland-Jones B. (2008). Optimization of RNA extraction from FFPE tissues for expression profiling in the DASL assay. BioTechniques. 44(3):417-23.

Tang W, David F, Wilson M, Barwick B, Leyland-Jones B, Bouzyk M. (2009). DNA extraction from formalin-fixed paraffin-embedded tissue. CSH Protocols. 4: 1-5.

Chung JY, Braunschweig T, Hewitt SM. (2006). Optimization of recovery of RNA from formalin-fixed, paraffin-embedded tissue. Diagnostic Molecular Pathology. 15(4):229-36.

Johnson NA, Hamoudi RA, Ichimura K, Liu L, Pearson DM, Collins VP, Du MQ. (2006). Application of array CGH on archival formalin-fixed paraffin-embedded tissues including small numbers of microdissected cells. Laboratory Investigation. 86(9):968-78.

Kristensen LS, Wojdacz TK, Thestrup BB, Wiuf C, Hager H, Hansen LL. (2009). Quality assessment of DNA derived from up to 30 years old formalin fixed paraffin embedded (FFPE) tissue for PCR-based methylation analysis using SMART-MSP and MS-HRM. BMC Cancer. 9:453.

Gnanapragasam VJ. (2010). Unlocking the molecular archive: the emerging use of formalin-fixed paraffin-embedded tissue for biomarker research in urological cancer. BJU International. 105(2):274-8.

Little SE, Vuononvirta R, Reis-Filho JS, Natrajan R, Iravani M, Fenwick K, Mackay A, Ashworth A, Pritchard-Jones K, Jones C. (2006). Array CGH using whole genome amplification of fresh-frozen and formalin-fixed, paraffin-embedded tumor DNA. Genomics. 87(2):298-306.

Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J. (2007). qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biology. 8(2): R19.

Lefever S, Hellemans J, Pattyn F, Przybylski DR, Taylor C, Geurts R, Untergasser A, Vandesompele J; RDML consortium.(2009). RDML : structured language and reporting guidelines for real-time quantitative PCR data. Nucleic Acids Research. 37(7):2065-9.

Taylor S, Wakem M, Dijkman G, Alsarraj M, Nguyen M. (2010). A practical approach to RT-qPCR-Publishing data that conform to the MIQE guidelines. Methods. 50(4):S1-5.

Bustin SA. (2010). Why the need for qPCR publication guidelines? The case for MIQE. Methods. 50(4):217-26.

VanGuilder HD, Vrana KE, Freeman WM. (2008). Twenty-five years of quantitative PCR for gene expression analysis. BioTechniques. 44(5):619-26.

Rouleau E, Lefol C, Bourdon V, Coulet F, Noguchi T, Soubrier F, Bièche I, Olschwang S, Sobol H, Lidereau R. (2009). Quantitative PCR high-resolution melting (qPCRHRM) curve analysis, a new approach to simultaneously screen point mutations and large rearrangements: application to MLH1 germline mutations in Lynch syndrome. Human Mutation. 0(6):867-75.

Erler JT, Linding R. (2010). Network-based drugs and biomarkers. Journal of Pathology. 220: 290-206.

Ptitsyn AA, Weil MM, Thamm DH. (2008). Systems biology approach to identification of biomarkers for metastatic progression in cancer. BMC Bioinformatics. 12;9 Suppl 9:S8.

Pal NR, Aguan K, Sharma A, Amari S. (2007). Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering. BMC Bioinformatics. 6;8:5.

De Coen WM, Janssen CR. (2003). A multivariate biomarker-based model predicting population-level responses of Daphnia magna. Environmental Toxicology and Chemistry. 22(9):2195-201.

Habeck C, Foster NL, Perneczky R, Kurz A, Alexopoulos P, Koeppe RA, Drzezga A, Stern Y. (2008). Multivariate and univariate neuroimaging biomarkers of Alzheimer's disease. NeuroImage. 40(4):1503-15.

Web Tools

Qbase Plus PCR algorithm

Real time PCR data markup language generator

PCR primer design quality control tools


Following MIQE Recommendations
Jul 7-9 / Heidelberg, Germany

Advances in qPCR
Sept 14-15 / Dublin
Select Biosciences

European Biomarkers Summit
Nov 9-10 / Florence, Italy
Select Biosciences