Cancer Research Technical Guide

Table of Contents

Letter from the Editor
Index of Experts
Cancer Sequencing: Elaine Mardis
MicroRNAs in Cancer: Ondrej Slabý and Muneesh Tewari
Methylation in Cancer: Irma Russo, Sandra Fernandez, Lifang Hou, and Simon Lin
Cancer of Unknown Primary: F. Anthony Greco and David Bowtell
List of Resources

Download the PDF version here

Letter from the Editor

Cancer is tricky. But equipped with a vast arsenal of tools, investigators are on the offensive against the duplicitous disease. Still, even the most robust cancer research techniques come with their own sets of challenges. From sequencing to interrogating microRNAs, this technical guide aims to give you a fresh look at what a number of cancer researchers are doing to improve their day-to-day results at the bench in an effort to supply optimal diagnostics and care at the bedside.

The following pages contain tips for genome sequence analysis, optimizing qRT-PCR procedures for the investigation of miRNAs, the merits of various methylation interrogation techniques, isolating RNA from paraffin-embedded tissues, and what to do when gene expression microarray data is inconsistent with the clinical presentation in diagnosing cancers of unknown primary site. Be sure to consult the list of resources at the end of this guide for citations to the methods and research papers our experts have referred to in their responses.

— Tracy Vence

Index of Experts

Many thanks to our experts for taking the time to contribute to this technical guide, which would not be possible without them.

David Bowtell
Peter Mac Callum Cancer Centre

Sandra Fernandez
Fox Chase Cancer Center

F. Anthony Greco
Sarah Cannon Cancer Center

Lifang Hou
Northwestern University Feinberg School of Medicine

Simon Lin
Northwestern University Bioinformatics Consulting Core Facility

Elaine Mardis
Washington University in St. Louis School of Medicine

Irma Russo
Fox Chase Cancer Center

Ondrej Slabý
Masaryk Memorial Cancer Institute

Muneesh Tewari
Fred Hutchinson Cancer Research Center

Cancer Sequencing: Elaine Mardis

Genome Technology: Which sequencing platform do you use for your studies, and why?

Elaine Mardis: We use the Illumina GAIIx system for our initial sequencing data production. At present, we are using 2 × 100 basepair paired-end reads in the 50G configuration. We typically construct two libraries with size fractions that differ by about 100 basepairs for each genomic DNA sample. Approximately equal numbers of flow cell lanes are generated from each library. We use the Roche/454 Titanium platform to validate our predicted somatic point mutations and insertions/deletions, following site-specific amplification (PCR) of each locus in the tumor and normal DNA from the patient.

GT: How do you account for, and resolve, artifacts in your sequence?

EM: Although our library methods are tuned to reduce duplication sequences — which essentially are due to PCR bias in the amplification steps of the library construction process — we still do see a low percentage of reads that appear to be true duplicates. These are detected after alignment, and we reduce the coverage by that fragment to one representative read pair that remains aligned to the genome reference. The other "artifact" is that we experience lower-than-average coverage in regions of the genome that have less than 95 percent G+C or A+T content. Unfortunately, there's not much we can do about this. It is a known representation issue in the Illumina system that we originally reported when we first re-sequenced the C. elegans type strain in a Nature Methods paper in 2008.

GT: Which tools do you use for sequence analyses?

EM: For alignment, most of our analysis pipelines utilize the BWA aligner. For SNV detection in tumor genomes, we use our own glfSomatic algorithm. We have modified the SAMTools indel detection software parameters for indel detection, but we also are investigating Pindel and we like its performance very much. For detecting structural variants, we use the BreakDancer algorithm that Ken Chen published recently. Once we have identified the putative structural variants, we have a filtering process that lowers the false positive rate of detection caused largely by the repetitive content of the genome. We further investigate structural variants by a localized assembly of the reads that identify each one. Here, we've developed the Tigra assembler (Lei Chen and Ken Chen co-developed Tigra, not yet published) to perform these assemblies. We use the assembled sequence for each structural variant to design PCR primers that are used to validate the variant region (or not validate it). We find that if, after all the filtering steps, a good assembly of a suspect structurally altered region can be obtained, and then it typically will validate.

MicroRNAs in Cancer: Ondrej Slabý and Muneesh Tewari

Genome Technology: How do you normalize experimental qRT-PCR data to generate candidate miRNA markers?

Ondrej Slabý: We use two standard approaches to get normalized miRNA qRT-PCR data. First, RNA amount and quality in our samples are quantified to ensure equivalent sample loading using Agilent 2100 Bioanalyzer. According to ABI's recommendations, all samples are diluted to a final concentration of 2 ng/μl before proceeding to miRNA-specific-RT. Despite this, we use endogenous controls to correct for potential RNA input or RT efficiency biases. Our projects focus on miRNA significance in solid cancers; in each study we usually work with one type of tissue for which stable endogenous control can be identified. Therefore, we perform an endogenous control selection. According to our experiences, we start with three controls — RNU6B, RNU44, RNU48 — and select the assay with the lowest variability in the particular tissue or the highest stability under different treatments in the case of in vitro experiments. Consequently, we use the ΔΔCt relative quantitation method.

GT: How can you optimize your approach to obtain sensitive and specific miRNA detection?

OS: We use exclusively Applied Biosystems pre-designed and well-validated TaqMan assays to measure mature miRNAs. Although it is more expensive, we use ABI's TaqMan assay because it's more specific than SYBR green. We use the ABI 7900HT Real-Time PCR Instrument. Cts in the twenties are good for mature miRNA detection, whereas Ct values of 25 to 30 are typical. When you use TaqMan probes, higher Cts are also acceptable, because they are usually connected with very little background or noise. ABI's approach is based on the use of stem-loop-structured primers for miRNA transcription, which is specific for mature miRNAs only. It is not transcribing pri- or pre-microRNA at all and, subsequently, produces no interference.

GT: What steps do you take to ensure data reproducibility?

OS: We practice qRT-PCR assays in 96-well plates in the case of individual miRNA expression assays and TaqMan Human MicroRNA Set Cards v2.0 (754 miRNAs) for miRNA expression profiling. To get robust and reproducible
data, we perform three technical replicates and three experimental replicates per treatment group. No-template and no-primer controls are included.

Genome Technology: How do you normalize experimental qRT-PCR data to generate candidate miRNA markers?

Muneesh Tewari: For miRNA analysis of RNA from cells or tissues, we use one or more snoRNAs. For miRNA analysis from plasma or serum, we do not have much confidence in specific miRNAs as endogenous controls because little is known about miRNA variability in these kinds of samples from person to person (or even from time to time in the same person), nor is much known about what specific factors cause variability and which miRNAs are specifically affected. Therefore, we just spike-in a set of three nonhuman synthetic miRNAs from C. elegans that permits us to adjust for variations in RNA recovery from plasma/serum. RNA yields from a few hundred microliters of plasma or serum are typically not quantifiable spectrophotometrically, so using the spiked-in internal normalizers is useful. This helps correct for variability due to differences in RNA isolation efficiency or the presence of endogenous PCR inhibitors, but doesn't correct for inherent biological variability, which is poorly understood.

GT: How can you optimize your approach to obtain sensitive and specific miRNA detection?

MT: Use large reaction volumes so you can input the greatest amount of RNA possible, use TaqMan probe-based approaches if specificity is critical, and run dilutions of the RNA sample to be certain that measurements obtained are in a linear range of quantification.

GT: What steps do you take to ensure data reproducibility?

MT: This is done primarily by running replicates within an experiment and repeating the entire experiment in an independent run.

Methylation in Cancer: Irma Russo, Sandra Fernandez, Lifang Hou, and Simon Lin

Genome Technology: Which techniques do you use for genome-wide screenings of methylation status? Why?

Irma Russo, Sandra Fernandez: Initially, we analyzed the methylation status of CpG islands the estrogen receptor in bisulfite-treated genomic DNA that was amplified by PCR and analyzed by pyrosequencing using the PSQ 96MA from Qiagen. We found that pyrosequencing was excellent at direct quantitative sequencing; however, the method was limited by sequence read length. Therefore, we could assess only a few CpGs in any one pyrosequencing reaction. This was a major drawback that made the technique expensive, time consuming, and too limited for testing multiple tissues and experimental conditions as well as clinical specimens. To address the need for performing genome-wide screening, we established a collaboration for performing analysis using restriction landmark genomic scanning. This method provides a quantitative assessment of thousands of CpG island cytosine methylation in a single gel: genomic DNA is digested with restriction enzymes that cannot cleave methylated sites present within CpG islands, such as NotI or AscI; the cleaved ends are radiolabeled, digested with a second restriction enzyme, electrophoresed through an agarose tube-shaped gel and then digested by a third, more frequently cutting, restriction enzyme, and electrophoresed, in a direction perpendicular to the first separation, through a non-denaturing polyacrylamide gel that is autoradiographed, using radiolabeled NotI or AscI sites as landmarks. The resulting restriction landmark genomic scanning profile displays both the copy number and methylation status of the CpG islands. We analyze RLGS fragments using a computer software program having automated spot detection algorithms that include Conime. We validate results obtained by RLGS by testing the same specimens using methylation-specific PCR. This technique is sensitive and specific for methylation of virtually any block of CpG sites in a CpG island and it also distinguishes DNA incompletely reacted with bisulfite because marked sequence differences exist between this and the unmodified DNA. Methylation-specific PCR requires very small quantities of DNA, is sensitive to 0.1 percent methylated alleles of a given CpG island locus, and can be performed in DNA extracted from paraffin-embedded samples. Although MSP is a simple and rapid method for determining the methylation pattern of a particular locus, it only allowed us to study a limited number of genes. For profiling the levels of DNA methylation at genome-wide scale, we utilize methylated DNA immunoprecipitation with DNA chip technology. The mDIP-Chip method identifies DNA methylation in gene promoters, CpG islands, introns, and exons, and intergenic regions through enrichment of methylated DNA fragments by immunoprecipitation with a monoclonal antibody against 5-methyl-cytosine. We've used this technique for studying DNA methylation changes in human breast epithelial cells treated with the xenoestrogen bisphenol A and in different breast lesions. We have also isolated the methylated double-stranded DNA via binding to the methyl-CpG binding domain of human MBD2 protein. The high affinity of MBD2 protein for CpG-methylated DNA provides greater sensitivity than the antibody. We use sonication to fragment the DNA (150 to 500 basepairs) and isolate the methylated DNA via binding MBD2 protein. We amplify methylated fragments using the GenomePlex Whole Genome Amplification kit from Sigma, and then hybridize to the Human promoter 1.0R Array from Affymetrix. We analyze our data using CisGenome, and validate genes identified to be methylated using MSP as well as by evaluating mRNA gene expression levels using the Affymetrix U122 plus2 chip.

Genome Technology: Which techniques do you use for genome-wide screenings of methylation status? Why?

Lifang Hou: We use the Illumina HM27 BeadChip. The HM27 chip has high reproducibility in technical replicates. In the interest of single-site resolution, in contrast to tiling arrays used in other platforms, the Infinium is the only microarray technology that can quantitatively measure DNA methylation at single CpG sites. It thus allows direct comparison with pyrosequencing, which also has single site resolution for the further verification/validation. The HM27 BeadChip entails pre-microarray processing of DNA samples through a bisulfite treatment which is better standardized, less laborious, less expensive, and low sample requirements (500 ng of DNA), enabling analysis from limited DNA sources and has low microarray costs. Therefore, it is also suitable for largescale population studies.

GT: Which assay types do you employ in your studies, and how do you optimize them for PCR?

LH: HM27 offers the opportunity to run 12 methylation arrays in tandem with 12 mRNA expression arrays (using HT-12). As such, epigenetic profiles can be measured and interpreted together with genetic profiles. Given the relatively low price for both product lines, we think that it will become a popular test.

GT: Which analytical tools do you use to map the methylome?

Simon Lin: Currently, we use the Methylumi package in Bioconductor to load the Illumina methylation array data. Bioconductor is an opensource, versatile toolset for computational and statistical data analysis. Each package in Bioconductor is like a Lego piece; the user can creatively put them together to form a customer data analysis pipeline. The Methylumi package is, however, outdated. It was originally designed for the GoldenGate-based assays. Our group finds that the Methylumi package can robustly handle Infinium-based data, too. As an opensource package, we are able to make modifications of Methylumi easily and bring it up to date. These modifications will be shared with the Bioconductor user community. The real challenge is to interpret the methylome data in the context of the transcriptome. Gene Ontology can be a focal point for both data integration and biological interpretation.

Cancer of Unknown Primary: F. Anthony Greco and David Bowtell

Genome Technology: How do you best isolate RNA from FFPE tissues for gene expression microarray profiling?

F. Anthony Greco: I most often have used the RT-PCR platform, wherein unstained, or "blank" slides, are made from the paraffin-embedded block. A pathologist reviews it and then the tumor is isolated and deparaffinized, scraped from the adjacent, unstained sections, and incubated with proteinase K overnight. This is a standard procedure for this type of RT-PCR assay. The mRNA is extracted and amplified and analyzed. I send the biopsy specimens to a laboratory specializing in these assays. RT-PCR, in my opinion, is much simpler and easier.

GT: What should you do when the gene expression microarray-predicted foster primary is inconsistent with the clinical scenario?

AG: First of all, primary site predictions based on gene expression profiling in patients with cancer of unknown primary site, is very early in development. These tests are not entirely validated yet, but look promising. These GEMs and the RT-PCR are used in concert with clinical features and standard pathology, particularly immunohistochemical markers or stains. All of this information needs to be considered together: if a gene expression profiling test told me one thing, and it was not consistent with the clinical picture and the other pathology, most of the time I would think that the gene expression assay may not be correct. There are exceptions. Sometimes we verify that the gene expression assay prediction is likely correct by doing additional immunohistochemical marker stains that were not done initially. All three are used together including specialized pathology, mainly immunohistochemical markers, the clinical setting and features, and lastly the molecular assay or gene-expression assay should be used in concert to help decide the primary tumor site. Immunohistochemical marker stains are standard. The gene expression profiling assays are looking at genes rather than proteins, but it's the same principle. One can look at many genes with just one molecular assay, whereas with immunohistochemical markers one is looking at single proteins for each antibody.

Genome Technology: What should you do when the gene expression microarray-predicted foster primary is inconsistent with the clinical scenario?

David Bowtell: It depends on the confidence in the clinical information and the known accuracy of the test for the specific classification. It may be that the prediction is extremely unlikely. For example, prostate cancer in a woman would be extremely improbable. The next most certain clinical situation is where the test may be ordered to clarify whether the cancer is one of a restricted number (two or three) of possibilities. After that, given we are dealing with CUP, I'd say that any clinical information will be shaky and that the clinical and GEM have to be evaluated on their relative merits. Given the state of development of these assays, they should be there to guide further, hopefully confirmatory, investigations rather than being treated as a final word.

List of Resources

Sometimes you need more information. Here are more sources that may help you answer your cancer research questions.


Calvanese V, Horrillo A, Hmadcha A, Suarez-Alvarez B, Fernandez AF, Lara E, Casado S, Menendez P, Bueno C, Garcia-Castro J, Rubio R, Lapunzina P, Alaminos M, Borghese L, Terstegge S, Harrison NJ, Moore HD, Brüstle O, Lopez-Larrea C, Andrews PW, Soria B, Esteller M, Fraga MF. (2008). Cancer genes hypermethylated in human embryonic stem cells. PLoS One. 3(9): 3294.

Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang Q, Locke DP, Shi X, Fulton RS, Ley TJ, Wilson RK, Ding L, Mardis ER. (2009). BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods. 6(9): 677-81.

Fernandez SV, Snider KE, Wu YZ, Russo IH, Plass C, Russo J. (2010). DNA methylation changes in a human cell model of breast cancer progression. Mutation Research. Epub ahead of print.

Herman JG, Graff JR, Myöhänen S, Nelkin BD, and Baylin SB. (1996). Methylation-specific PCR: A novel PCR assay for methylation status of CpG islands. Proceedings of the National Academy of Sciences. 93: 9821-6.

Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M, Huang W, Magrini VJ, Richt RJ, Sander SN, Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK, Mardis ER. (2008). Whole-genome sequencing and variant discovery in C. elegans. Nature Methods. 5(2): 183-8.

Kroh EM, Parkin RK, Mitchell PS, Tewari M. (2010). Analysis of circulating microRNA biomarkers in plasma and serum using quantitative reverse transcription-PCR (qRT-PCR). Methods. 50(4):298-301.

Linsen SE, de Wit E, Janssens G, Heater S, Chapman L, Parkin RK, Fritz B, Wyman SK, de Bruijn E, Voest EE, Kuersten S, Tewari M, Cuppen E. (2009). Limitations and possibilites of small RNA digital gene expression profiling. Nature Methods. 6(7): 474-6.

Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, Fulton RS, Delehaunty KD, McGrath SD, Fulton LA, Locke DP, Magrini VJ, Abbott RM, Vickery TL, Reed JS, Robinson JS, Wylie T, Smith SM, Carmichael L, Eldred JM, Harris CC, Walker J, Peck JB, Du F, Dukes AF, Sanderson GE, Brummett AM, Clark E, McMichael JF, Meyer RJ, Schindler JK, Pohl CS, Wallis JW, Shi X, Lin L, Schmidt H, Tang Y, Haipek C, Wiechert ME, Ivy JV, Kalicki J, Elliott G, Ries RE, Payton JE, Westervelt P, Tomasson MH, Watson MA, Baty J, Heath S, Shannon WD, Nagarajan R, Link DC, Walter MJ, Graubert TA, DiPersio JF, Wilson RK, Ley TJ. (2009). Recurring mutations found by sequencing an acute myeloid leukemia genome. New England Journal of Medicine. 361(11): 1058-66.

Slaby O, Svoboda M, Michalek J, Vyzula R. (2009). MicroRNAs in colorectal cancer: translation of molecular biology into clinical application. Molecular Cancer. 14(8):102.

Tothill RW, Kowalczyk A, Rischin D, Bousioutas A, Haviv I, van Laar RK, Waring PM, Zalcberg J, Ward R, Biankin AV, Sutherland RL, Henshall SM, Fong K, Pollack JR, Bowtell DD, Holloway AJ. (2005). An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Research. 65(10): 4031-40.

Varadhachary GR, Talantov D, Raber MN, Meng C, Hess KR, Jatkoe T, Lenzi R, Spigel DR, Wang Y, Greco FA, Abbruzzese JL, Hainsworth JD. (2008). Molecular profiling of carcinoma of unknown primary and correlation with clinical evaluation. Journal of Clinical Oncology. 26(27): 4442-8.