Validating Cancer Biomarkers

Table of Contents

Letter from the Editor
Index of Experts
Q1: What are your standards for sample collection to ensure the best outcomes for biomarker screening?
Q2: How do you ensure sensitivity and specificity in looking for cancer biomarkers?
Q3: What genomic or proteomic technologies do you use in biomarker discovery, and why?
Q4: What is your process for validating a cancer biomarker?
Q5: How do you statistically evaluate biomarker discovery data?
Q6: In scientific studies from other labs, what key characteristics do you look for to ascertain that a biomarker has been properly validated for a clinical setting?
List of Resources

Download the PDF version here

Letter from the Editor

This month, GT brings you a technical guide on validating cancer biomarkers. While the topic spans a breadth of literature, we decided to focus on the nuts and bolts of what it takes for a biomarker to make it to clinical validity.

While many new 'omics tools have come onto the scene for discovering tumor markers, especially advanced mass spectrometry and other protein expression analysis methods, validating cancer biomarkers continues to pose serious challenges. Not only does a better understanding of the limitations of serum for comparative protein profiling need to be addressed, but improved catalogs of cancer cell secretomes could greatly speed up progress.

Some of the biggest challenges to validating cancer biomarkers include making them trustworthy for both clinically detecting and diagnosing tumors. Study design, especially bringing in significant numbers of test subjects, as well as reliable statistical methodologies, are both very important for proper clinical validation. In this guide, we've taken an A-to-Z look at what it takes to validate a tumor marker and we've culled experts from a wide range of clinical posts. In their answers, our experts cover standards for sample collection, ensuring sensitive and specific detection, and the statistical challenges of evaluating biomarker discovery data for use in a clinical setting. In addition to their instructive remarks, be sure to check out our handy resource section at the back of the guide

— Jeanene Swanson

Index of Experts

Genome Technology would like to thank the following contributors for taking the time to respond to the questions in this tech guide.

Lucila Ohno-Machado
Harvard Medical School
Brigham and Women's Hospital

David Rimm
Yale University School of Medicine

Daniel Sargent
Mayo Clinic

Mark Sherman
National Cancer Institute

David Wong
UCLA School of Dentistry

Yan Xiao
National Institute of Standards and Technology

Zhen Zhang
Johns Hopkins Medical Institutions

Q1: What are your standards for sample collection to ensure the best outcomes for biomarker screening?

For tissue biomarkers, the most reliable specimen is a core biopsy that is formalin fixed immediately after harvest. Resection specimens have variable cold ischemic time, depending on the surgical manipulations required in each resection and possibly even more, on the time and temperature of the specimen after resection and prior to sampling or formalin fixation.

— David Rimm

This depends greatly on the stage of biomarker development. In the initial stages, sample collection should be highly standardized to ensure minimal heterogeneity. However, in later stages, conditions similar to how the biomarker would be used in practice are allowable, and in some senses even desirable. The best way to test whether a marker will work in the 'real world' is if the studies to validate the biomarker are performed in as close to the clinical environment in which the biomarker will be used.

— Daniel Sargent

For systemic markers (e.g., blood or urine), patient factors such as recent testing or biopsy, stress, food or medication intake, menstrual cycling, diurnal variation, and other factors may be important. If these factors are not measured or cannot be standardized, their impact on test results should be assessed. A pitfall in biomarker studies is to focus on a particular molecular target while ignoring broader issues concerning whether the subjects are representative of those with the disease in the population. For example, one cannot assume that patients whose tumors express the same target but differ in age, race, and socioeconomic status have biologically identical tumors.

— Mark Sherman

Standardization of sample collection, processing, and stabilization is key to biomarker studies. We routinely have subjects come in between 9-11 am after fasting for two hours and rinse twice with water before providing a saliva sample. Saliva collection is always kept on ice. Sample processing always occurs within an hour of saliva collection.

— David Wong

As a member of the Early Detection Research Network, our laboratory follows the guidelines for creating the Standard Specimen Reference Sets (SSRSs) in specimen selection, collection, procession, and storage.

— Yan Xiao

Q2: How do you ensure sensitivity and specificity in looking for cancer biomarkers?

There will always be a tradeoff between sensitivity and specificity, and it is therefore sensible to combine results from different methods to ensure that the biomarkers that are discovered are not related to other processes (e.g., inflammation) and are specific for cancer. Depending on the application, ensuring high specificity may be more important than high sensitivity or vice versa. Characterizing the diagnostic value of a cancer biomarker is a critical point in understanding how/if it can be used in practice.

— Lucila Ohno-Machado

The highest sensitivity and specificity is generally desirable, but can be a function of the test. For example, there are some tests where sensitivity is so important that the prescribers of the test are willing to tolerate lots of false positive tests (low specificity). Arguably the estrogen receptor test in breast cancer is this sort of test. Since hormonal therapy is generally of low toxicity and can be very beneficial, often any positivity (even just a few cells) is considered a positive test (even though the threshold in most clinical trials of the drug was 10 percent of the cells or more). The low specificity is tolerated to give the patient every chance to receive the therapy. From a strict mathematical perspective, sensitivity and specificity are best assessed in a combined assessment using a Receiver Operator Characteristic curve. The value of the test is then assessed by measuring the area under the curve (AUC). An AUC of 1.00 is a perfect test. In practice, most AUC values are between 0.7 and 0.9, where less than 0.7 is considered a failed test by statisticians. Most current tests are single marker tests (ER, CEA, PSA, etc.) with limited AUCs. Tests are being developed that are multiplexed (many markers combined) to give optimal results. It seems likely these multiplexed tests will ultimately achieve higher AUCs.

— David Rimm

This is highly related to question #1. Maximal sensitivity and specificity will be obtained when the samples, and the associated patients, are as homogeneous as possible. The choice of the patient population is critical.

— Daniel Sargent

Sensitivity and specificity can be interpreted in two ways: analytically or clinically. Assessing analytical performance is difficult because assay performance using controls and subject samples may differ secondary to sample matrix, collection procedures, processing, storage, and other factors. Reproducibility may represent the first and most critical metric to assess because poor performance renders other parameters moot. Clinical assessment of sensitivity and specificity is strongly related to subject characteristics, especially ensuring that cases and controls are comparable and that their samples were tested under identical conditions. Although many biological relationships are linear, clinical decision-making often is not. Assay performance may be most critical around the threshold for determining a negative, positive, or categorical result. If measurements fall to extremes, then some loss of sensitivity is tolerable. However, specificity is almost always critical because false positive results may lead to biological misunderstanding or harmful clinical interventions. For rare diseases, poor specificity is especially problematic because false positives can easily outnumber true positives, resulting in low positive predictive value and high expense, patient anxiety, and wasted effort. Whereas lack of sensitivity may lead to withholding a targeted treatment, poor specificity may lead to false conclusions about efficacy in research or clinical harm.

— Mark Sherman

There are five attributes that influence this: 1) quality of clinical samples; 2) the technology tools for biomarker discovery; 3) the biology of the biomarkers; 4) the statistical evaluation accompanying the discovery and validation process will reveal discriminatory biomarkers, their disease discriminatory values; and 5) the ability to validate the biomarkers. While the biology of the markers is a given, the technology tools (proteomics, genomics, metabolomics, microbial, miRNA, etc.) and the statistical competence significantly factor into the outcome.

— David Wong

The key is proper specimen design so that they are sufficiently powered to detect clinically important markers. We include three different categories of control samples: ones with no disease, ones with benign diseases, and ones with other types of cancer. In this way, the specificity of biomarkers against potentially confounding conditions can be evaluated. Another point to consider is that a panel of biomarkers may provide a multi-factor combination of markers that is more sensitive and specific than any single marker.

— Yan Xiao

These have to be determined by the expected/planned use of the biomarkers. They differ from disease to disease (e.g.,prostate cancer vs.ovarian cancer) and from one application to another (e.g., screening vs. early detection vs. diagnosis vs. prognosis vs. monitoring).

— Zhen Zhang

Q3: What genomic or proteomic technologies do you use in biomarker discovery, and why?


In my biomarker discovery lab, we exclusively use AQUA technology for quantitative assessment of absolute amounts of protein expression within a tissue sample. It is highly reproducible and easily compared to user-defined standards so measurements can be made that are independent of machine or operator or many other variables that creep into tissue biomarker discovery assays.

— David Rimm

Historically, molecular epidemiologic research has been hypothesis driven, and as such, biomarker studies have focused on analysis of candidate markers. This approach is now complemented by more "agnostic" marker profiling methods in which conclusions are often driven by statistical analyses rather than a priori hypotheses. The candidate approach is limited by our biological insights; profiling is limited by technical advances, false discovery secondary to low prior probabilities and cost. In general, for cost savings/ efficiency, we try to employ hierarchical approaches in which profiling is done on a subset of representative samples to identify specific markers, which can then be tested using targeted assays on entire study populations. We are moving increasingly towards multi-platform approaches that can strengthen findings through cross-validation. For example, identifying loss of protein expression by immunohistochemistry affirms methylation silencing; increased or decreased mRNA levels may correlate with gene amplification or deletion.

— Mark Sherman

For genomic tools, we engage in transcriptome profiling using gene-based (Affymetrix 133+2) and exon-based arrays (all exon). For proteomics discovery, 2D or LC/MS are standard approaches. These are the first two diagnostic alphabets in saliva.

— David Wong

Mitochondrial DNA is considered more susceptible to environmental damages than genomic DNA. As a result, mtDNA lesions may accumulate in the cell and ultimately lead to malignant transformation. In the last years, mtDNA lesions have been discovered to be characteristic of many cancers. We have recently initiated a project to identify oxidatively induced mtDNA lesions in cancer cell lines as potential biomarkers for early detection. We use liquid chromatography/mass spectrometry and gas chromatography/ mass spectrometry to measure the levels of DNA-based lesions in the mitochondrial genome.

— Yan Xiao

Genomic profiling, proteomic profiling (mass spectrometry, protein arrays).

— Zhen Zhang

Q4: What is your process for validating a cancer biomarker?

Although there is no substitute for wet lab experiments, we have computational tools to screen potential cancer biomarkers using publicly available data (from GEO and other sources), so that at least we can verify the potential for that biomarker for being specific for cancer, or even specific for a particular type of cancer.

— Lucila Ohno-Machado

Validation means many things to many people. To some, validation means reproducibility of the assay results across multiple large populations. Validation may also refer to the reagents used in the assay. For antibody reagents, this means using extensive extrinsic control series run with each assay to generate standard curves that show both linearity and a reproducible dynamic range with a reproducible detection threshold.

— David Rimm

Multiple steps are necessary. I become involved at the point the biomarker is ready to be tested on a moderate to large number of patients. Akey item that is frequently overlooked is the need for a) pre-specification and b) replication in the validation process. For a biomarker to be truly validated, the study must pre-specify all aspects of the analytical and statistical plan. This includes the primary endpoint, the biomarker cut-off values, and the statistical methodology. Even in this case, if multiple (hundreds or thousands of) possible markers are being studied, the chance of a false positive is high. This is why independent replication in a second study is critical; it greatly reduces the chance of a false positive if the same pre-specified marker is identified as important in two independent studies.

— Daniel Sargent

Biomarker validation typically requires work by multiple different research groups. The success of the development of HPV testing demonstrated the need for ensuring assay sensitivity and reliability; demonstration of expected correlations with other findings (cytologic and histologic disease); optimization of specimen collection and handling; and improving predictive value for clinically important disease states by identifying which women to test.

— Mark Sherman

After discovery, a biomarker must go through four validation steps: (I) clinical assay and validation, (II) retrospective longitudinal, (III) prospective screening, and (IV) cancer control. We conduct analytical validation that determines the capacity of biomarkers to distinguish between cancer and normal samples. The process comprises: (1) development of a validation strategy, (2) sample acquisition, (3) data acquisition and analysis, and (4) statistical analysis.

—Yan Xiao

Q5: How do you statistically evaluate biomarker discovery data?

Statistical evaluation requires a sufficient number of disease and control samples, as well as a significant number of different types of cancer material. Most available studies do not report on a large number of samples, and hence we find it often necessary to combine results from different experiments to achieve the power needed to detect statistically significant differences across different samples.

— Lucila Ohno-Machado

Statistical validation of a biomarker is usually done by biomarker cut-point and multiplex discovery using one cohort, followed by validation on a second and third independent cohort. Once a biomarker assay has been defined, it is evaluated by ROC curve analysis or misclassification statistics. A good biomarker may have a statistically significant p-value, but that does not mean it will be valuable in real world testing. The sensitivity and specificity must fit the clinical situation and solve a clinical problem or produce some added value such that prescribers are willing to order the test and payors are willing to reimburse.

— David Rimm

For a definitive study, all aspects must be pre-specified, just as in a therapeutic clinical trial. Many statistical approaches are possible and valid, given that they are pre-specified. Once you have looked at the data, then trying multiple statistical methods in the hope that one identifies a significant result clearly is improper and increases the risk of false discovery.

— Daniel Sargent

Analysis of biomarker discovery data, particularly those derived from profiling methods, has placed bioinformatics and biostatistcs in a lead role. Optimizing analysis typically requires a multi-disciplinary effort. Although statistical correction for false discovery is important, understanding pathway biology and considering whether results support a priori hypotheses or represent unexpected findings are also essential. Finally, developing a common understanding of the exact question that has been asked and translating this into the best suited analysis is central to success. The approach to an analysis itself sometimes serves to clarify or refine exact goals. For example, to define the biological interaction between pathways, one might examine correlations between multiple biomarkers. To demonstrate the public health importance of a particular marker, the prevalence of a biomarker may be as important as its strength of association with disease. Specifically, eliminating common exposures that pose small disease risk may lessen disease burden. In contract, clinical management typically requires a marker that is strongly associated with outcome, but the marker may have relevance even if uncommon or rare. For example, only a minority of cancers show HER 2 amplification, yet identifying this biomarker permits the use of effective targeted treatment. Similarly, many highly penetrant genes are rare, but identifying individuals with these genetic markers may guide screening or prevention.

— Mark Sherman

This is a critical component of the biomarker discovery and validation process. Statistical competency is crucial to avoid data over-fitting and to assess the discriminatory value of biomarkers. Our statistical evaluation process depends on the type of discovery platform under consideration. For gene expression data we import the raw microarray data into the statistical software package R. Within R, we perform data preprocessing including: background correction, normalization, expression index computation, and chip quality control. Based on the processed data, we identify biomarker candidates using a combination of statistical metrics including multiple testing corrected t-tests (for two group comparisons), the presence/absence calls, and the fold change between groups.

— David Wong

We use the software Interrelation Miner (http://www.interrelationminer.com/) from Systaim to analyze our data. However, the method for statistical analysis depends largely on the method used for biomarker discovery and is also very much a personal/institutional choice. Regardless of which method or software is used, caution must be taken not to over-fit the data. Many multivariate statistical and machine-learning algorithms used in biomarker discovery are prone to over-fitting, meaning that the number of parameters in a model is too great relative to the number of samples, which leads to the consequence that an algorithm may perform well on the original sample set, but predict poorly for independent validation samples. Cross-validation and validation of independent datasets, as well as large sample sizes, are necessary to avoid data over-fitting.

Please note that the mention of a specific statistical analysis software does not imply a NIST endorsement of that product.

— Yan Xiao

Q6: In scientific studies from other labs, what key characteristics do you look for to ascertain that a biomarker has been properly validated for a clinical setting?

A careful description of subject characteristics, as well as a detailed description of histological findings and sample preparation are critical to assess whether the biomarker can be useful in a real clinical setting. Furthermore, it is important to check whether the biomarker really adds value to the standard of care in a particular cancer domain: How predictive and how expensive is the biomarker when compared to variables that are currently routinely collected in the process of care? Does the new biomarker have the potential to change the way the patient is managed? This is important because ultimately we want to know whether measuring that biomarker results in better management and improved outcomes for the patients.

— Lucila Ohno-Machado

Four key characteristics are 1) reproducibility, 2) robustness of the assay to variability in reagents, 3) objectivity, and 4) testing against appropriate criterion standards. Too often a biomarker is tested against another biomarker, which is not reflective of the actual disease condition or response. For example, testing a Her2 IHC test against a standard of DNA amplification by FISH is testing against a false standard. The true standard for prognosis is patient recurrence-free survival or, for prediction, response to Herceptin or other Her2 targeted therapy.

— David Rimm

The key element is pre-specification. What was the patient population? How many patients were tested but a biomarker result was not available, and how was this accounted for in the analysis? Did the sample represent in some sense a complete population, or were they highly selected? Did the lab run trials of multiple assays on at least a subset of the samples, and demonstrate reproducibility?

— Daniel Sargent

Validation should address both technical aspects of assay performance and clinical application of the test. Validation procedures may vary with the marker, the specimen for testing and the clinical implications of test results. However, certain factors are always critical. Excellent assay reproducibility is essential, and preferably, reliability should be robust to variability in collection, handling, and storage practices that are commonly encountered clinically. It is important to know that the assay performs well in clinical laboratories as well as in research laboratories and procedures for quality assurance should be available. The data should support the utility of the assay for cost-effective clinical decision- making and there should be evidence-based guidelines that indicate which patients to test and how to interpret the results. It is equally important to know for which patients or clinical situations testing is not useful. Finally, validation requires replication in many settings because of the problem of over-fitting data. A promising association that emerges from one data set may be an artifact of the statistical analysis that cannot be generalized widely.

— Mark Sherman

All of the above characteristics of a well-designed, sufficiently powered discovery and validation study is mandatory. If a study is not properly designed, powered, and followed through, it is better off deferring or not doing it at all. One final note, biomarker studies are population-based studies, and the ability to validate disease-discriminatory biomarkers in independent populations is key, and there is no substitute to this.

— David Wong

Proper validation of a biomarker is extremely important if it is to be used in a clinical setting. Without proper validation, the meaning of the biomarker is ambiguous or essentially useless. We look for the following key characteristics in scientific studies from other labs to ascertain that a biomarker has been properly validated for a clinical setting: (1) Samples were acquired properly. For example, the population was that intended for clinical application and was sufficiently general; multiple types of cases and controls were used; controls matched cases; systematic differences between cases and controls in sample collection and processing were avoided; etc. (2) Tests were conducted properly. For example, the assay method used for analyzing the biomarker was the one intended for general clinical use; testing assay was conducted in blind mode (without knowing if it was on case or on control); etc. (3) Statistical analysis was conducted properly. For example, sample size was large enough to avoid over-fitting; cross-validation and validation of independent datasets were properly conducted; etc. Finally, as performance criteria for properly validated biomarkers, the sensitivity and specificity of the biomarker should meet the predefined values.

—Yan Xiao

Key are: a) independent/blind, multi-center samples that are appropriate for the stated clinical use of the biomarkers; b) statistically sufficiently powered study; c) clinically meaningful performance level.

— Zhen Zhang

List of Resources

Our panel of experts referred to a number of publications and online tools that may be able to help you get a handle on interacting proteins. Whether you're a novice or pro at validating cancer biomarkers, these resources are sure to come in handy.

Publications

Bensalah K, Montorsi F, Shariat SF. Challenges of cancer biomarker profiling. Eur Urol. 2007 Dec;52(6):1601-9. Epub 2007 Oct 1.

Brozkova K, Budinska E, Bouchal P, Hernychova L, Knoflickova D, Valik D, Vyzula R, Vojtesek B, Nenutil R. Surface-enhanced laser desorption/ionization time-of-flight proteomic profiling of breast carcinomas identifies clinicopathologically relevant groups of patients similar to previously defined clusters from cDNA expression. Breast Cancer Res. 2008;10(3):R48. Epub 2008 May 29.

Liew M, Groll MC, Thompson JE, Call SL, Moser JE, Hoopes JD, Voelkerding K, Wittwer C, Spendlove RS. Validating a custom multiplex ELISA against individual commercial immunoassays using clinical samples. Biotechniques. 2007 Mar;42(3):327-8, 330-3.

Ralhan R, Desouza LV, Matta A, Chandra Tripathi S, Ghanny S, Datta Gupta S, Bahadur S, Siu KW. Discovery and verification of head-and-neck cancer biomarkers by differential protein expression analysis using iTRAQ labeling, multidimensional liquid chromatography, and tandem mass spectrometry. Mol Cell Proteomics. 2008 Jun;7(6):1162-73. Epub 2008 Mar 13.

Simpson RJ, Bernhard OK, Greening DW, Moritz RL. Proteomics-driven cancer biomarker discovery: looking to the future. Curr Opin Chem Biol. 2008 Feb;12(1):72-7. Epub 2008 Mar 11.

Sturgeon CM, Hoffman BR, Chan DW, Ch'ng SL, Hammond E, Hayes DF, Liotta LA, Petricoin EF, Schmitt M, Semmes OJ, Söletormos G, van der Merwe E, Diamandis EP; National Academy of Clinical Biochemistry. National Academy of Clinical Biochemistry Laboratory Medicine Practice Guidelines for use of tumor markers in clinical practice: quality requirements. Clin Chem. 2008 Aug;54(8):e1-e10. Epub 2008 Jul 7.

Sullivan Pepe M, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M,Winget M, Yasui Y. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001;93(14):1054-61.

Yoon SK. Recent advances in tumor markers of human hepatocellular carcinoma. Intervirology. 2008;51 Suppl 1:34-41. Epub 2008 Jun 10.

Conferences

CHI's Biomarker Discovery Summit
http://www.healthtech.com/bmks/overview.aspx

GTCbio's Oncology Biomarkers Conference
http://www.gtcbio.com/conferenceDetails.aspx?id=142

Web Tools
http://edrn.nci.nih.gov/resources/samplereference-sets
http://www.interrelationminer.com/