Protein Microarrays

Table of Contents

Letter from the Editor
Index of Experts
Q1: In developing your protein or capture arrays, how do you optimize functional expression when choosing proteins and/or ligands?
Q2: How do you reduce cross-reactivity between ligands and/or protein coupling to the slide surface?
Q3: How do you choose a detection method that offers the widest dynamic range?
Q4: What steps do you take to ensure sensitivity, specificity, and low background during detection?
Q5: What methods do you use to confirm your results?
Q6: How do you optimize and standardize data analysis?
List of Resources

Download the PDF version here

Letter from the Editor

For this month's installment of Genome Technology's technical guide series, we decided to focus on yet another cutting-edge technology: protein microarrays. As the postgenomic era slowly but surely paves the way for proteomics — functional analysis, biomarker discovery, or purely understanding the inner workings of signaling pathways are all at the forefront of proteomic research — protein microarrays are becoming an invaluable systems biology tool.

Our experts are culled from a broad base; they work on all types of arrays, including capture arrays, reverse phase protein microarrays, and functional arrays. Whether identifying and quantifying a protein in a mixture, immunoprofiling serum samples, or examining molecular-protein interactions of all types, protein microarrays are like all assays: sensitivity, specificity, and reproducibility are keys to good data.

Since we chose to focus broadly, the questions cover everything from how to optimize functional expression to how to reduce cross-reactivity, choose a suitable detection method, and confirm results. At the end of the guide, we've also included the usual list of resources to get you up to speed on this powerful assay. So, if you're already studying proteins, or hope to one day, don't miss out on this reliable reference tool.

— Jeanene Swanson

Index of Experts

Genome Technology would like to thank the following contributors for taking the time to respond to the questions in this tech guide.

Dolores Cahill
Professor of Translational Science
Conway Institute
University College, Dublin

Mark Gerstein

Professor of Biomedical Informatics
Yale University
(special credit to Andrea Sboner and Alexander Karpikov)

Thomas Joos
Head of Biochemistry
Department Natural and Medical Sciences Institute
University of Tuebingen, Germany

Steve Kornblau
Associate Professor
MD Anderson Cancer Center

Joshua LaBaer

Director
Harvard Institute of Proteomics
(special credit to Niro Ramachandran)

Gordon Mills

Chairman, Department of Molecular Therapeutics
MD Anderson Cancer Center

Satoshi Nishizuka

Assistant Professor
Iwate Medical University School of Medicine, Japan

Emanuel Petricoin III

Co-director, Center for Applied Proteomics and Molecular Medicine
George Mason University

Heng Zhu

Assistant Professor
Johns Hopkins Medical Institute

Q1: In developing your protein or capture arrays, how do you optimize functional expression when choosing proteins and/or ligands?

First of all, I would like to define "protein arrays." Protein microarray-based assays can be grouped according to different formats and different types of applications. Currently, forward phase protein microarray assays are the most frequently used format. They allow the simultaneous analysis of large numbers of different parameters from one sample using an array of well-defined capture molecules. Examples of forward-phase microarray assays include antibody microarrays that are used to identify and quantitate target proteins of interest, and protein affinity assays that are used to study the interactions between proteins and immobilized binding molecules such as proteins, peptides, low molecular weight compounds, oligosaccharides, or DNA.

On a reverse phase array, a multitude of different samples such as tissue or cell lysates are immobilized in a microarray format. Each microspot contains the whole proteome repertoire of the tissue or cell. Single soluble probes such as highly specific antibodies are used to simultaneously screen these tiny spots for the presence or absence of distinct target proteins. This allows a set of parameters in large collections of tissue or cell samples or sample fractions to be determined.

At the NMI we are predominantly working with miniaturized multiplexed sandwich immunoassays. For those assays for which no commercial standards are available, we use recombinant protein expression systems, and purify the recombinant proteins using TAG specific antibodies.

— Thomas Joos

We use reverse phase protein arrays (RPPA) and print whole cell lysates from leukemia patient samples or from cell lines onto the slides. In the case of the patient samples, we are probing them for their native characteristics, so no manipulation is performed to modify the baseline protein expression. Some of the cells lines are grown under different conditions, +/- a perturbing agent, e.g. growth factor, cytokine, drug, siRNA, etc. We use that sort of manipulation more if we are testing something and want to look for differences in expression between two states.

— Steve Kornblau

We produce our protein arrays by expressing the proteins in situ on the array surface using cell-free expression systems. The proteins are expressed as fusion proteins with polypeptide tags fused to the N- or C-terminus and captured to the surface using capture agents that recognize the polypeptide tags. One advantage is that the proteins are produced fresh at the time of the experiment, minimizing concerns about protein shelf life. There is also a fairly narrow range of levels from the most to least abundant proteins that falls within a threefold range of the mean. As a general rule, we express full-length proteins, but there are certain assays where we might supplement this with specific protein domains. To ensure protein function, it is always best to test it in the assay that is planned for the specific experiment at hand. It is generally not feasible to test all proteins to ensure that they are functional, so we will often include positive controls in early experiments and optimize for those.

— Joshua LaBaer

In the lysate array application (aka, 'reverse phase' protein lysate microarray), we primarily print cell lysates that consist of a full fraction of proteins from a cell, which can be heterogenic and dynamic.Therefore, optimization in sample collection takes into account how and when we collect samples.

Proteins associated with a cell are roughly divided into two major groups from the perspective of protein detection: (a) those that do not change their expression levels under any condition; and (b) those that do (Nishizuka, S. et al. PNAS, 100:14229- 14234, 2003). The former includes cytoskeletal or constitutively active proteins; the latter includes proteins involved in processes such as signal transduction. Cell signaling is the consequence of biochemical reaction cascades that take place in a time- or input-dependent manner. Hence, the quality of signal detection is related to not only the detection method itself, but also the set of protein species that is collected. It is important to select the proper conditions for test samples to represent the signals of interest.

— Satoshi Nishizuka

The reverse phase arrays can be used to analyze analyte concentrations down to the yoctomole range depending on the antibody used. The arrays can analyze proteins in raw serum, cellular lysates, and expression products. One of the great advantages of the reverse phase arrays that we invented is that the analytes are completely denatured before they are queried, so we don't have to worry about antigen retrieval, or two-site assays.

— Emanuel Petricoin

Our group is mainly dealing with functional protein microarrays that are composed of individually purified proteins. We are constantly facing the challenge of how to produce and purify full-length proteins with expected activity. We have found that the budding yeast is a wonderful system to produce active fusion proteins encoded by various eukaryotes and mammalian viruses. Using native conditions during the purification is another key to preserving protein activities.

— Heng Zhu

Q2: How do you reduce cross-reactivity between ligands and/or protein coupling to the slide surface?

In the RPPA system the protein is printed on the slide, there is no "capture" issue (as long as the protein binds to the membrane), and each slide is probed with a single antibody. We use nitrocellulose coated slides (formerly FAST Slides from S&S, now Whatman slides) designed to bind protein with high affinity and have no problems with protein binding. Since all proteins are theoretically present in each dot, the crucial issue for specificity is to use highly validated antibodies.

— Steve Kornblau

The arrays we use are functional arrays that have different proteins at each feature on the array. Thus, we are not concerned about the kind of cross-reactivity that can be observed with capture arrays, for which a capture molecule might inadvertently capture the wrong analyte. However, in our format, the proteins are transcribed, translated, and captured locally on the array, so it is possible that an expressed protein could diffuse away and bind to a neighboring feature. To control this effect, we do two things. We always print replicate spots for each protein and we typically print them away from each other so we don't get inappropriate enhancement of effects by having them next to each other. We also surround each replicate with different neighbors so that if there are spillover effects, they will affect different proteins. A large CV between replicates of the same protein will be a clue that there might be a spillover effect. The most important element of cross-reactivity, however, is to measure it directly so that its effect is known and will be considered. To do this, the characteristic of a particular protein can be measured for the protein itself on the array and at its neighbors to see how much of it is found there.

— Joshua LaBaer

The key challenge is to have high-quality antibodies. Thus, we spend many weeks characterizing and validating each antibody. To be validated, the antibody must have a high dynamic range on western blots and arrays. The western blots and arrays must have a strong correlation. Each protein is manipulated with either signaling events or with siRNA or by using cell lines with markedly different amounts to ensure utility. The antibodies must behave consistently across many arrays.

— Gordon Mills

We use a nitrocellulose slide. Another fantastic attribute of the reverse phase arrays, and an advantage over antibody or forward phase arrays, is that we immobilize the entire lysate on the slide, and then block the slide just like a western blot. In fact, the reverse phase arrays are set up just like a miniaturized immunoassay. That is why the CVs of the reverse phase array rival FDA-approved immunoassays — in the 5% range. We use third-generation amplification technology to generate tremendous sensitivity.

— Emanuel Petricoin

Because each ligand and/or protein is likely to possess very different biochemical properties, we normally have to test various types of surface chemistry on the slides as well as test different blocking reagents. This optimization step is almost crucial to the success of the experiment. For example, when a ligand is highly charged, you probably want to increase the salt concentration to offset the non-specific interactions due to static force.

— Heng Zhu

Q3: How do you choose a detection method that offers the widest dynamic range?

A highly sensitive alternative to confocal optics is the application of planar waveguide excitation devices combined with CCD cameras or photomultipliers as detectors. This technology shows greater sensitivity with regard to signal intensity, linearity, signal-to-noise ratio, and background. Zeptosens AG, now a division of Bayer Technologies (www.zeptosens.com, Witterswil, CH), has developed the Zepto Reader which is based on the planar waveguide technology. Capture molecules are immobilized in a microarray format on a thin (100-200 nm) film (planar wave guide) which consists of material with a high refractive index (e.g. Ta2O5) deposited on a transparent support. A laser beam is optically coupled into the planar waveguide via diffractive grading. The light is coupled into this thin layer via grafting and is propagated in the thin layer, thereby creating a strong, surface-confined evanescent electromagnetic field. The penetration depth of this evanescent field into the adjacent medium is limited to about 200 nm. Thus, only surface-confined fluorophores are excited and emit fluorescent light. Fluorophores in the bulk medium are not excited and therefore not detectable. A CCD camera is used to detect fluorescent light with high spatial resolution. Parallel excitation and parallel detection of binding events on different spots is performed and is both highly selective and highly sensitive, even in solution.

— Thomas Joos

We evaluated both fluorescent and dye precipitation techniques. To our surprise we found a loss of sensitivity at very low levels with the fluorescent conjugated antibodies. The 3,3'- diaminobenzidine tetrachloride precipitation seemed to give us better results, maintaining linearity of signal at low levels of protein (higher dilutions). We plan to go back and revisit using fluorescent labeled antibodies, as this would permit us to get more data from each array.

— Steve Kornblau

The choice of detection method depends largely on the specific experiment. Wide dynamic range is one consideration, but the method should fit the biochemistry. For example, in some cases, radioactivity might make sense (e.g., kinase activity), for others immune detection is necessary (e.g, immunoprofiling), and still other circumstances might demand more specialized detection schemes. As an academic lab on a limited budget, we tend to work within what equipment is already available, often tailoring the biochemistry if needed. We select the detection method on a case-by-case basis.

— Joshua LaBaer

We use a tyramide signal amplification system almost exclusively, which gives one of the widest ranges of detection for immunostaining. The array format requires many scans, so the scanning speed is critical. Ordinary flatbed optical scanners are suitable for these requirements.

— Satoshi Nishizuka

We print our samples in miniature dilution curves, just like an ELISA — this ensures many log dynamic range, tremendous sensitivity and reproducibility.

— Emanuel Petricoin

Q4: What steps do you take to ensure sensitivity, specificity, and low background during detection?

An important aspect to take into account is the image processing software. Protein arrays present peculiar features that need to be properly addressed in order to obtain reliable measurements. Currently, most image processing software has been developed for DNA microarrays where the spot is typically well defined and all the spots have the same geometrical features across the array. For protein arrays, instead, a critical aspect is the identification of the area on the spot which defines the bound protein, e.g. the segmentation. Ensuring a proper segmentation algorithm enables one to keep a good quality of the signal and therefore a good quality of the analysis. To compare the performance of different image analysis algorithms for the protein microarrays (including the algorithms we develop in our lab) we use a test protein microarray where the same protein is spotted many times. In addition to that, a proper normalization schema has to be defined. It should take into account the source of variability that may arise from the technical preparation of the array, such as printing issues, scanning, etc. Running a calibration experiment can also help in defining the parameters of the normalization procedure.

— Mark Gerstein

A lot of time is spent on antibody validation and we have a big box of "not so specific" antibodies left over from testing. First we perform a western blot against a panel of cell lines, mostly leukemia, but some solid tumors, and look for a single band in the correct location. We try to include a cell line that is known to lack the protein where possible. If a protein has a known phosphorylation form, or a cleavage form, we will accept a western blot that shows those variants. For phosphorylation antibodies we try to include a state where the protein is not phosphorylated and look for an increase in phosphorylation after stimulation with a known agent. Sometimes we will use siRNA to knock down expression to verify specificity. Once we have a good candidate AB from WB we then probe our cell line RPPA with it, using the AB at different concentrations. We compare signal strength on the RPPA to that on a WB and insist on a correlation of r >.7. Since the cell line array has about 150 cell lines there is usually one that is known to lack, or overexpress, the protein in question. We make sure that the protein shows up, or is absent, in the right locations. In one of our arrays we included 138 purified peptides, including non-phospho and phospho forms, and could determine sensitivity against known concentrations of peptide. We found sensitivity down to fentomolar levels. We adjust the primary and secondary AB concentration to find the combination that gives the least background. We also employ a technique that we developed (Gordon Mills, Kevin Coombes and I) called topographical normalization. We spot a positive control, a mixture of 11 cell lines that we hope will be positive for everything (and which has been for 108 different sites so far) or a negative control of lysis buffer at the end of each patient sample. Since we use five dilutions from each patient, this makes every sixth column a control column across the slide, or 48 columns in total. Since the same protein appears across the slide, this creates a 3D topographical map across the slide. The topographical map of the control protein gives the background correction, the map of the cell line mixture gives scale. We can thus correct background and scale (signal intensity) across the slide. We also print replicates on each slide.

— Steve Kornblau

Again, this largely depends on the goal of the experiment. The user must understand what he or she plans to measure. Generally speaking, the user has to define the appropriate positive controls and negative controls that best represent the experiment. This will address the issues of sensitivity and specificity. For example, to measure protein-protein interactions, the user may first want to establish the range of affinities of interest. Then, using a set of interacting protein pairs with known affinities and specificity, test the system. For immunoprofiling of antigen arrays using serum samples, the user may want to test sensitivity and specificity by spiking in various amounts of a specific antibody into their serum. Often the user must sort out whether the problem is due to weak signal or high background. Each would require a different course of action: low signal might require the need for more antigen to be displayed, more sensitive detection systems, etc., whereas high background might require improvements to surface chemistry, better formulations of blocking solutions that specifically address the source of the background, etc.

— Joshua LaBaer

Lysate arrays are not designed to detect low amounts of proteins. Because the lysate array employs an antibody-based detection system, the sensitivity depends largely on the degree of primary antibody performance. Therefore, sensitivity is not a primary concern, because we eliminate antibodies that yield low signals. Specificity can be ensured prior to lysate array signal detection. The signal from a dot format is the sum of specific and non-specific antibody binding. The western procedure using membrane strips is designed to test how specific signals can be produced in a given sample-antibody combination. We do not subtract the local background around a dot. A lysate sample on a typical (and the original) lysate array is printed with a dilution series (10-time, two-fold serial dilutions) for better quantitative analysis. We assume that the background level is relatively similar across the dilution; hence, there should be no significant effect whether the background is subtracted or not. Rather, the advantage of lysate arrays is the ability to capture relative differences between many samples, which can be preserved across the dilution series (Nishizuka, S. Biotechniques, 40:442-448, 2006).

— Satoshi Nishizuka

We thoroughly validate our antibodies prior to use. Single band at the right molecular weight, peptide competition, etc. Then we use tryamide precipitation reaction biotin/avidin based amplification — we can detect a few hundred molecules per spot, rivaling the best immunoassays. We block the slides just like a western, and do background subtraction. Since every sample is printed in a miniature dilution curve on the reverse phase array format, we know where the linear dynamic range is, and then since we print a calibrator on the array, we have fantastic precision and accuracy.

— Emanuel Petricoin

Choosing the right labeling method is key to ensuring high sensitivity. To increase specificity, it is always a good idea to include a meaningful negative control. For example, a wild-type protein probe and its mutant, labeled with different dyes, can be incubated together on a protein chip, and based on the ratios specific interactions to the WT protein can be easily identified.

— Heng Zhu

Q5: What methods do you use to confirm your results?

It depends on the application, but we generally use multiple replicate experiments, western, ELISA, coprecipitation, co-immunoprecipitation, mass spectrometry, and BIAcore. We also use FACS colocalization, tissue microarrays, gene expression analysis, siRNA, large numbers of subject, and controls if dealing with sera studies.

— Dolores Cahill

We perform the analysis of the data of the protein chips probed with different serum samples (obtained from normal patients as well as from patients with autoimmune disease) at different dilutions. We compare the results of the protein chip data analysis with the information coming from the common biological knowledge.

— Mark Gerstein

If the AB has passed the validation described above then we feel comfortable trusting the data from the patient samples. When we were validating the methodology we correlated results with western blot. Our current leukemia RPPA slide set (two slides) has over 1,070 different leukemia samples printed on them. We have probed this set with 108 different antibodies. So it's not feasible to double check with western blot. When we get interesting results we do carry out fur ther experiments that have validated the results. In one case we suspected that cases with very high levels of p53 would harbor mutations. We sequenced from exons 5 to 9 and have found that 50% of high expressors, vs. none of the low expressors, had mutations. In another we developed a signature that would predict the presence of the FLT3-ITD from cases of AML with known FLT3 status. We then prospectively predicted whether cases lacking FLT3 analysis would have a mutation or not. We then determined the FLT3 status on those cases. Our accuracy was over 80% suggesting that the signatures were reliably associated with the mutation.

— Steve Kornblau

This may generally fall into two categories, which relies on whether the intention is to confirm the technical validity of the result or the biological validity. Confirmation of the technology could involve reproducibility of the signal using the same technology and/or confirmation using a complimentary approach, which in many cases might be a standard ELISA. To ascertain biological validation of the result may involve a wide variety of assays ranging from immunoprecipitation, westerns to phenotypical assays, all depending entirely on the question at hand.

— Joshua LaBaer

Data validation can be very challenging depending on the assay types. In many cases, we first try to generate a robust hit list by integrating available knowledge in the literature and databases. Next, we will focus on the selected candidates and perform a quick in vitro assay to validate the chip results. For those that pass the test, we will use various in vivo methods to further validate the results.

— Heng Zhu

Q6: How do you optimize and standardize data analysis?

We have control regions on the chip to ensure the proteins are there up to eight times each; they are also at different points on the array. We have extensive positive and negative control regions on the array containing standard proteins and antibodies over a broad concentration range. We also have empty control regions for experiment-specific positive and negative controls.

— Dolores Cahill

We typically carry out the analysis by means of Matlab, R, or our own software, ProCAT. The main advantage of these tools is that they provide enough flexibility to analyze the data, from quality controls to actual data mining. In our experience, this flexibility enables the adaptation of the analysis to the particular biological question one has in mind.

— Mark Gerstein

Standardization: As much as possible, we standardize how things are performed and what materials are used. All our clinical samples have been handled uniformly since 1999. The cell concentration is fixed at 10,000 cells/μL. We print our arrays in batches of 100 per day on sequential days. We stain a slide with Coomassie Blue to assure that everything printed as anticipated. We look at the patterns of staining across the slide set to see if there are light or heavy print areas. Slides are stored together until use. We try to use antibodies from the same batch. The same controls are present on each array. The slides are scanned and analyzed on the same scanner and by the same person.

Optimization: We optimize the antibody conditions by testing various combinations of primary and secondary antibody as necessary. Surprisingly, our first guess is good over half the time. When the slides are analyzed using the Micro-Vigene software, we look at the quality of the analysis and verify that the results are linear. We use a technique called supercurve to test the quality of the signal from each dot on each slide as well. We check the replicate correlation between samples on the same slide but this generally isn't an issue (R > .9 in 78%, R > .8 in 96% on our current slide set). We check the range of signal between slides as well, but again this hasn't been an issue. Across the dataset we look to see if a given sample has very low or high levels of all proteins.

— Steve Kornblau

This area is very much in development, but currently there are no standards set on how to analyze data. Having said that, some basic needs for data analysis are needed to accurately establish background, correct for zone variations, and normalize the signal for array-to-array comparisons. The data can then be processed using a variety of biostatistical methods to identify potentially interesting "hits." This may involve methods designed to measure how well signals separate from the negative control. This could be done within a single array or across multiple arrays; one could also factor in existing knowledge to prioritize the "hits." To standardize these approaches will rely heavily on the level of false positives and false negatives of the various approaches which can be determined using known positive and negative controls included in the experiment. This will allow the user to dial in to the optimum methods for data analysis.

— Joshua LaBaer

We have adopted MicroVigene to obtain spot intensity and have developed our own software to facilitate determining concentration. We do not have major problems with quantification. The challenge is visualization of the data. Representing the data by heat maps, bar graphs, and other approaches remains a major challenge to providing a good visualization.

— Gordon Mills

This is accomplished by a variety of techniques. We have optimized the primary and secondary antibody concentrations. We stain our slides with FDAapproved platforms such as DAKO's Autostainer. These are automated machines and work very very reliably. We have software that was designed by VigeneTech. The software performs local background subtraction, secondary subtraction, linear range detection, etc.

— Emanuel Petricoin

Unlike the DNA/oligo microarray assay, assays performed on protein chips cover a wide range of biochemical analyses. A one-for-all approach for data analysis simply does not exist. However, data normalization plays a crucial role and can be somehow standardized. So far, we are collaborating with bioinformatics groups to come up with a standard algorithm for data normalization. We also try to build in various negative and positive control spots on the protein chips to assist the normalization.

— Heng Zhu

List of Resources

Our panel of experts referred to a number of publications and online tools that may be able to help you get a handle on protein microarrays. Whether you're a novice or pro at these types of arrays, these resources are sure to come in handy.

Publications

Hall DA, Ptacek J, Snyder M. Protein microarray technology. Mech Ageing Dev. Jan;128(1): 161-7 (2007). Epub 2006 Nov 28.

MacBeath G, Schreiber SL. Printing proteins as microarrays for high-throughput function determination. Science. Sep 8;289(5485): 1760-3 (2000).

Speer R, Wulfkuhle JD, Liotta LA, Petricoin EF 3rd. Reverse-phase protein microarrays for tissuebased analysis. Curr Opin Mol Ther. Jun;7(3): 240-5 (2005).

Zhu H, et al. Global analysis of protein activities using proteome chips. Science. Sep 14;293(5537): 2101-5 (2001). Epub 2001 Jul 26.

Zhu H, et al. Analysis of yeast protein kinases using protein chips. Nat Genet. Nov;26(3): 283-9 (2000).

Zhu X, Gerstein M, Snyder M. ProCAT: a data analysis approach for protein microarrays. Genome Biol 7(11): R110 (2006).

Books

Protein Microarrays
Edited by Mark Schena
(July 2004) Jones and Bartlett Publishers, Inc.
ISBN-10: 0763731277; ISBN-13: 978-0763731274

Web tools

www.mathworks.com
www.R-project.org
www.vigenetech.com/MicroVigene.htm