At A Glance
Martin Bilban, research assistant, University of Vienna, department of medical and chemical laboratory diagnostics
PhD in molecular and cell biology, The Scripps Research Institute: Studied mechanisms regulating cell migration and invasion in normal and malignant tissue. Developed metastasis cDNA microarray
Postdoctoral fellowship, University of Vienna
Recently published a paper in BMC Genomics entitled "Defining signal thresholds in DNA microarrays: exemplary application for invasive cancer"
What is your current project at the University of Vienna?
We are starting projects to study the molecular details of vascular disease such as atherosclerosis and the role of macrophages. In this project, we will use genome-wide microarrays [from Affymetrix] to screen for the responsive genes in macrophages.
What role do microarrays play in your research?
We have developed a focused microarray system [in collaboration with Mary Hendrix at the University of Iowa] for studying how cells interact with their extracellular matrix. These interactions are necessary for cell migration and invasion during processes like wound healing, cancer dissemination, and inflammation. It’s a cDNA microarray, [and] we call it focused because it has only a subset of genes, but these genes are known to be involved in these biological processes, migration and invasion. [It contains] 100 genes, including controls, housekeeping genes, and the test genes. These were integrins, proteases, and some members of extracellular matrix proteins – for example laminins. Our hypothesis was that highly aggressive cancer cells or melanoma cells may have different amounts of proteases [and] may secrete different patterns of extracellular matrix proteins than poorly invasive melanoma cells.
Very soon we realized that we had to deal with a number of technical problems, [for example ]there are several options for normalizing differences of Cy3 and Cy5, [depending] largely on the design of the microarray. If you have a microarray that contains tens of thousands of genes, you can normalize using all genes, … or you can use housekeeping genes. You can also spike external control RNAs that don’t hybridize to your test genes into your labeling mix, and a forth method would be [to] select invariant genes with mathematical methods. If you only have a focused microarray with a few hundred genes, the global method might not work because a large percentage of your genes might change, so you would normalize out the biological difference between those samples. The best method that we came up was spiking external RNAs into the labeling mix.
The second problem that we encountered was, if you have a low signal coming [from] your test genes, how can you say it’s specific? And that’s actually a huge problem with microarray data. If you have a large microarray with lots of genes, in any given cell type only a small fraction of the whole genome might be expressed. If you have genes that are not expressed, you would assume you get zero fluorescence signal from the genes on your arrays. But that’s actually not the case because you always get a very low signal from these genes...you might have some degree of non-specific hybridization, and that occurs because in microarray hybridizations you only use one hybridization stringency that has to be met by all genes.
But you can overcome this problem by using positive and negative controls on your microarray, and if you evaluate this control set, then you might be able to determine a reasonable cutoff. Your positive controls might be a couple of genes that you spike into your labeling mix at different amounts, and the negative control genes would be genes that you don’t spike into the labeling mix and that you know your test genes will not hybridize to. For example, if you have a human microarray, then for positive controls you could use bacterial genes. What we have used on our metastasis chips for negative controls were plant genes, Arabidopsis genes, and for positive [controls] we used Bacillus subtilis genes, which are actually the positive controls that Affymetrix recommends for their arrays.
In your recent BMC Genomics paper, you describe a statistical method for filtering false-positives and false-negatives called receiver operating characteristic analysis, or ROC. Can you tell me more about this?
It’s actually an old method that [was] used in the late sixties for analyzing signals that come from radar. It actually turned out to be a nice method to determine the sensitivity and specificity for many different medical and diagnostic tests.
We just thought that microarrays one day might be used as a diagnostic test, so why not try and apply this method to microarrays. All you have to have is a set of positive controls and a set of negative controls, and then you can determine an overlap region between these two sets. For example, if you have a large overlap, then you have lots of non-specific hybridization, and if you have no overlap, then you are fine and your hybridization worked very well. And that you can determine with this ROC method.
What it will give you is a threshold with a certain degree of specificity and sensitivity. The area under this [ROC] curve can serve as an estimator for hybridization quality, because [it] is a measure for the degree of overlap between negative and positive controls. If they don’t overlap, then you have a value of 1.0, [and] if 50 percent of the signals overlap, then you have a value close to 0.5 [and] you have almost no discrimination. If you run two or three microarray hybridizations for one experiment,…and you have one microarray that has a low…value, then you should actually exclude this microarray from your analysis and probably run one more hybridization. So you could define a gold standard that has to be met by your individual hybridizations.
What is the main advantage of your method over others?
The advantage over current methods to select a threshold is that with current methods, you would use a ratio threshold – let’s say anything that’s higher than a twofold change is real, and anything that would give you a fold-change that is less than two you would exclude from the analysis. But, for example, if you have signals in the Cy3 and in the Cy5 channel that are very high but have only a 1.5-fold change, that might have more importance than a 4-fold change that comes from signals that are very low.
The importance is that you have to look at the absolute signal intensities rather than the ratios. One other method that uses signal intensities is to look at the median value of your negative controls. For example, many cDNA microarray formats use plant or bacterial genes that wouldn’t cross-hybridize with human genes. But the disadvantage of using just the negative controls is that you don’t know the specificity and sensitivity that you will get if you evaluate a set of positive and negative controls.
Can you use this method for any microarray?
It is actually applicable to any microarray format–oligo arrays, or cDNA microarrays–as long as you have these controls printed on your arrays. You also have to have the positive controls as RNA molecules to spike into your labeling mix.
What are the limitations?
You really need to be careful about spiking. If you look at the distribution of your test genes, that has to match the signal distribution of your spike signal. Because if your spike signals are very high, then they will always be separate from your negative control signals. So you have to really spike carefully and spike it in the signal range of your test genes.
Where do you see the microarray field going?
I hope that [the microarray] will find its way as a standard diagnostics tool for many kinds of disease analysis and diagnosis. I think that if you have a small subset of genes that you know is involved in…disease progression, then you can for example spot many replicates of that gene, so you have high statistical power of your microarray data.
[With regard to technical issues], I would like to see improvements in terms of being able to design probes … that are equally efficient in terms of hybridization efficiency. Because many times, for example if you have oligo arrays, if you have a low signal, it doesn’t necessarily mean that you have no mRNA in your sample, that the gene is not expressed: it might actually come from low hybridization efficiency. So if you can develop software or strategies to efficiently and quickly design oligos to genes, that would be a great improvement. The other one would be to be able to design probes for splice variants efficiently – obviously you can’t do that with cDNA microarrays – but to be able to do that with short oligos.
What is your opinion on the usefulness of protein microarrays?
Protein chips are a very efficient tool to study the protein abundance similar to what you can do with DNA microarrays. “[But] proteins often undergo posttranslational modifications, and may move within a cell, bind with other proteins or cofactors and vary in abundance and activity level over time, temperature, and pH. Standard protein arrays provide no information about such modifications and whether the proteins detected are in an active or inactive state….Analyzing the abundance of proteins may not be sufficient. Consequently, ’functional proteomics’ looking at protein interactions or activity has to be applied.