Senior Research Scientist/Assistant Professor
Harvard Medical School
At A Glance:
Name: Zoltan Szallasi
Title: Senior Research Scientist, Children's Hospital Informatics Program, Assistant Professor, Harvard Medical School, Boston, Mass.
2001 — present — Senior Research Scientist, Children's Hospital Informatics Program, Assistant Professor, Harvard Medical School,
1996-2001 — Assistant Professor (Tenure Track), Department of Pharmacology, Uniformed Services University of the Health Sciences, Bethesda, MD
1995-1996 — Visiting Scientist, Laboratory of Cellular Carcinogenesis and Tumor Promotion, Division of Cancer Etiology, National Cancer Institute, National Institutes of Health, Bethesda, MD
1992-1995 — Visiting Associate, Laboratory of Cellular Carcinogenesis and Tumor Promotion, Division of Cancer Etiology, National Cancer Institute, National Institutes of Health, Bethesda, MD
1989-1992 — Visiting Fellow, Laboratory of Cellular Carcinogenesis and Tumor Promotion, Division of Cancer Etiology, National Cancer Institute, National Institutes of Health, Bethesda, MD
1988 — MD, University Medical School, Debrecen, Hungary
1987 — Graduate student, Department of Histology, University of Lund, Sweden
1984 — Graduate student, Fritz Verzar International Laboratory for Experimental Gerontology, Debrecen, Hungary
Zoltan Szallasi is a senior research scientist at the Children's Hospital Informatics Program at Harvard Medical School in Boston, and is also a fixture on the microarray conference circuit, where BioArray News has bumped into him on numerous occasions.
Two weeks ago Szallasi, an MD originally from Hungary who previously worked as an investigator at the National Cancer Institute, gave a presentation at Cambridge Healthtech Institute's Beyond Genome conference in San Francisco concerning a set of probe sequence analysis tools that he claims may be useful in describing the role cross hybridization between probes plays in affecting the results of a microarray experiment.
While Szallasi said his group is not ready to present a "publishable tool" to the research community, he spoke with BioArray News last week about the technology and why it is important.
What are you working on at the moment?
We are investigating whether genome scale approaches, such as microarray analysis or high throughput screening for mutations, can take us closer to the rational design of combinatorial therapy in human malignancies, in particular, breast cancer.
And this is at CHIP?
Yes, my lab is part of the Children's Hospital Informatics Program but our work involves several collaborations, for example with Lyndsay Harris at the Dana Farber Cancer Institute or with the group of Roderick Jensen at the University of Massachusetts, at Boston.
What kind of team exists for your research?
I have three students working on computational issues.
You are working on some research on cross-hybridization on probes. How wide scale do you think cross-hybridization is?
As we and others have shown, there is widespread cross-hybridization on microarrays. This is kind of obvious. Microarrays are run under a single rather permissive hybridization condition, therefore the expectation that most microarray probes will produce a specific signal is unrealistically optimistic. For a large portion of microarray probes a significant level of homology exists between probe sequences and transcripts that are not targeted by the given probe. Basic biochemical considerations suggest that this level of sequence homology will readily produce cross-hybridization signals. I am quite sure that most users with a basic training in biochemistry have known this since the beginning of microarrays, but is has not been that easy to discuss this issue in a meaningful manner.
Why hasn't it been easy to discuss?
Two reasons. The first has a lot to do with the way microarray technology developed and the "historical context" can explain a lot. It might be worth pointing out that the introduction of microarrays was more of a result of quantitative development than a qualitative jump in technology. Long before microarrays, there had been something called "dot-blots" that biochemists had toyed with for decades.
The problem was and still is that the "dot blot" approach eliminates size measurements of the measured transcript — important "confirmatory information" biochemists heavily relied on while running Northern blots. The big thing [with the advent of microarrays] was that by "dot-blotting" the probes instead of the RNA samples, and by exploiting miniaturization and automation, the whole technology became very efficient.
There were of course other nifty ideas as well, such as light directed oligonucleotide synthesis, the principle the Affymetrix technology is based on. These three or four major technical advantages made the technology really happen.
[Still], the introduction of the technology did not really solve the basic measurement issue. For every oligonucleotide hybridization there is an ideal condition that ensures the maximum attainable specificity, even if that is only a limited one due to sequence homology limitations. For a given microarray hybridization, a single condition is chosen and one hopes that most probes are going to be specific. Now, whether this is achieved or not by a given technology has never been shown or discussed based on scientific facts or data. It is only now, ten years after the introduction of technology, that researchers started to investigate this issue using very indirect data sets that were not really designed to investigate this question at all.
[This] leads us back to the "historic context" again. In the case of oligonucletotide-based microarrays, produced by industry, there was no sequence information available until 2001, which is obviously essential for these types of studies. Otherwise there is no way of matching sequence homology to cross-hybridization. It is important to take a good hard look at probe sequences. It was not that easy to design good microarray probes, especially starting with fragmented unconfirmed transcript data-bases.
Therefore, in hindsight it is perhaps not that surprising, that as we and others showed, depending on the microarray platform up to 40 percent of the probes do not have the actual correct probe sequence against the transcript they were supposed to target. I am sure the end users would have appreciated this bit of information. Simply releasing the probe sequence information does not suffice. The lack of carefully designed experiments slowed down further investigation.
[Also], this is still an expensive technology and after the initial [investments in the late 1990s], there has been no grant support to aid this type of technological research, which is quite understandable. In any case, it is not quite clear to what extent it is worth investing money and resources into determining the level of cross hybridization and accuracy in microarray measurements.
But who would the burden fall on [to optimize the technology]? Would it fall on the manufacturer to create a better product?
Right now it's changing and I am really reluctant to criticize anybody. But for a long time, about six or seven years, the manufacturers did not release any probe sequence information. But right now it's changing. We have with the US Food and Drug Administration and some other government organizations a large scale effort. [For example], Leming Shi, [an investigator at the FDA's National Center for Toxicological Research] put together a large scale microarray quality consortium that involves all the manufacturers and there are multiple sites for all platforms using the very same RNA, and they look at what's happening, what's going on. Very soon the tools will be out to do some meaningful analysis. So soon we can probably put this whole issue to rest. Right now there are some people that are trying to [deal with] these things like the National Institute of Standards and Technology], but there is no good quality data out there that we can use and that we can build models on. So hopefully with NIST, we can start designing experiments that deal with that very issue. With the NIST backing, if they are really serious about this, within a year or two these things could be solved. I mean it's not rocket science.
Are you disappointed with the technology?
No. We have never had a chance to evaluate it. If you had asked me in 2001 'Am I disappointed?' I would have said 'I have no idea. I don't know what we're measuring. There are tens or a hundred thousand spots on this thing. They are very nice. They are dark or they light up, but I have no idea what I am measuring.' And until I have a fairly good idea what the sequence information, how the hybridization occurs, we have no idea.
Maybe it's a great technology; maybe we can improve it with very simple tools. We have some ideas how with a single chip, if you keep changing hybridization conditions, we can probably squeeze out much more reliable information. But the problem is that nobody has ever really tried very hard. What people did is they just worked with some very questionable data sets that manufacturers released and they tried to tweak and improve their algorithms on that, but there was nothing serious going on.
Now you said that you have developed some kinds of probe sequence tools.
We are working on probe sequence based tools to predict what's going to cross hybridize and what's not. But we do not have a publishable tool yet. There are some obvious sequence characteristics that "guarantee" strong cross-hybridization and certain sequence characteristics that prevent any binding under a given hybridization condition. These probes can be essentially thrown out. But these are rather [undeveloped] tools and we must be able to do much better.
Are you drawing on existing technology for those tools?
We are working with Affy and [GE Healthcare's] CodeLink simply because raw data and probe sequence information are available for those. A lot is already known, for example, the longer the probes are the more prone they are to cross hybridization and stuff like that. So bits and pieces are known. I think the basic knowledge is pretty good, and not with that much extra effort, if the manufacturers are interested the whole thing can be cleaned up within a relatively short period of time.
Is there any estimated date you have for making these tools available to these researchers?
It all depends on producing the data. If [NIST] is interested and [they] can produce the benchmark data soon, we can have a first level of solution not long after that, perhaps by the end of this year. We know how to do this. We just need a lot of good quality measurements.