AT A GLANCE
Peter Tolias, director, Center for Applied Genomics, Public Health Research Institute
Associate professor, UMDNJ-New Jersey Medical School
Adjunct associate professor, Rutgers University-Newark and the New Jersey Institute of Technology
BSc 1981 and PhD 1987, McGill University, molecular biology
Postdoctoral fellow, 1987-1991, Harvard University
Research interests: Use and development of novel DNA microarray technology and data mining software for biomedical research applications such as cancer, infectious disease, spinal cord injury, and space flight.
At the Public Health Research Institute Center for Applied Genomics, you developed the first TB microarray last year. What other types of microbial microarrays are you developing now?
We have three in house right now, one for TB, one for cytomegalovirus, and one for Bacillus subtilis. We are about to print another one for Group A Streptococcus. The institute specializes in infectious disease, and a number of our investigators use these arrays for their work to understand the biology of infectious disease.
Are more and more researchers getting into microarrays?
Absolutely. We’re still in the growth phase, there’s no question about it. We also offer a human oligo chip representing approximately 19,000 genes, and we have two different rat chips. But we have plateaued in our Affymetrix use. What has really taken off is our own spotted arrays — because they are cheaper than the Affymetrix chips, and you can spot exactly what you want. We have been processing approximately 100 to 125 Affy chips a month, and about half have been human GeneChips. But moving forward, we have many in-house projects that require a lot of human arrays. A very good example is a project that we have with the University of Medicine and Dentistry of New Jersey. They received a grant from the Department of Defense for chip-based work on bioterrorism, looking at blood infections resulting from agents such as plague, anthrax, and tularemia. Our goal is to use our human chip to analyze transcriptional responses that result when infecting human blood cells with infectious agents, and trying to determine discrete signatures of each pathogen. The idea is that if there is some kind of an attack or an exposure and you can no longer detect the actual pathogens themselves or any agent that was used, you may still be able to detect some kind of immunological signature in the gene expression profile. That program involves running almost 2,000 chips a year to look at ten different bugs that UMDNJ is getting from the Department of Defense.
How are you handling the data analysis for such a massive project?
We have our own informatics group that has developed nonparametric statistics in house, looking at differences among large sets of array data to determine what is unique for any particular bug. We have used it very successfully for cancer research: We have been looking at differences among lung, ovarian, breast, and prostate cancer, as opposed to just looking for gene expression profiles of cancer, at the different stages of progression and the differences in epithelial cells. The same algorithms are used for a wide array of applications including the work we are doing with pathogens with UMDNJ.
Are those the algorithms that your bioinformatics leader Michael Recce developed, where you rank gene expression sort of like a Wilcoxon Rank Sum test?
Exactly. We are using this approach to rank genes within tumors, not by order of fold changes but by order of what are equivalent to inverse p-values, which we refer to as event ratios. This enables us to look for consistency with respect to the directionality of expression, and rank the genes accordingly. We then examine the number of ranked replicate samples, to compute the actual fold change. When the reproducibility of an experiment is high, the corresponding event ratio (rank number or confidence value) is close to 1. You can then compare this number, in, say, breast cancer, to the corresponding value for, say, prostate or ovarian cancer. What you are looking for is an event ratio that is very high only in one of the data sets so you can distinguish this set among many. That’s what the power of this really simple algorithm is: It really allows you to look at very large datasets and quickly pick out a very small subset of genes that would be acting as markers to really distinguish one group from another.
We have successfully applied this to classify different cancers, and are now applying it to look at pathogenic agents used by bioterrorists.
So you take a little blood sample and look at gene expression profiles...
In whole blood. We are also purifying different types of blood cells to discover genes that are undergoing transcriptional changes in a subpopulation of cells. However, signatures deciphered from isolated cell types may be diluted in whole blood. The idea is to develop these markers for whole blood so you could use real-time molecular beacon-based RT PCR or a Cepheid-type device that would rapidly give you the answer in an hour and a half. Also, [in the cancer arena,] we would like to do a large-scale study where we would profile all tissues and all different types of cancer and use an algorithm such as this to obtain a unique set of markers that distinguishes different diseases.
You might have to run that study on a supercomputer.
Maybe not a supercomputer. The UMDNJ just got a relatively powerful system: a Sun 6800 mid- range server with 24 Spark III processors. It has excellent computing power. We have all of our databases and all of our algorithms sitting on the back end. So we have the computing power to eventually attack these problems.
Finally, you recently got a supplement to your NCI grant. What you are planning to do with it?
Part of the work in my NCI grant is going to be published in the August issue of Genome Research. We’ve come up with a method whereby we can use what we call a universal oligonucleotide substrate to screen expression libraries by a filter-type binding assay for clones and proteins that bind DNA and RNA. It’s very simple. It’s just like using assays whereby you clone a site-specific DNA binding protein: You take an oligo sequence either defined by footprinting or a gel shift assay, you make many copies of it, radiolabel it, then screen an expression library for binding, and make phage clones. We’ve adapted this strategy for this oligo that has features typical of DNA and RNA binding proteins and mismatches, all in one oligo. We’ve come up with an oligo that reduces false positives from ionic interactions and proteins that are non-specifically binding DNA and RNA. Out of approximately 150 positive clones that were isolated, only one did not encode a known DNA and RNA binding motif. It works very well. We are now applying this technique for use on spotted protein chips as these may one day replace expression libraries in activity screening applications.