AT A GLANCE Chris Stoeckert
Research Associate Professor, Department of Genetics and Center for Bioinformatics, University of Pennsylvania
PhD in Biophysics, Johns Hopkins University
Postdoc in Human Genetics, Yale University
You recently published a paper in Physiological Genomics where you compared three different labeling methods. What prompted this study?
This study resulted from a collaboration with Klaus Kaestner in the genetics department, who uses microarrays. A question that he faced, like so many others, was: ’What method do you use to label your material?’ We embarked on a fairly comprehensive study to address that question, looking at three labeling approaches: direct incorporation of the Cy5 and Cy3 dyes, an amino-allyl or indirect labeling method, and Genisphere’s dendrimer labeling method [which uses a branched structure of hybridized oligos that contain fluorophors.]
One of the appeals of the dendrimer method has been that you start with about ten times less material, although there had been mixed reports, often through the grapevine, as to how reproducible that approach was.
So how did you go about testing the labeling methods, and what did you find?
In our study, we used a microarray that we built called the PancChip: it contains cDNAs that we determined to be expressed in the pancreas. We used pancreatic RNA for both channels because we expected a majority of the spots to have a positive signal this way. By comparing the two channels and by looking at eight replicates for each of our labeling methods, we studied whether we got the same signal each time.
The first question that we looked at was: If you see a spot give a signal over background, how consistently does that happen? All three methods did fairly well here, but the indirect and the dendrimer methods were consistently better than the direct method. We then studied how distinguishable a signal was above background. When we did that evaluation, the dendrimer method actually came out on top, particularly at higher intensities.
When we looked at the third parameter, how much variation you see, again the dendrimer came out well, although the indirect method did also. What we concluded from that was, when you do get a signal, in particular with the dendrimer you get a strong signal.
However, the last evaluation we did was to look at the predictive ability of the signal. If you have more of a particular transcript, do you get more signal? This was not as detailed a study as the replicate study, we just did five different dilutions of one of the input samples. We found that the direct method gave the best predictive results. The indirect was next, it wasn’t too bad, but the dendrimer did fairly poorly. If you got a signal, it was very strong but did not reflect accurately how much material you put in.
What did you conclude from this study?
Our conclusion was that overall, the indirect method that we used is best, particularly when you have sufficient material. If your starting material is limiting, there are amplification methods that can still allow you to use this method. However, the dendrimer method should be considered when the starting material is limiting and you don’t want to use amplification methods, and if you are interested in whether your gene is expressed or not.
What role do microarrays play in your overall research?
My group is a bioinformatics group. Microarrays play a large role in the data that we help analyze and store. We collaborate with a number of investigators, and we see our role in helping to work out the design of various aspects of microarray experiments, including technical details such as how to put together the array and, as in this case, methods used.
But we aren’t just focused on microarrays in my group. We use that method as one component to understand genes and gene regulation. We have been focusing on mouse and human, but we are also involved with the Plasmodium genome. PlasmoDB [a Plasmodium genome database] is a result of our work with David Roos’ group: the full malaria genome is in fact scheduled to be published on October 3 in Nature. One of the utilities of having that genome is to help identify genes and study their behavior.
The Malaria Research and Reference Reagent Resource Center (MR4), which was established by the National Institute of Allergy and Infectious Diseases, has been giving out microarrays for Plasmodium falciparum to various groups, and those investigators will be depositing their microarray data into PlasmoDB. We have already been a central repository for genome sequence, and we look to be a central repository for microarray data for Plasmodium as well. However, we certainly will also be putting all of our data, including the labeling data from the Physiological Genomics paper, into Array Express.
You are a member of the Microarray Gene Expression Database, or MGED, Society. Has there been any progress in making the group’s standard, minimal information about a microarray experiment or MIAME, commonly accepted?
MGED has just sent out an open letter to a number of journals recommending the adoption of the MIAME guidelines, and the feedback so far has been very positive.
Where do you see the main obstacles to having a working gene expression database?
There are a couple of obstacles. One has been the experimental annotation, but I think we are making progress there. One of the major obstacles remaining is the processing of the data, how to adequately describe that and capture that information.
MGED’s normalization working group — the other three are MIAME, MAGE (microarray and gene expression), and ontologies — is addressing this issue. Gavin Sherlock and Cathy Ball at Stanford, and John Quackenbush at TIGR, are in charge of this effort, one aim of which is to come up with standards for quality control. The question is: ’How do you assess whether this experiment is any good or not?’
Do you think ArrayExpress will see more entries in the near future?
The EBI has been getting the technology up to transfer data and organize it the way they want. One part of this has been the ontologies working group, which I lead for MGED. We are struggling with what the terms are that people should be generally using in order to annotate their experiments. All of this is ongoing, but it’s finally developed to a point where we have been implementing it. As these tools become available — and a number of companies, like Rosetta and Affymetrix, have been involved with this — we will see the number of experiments that have been deposited in ArrayExpress increase dramatically.