Postdoctoral Fellow in Computational Biology
National Center for Biotechnology Information
Name: Damir Herman
Title: Postdoctoral Fellow in Computational Biology, National Center for Biotechnology Information.
Professional Background: 2003 — present, Postdoctoral Fellow in the Computational Biology Branch at NCBI in Bethesda, Md.
Education: 2003 — PhD, theoretical physics, Case Western Reserve University; 1998 — BSc, theoretical physics, University of Zagreb, Croatia.
This is a busy time of year for anybody, but it has been especially busy for Damir Herman. Herman is a leader of the Microarray Quality Control's Probe-Sequence-Based Cross-Platform Comparison group, and on Dec.1-2, the National Center for Biotechnology Information investigator will be attending a meeting of the Microarray Quality Control project in Palo Alto, Calif., to try to clear up any remaining ambiguities among group members before MAQC publishes the results of its study. Then, on Dec. 7, Herman will report on MAQC to the National Institutes of Health.
All of this activity underscores the amount of energy that is involved in bringing the MAQC project to completion as well as in disseminating the knowledge the project — which is spearheaded by the US Food and Drug Administrations National Center for Toxicogenomic Research and includes all major array manufacturers.
To get a better picture of just what will be expected from the MAQC meeting in Palo Alto and afterwards, BioArray News spoke with Herman this week.
Which samples did the MAQC project test across platforms?
It's Stratagene Universal Human Reference RNA — UHRR I believe it's called. That's an RNA mixture from 10 different cell lines, mainly cancer cell lines. And then another sample is Ambion's brain RNA — an RNA mixture from about 20 individuals of various ages. In addition we looked into titration solutions with 25/75 and 75/25 of each.
How are you testing them?
We are testing them across seven different microarray platforms. There are six big ones — ABI, Affymetrix, Agilent, GE Healthcare, Illumina, NCI-Operon — with tens of thousands of probes. These are all oligo arrays. There's one cDNA made by Eppendorf technologies. There are three validation experiments. Two are QRT-PCR, one is by Gene Express, and the other is ABI's TaqMan assays. There's Genospectra Quantigene. Altogether I believe that's 10.
Who is sponsoring this consortium?
Well the entire thing is sponsored in-house. FDA doesn't sponsor microarrays, doesn't sponsor samples, doesn't sponsor time. So everything is more or less on a voluntary basis.
MAQC is spearheaded by the NCTR in Arkansas. NCBI is just one of the data analysis sites. Being on neutral ground, we are unbiased on probe sequence information. There's a strong push to have everything open as much as possible. Nonetheless I had to sign non-disclosure agreements with ABI and Eppendorf. They are trying to keep their probe sequences confidential. But the rest of the participants are trying to make them [available].
In addition, there is a group at the University of Massachusetts, Boston, led by Rick Jensen that is working on the same cross-platform mapping and even though we have different ideas, we actively compare results.
How has the MAQC project progressed since it was started?
I think that Leming Shi [the National Center for Toxicological Research investigator who heads MAQC] did a very good job in terms of timing.
Initially, there were four samples used in the pilot study. During the meeting at FDA in Rockville, Md. in May this year, it was decided to go forward with the two reference RNA samples for the main study.
We planned to have the main study started in late August and completed within five to eight weeks of receiving the RNA samples. The data was distributed to 21 data-testing sites in late October. We got both the raw data as well as manufacturers suggested normalized and pre-processed data in clean tab-delimited format, which greatly simplifies data analysis.
The data will be publicly available once the first round of work has been submitted in a series of publications in February next year.
The timeline now is to see where we are at in Palo Alto and to discuss what the next step is.
We will meet again in October or November next year to discuss microarray quality control and data analysis. Finally, the FDA plans to issue guidance on microarray quality control and data analysis in late 2007.
What will the publication concern?
The publication will concern a series of articles that came out of the project. There are several topics that we are trying to address. MAQC stands for microarray quality control project. There's no golden standard and reference, but we are moving in that direction. I am not saying we are going to come up with gold standards, but we are trying to come up with a set of thresholds and metrics that can be used in further analysis. For instance, with these two samples, if the whole consortium has seen on all these platforms certain values, then if you run your own experiments, then the idea is to get those samples and run them on your arrays and see if you get similar results. Then you're good to move in uncharted area. But if there are problems with that, then you have to stop and think what went wrong.
The main goal is, I would say, to assess microarray cross-platform concordance. The big question for the FDA is what microarray data submitted for drug approval it can trust and to what degree? As I said earlier, we are trying to have all the great details publicly available and used for microarray quality control.
These papers are going to contain some introductory information about where the FDA is heading with the experiment. We are going to discuss probe sequence base mapping across microarray platforms. This is the stuff I am going to talk the most about here at the NIH. Basically, we compared probe sequences with the Human RefSeq database, which is curated by NCBI as a high-quality mRNA database. And we are going to look into different normalization and gene- selection methods. So there is a whole jungle of options on how you can handle the microarray data,
We will also look into validation, which goes along the line of probe sequence-based mapping. So we would like to understand concordance and discordance on the platforms. Also we are going to look into use of titration datasets. We are going to use these titration datasets to assess the accuracy of microarray data platforms. This is a move towards standardization.
Also, a group at NIST will consider inter-platform and inter-laboratory variability. Each of the four samples were tested on three different test sites in five technical replicates. So there is plenty of room to do statistics and see what can go wrong within the lab and cross compare labs just based on these experiments.
We are also interested in cross-hybridization, which is a big open area, and we are also going to compare one-color versus two-color designs. So Agilent and NCI are two-dye platforms whereas the other ones are just one-dye platforms. We are interested in seeing performance with these two different microarray technologies.
Also, as we would like to make everything available and as user-friendly as possible, there is also work being done on bioinformatics. We are looking at tools that will make massaging of this data as easy as possible.
The idea is to have all these topics covered in a number of publications.
Have you decided to which journal you will submit the study?
We discussed the possibility of submitting to Nature Biotechnology. The editor seems to be excited about publishing a separate issue that will cover the MAQC project.
What would it take for you to consider next week's meeting a success?
Success would be if we cleared up all the issues. We are still debating how to do the cross-platform comparison based on this probe sequencing. We are not certain how to deal with alternatively spliced variants. That's just from my perspective, which is very subjective.
I think if we iron out all the ambiguities that we have with probe-based mapping and platform concordance that we are trying to see and what we are expecting to see [then it will be a success].
Different manufacturers have different agendas in probe design. If we don't take that into consideration we are going to interpret results in the wrong way. I wish I could get more information from the manufacturers about their probe design strategies. I understand it is a business decision, but I believe having open access to probe sequences and [making them] subject to scientific scrutiny can only propel microarray technology forward. I also believe that the MAQC project is an ideal ground for experimental validation of various probe design philosophies.
Some manufacturers do a nice job with publishing the white papers and making them available to the public, but there are a lot of holes that need to be plugged in order to consider these white papers as probe design recipes.