When it comes to hot topics in microarray analysis, the latest clustering algorithms and statistical calisthenics don’t necessarily top the list for many pharmaceutical companies, who are still struggling to determine how microarrays can help them speed drug discovery. Last week, at the CHI Microarray Data Analysis conference in Baltimore, Md., researchers from several big pharmas capped three days of talks on cutting-edge analysis methods with results from early gene expression experiments conducted to help them determine not how, but whether, microarrays could be of use to their projects. The talks, from Schering-Plough, GlaxoSmithKline, and Bristol-Myers Squibb, gave attendees a taste of how microarray data behaves — and often misbehaves — in real-world applications.
Jun Zou, a principal scientist in Schering-Plough Research Institute’s allergy research group, discussed how his team is identifying asthma-related genes through a combination of microarray experiments and traditional genetic linkage studies in asthmatic families. Zou’s team used microarrays to profile lung tissue from cynomolgus monkeys, which have a natural nematode hypersensitivity that produces symptoms similar to allergic asthma in humans.
From an initial pool of around 40,000 ESTs, the Schering-Plough team identified 149 genes regulated greater than 2.5-fold in monkey lungs treated with the nematode antigen. They carried out standard hierarchical cluster analysis to break that set into five different subgroups, but Zou said that really wasn’t enough to prove the validity of the findings. At the time of the experiment — four years ago — Schering-Plough wasn’t sure “how reliable and accurate microarray technology was,” so Zou said they decided to verify their results with Taqman RT-PCR.
The results were somewhat surprising. Taqman “gave the same trend” as the microarray analysis when it came to whether genes were up- or down-regulated, but “was more sensitive,” Zou said. In an overall comparison of the Taqman results with the microarray results, the two indicated the same fold change in differential expression 55 percent of the time, with Taqman indicating a higher fold change 35 percent of the time and a lower fold change the remaining 10 percent of the time. The take-home message from the study, Zou said, is that microarrays are a powerful and effective tool for identifying differentially expressed genes in a disease model, but when it comes to validating them, it’s best to work “in combination with real-time PCR.”
Amber Anderson, senior statistician for GlaxoSmithKline’s clinical pharmacology statistics and programming group, described GSK’s early experiences with microarray analysis using one of the company’s initial clinical microarray studies, conducted two years ago. GSK wanted to evaluate the effect of two non-steroidal anti-inflammatory drugs (Oxaprozin and Rofecoxib) versus a placebo in healthy patients. Each of the 13 patients in the study received each of the three regimens, with four blood samples taken for each. With a total of 137 samples and more than 12,600 genes, Anderson said the team found itself facing a “multiplicity problem” that required them to calculate adjusted p-values using the Bonferroni correction, false discovery rate, and other methods. The problem, however, was that “even the least conservative p-value adjustment was unable to identify any differentially expressed genes,” Anderson said.
The GSK team was able to loosen the constraints a bit to detect differentially expressed genes while remaining statistically accurate by using “permuted p-values” — a method that recalculates p-values for the 200 genes with the lowest original p-values by randomly re-ordering regimens within a subject for comparisons between regimens, and re-ordering time points for comparisons between times. Anderson said that because the process is computationally intensive, the GSK researchers were limited to 1,000 permutations for each of the 200 genes. The result, she said, was worth the effort, as it offered a more reliable assessment of the accuracy of the original ranking. As with Zou’s experience, the method provided an additional metric to validate the microarray results.
Moving into lead discovery, Petra Ross-Macdonald, a senior research investigator in applied genomics at Bristol-Myers Squibb, discussed how BMS embarked upon a pilot project three years ago to test gene expression profiling as a means of detecting toxic effects, off-target binding, or other undesirable properties in chemical compounds before they hit the clinic. BMS first ran an experiment with Troglitazone, a diabetes drug that Warner-Lambert pulled off the market in 2000 after it was found to cause liver failure in some people, to see if gene expression profiling would have indicated a problem with the compound earlier in the process. Sure enough, the red flags were there: Initial heat maps showed that Troglitazone’s gene expression profile was very similar to Farglitazar, a compound that was dropped prior to clinical trials.
After the success of the pilot, Ross-MacDonald said, “We decided to use microarray gene expression profiling on actual development compounds last year.” An initial obstacle, she said, was accounting for the experimental process when analyzing the data. BMS has a rigorous QC/QA process, Ross-MacDonald said, that uses Agilent chips to validate its Affymetrix chips, but even the most stringent controls were unable to eliminate bias due to the experiment design: Process blocks will tend to cluster together, providing analysis effects that “are more prominent than the biology,” she said. The key to the problem is being aware of it, however, and Ross-MacDonald said that her team gained confidence in its findings once it began correcting for these effects.
In a study to determine the selectivity of kinases using gene expression data for normal cells, diseased cells, and treated cells for 22,000 genes, the research team found that compounds with selectivity issues “did pop out,” Ross-MacDonald said, although she added that a few lingering questions remain before they can rely entirely on microarray data to support compound selection: “How distinct does a compound have to be?” she asked, and “How many cell types do you have to use?”