There’s no shortage of commercial software packages for enterprise-scale microarray analysis, so why did Johnson & Johnson’s Pharmaceutical R&D group in Raritan, NJ, turn to academic collaborators and public domain tools when it wanted to automate its gene expression analysis pipeline? According to Javier Cabrera, a Rutgers University statistician who worked on the project, commercial statistical packages aren’t as easy to use as the collection of R programs he and his J&J collaborators have written for microarray analysis.
The software package, called DNAMR (for DNA Microarray Routines) is a group of R programs that Cabrera and J&J biostatistician Dhammika Amaratunga wrote for the statistical analysis of microarray data. Although similar to BioConductor, an open source microarray analysis package also based on R, DNAMR was developed to be “more user-friendly” than BioConductor, Cabrera told BioInform. The difference between the two packages is primarily one of “philosophies,” he said.
At the BioArrays 2003 conference held in New York last week, Cabrera said that the J&J collaboration grew from a backlog of microarray data at the pharmaceutical company. With four sites to support, hundreds of experiments being conducted at each site, and “not enough manpower to analyze all the data,” J&J was looking for a way to automate its analysis pipeline without compromising statistical rigor in the process.
Cabrera, Amaratunga, and the J&J bioinformatics department created a system called JNJarray, which uses Perl scripts to link the DNAMR software engine to a centralized repository of gene expression data and a web-browser user interface. Biologists, bioinformaticists, and biostaticians can log onto the system to download any gene expression data within the company that they are authorized to access. The system guides the user through the steps of normalization, analysis, and the selection of differentially expressed genes, as well as the integration of pathway data and gene annotations.
“The idea was to provide a system where the user can sit down and push some buttons without having to go to a statistician or a bioinformaticist for help,” Cabrera said, but noted that the approach does not go so far as to be a black-box “expert system” where users simply plug their data in and get results out. Attempts to create such systems for microarray analysis have not been successful, Cabrera noted, because researchers will always have to make critical choices about what kind of statistical analysis to run and when to run it in order to get the best results.
JNJarray currently offers a laundry list of statistical approaches, with new methods for cluster analysis and data classification on the way, Cabrera said. In addition, the system uses qualitative analysis to graph all the possible outcomes of each analytical step in order to guide later steps. For example, Cabrera said, the system can determine if there are missing values, and if so, what steps to take to ensure statistical significance in later analytical steps.
Cabrera was unable to disclose many details about how J&J is using JNJarray, but did note that in an erythropoietin study using two drugs at two dosage levels over a series of time, the J&J research team uncovered two new pathways for EPO using the system.
Cabrera said that samples of some of the DNAMR routines are available at his website (http://www.rci.rutgers.edu/~cabrera/DNAMR/). In the future, he said, he hopes to make the entire package available through Rutgers.