Facing the fact that statistical analysis of gene expression data is currently restricted by a number of factors, researchers at Paradigm Genetics have found a way to improve the reproducibility of the data they derive from their expression analysis platform.
“The real challenge in expression data is not just from one chip to the other, but in a collection of hundreds or thousands of chips, how can you compare any of those results to anything else,” said Crag Liddell, vice president of informatics at Paradigm. “Can you compare chips made in a subsidiary in France to those made in your home base in California that are made on different dates with slightly different technologies? The answer to that at the moment is no. It’s very hard to understand that data.”
In response to this challenge, Paradigm designed its FunctionFinder bioinformatics system to account for the process and biological variation that can plague statistical analysis of microarray data.
Liddell and his colleagues described the approach in this month’s Proceedings of the SPIE BIOS 2001 Conference. The team has built a database of baseline expression levels for Arabidopsis to act as a control. For each Arabidopsis gene of interest, Paradigm conducted a series of replicated microarray experiments of wild-type samples to measure expression levels over major growth stages in order to determine a mean expression level and standard deviation for each possible condition.
The baseline data serves as “a very solid statistical method for normalizing the results of expression analysis,” according to Liddell, and acts as a control to establish whether observed results are actually statistically different from the normal state of genes in a cell.
Said Liddell, “All the most valid parametric statistical approaches will let you say [an expression measurement] is significantly more than another chip. But the real question is, is that outside the bounds of a normal biological system?” The baseline data provides a statistics-based definition of a normal biological system that indicates the amount of variation around the mean necessary to determine whether an observation is indeed biologically significant.
In practice, Paradigm first runs the expression data gathered from Agilent chips through the Rosetta Resolver system for “refining” into a form that can be presented to the FunctionFinder system. FunctionFinder then performs the baseline-based statistical analysis and compares expression data with metabolic data, phenotype data, and sequence data.
Liddell said that the baseline data and other features of FunctionFinder are used solely for Paradigm’s purposes to deliver data to its clients, which include Monsanto and Bayer. He noted, however, that Paradigm is considering making it available through licensing as the system develops.
— BT