As a microarray services provider, Expression Analysis doesn't sell gene-expression analysis software, but that doesn't mean it doesn't know a thing or two about microarray analysis. In the past two months, the firm has added two new analytical methods to its in-house tool menu for interpreting data from Affymetrix GeneChips, and it's in the process of developing several more.
In August, the company introduced REDI (reduction of invariant probes), which identifies probes within an Affy probe set that may not respond to biological changes. These probes are eliminated from subsequent analytical steps, leading to a better assessment of differential expression. Last week, Expression Analysis released PADE (permutation analysis for differential expression), a statistical method that the company describes as a more effective option for microarray analysis than the commonly used t-test.
"The problem is that the t-test and the p-value were really designed to test only a handful of individual features simultaneously," said Wendell Jones, senior statistician at Expression Analysis. With microarray experiments that involve a control group and an experimental group, however, "we're not just testing one or two or three features — we're testing over 50,000 features, because we're actually testing every transcript on a microarray when we compare one group of samples to another."
The result, he said, is that using even a relatively low p-value across an entire microarray experiment can yield a dramatically large number of false positives.
Jones said that PADE differs from the standard t-test in two ways. First, instead of the t distribution, it uses a "computationally intensive" resampling method to "rerandomize" the samples and generate a different reference distribution. Then, it compares an entire set of transcripts rather than one transcript at a time to that reference distribution, generating a much more accurate false discovery rate.
According to Expression Analysis, PADE helps reduce false positives and enables the company to estimate the false discovery rate for a set of potential differentially expressed transcripts.
Steve Casey, founder and COO of Expression Analysis, said that all of the statistical methods the company develops serve to enhance its role as a service provider. "It's almost an obligation on our part to develop these sorts of tools," he said. "This is not something we do to supplement the company's revenue stream — providing outstanding service to our clients is the company's revenue stream."
Casey added that the company's experience as a service provider has given it a great deal of insight into both the strengths and the shortcomings of the Affy platform.
"We process every type of array. We have probably processed every type of tissue that's out there. We've done everything from human brain to rat liver and everything in between, and we've done thousands and thousands of them," he said. "So what that gives us is access to a wealth of information about how probes respond, how well the arrays are working, how the processes and protocols work, and how we can make them work better."
The company employs around 25 people, and around five of those work in bioinformatics and statistical analysis. "When we see things on a recurring basis, or we see things because we know them to be problematic or systematic, then we will on our own initiative develop methods to overcome those issues," Casey said.
Expression Analysis is currently working with some of Affy's new chips, so Jones and his colleagues have a few more analytical methods in the pipeline.
One of these, developed for Affy's 500K genotyping array, is for SNP imputation, or estimating SNP calls from missing data. "A 1 percent no-call rate on a 100K chip is exponentially magnified for a 500K chip, so what we're able to develop here are tools that can impute what those no-calls are supposed to be, thereby increasing the call rates," Casey said. That tool should be available within 30 days, he added.
Another method is under development for Affy's exon arrays. "Now that we are able to detect splice variants, the question is how do you interpret the data properly from an exon-oriented array?" Jones said. Analyzing these arrays will prove difficult, he said, because a gene may not appear to be differentially expressed between two experimental groups, but variants of that gene could be differentially expressed. "Properly interpreting that becomes much more problematic, because before we were looking at 50,000 transcripts, but now if you consider looking at all the exon-related splice variants that are associated with that, it could be an order of magnitude higher."
Jones described another method, probe reassociation, as an "extension" of REDI that accounts for new genomic information that has emerged since Affy designed its current batch of chips. "We now know that what Affy may have put together originally as part of a probe set that interrogates a transcript may actually interrogate two different transcripts," he said. Reassociation will allow Expression Analysis to mix and match data from different probe sets in order to account for this new information, he said.
Despite the quick pace of tool development at Expression Analysis, Casey stressed that the company is "not a research organization, we're a service organization. We do have development activities, but they are there strictly to improve upon the services that we provide to our clients."
Casey admitted that the company has had "conversations" about rolling its suite of analytical methods into a software package, but has decided against it. "It's the nature of Expression Analysis to focus solely on providing the service that we do to our clients," he said. "We think that developing any other products, or any other activities besides that, could take our eye off the ball."
He added that the company is GLP compliant and CLIA registered, "so when it comes time to pull the trigger on a diagnostic, we want to make sure that's where our attention is."
— Bernadette Toner ([email protected])