Iconix Pharmaceuticals, a Mountain View, Calif., chemogenomics company, has been happily compiling a database of chemical compounds and gene expression profiles for the past three years. But by the time the database, called DrugMatrix, began creeping toward 100 million data points a year ago, it was simply too large for Iconix’s researchers to extract meaningful information from it using standard data-mining tools, according to COO Leslie Browne.
Luckily, an Iconix scientist was aware of a research group led by Laurent El Ghaoui in the department of electrical engineering and computer science at the University of California, Berkeley. An expert in the analysis of very large data sets for the financial industry and other domain areas, El Ghaoui had developed a set of linear discriminant analysis algorithms that had not yet been used for life science data.
“We realized that the tools we were using needed to be supplemented,” Browne said, so he and his colleagues compared El Ghaoui’s algorithms to software from several commercial vendors in a bake-off-style evaluation using “a slice” of the data in DrugMatrix. “We said, ‘Here’s the data, tell us what you see.’ And El Ghaoui’s algorithms did the best job,” Browne explained.
Last week, Iconix announced that it had signed an exclusive license for the UC Berkeley technology, which will be incorporated in a new set of data-mining tools it is developing for its database. Browne said that Iconix has already used the new algorithms to extract several new “Drug Signatures”— collections of biomarkers that characterize unknown compounds with regard to toxicity, on- and off-target effects, and mechanism of action — from the information in DrugMatrix.
According to Browne, the new algorithms offered an order-of-magnitude improvement over the data-mining methods the company was using previously to predict characteristics. “We started out modestly,” he said, “and we were able to use off-the-shelf tools to analyze the data.” However, he noted, the company has now completed gene expression experiments for tissues to which 550 different chemical compounds were added. “The chip has 10,000 genes, and there are about 24 arrays per experiment, so when you multiply 550 by 24 by 10,000 genes, you get some kind of an idea about what’s in this database,” Browne said.
Iconix sells subscriptions to DrugMatrix, which includes a set of data-mining tools. The new algorithms “will ultimately make their way into the product, but they’re not embedded at the moment…depending how the business develops, we’d like to offer it to customers as well,” Browne said.
For now, the company is keeping the new technology to itself while it identifies additional Drug Signatures — the key building block for its predictive chemogenomics platform. In a typical analysis, Iconix uses the Drug Signatures to predict the toxicology effects and methods of action for several compounds of interest to a pharmaceutical company. “We can see the properties of the compound because we can see the individual signatures embedded in the overall global expression profile of the candidate compound,” Browne said.
Iconix is already benefiting from the improved analysis technology, according to Browne, and just wrapped up the analysis of five different data sets for a pharmaceutical company using its new Drug Signatures.