PALO ALTO, Calif.--Molecular Applications Group and Affymetrix announced a collaboration last week in which Affymetrix's GeneChip will be the platform that will allow the coupling of quantitative differential gene expression data with Molecular Applications' data mining and visualization technologies.
The companies, which will jointly market and profit from the commercialization of their software, said they will develop datamining and visualization tools to enable scientists to discover biological patterns in gene expression data and integrate them with bioinformatics data from sequence, structure, and predicted gene function. Affymetrix and Molecular Applications will also team up to develop a gene index system for clustering expressed sequence tags (EST's) into a clean index of distinct genes from public and private sources of EST's, including comprehensive annotations of the resulting sequences.
Myra Williams, president and CEO of Molecular Applications, told BioInform that the technology developed through the collaborative effort "will create a paradigm shift in the way drugs are discovered, ultimately."
She explained, "An enormous amount of data is generated using the DNA arrays. It's very powerful technology for understanding disease processes, locating the genes involved in disease, or deducing what genes interact. What's required now is the development of the new science and software capabilities to move from the results of those experiments to providing that sort of understanding."
The arrays generate millions of different probes and experiments for analysis, she explained. Technology to be developed under the collaboration will help researchers "move from what you see on a probe in terms of differential gene expression under a number of different conditions, to being able to do something about the role that a gene plays in disease," Williams said.
The partnership, which evolved from longstanding professional relationships between Williams and Affymetrix CEO Stephen Fodor and Chairman John Diekman, "gives us an opportunity to work with Affymetrix to mine these data," Williams added.
Although she said it is "premature to try to quantify what the collaboration could mean" for the growth of Molecular Applications, a privately held firm founded in 1993, Williams said she expects the deal to translate into "a very significant opportunity for both companies."
"Our focus has been on proteins and protein function. The next thing you need to understand is how that relates to disease. The Affymetrix chip technology can help us understand that," Williams continued. Adding Molecular Applications data analysis technology will "provide a major expanded capability for scientists," she claimed.
Williams gave this example of how the new technology will be useful: "You've done an experiment and you are looking at disease tissue versus normal tissue. You see that there are 20 genes that are profoundly altered in their expression in the presence of disease. By linking the results from that experiment to our ability to very quickly define everything that can be known or deduced about those genes and their function, you will be able to draw a conclusion from that experiment."
What Williams described as a less important but highly useful product, the index of human gene EST's planned by the collaborators, is intended to be an improvement over gene indices now in the public domain. "The gene index available through the National Center for Biotechnology Information has lots of errors and problems in it. That can now be significantly enhanced based upon new information. We can improve the quality of the genes defined there and couple that with information about what can be predicted about gene function. It will be a much broader product than what is currently available," she said.
Initially the gene index will rely on public sources of EST data, but an open-architecture design will allow customers to merge proprietary data into the index. As to market demand for such a product, Williams explained, "Cleaning it up becomes particularly important as people start designing probes to go on the DNA array, because the sequence needs to be the best possible with the fewest errors. Otherwise you won't get good hybridizations of those probes."
"There's a great demand for annotation," she continued. "One of the problems that we face is that there's a lot of information of varying quality. By doing comprehensive information retrieval you can frequently eliminate some of the poor information."
For instance, Williams said, "If you see three different sources in three different analyses, all of which lead to the same conclusion, and yet you have a fourth analysis that gives you something that is not at all supported by the preceding three, then it can help you to improve the quality of the conclusions you draw."
"The kind of technology we can bring to annotations helps scientists understand all of the information available, and lets them drill down to the original source of that information and draw conclusions as to what is relevant and what is erroneous," Williams added.
Targeting the unmet need
Williams said bioinformatics leaders and executives she has spoken with at pharmaceutical and biotech companies have identified the analysis of differential expression data as an area in need of major development. "Many people have developed their own software, but without exception, those we've talked with have said that they have just scratched the surface of what is the potential in this area," she commented.
It's still "too early to talk about an exact date of availability," but Williams said Molecular Applications' GeneMine software--a product introduced late last year that performs comprehensive retrieval, analysis, and clustering of information into a compact visual format--will be connected to Affymetrix's DNA array analysis software within the next few months.
Additional joint products will be rolled out over time, she added. Furthermore, the same technologies could be applied to agriculture, Williams said. "There's no question that all of the techniques we have developed with a primary focus on human are highly relevant to pathogens, plants, and other organisms. Because our technology is modular and open in its architecture, we will be developing a plant-specific version," she predicted.