KINGSTON, Ontario--The moment isn’t exactly an auspicious one for launching a new bioinformatics software company. A shakeout in the competitive market has been predicted for years, and, in recent months, early entrants to the industry such as Compugen, Neomorphic, and Pangea Systems, have drastically altered their strategies. Another trailblazing bioinformatics company, Molecular Applications Group, was dissolved last month not long after its CEO had predicted that most companies in the space would ultimately "either die or merge."
But Evan Steeg, president of the startup Molecular Mining, a drug discovery and datamining software provider here, isn’t worried. Steeg’s firm was incorporated in 1999 and closed an initial $2 million financing round, led by the wholly owned affiliate of SmithKline Beecham, SR One. Now, Molecular Mining’s technology is being tested out in two secret, exploratory collaborations with life sciences companies--one in biology research and the other in chemistry.
Among Molecular Mining’s founders are several computational biology pioneers: Suzanne Fortier, a crystallographer, is vice-president of research at Queen’s University here and vice-president of Canada’s National Science and Engineering Research Council; Janice Glasgow is head of Queen’s Department of Computing and Information Science; Lawrence Hunter is section chief of molecular statistics and bioinformatics at the US National Cancer Institute; Donald Weaver, professor of neurology and associate professor of chemistry at Queen’s, is an expert in computer-assisted molecular design and the application of computational chemistry to drug discovery and design. Steeg, the fifth founder, has experience applying machine learning and bioinformatics to the automation of protein structure determination through x-ray crystallography.
In an exclusive interview with BioInform, Steeg explained how his team plans to make Molecular Mining prosper in a climate that does not seem favorable to bioinformatics software companies.
BioInform: What will enable Molecular Mining to survive in a marketplace where others have failed?
Steeg: Companies fail for one of two reasons: Either they don’t produce what the market wants--what biopharmaceutical companies want--or they fail to access the capital they need. We’re confident we have what R&D scientists and managers need to develop targets and leads more effectively, and that we can maintain the infrastructure needed to deliver it. We can add value to the drug discovery and development process, and the resources required to deliver and get paid for that value are within our reach.
BioInform: What are you offering to customers?
Steeg: Our product and service offering is based on a set of very powerful tools for discovering hidden relationships--correlative, hierarchical, and predictive--in huge biological and chemical datasets. We have the domain expertise to know where, how, and in what combinations to apply these tools and we have the flexible software architecture that allows us to rapidly deploy solutions for particular applications. These solutions combine our proprietary tools with customers’ in-house systems and tools and databases from both the public domain and other vendors.
We provide an extremely valuable computational platform for our customers, but we’re not trying to be their complete software infrastructure. Nor, of course, do we provide expensive laboratory instrumentation or other wetware platforms. This keeps our infrastructure costs manageable compared to some of our competitors, and it makes it less likely that we’ll starve from lack of capital.
BioInform: Is your strategy then to help pharmaceutical companies by adding value to their proprietary data?
Steeg: When we look at the landscape of our marketplace, we identify what you might call the end-users--the pharma and biotech companies that develop leads. Then we have these database generators and integrators--people who produce large amounts of data. At various points along the drug-discovery and development pipeline, these companies can either sell data through subscriptions, or they can sell a platform that enables the end-customer to generate lots of data. But the market is going to shift from just raw data to data-plus-tools needed to get the most out of that data. We view that as a major opportunity.
BioInform: Do you envision partnering with data providers such as Incyte or Celera?
I don’t want to talk about specific companies, but I can certainly envision relationships to help data generators sell a dataset with tools that immediately add value for the end user. Our tools can help in that way.
The tools can also be licensed directly to let the data generator dig out more valuable data and just sell that. So, there might still be a role for data-only sales to pharmaceutical and biopharmaceutical customers, but it wouldn’t be the raw data; rather, it’s a transformed content sale.
BioInform: Do you consider it a higher priority to develop relationships directly with end users?
Steeg: We would like to explore both simultaneously. It’s been said of bioinformatics companies that they have squishy business models or that they change their business models as often as they change their socks. Some of that criticism is deserved, but it’s also a bit misguided because we have in front of us one of the most fluid technological markets in the history of business.
We think we can add enough value to a pharmaceutical company with our tools directly in target discovery and lead discovery that it merits multi-year collaborative relationships that may even have milestone and royalty payments on the backend. But I also understand that pharmaceutical companies are wary of that [model] and we have to prove ourselves.
We are doing this through initial proof-of-concept collaborations and by applying our tools to public datasets. For example, a few weeks ago we ran a well-known gene expression dataset and were able to improve upon results published in Science, finding sets of two and three genes, out of 7,000 highly diagnostic of Leukemia type.
BioInform: Will you generate your own data?
Steeg: At this point, no.
BioInform: Who do you see being your competition right now? Are you worried by ventures such as DoubleTwist.com that will give free access to tools?
Steeg: I’m not worried about any particular competitor or seemingly novel business model, but I’m keeping my eye on them. I don’t know to what extent the e-commerce model is going to work in bioinformatics. Certainly the market size in terms of the numbers of users in bioinformatics and cheminformatics is not of the same order of magnitude of the markets that faced Netscape when they were deciding whether to give away certain of their tools for market penetration and branding.
Competition will still come down to what functionalities we can provide versus what our competitors can provide at various points along the drug discovery and development pipeline. We’re focusing on maintaining and enhancing our very powerful tools, on building the domain expertise to know which problems to apply them to and how to apply them, and on building relationships. Besides, the e-commerce bioinformatics companies are also potential marketing channels for our tools and transformed content products.
BioInform: Will you explain what you call the KDD core?
Steeg: Getting past all the buzzwords, KDD--knowledge discovery for databases, or knowledge discovery and datamining--can be boiled down to finding relationships among variables, in particular among huge numbers of variables so large that older, more standard methods don’t work so well.
I would identify three kinds of relationships among variables: correlative, hierarchical, and predictive. If you buy into that view of the world, then you can approach a lot of particular problems in drug discovery and elsewhere. The variables could stand for chemical descriptors within a QSAR or assay database. Or they could stand for particular residues or positions in an aligned set of protein sequences. If you can find these correlative, hierarchical, and predictive relationships in these datasets, you can find the hidden value.
For example, we take our tools and put them together in different combinations to tackle several different problems. One is gene expression analysis. We can apply our tools to discover disease-linked genes. Part of that is finding correlations between gene-expression values and attributes that relate to disease versus healthy state. It’s sort of a fancy way of doing differential display.
We can classify genes into particular functional classes.
We can identify co-regulated genes and build predictive gene network models.
One step in the process that we can do on the chemistry side is to find which of a huge number of chemical descriptors correlate with a particular assay result. We’ve done this in preliminary work with a pharmaceutical company. Let’s say there are more than 500 different chemical descriptors and tens of thousands of compounds. We can add value to that dataset by reducing its size and bringing out the most important relationships, finding, for example, a much smaller basis set of chemical descriptors, or which attributes of a compound library are highly predictive of bioactivity.
BioInform: Why would a company buy your package instead of a commercially available off-the-shelf package?
Steeg: Two reasons. We can actually do more things within a single application. For example, we can take some of the analysis on gene expression deeper into pathway analysis.
Also, we have this general set of tools and the software architecture for integrating applications.
BioInform: Meaning, you’re not just selling something off the shelf, you’re customizing it?
Steeg: Right. I view the shrinkwrap-versus-service issue as a spectrum. We have tools and the tools deliver certain functionality that we in the industry agree you need to do in a gene expression analysis software tool. But we can customize to such a high degree that it becomes more of a collaborative R&D relationship than a tool sale. The fact that we do things on both the biology and chemistry sides enhances that. To some extent, for a large pharmaceutical company, the division between biology and chemistry, between targets and leads, is artificial and arbitrary. If you look at the whole discovery pipeline there are clearly advantages to being able to tie those together.
BioInform: Are Molecular Mining’s tools adaptable to different hardware?
Steeg: This is one of our strong points. We build portability from the very beginning for all kinds of hardware, operating systems, and database platforms, whether Oracle, Sybase, or flat files. We’re not trying to provide a complete computational service for a company. We recognize people have invested money and expertise and time in using particular tools for multiple alignment, protein secondary structure prediction, chemical database management, and so forth. We’re not trying to do those. We wanted our tools to fit in, so we designed that kind of flexibility from the beginning.