SANTA CRUZ, Calif., March 27 - Bioinformatics companies have had their share of ups and downs over the last several months. Consolidation is a growing force, some are on the ropes, and the market casts doubt over the whole business model.
But at the same time, bioinformatics efforts in academic and government labs are booming. Data and users continue to pour data into GenBank and other repositories. Academic centers, like the popular tools and data bank run by the University of California, Santa Cruz, are continuing to expand.
Indeed, UCSC's Genome Project, run by David Haussler, has grown into an approximately $2 million annual operation with 20 staff scientists. Two years ago, a six-figure budget paid for four staffers.
The UCSC site, www.genome.ucsc.edu, attracts about 50,000 hits each day, with a significant number of users coming from the private sector, according to Haussler, who spoke to a GenomeWeb reporter recently about why academic bioinformatics appears to be coming into its own while the commercial effort seems to have lost its footing--at least, he says, for now.
GenomeWeb: Do you think the open source bioinformatics success may have negatively affected commercial enterprises?
David Haussler: I think it did impact certain aspects of the commercial bioinformatics business. I feel badly about that. Because I'm excited about the bioinformatics from the commercial point of view. There's no question that certain types of products are not as exciting as a mode for commercial bioinformatics companies in that there are good resources available free over the web now. Government-funded resources in this case. [But]I think there's still lots of issues where companies are reluctant to use the Internet; they have very sensitive data that they don't want to go out over the web.
And there's a lot more to bioinformatics than we can fit under this metaphor of mapping and tracking functional genome sequences. So there's still a huge area for that, but I think there's some issues.
I don't mean to suggest at all that we have any influence here. I think it was the market and I think it was also the realization that it's not a huge consumer industry; you're not going to sell a lot of copies of your software. You have to charge a lot for each copy you sell, it has to be something special, and for that price I think that pharma and biotech wanted a custom solution. You can't provide a fully custom solution to every customer. That's why it makes sense that some pharmas may decide to build their own bioinformatics, because their situation is so special that they just want people totally devoted, one hundred percent, to working on their bioinformatics needs.
GW: What drives your growth?
DH: We have a number of institutions that are funding us now--NHGRI, NCI, QB3--and this is quite a change. We're able to martial a lot of effort toward putting out the best bioinformatics tools we can through this web-based browser set up.
GW:What recent changes in bioinformatics has your team focused on?
DH: We've pushed over to annotating and not generating, and that means the coordinates are the same. It sounds like a simple thing but it's much easier to have everybody sharing data in an open-source way when the data pertains to positions on the human genome. Everybody uses the same coordinates, and that is true now. One of the effects that has come is that we have a very unique system for adding your own information onto the browser.
GW: How does it work?
DH: You can add a file. You can choose to share your tracks. The way the browser is set up, you can zoom in, zoom out along a chromosome to see whatever features you've elected to see.
Show me the genes, show me the Ensembl gene prediction, show me the Genscan gene predictions, show me mRNAs, spliced ESTs, show me TIGR's ESTs, show me tetraodons matched to it, show me mouse sequence matched to it, show me SNPs. There's 30 different standard tracks to it.
But the exciting thing is you can also see your own information. By uploading their own information it is formatted back. They can see where the microsatellites are that are associated with a disease. They can look at the genotype SNPs that they've done. They can even do project planning; they can look at SNPs that are in the pipeline and what regions they're going to be covering.
GW: What kind of applications do you envision for this new function?
DH: It's unlimited. This is a completely flexible interface, where you can put whatever data you have. The new metaphor guiding this whole thing is data has to be associated with a position in the human genome. And we're not limited to the human genome; we just released a mouse genome. ...
There are many other species that are sequenced or getting close to being sequenced, and we can make browsers [for them]. ... There is an enormous opportunity in bioinformatics at this point to develop more sophisticated bioinformatics software and statistical models that will really capture the promise of comparative genomics.