Skip to main content
Premium Trial:

Request an Annual Quote

IBM Barges into Bioinformatics

Premium

DiscoveryLink is coming. Will it live up to the hype?

by Adrienne Burke

Any day now IBM’s long awaited data-integration middleware for the life sciences market will be, as they say around the Big Blue marketing department, “GA.” The product, whose origin dates back six years, will be assigned an SKU number and become, company officials promise, “generally available for commercial purchase in the second quarter.” This will be IBM’s moment of truth in the life sciences.

The company started, as a spokeswoman says, “going after this market in a very aggressive way” about 18 months ago. But in a space where Compaq dominates in high-performance computing, where Oracle is the standard database technology, IBM needed to be clever to compete.

Clever indeed. Its strategy for picking up new life sciences service contracts will be to offer the industry what it seems to need most: a killer app for integrating genomic data.

In mid-May, DiscoveryLink — a federated data management tool that employs wrappers to link varied data from multiple, disparate databases — existed only as a prototype. Two unnamed pharmaceutical companies were running pilot tests and NetGenics and Incyte had projects underway to incorporate it into their own yet-to-be released drug discovery tools, DiscoveryCenter and Genomics Knowledge Platform, respectively.

The product’s lack of availability, however, was not keeping IBM from hyping it. Janet Perna, IBM’s general manager for data management, called DiscoveryLink “the silver bullet, diamond-in-the-rough technology” that will change the way drug development data is handled. And a press release announcing IBM’s partnership with BioQuebec called IBM “the most advanced computer company in the world in terms of bioinformatics.”

Not surprisingly, that sort of talk from a life sciences latecomer has the hackles up in many veterans of the genomics computing and data integration businesses. In this small, tight-knit industry, IBM’s bald ambition is ruffling feathers.

DiscoveryLink’s Garlic Breath

DiscoveryLink’s earliest incarnation was in the mid-1990s as the pet project of computer scientist Laura Haas at IBM’s Almaden research facility. Known internally as Garlic, the project aimed to enable the integration of heterogeneous data sources in a single, cross-source query. Almaden Web pages dated from 1995 expound on the technology’s possibilities for helping interior decorators store information on wallpapers, kitchen cabinets, appliances, and floor tiles or for enabling hospitals to store lab reports, MRI scans, and EKGs in one place.

Jill Mesirov, CIO of the Whitehead Institute Center for Genome Research, who did a stint as IBM’s manager of bioinformatics and computational biology from 1995 to 1997, recalls that it was clear at the time that Garlic held promise as a drug discovery tool.

It was around 1995 when IBM entered into a hush-hush deal with Merck to codevelop Garlic as a pharma industry tool. Independent IT consultant Arthur Thomas of Proteus Associates, who says he has been “involved in a number of briefings that IBM has given to pharmas,” considers Garlic “one of the most innovative things that’s been done in this area.”

But, observers say, IBM seemed to lose interest in the life science market. Merck fell out of the picture and Garlic didn’t resurface until 1999, when IBM made a $2 million investment in bioinformatics firm NetGenics and rekindled the Garlic product development in partnership. Mesirov counts the current campaign as “the third time IBM has made a big push in life sciences.”

Pushy PR

This time around, some in the industry are accusing the computer giant of pushy tactics and of overstating DiscoveryLink’s capabilities. They suggest that the computer giant is buying its way into the market.

Since January 2000, IBM has made a series of equity investments (see timeline) binding its beneficiaries to install the DiscoveryLink-enhanced version 8.0 of the DB2 database when it is delivered. In addition, IBM’s minority stakes in LabBook and NetGenics are designed to expand its reach into the sector and elicit tools optimized to run on IBM hardware.

Jeff Augen, director of strategy for IBM’s life sciences group, says the company’s goal is not to compete with bioinformatics vendors or even Oracle, but to leverage Discovery- Link to win service contracts. “We believe there are services required. The more important part of this is the infrastructure required,” he says.

Thomas observes: “More than half of IBM’s revenues come from services. They made a billion-and-a-half dollar deal with Aventis to build a research infrastructure. They’re interested in selling services.”

Bristling Bioinformaticians

Still, other data integration technology vendors bristle at IBM’s approach, which seems antithetical to the open and collaborative ways that computing vendors such as Compaq and Sun do business.

For instance, Bill Blake, vice president for high-performance technical computing at Compaq, says his strategy is: “We’ll do the best job on hardware, and there are better companies at building the additional layer.”

Blake says that Compaq, which has partnerships with Oracle, InforMax, and Lion Bioscience, “prefers to allow best-of-breed third parties to come in” to provide middleware solutions.

Sun, which is in fourth place in life science market share after Compaq, IBM, and HP (according to IDC data), is known among genomics companies for having established collaborative efforts such as its Informatics Advisory Council and the industry-wide I3C, which aims to develop an open-source platform for integrating life sciences data and tools.

When IBM joined the I3C recently, its bullying approach didn’t sit well with other members. “They pissed off a lot of people by saying they want to be part of the I3C but saying, ‘We don’t code, we just provide the plumbing.’ It was really ridiculous,” says one member, who added that Big Blue reps went out of their way to make amends at a subsequent meeting.

Friedrich von Bohlen, CEO of Lion Bioscience, whose SRS data integration technology is widely used throughput the industry, notes that not one of his 50 customers has ever asked to have SRS optimized to run on IBM hardware or database products. He asks rhetorically: “Why does IBM need to use equity to get customers? Because in a free market no one would choose their solution.” Von Bohlen says he is skeptical about whether IBM’s approach “will convince the highly scientific community [that] wants the best solution support, not mere technology.”

InforMax CSO Steve Lincoln says DiscoveryLink is one of several interesting technologies for data integration, but “the important thing about integration is that it’s a piece of enabling technology. Just because you can build a data warehouse environment doesn’t mean you’ve told me what a toxicologist is going to do with gene expression data.” Lincoln says that for that reason, InforMax is focusing more on “understanding why you would do this and what it means.”

Nevertheless, the IBM threat exists for data integration tools providers: Big Blue has the resources to provide the applications, hardware, and services, and steamroll over smaller providers of integration technology and their partners. Says Arthur Thomas, “There may be an element of IBM throwing its weight around, but at the end of the day you would be foolish to underestimate them. They have the resources, and … any technology that’s required they can either build or acquire from the outside.”

Proof and Perception

To be sure, the specter of going head-to-head with the limitless resources of IBM would strike fear in any small-by-comparison bioinformatics company. The reactions of small vendors could be just sour grapes. But to anyone who has invested significant time and money struggling with the genomic data integration problem, IBM’s self-assuredness is understandably hard to swallow.

NetGenics VP Beth Sump-Kleinhenz, says that while her team “recognized [DiscoveryLink] early on as a very nice enabling technology,” she will be surprised if a single integration solution for drug discovery ever emerges. “There’s not one key solution, there can’t be one,” she says. “Integration is a tough problem.”

Meanwhile, IBM’s trouble getting acceptance from the industry seems more to do with how its perception of irself differs from what outsiders see than with its technology. Big Blue doesn’t see itself as the newcomer. Scientists at the company’s Almaden research facility were working on this technology before some bioinformatics companies existed. Says IBM’s Augen, “We spent many years developing the optimization built into DiscoveryLink. We’re very proud of that technology.”

And while Augen claims that DiscoveryLink has “capabilities … built into it that those small companies can’t develop,” he sees them as “features that would enhance Lion’s capabilities.”

Sump-Kleinhenz, who says she hasn’t seen “anything else that purports to deal at [the same] level of complexity” as DiscoveryLink, offers this example for how a pharma scientist might use the tool: “Given that I am interested in proteins from gene family X, and specifically in those family members that have a structure Y, please provide me a list of chemicals containing a specific R-group (Z) that inhibit these proteins by 10-fold at a concentration of less than one micromolar, and which are present in our combinatorial chemistry library in a quantity of more than 10 mg.”

To do such a search without DiscoveryLink, she says, would require separate searches, including, for example, keyword searches on GenBank, SwissProt, and any internal databases; a motif search; a chemical substructure search on one or more chemical libraries; a series of searches on high-throughput screening results; and an inventory search of a chemical library.

Sump-Kleinhenz says, “Once the wrappers to the individual databases have been integrated with DiscoveryLink, its job is to deconstruct the query and bring the final result back to you in the form of ‘Here are three compounds that meet your criteria.’”

Sure sounds promising. But as IBM’s Augen acknowledges, “I guess now the onus is on us to prove it.”

1995

First mention of Garlic on IBM website refers to data integration applications for kitchen design and advertising agencies

IBM begins working with Merck to develop Garlic for pharma applications

1999

IBM invests $2 million in NetGenics, begins collaborating on DiscoveryLink development

2000

January: IBM makes $10 million equity investment in MDS Proteomics; makes MDS''s future use of DiscoveryLink contractual

August: IBM announces that it has $100 million to invest in life sciences partners over the next 30 months.

September: Incyte says it will embed DiscoveryLink into its Genomics Knowledge Platform product

November: IBM takes minority stake in Structural Bioinformatics; in focus group gathered at industry conference, IBM asks industry players what it will take for IBM to break into this market

December: IBM says it will build a 7.5-teraflop computer cluster for NuTec Sciences; also says it may double the $100 million available for life sciences partner investments

2001

March: IBM establishes Global Life Sciences Consulting and Solutions unit

April 11: IBM takes equity stake in LabBook, LabBook to build front end to DiscoveryLink

April 24: IBM buys Informix for $1 billion, says it will jointly develop and market information management systems

April 26: NetGenics says it will integrate DiscoveryLink into its Discovery Center platform and will be a DiscoveryLink distributor

May 3: IBM partners with BioQuebec and says its Net Generation Life Sciences division has "several hundred million to invest in firms operating life science and IT"

May 8: Discovery Link is not yet commercially available, nor are products from NetGenics, Incyte, or MDS Proteomics that promise to integrate DiscoveryLink

The Scan

Back as Director

A court has reinstated Nicole Boivin as director of the Max Planck Institute for the Science of Human History, Science reports.

Research, But Implementation?

Francis Collins reflects on his years as the director of the US National Institutes of Health with NPR.

For the False Negatives

The Guardian writes that the UK Health Security Agency is considering legal action against the lab that reported thousands of false negative COVID-19 test results.

Genome Biology Papers Present Epigenetics Benchmarking Resource, Genomic Architecture Maps of Peanuts, More

In Genome Biology this week: DNA methylation data for seven reference cell lines, three-dimensional genome architecture maps of peanut lines, and more.