Skip to main content
Premium Trial:

Request an Annual Quote

The Bioinformatics Reading Room Fills Up: IBM Signs First Text Mining Licensee


A recent license agreement with Japanese biotech firm Celestar Lexico-Sciences has put IBM on the biological text-mining map. CLS is the first licensee for IBM’s MedTAKMI technology, a biomedical variation of the general-purpose TAKMI (text analysis and knowledge mining) system that has been under development at IBM Research since 1998.

The license, part of a deal that included an eServer p690 system running AIX, comes just as a viable commercial market for tools to mine the biological literature is beginning to emerge. Only within the last year or so have small startup firms begun marketing solutions to extract meaningful biological relationships from journal articles (see table, p. 8). But as with most new technologies, the target market of biotech and pharmaceutical companies has been slow to adopt these tools. IBM’s entry into the market is a sign of growing interest in the commercial potential of the technology — at least from the vendor side of the equation.

In this case, two separate arms of the big blue behemoth — IBM Research and the company’s life science business unit — worked in tandem to develop MedTAKMI. The two units are independent entities, but they often work together to tackle problems encountered by potential life science customers. Such was the case with CLS and MedTAKMI, according to Koichi Takeda, group leader of information integration at IBM’s Tokyo Research Lab. TAKMI (which means “craftsman” in Japanese and is pronounced “tak-u-mi”) was originally developed as a research project and later used to track patterns in data records from the company’s PC call centers in Tokyo and Raleigh, NC.

While the Tokyo research team was looking for ways to expand the scope of the technology, it didn’t hit upon the biomedical literature until a year ago, when CLS mentioned to the IBM life science sales team that it was looking for a useful tool to mine Medline abstracts.

Takeda said his team put a proof of concept together and began collaborating with CLS on some of the domain-specific terminology required to adapt TAKMI to the biomedical literature. The biggest challenge, according to Takeda, was the combination of the biological terminology and the abstract-ese syntax of the Medline abstracts. Unlike call center records, “it looks like English, but I can’t understand it at all,” Takeda noted.

Hisayuki Horai, chief of system development at CLS, said the company considered using similar technology from Fujitsu “because Fujitsu seems to be a leading company in the field of natural language processing technology in Japan.” However, he said, “We concluded that IBM has an advantage in processing English texts.”

CLS, which plans to use MedTAKMI to study gene-disease associations and specificity of gene expressions, contributed biological synonym dictionaries and category dictionaries to the project, Horai said.

Statistics-based search methods, such as word frequency and keyword extraction, are used in TAKMI, but the technology also relies on natural language processing techniques to identify words, terms, and phrases semantically, Takeda said. MedTAKMI draws from the National Library of Medicine’s MeSH (Medical Subject Headings) vocabulary to help categorize terms and also developed 17 additional categories for a custom biomedical ontology.

Once the information is extracted, machine learning and pattern recognition techniques are applied to the data and a number of visualization options are available to allow users to play with the results.

Takeda said his group is currently working with the IBM life science business unit on integrating MedTAKMI into middleware solutions it provides through its life sciences framework.

A Threat to the Little Guys?

IBM has maintained since it launched its life science business unit in August 2000 that it would not compete in the life science application space. However, the company has recently begun offering application-level capabilities as part of its infrastructure deals. Last month, IBM announced that AxCell Biosciences would use Intelligent Miner, a data-mining feature in the DB2 database, to study protein-protein interactions.

And CLS is not IBM’s only partner in the text mining area. The company has also entered a collaborative agreement with Virtual Genetics, a Swedish data- and text-mining informatics firm. According to a brief note on Virtual Genetics’ website, the company’s Virtual Adapt text mining software will be listed in IBM’s Global Systems Directory, a list of applications “that have been approved by IBM Life Sciences.” The two companies are also working together to “produce a commercial offer in which one of Virtual Genetics’ products is a key component.”

A Virtual Genetics spokesman was unable to provide further details about the partnership, but noted that “IBM has its own solutions group for text mining and treats us like any other product, in parallel with its own products.”

Although IBM may not intend to go head-to-head with bioinformatics tool providers, it does plan to actively broaden its range of solutions — either through its own R&D or through partnerships with tool companies — giving prospects whatever it takes to sign infrastructure deals.

Regarding the AxCell agreement, Sharon Nunes, director of IBM’s life-science solutions division, told BioInform’s sister publication in August that “We will not be making our mining tools specifically available for [life-science] companies, but when we have a company that is interested in using them and testing them ... we’re certainly going to sell to them.” Could this strategy end up choking out the startups?

Right now, the competition isn’t worried. Uli Berresheim, CEO of Definiens, a German startup gearing up to launch its Polymind text-mining solution in the second quarter of 2003, said he sees IBM’s involvement in this area as no threat at all. “IBM has a very, very good consulting force and has an understanding of text mining, therefore they can consult the pharmaceutical companies a little better,” he said. However, he noted, “We are in contact with companies using IBM Intelligent Miner for Text and they still tell us this problem is not solved; they are not able to get much further than with a standard search engine.”

Berresheim added that Definiens’ system could be used “in conjunction with” IBM’s text miner.

Gordon Baxter, CEO of UK-based Biowisdom, voiced a similar opinion. “The IBM guys have always disappointed me,” he said. “They try to come and sell their kit — and they have some wonderful technology — but they characterize what the problem is with all the big IT guys [who are] trying to get into the life sciences: They don’t understand the domain.”

— BT

File Attachments

Filed under

The Scan

New Study Investigates Genomics of Fanconi Anemia Repair Pathway in Cancer

A Rockefeller University team reports in Nature that FA repair deficiency leads to structural variants that can contribute to genomic instability.

Study Reveals Potential Sex-Specific Role for Noncoding RNA in Depression

A long, noncoding RNA called FEDORA appears to be a sex-specific regulator of major depressive disorder, affecting more women, researchers report in Science Advances.

New mRNA Vaccines Offer Hope for Fighting Malaria

A George Washington University-led team has developed mRNA vaccines for malaria that appear to provide protection in mice, as they report in NPJ Vaccines.

Unique Germline Variants Found Among Black Prostate Cancer Patients

Through an exome sequencing study appearing in JCO Precision Oncology, researchers have found unique pathogenic or likely pathogenic variants within a cohort of Black prostate cancer patients.