Skip to main content
Premium Trial:

Request an Annual Quote

Two Text-Mining Firms Take Different Paths to Grow in Pharma, Academic Markets


SAN FRANCISCO — Bioalma and Linguamatics are both looking to use their text-mining technology to aid life-science researchers, but the two firms are following different routes to grow their businesses in 2009.

Linguamatics, a Cambridge, UK-based firm founded in 2001, has so far concentrated on building a customer base among large pharmaceutical firms for its I2E information-extraction technology — a strategy that the company plans to continue in the year ahead, Phil Hastings, director of business development, told BioInform at Cambridge Healthtech Institute's Molecular Medicine Tri-Conference here this week.

Bioalma, meanwhile, based in Madrid, Spain, is eyeing the academic market to help drive growth in 2009. Founded in 2000, the company earlier this month expanded beyond its flagship AlmaKnowledgeServer text-mining system by launching a new product called Novoseek, a free, web-based literature search engine it expects to appeal to academic researchers, Ramón Alonso-Allende, director of marketing and business development, told BioInform.

Linguamatics' Hastings said that 2008 was a "good year" for the privately held firm. While he did not disclose specific revenue figures, he said that the company has seen more than 50 percent growth year-over-year for the past several years and that it is profitable. Linguamatics currently employs around 25 people and is looking to hire a "handful" of additional staffers this year, Hastings said.

Linguamatics currently claims eight of the top 10 pharmaceutical companies among its customers, including Pfizer, AstraZeneca, and Eli Lilly. Hastings said that in the year ahead, the company hopes to add more pharma and biotech firms to its customer list, but also to expand adoption of its technology within its existing client base.

Linguamatics also plans to expand its presence in the US market. Last June, the company opened a US subsidiary in the Boston area, and is now looking to "grow our presence on the ground in the US over the next couple of years," Hastings said.

So far, he said, "our strategy to focus on pharma and biotech is paying off," but Linguamatics believes there is plenty more opportunity for growth within that market. "The floodgates haven't opened yet, but momentum is starting to build."

The company's I2E technology uses natural language processing to help "structure unstructured data," Hastings said. Query results are returned in table form, sorted by key terms and relationships. "It treats the literature as a database that you can query over," he said.

Hastings said that to date, pharmas have primarily used I2E for target discovery and prioritization, biomarker discovery, and toxicology, but the company is seeing a migration toward more downstream applications such as the clinical arena and even post-market surveillance.

One goal for Linguamatics in the year ahead is to integrate I2E with various "enterprise" tools to make it accessible to a broader range of users within pharma. As an example, Hastings said, "Everywhere we go, we hear about SharePoint." The company is currently working on this integration, he said, adding that he envisions users having a "portal" that would enable users to run I2E queries directly from the Microsoft collaboration tool.

In addition, he said, the company is building "resources around I2E" such as "libraries" of common queries for particular applications that researchers can quickly customize for their own purposes.

Hastings said that at most pharmaceutical customers, the number of researchers who work directly with I2E on a regular basis is currently in the "tens," though hundreds of end users within those organizations benefit from the results via reports and databases populated with information I2E extracts. Linguamatics views those end users as potential customers for the software as well, he said, but they will first need the ability to access it through a more familiar interface.

[ pagebreak ]

A Web-Based Business

For Bioalma, the primary goal in 2009 is to build a "broader audience" among academic researchers, Alonso-Allende said. The company is targeting Novoseek — a free, web-based search engine that enables customized literature querying — primarily to this market.

Novoseek relies on the same technology that powers AKS. However, while AKS is installed at a customer site on a dedicated server, and is designed for high-volume querying, Novoseek is accessible through any web browser and is free for all users, but does not offer the same analytical muscle as AKS.

Novoseek "doesn't have the same power as AKS, but it provides a lot of information very quickly for a lot of users," Alonso-Allende said.

Eventually, Bioalma hopes to make Novoseek an advertising-supported offering, but Alonso-Allende said the company must first attract a steady base of regular users. "We'd like it to be the first step in a researcher's daily work," he said.

Since the product launched earlier this month, traffic has been "better than expected," Alonso-Allende said, but he declined to provide details on the number of users. The opportunity for growth is considerable, however, considering that PubMed handles around 80 million searches per month.

Alonso-Allende acknowledged that PubMed is the primary competition for Novoseek, but said that the company's product offers a number of advantages, particularly for complex searches that would require extensive knowledge of MeSH terms to perform in PubMed.

Bioalma has "pre-analyzed" the literature with its text-mining technology to flag key concepts and relationships and to disambiguate terms that might have multiple meanings depending on the context of the query — such as "cat" the animal and "CAT" the chloramphenical acetyl transferase protein.

Because of this, the Novoseek results page includes a side bar that allows users to narrow their search by homing in on different concepts related to the query, such as diseases, biological functions, genes and proteins, or chemicals.

The company downloads around 2,000 new articles each day, performs the same analysis, and then adds them to the site, Alonso-Allende said.

In addition to papers, the site also returns results from US grants databases, with the thinking that "2009's grants will be the research papers in 2010 and 2011," Alonso-Allende said.

Bioalma also plans to add new features to the site, such as a "myNovoseek" interface that would track a user's search history and generate alerts when there are new papers for a particular query, and is thinking about adding the capability to search patent databases, but the priority for the company right now is to build the user base for the current site.

"We'd like it to be their home page," Alonso-Allende said.