New York-based Cognia announced last week that it is opening a European subsidiary in Edinburgh, Scotland, where it will collaborate with the University of Edinburgh’s School of Informatics to develop text-mining technology for life science research.
The subsidiary, called Cognia EU, and the university will share a three-year, £5.3 million ($10.2 million) grant from ITI Life Sciences, a Scottish economic-development agency.
The goal of the project is to develop software that will enable researchers to create their own biological databases by extracting information from the scientific literature. Cognia will have the rights to commercialize the text-mining software in the life science sector, although the technology is expected to have applications in other vertical markets.
“The actual aim is to create an engine that is generic and can be transported to other domains,” said David Milroy, senior analyst at ITI Life Sciences.
Milroy said that the life science industry was targeted as the first commercial market for the technology because “it relies on the literature so heavily — probably more so than many of the other industries — for its products.” In other industries, Milroy said, text-mining is “very much a support effort, whereas here you really can come up with new targets and new leads from looking at the literature. So I think this is where we’re likely to get the best return.”
Milroy said that ITI Life Sciences conducted “quite a lot of market research” on the potential demand for text-mining technology for life science research. While conceding that “it’s an emerging area, and quite difficult to put numbers on,” he said the organization estimates that the market for these tools could reach £200 million by 2014.
The core technology for the system is a set of natural-language-processing techniques developed by the University of Edinburgh’s School of Informatics. Cognia will contribute two key things to the project: its internally developed biological ontologies and its existing customer base. The company’s primary product is its Cognia Molecular data-management system, but it also serves as a distributor for a number of biological databases from BioBase and John Wiley & Sons and claims to have a distribution channel throughout biotech and pharma.
Cognia will develop the overall processing system, including parts of the document-retrieval process as well as output-handling from the information extraction system.
“We’ve developed a lot of tools that help researchers input their own information into a system — whether it’s manual curation, or batch-loading information from spreadsheets, or other structured forms of information,” said David Rubin, Cognia’s CEO. “In the first case, where it’s manual, it’s a little bit too slow to really bulk up your content offering, and in the case of the structured information, you’re still missing all of the information that exists in papers and patents, and we estimate that 90-95 percent or greater of the interesting information is in these unstructured formats.”
The goal, according to Rubin, is “to have manual curators interfacing with the high-throughput information and providing a greater level of quality assurance to the system.”
Rubin said that Cognia is currently recruiting staff for the Edinburgh subsidiary, but declined to provide details on how many hires are planned.
Text mining has been an active development area in bioinformatics for several years now, but according to Milroy, “there are many disparate efforts, but we believe that nobody’s got a particularly flexible system, and nobody’s put all the bits together, if you like, and that’s really where we think we can really improve on the state of the art.”
In fact, just about 200 miles south of Cognia’s new home, another text-mining initiative will have its formal launch last week. The UK’s National Center for Text Mining, or NaCTeM, based in Manchester and funded with £1 million from the UK government [BioInform 11-29-04], is scheduled to open its doors on March 21.
Milroy said that ITI Life Sciences may in-license additional text-mining IP from organizations beyond Cognia and the University of Edinburgh if necessary, and may collaborate with researchers from NaCTeM.
“The industry is at a point where it needs a push to get the technology over a certain hurdle,” Rubin said, “so the goal here is really to develop new intellectual property, but not start in a vacuum everywhere.”