Text mining software developer Linguamatics has been selected as a commercial partner in a two-year research project funded by the European Commission’s Seventh Framework Program that is focused on improving multilingual terminologies in biomedicine.
The company announced last week that it will participate in the Multilingual Annotation of Named entities and Terminological Resource Acquisition, or MANTRA, project.
MANTRA aims to develop and provide two community resources — automatically enhanced multilingual terminologies and semantically annotated multilingual documents — that will improve the accessibility of scientific information from documents in various European languages.
MANTRA aims to provide large-scale multilingual biomedical corpora that are annotated with biomedical entities such as genes, proteins, diseases, drugs, and chemicals; link monolingual and multilingual terminological resources from the life sciences; encourage community participation through an international challenge; and make the fruits of its efforts publically available for translation purposes and for text-mining multilingual documents.
It is estimated that the project will cost about €2.3 million ($3 million). The EU is contributing €1.8 million ($2.4 million) through FP7.
Participants in MANTRA aim to enrich multilingual terminologies in biomedicine by exploiting parallel corpora that exist in several different languages. This will make it possible to identify the same patent in multiple languages, for example. It will also ensure that terminologies in one language and the same documents in other languages can be mined simultaneously to provide enriched terminologies.
As an example, Linguamatics noted that if an English patent refers to "branching enzyme," a terminological resource that recognizes the German equivalent, "verzweigungsenzym," could be used to mine documents in that language.
David Milward, Linguamatics’ chief technology officer, said that the firm will work on improving “the range of existing terminologies in languages other than English by exploiting a number of existing resources together,” including "documents that already exist in more than one language, the more extensive English terminologies, and automatic language processing pipelines.”
He told BioInform that Linguamatics decided to participate in the project because of “increasing demand for information search and extraction in languages other than English” from its customers.
Furthermore, MANTRA’s goal is to develop “the sorts of resources that we know we will want to exploit in our future products,” he said.
“We believe the program has a good chance of success, especially as we will be able to build upon good working relationships established with several of the other program participants in an earlier program,” he added.
The EU encourages commercialization of resources that are developed during FP7-funded projects. In addition to Linguamatics, a second commercial firm, called Averbis, is also participating in MANTRA. Averbis, which spun out of the University Medical Center in Freiburg, Germany, in 2007, develops search technologies and text-mining software for hospital information systems providers, medical publishers, and pharmaceutical companies.
Academic participants in the project include the University of Zurich, the Friedrich-Schiller University of Jena, the Erasmus Medical Centre Rotterdam, and the European Bioinformatics Institute. Besides Linguamatics, Averbis, a semantic software company, will also be involved in the project.
The group will initially work on patents, research articles, and public health information documents in German, French, Spanish, and Dutch, though the project isn’t limited to these languages, Milward said.
According to the consortium, the results from this project will feed into EBI's public services as well as commercial solutions developed by Linguamatics and Averbis.
Milward said the manner in which MANTRA’s fruits will be made available in its product portfolio will be “driven by our customers.”
Linguamatics also sees the project as an opportunity to “demonstrate the benefits of the improved terminologies for real-world use cases involving our software,” Milward said.
The company this week launched a new version of its I2E software that includes an application programming interface as well as enhancements to its virtual data integration, and chemistry capabilities.