BioMinT, a €1.4 million ($1.7 million) biological text-mining project funded by the European Commission, is coming to a close at the end of March, and researchers will now have access to some of the technologies developed under the three-year effort.
PharmaDM, a three-person bioinformatics firm based in Leuven, Belgium, released the first tool from the project, the Gene and Protein Synonyms Database, this week at http://biomint.pharmadm.com. A previous version of the database was described in an applications note in Bioinformatics last year [Bioinformatics. 2005 Apr 15;21(8):1743-4], but the company described the new version as the first "production-quality" release of the resource.
The database of 1,760,234 gene and protein clusters, with an average of 8.46 synonyms per cluster, is freely available to all users, but PharmaDM views it as a first step toward a broader commercial offering.
"We hope that the [GPSDB] service is useful enough to attract lots of users, but we also hope that we can at a later stage offer things that are more relevant and interesting so that people who are using the tool already might want to pay for extras," Luc Dehaspe, PharmaDM's CSO, told BioInform.
"We hope that the [GPSDB] service is useful enough to attract lots of users, but we also hope that we can at a later stage offer things that are more relevant and interesting so that people who are using the tool already might want to pay for extras."
Dehaspe described GPSDB as "kind of a side result, a spin-off from the BioMinT project," which was funded with the intention of developing an automated text-mining system to assist curators for the Swiss-Prot (now UniProt) and PRINTS databases. Over the three-year effort, PharmaDM and the other BioMinT consortium members have developed a suite of tools for submitting queries, building them out with GPSDB, returning and ranking abstracts, and then extracting relevant information about gene and protein function from those abstracts using natural language processing.
Curators at the Swiss Institute of Bioinformatics currently have access to the complete curation pipeline, but Dehaspe said that most of the components are not yet available as off-the-shelf or web-based tools. However, the company is capable of providing those capabilities to interested users in the form of a services contract.
Other partners in the BioMinT consortium include the University of Manchester, the Austrian Research Institute for Artificial Intelligence, the Swiss Institute of Bioinformatics, the University of Antwerp, and the University of Geneva Artificial Intelligence Lab. PharmaDM holds the commercialization rights for technology developed as part of the project.
The launch of GPSDB comes as PharmaDM is readying its first set of shrink-wrapped software products: one for cheminformatics and another for bioinformatics.
PharmaDM was founded in 2000, but the company has kept a low profile. It got its start in an earlier EU-funded project, when its founders — researchers from the Catholic University of Leuven, the University of Aberystwyth in Wales, and Oxford University — met as part of a consortium to develop inductive logic programming technology in the mid-1990s.
After realizing that the relational data mining tools they were developing had applications in drug discovery, Dehaspe and his colleagues launched PharmaDM. To date, the firm has used its technology in service partnerships with companies like Pfizer, but over the next few months, it will begin rolling out DMax Chemistry Assistant and DMax Biology Assistant.
Both tools rely on the company's underlying relational data mining approach, which offers a number of advantages over table-based data-mining methods that are "constrained" in their ability to find unknown relationships between objects in order to generate hypotheses, Dehaspe said.
As an example, Dehaspe said that a biological molecule can't be fully described with a simple "vector of descriptors" in the form of columns and rows. Molecules can be described in terms of atoms that are connected in very specific ways, and can also be described in terms of their structures and substructures — "relational aspects that are bound to have an impact on the biological activity of the molecule," he said. PharmaDM's technology allows users to rely on the "expressivity of natural language" to generate new hypotheses from data, he said.
Dehaspe said that the company has not yet finalized the timing for the launch of DMax Chemistry Assistant and DMax Biology Assistant.
The company's longer-term plans include commercial development of some of the technologies developed in the BioMinT project, but GPSDB will remain freely available and will be updated every three months.
— Bernadette Toner ([email protected])