Cognia is preparing to launch its first internally developed database product, Cognia Catabolism, in October, a company official told BioInform this week.
The database, which contains information on the ubiquitin protein system, is the first of an extensive line of similar resources that the company plans to create with the help of its proprietary text-mining technology.
Robert Merold, Cognia CEO, said that the company has identified approximately 75 databases that would be of interest to the biomedical community, "So our thought is how quickly can we get to 75?"
The answer, at least in Cognia's view, is a natural-language processing platform that the New York City-based firm is developing at its R&D facility in Edinburgh, Scotland.
The company has identified approximately 75 databases that would be of interest to the biomedical community, "So our thought is how quickly can we get to 75?"
Cognia began developing the NLP platform last March, when it began partnering with the University of Edinburgh's School of Informatics under a three-year, £5.3 million ($10.2 million) grant from ITI Life Sciences, a Scottish economic-development agency. [BioInform 03-21-05]
The text-mining platform will be completed later this year, Merold said, enabling the company to launch "at least five" new databases in 2007, with a goal of one new database per month in 2008.
Merold was previously the chief operating officer and a member of the board of directors of Proteome, a pioneer in the hand-curated database market that Incyte Genomics acquired in 2000. He described text-mining technology as "a major sea change" in developing such resources, which have historically relied on teams of PhDs poring over the scientific literature, extracting relevant information, and adding it to the database in a structured manner.
"We will still have human PhDs involved, because we want quality, but we can dramatically reduce their time with the technology we've built," he said. "That's why we think we'll be able to do one a month when other people do one a year."
The company's first product, Cognia Catabolism, is a prime example of the effort that goes into a manually curated database. Cognia began developing the database in 2001 under a one-year SBIR award from the National Institute for General Medical Sciences, and has been pecking away at it ever since.
Cognia was launched in 1998 with the goal of developing and marketing its own database products, but it's taken a series of strategic moves to get to that point. The firm initially served as the US distributor for Biobase's Transfac and Transpath databases, and supported its internal R&D through SBIR grants. It launched its first internally developed product, a data-management system called Cognia Molecular, in 2003.
After Cognia opened its subsidiary in Scotland last year, it began accelerating its development plans, which was one reason that Merold was brought on board. The company is currently seeking venture capital funding, and Merold said that he was hired to bring more "operational experience" to the firm.
"As we've been raising capital and expanding our operation, we felt it was time to start up the database side of our business model that we've always wanted to get to," Merold said. Since the ubiquitin database was already under development, it made sense to begin there, he said.
Although the text-mining platform isn't complete yet, Merold said that the company has been using prototype input and output tools "to manage the process" of finishing the ubiquitin database, "and that makes it a whole lot more efficient than a classic manual curation program." In addition, he said, Cognia has outsourced some of the curation to GGA, a firm based in St. Petersburg, Russia.
"We had already started this project, so doing it manually was less onerous than if we had started from scratch," he said.
Cognia will be showcasing the upcoming database at the Ubiquitin for Drug Discovery and Development conference, to be held in Philadelphia June 26-27. Merold said that the company will disclose details on its second database product in August.
The official launch of Cognia Catabolism, which will contain information on around 3,000 proteins and draw from approximately 5,000 peer-reviewed papers, will be in October.
The company is not disclosing pricing for the database at this time.
Merold said he expects the database to be of broad interest in the biomedical research and drug-discovery markets because the ubiquitin system plays an important role in oncology, central nervous system disorders, and inflammation. The company has signed two undisclosed beta customers for the database, and Merold said that one of them has extended its license through 2008.
The firm should have an advantage in the marketplace, as Cognia Catabolism is the first database to focus solely on the ubquitin system, but that doesn't mean that the firm is without competition — especially as it plans to expand its product line. Several other well-established firms, including Aureus Pharma and Biobase (which acquired Proteome from Incyte in January 2005), sell curated databases that focus on particular protein families of interest to biomedical research.
Merold acknowledged the competition, but said that the company sees plenty of room in the marketplace for multiple players. "We don't need to own the world here," he said. He noted that the company's Cognia Molecular informatics platform should provide an advantage because it allows customers to integrate Cognia databases with internal data, third-party databases, and information from the public domain, making it a "turnkey way to get access to data."
While some still consider the bioinformatics database model to be risky after the high-profile failures of Incyte and Celera in that area, Merold said that the market for manually curated data is relatively safe. "There was a great market for gene sequence data back in the 1990s … and then the human genome project turned that into an afterthought," he said. Nevertheless, "proteomic databases are much more complicated, so they won't be pre-empted in the same way."
The cost and "physical complexity" of managing a large staff of curators have raised the barrier to entry for the curated database market, he noted. "It's manual, it's slow, so you haven't had a compelling case to justify building a lot of these databases, even though it's a proven market — the Proteome products are still around," he said. "It's just the physical barriers to creating this is why people have not gone after it."
— Bernadette Toner ([email protected])