Skip to main content
Premium Trial:

Request an Annual Quote

Cognia Plans to Launch Catabolism DB in October, Readies NLP Platform to Build Out Product Pipeline


Cognia is preparing to launch its first internally developed database product, Cognia Catabolism, in October, a company official told BioInform this week.

The database, which contains information on the ubiquitin protein system, is the first of an extensive line of similar resources that the company plans to create with the help of its proprietary text-mining technology.

Robert Merold, Cognia CEO, said that the company has identified approximately 75 databases that would be of interest to the biomedical community, "So our thought is how quickly can we get to 75?"

The answer, at least in Cognia's view, is a natural-language processing platform that the New York City-based firm is developing at its R&D facility in Edinburgh, Scotland.

The company has identified approximately 75 databases that would be of interest to the biomedical community, "So our thought is how quickly can we get to 75?"

Cognia began developing the NLP platform last March, when it began partnering with the University of Edinburgh's School of Informatics under a three-year, £5.3 million ($10.2 million) grant from ITI Life Sciences, a Scottish economic-development agency. [BioInform 03-21-05]

The text-mining platform will be completed later this year, Merold said, enabling the company to launch "at least five" new databases in 2007, with a goal of one new database per month in 2008.

Merold was previously the chief operating officer and a member of the board of directors of Proteome, a pioneer in the hand-curated database market that Incyte Genomics acquired in 2000. He described text-mining technology as "a major sea change" in developing such resources, which have historically relied on teams of PhDs poring over the scientific literature, extracting relevant information, and adding it to the database in a structured manner.

"We will still have human PhDs involved, because we want quality, but we can dramatically reduce their time with the technology we've built," he said. "That's why we think we'll be able to do one a month when other people do one a year."

The company's first product, Cognia Catabolism, is a prime example of the effort that goes into a manually curated database. Cognia began developing the database in 2001 under a one-year SBIR award from the National Institute for General Medical Sciences, and has been pecking away at it ever since.

Cognia was launched in 1998 with the goal of developing and marketing its own database products, but it's taken a series of strategic moves to get to that point. The firm initially served as the US distributor for Biobase's Transfac and Transpath databases, and supported its internal R&D through SBIR grants. It launched its first internally developed product, a data-management system called Cognia Molecular, in 2003.

After Cognia opened its subsidiary in Scotland last year, it began accelerating its development plans, which was one reason that Merold was brought on board. The company is currently seeking venture capital funding, and Merold said that he was hired to bring more "operational experience" to the firm.

"As we've been raising capital and expanding our operation, we felt it was time to start up the database side of our business model that we've always wanted to get to," Merold said. Since the ubiquitin database was already under development, it made sense to begin there, he said.

Although the text-mining platform isn't complete yet, Merold said that the company has been using prototype input and output tools "to manage the process" of finishing the ubiquitin database, "and that makes it a whole lot more efficient than a classic manual curation program." In addition, he said, Cognia has outsourced some of the curation to GGA, a firm based in St. Petersburg, Russia.

"We had already started this project, so doing it manually was less onerous than if we had started from scratch," he said.

Cognia will be showcasing the upcoming database at the Ubiquitin for Drug Discovery and Development conference, to be held in Philadelphia June 26-27. Merold said that the company will disclose details on its second database product in August.

The official launch of Cognia Catabolism, which will contain information on around 3,000 proteins and draw from approximately 5,000 peer-reviewed papers, will be in October.

The company is not disclosing pricing for the database at this time.

Merold said he expects the database to be of broad interest in the biomedical research and drug-discovery markets because the ubiquitin system plays an important role in oncology, central nervous system disorders, and inflammation. The company has signed two undisclosed beta customers for the database, and Merold said that one of them has extended its license through 2008.

The firm should have an advantage in the marketplace, as Cognia Catabolism is the first database to focus solely on the ubquitin system, but that doesn't mean that the firm is without competition — especially as it plans to expand its product line. Several other well-established firms, including Aureus Pharma and Biobase (which acquired Proteome from Incyte in January 2005), sell curated databases that focus on particular protein families of interest to biomedical research.

Merold acknowledged the competition, but said that the company sees plenty of room in the marketplace for multiple players. "We don't need to own the world here," he said. He noted that the company's Cognia Molecular informatics platform should provide an advantage because it allows customers to integrate Cognia databases with internal data, third-party databases, and information from the public domain, making it a "turnkey way to get access to data."

While some still consider the bioinformatics database model to be risky after the high-profile failures of Incyte and Celera in that area, Merold said that the market for manually curated data is relatively safe. "There was a great market for gene sequence data back in the 1990s … and then the human genome project turned that into an afterthought," he said. Nevertheless, "proteomic databases are much more complicated, so they won't be pre-empted in the same way."

The cost and "physical complexity" of managing a large staff of curators have raised the barrier to entry for the curated database market, he noted. "It's manual, it's slow, so you haven't had a compelling case to justify building a lot of these databases, even though it's a proven market — the Proteome products are still around," he said. "It's just the physical barriers to creating this is why people have not gone after it."

— Bernadette Toner ([email protected])

Filed under

The Scan

Drug Response Variants May Be Distinct in Somatic, Germline Samples

Based on variants from across 21 drug response genes, researchers in The Pharmacogenomics Journal suspect that tumor-only DNA sequences may miss drug response clues found in the germline.

Breast Cancer Risk Gene Candidates Found by Multi-Ancestry Low-Frequency Variant Analysis

Researchers narrowed in on new and known risk gene candidates with variant profiles for almost 83,500 individuals with breast cancer and 59,199 unaffected controls in Genome Medicine.

Health-Related Quality of Life Gets Boost After Microbiome-Based Treatment for Recurrent C. Diff

A secondary analysis of Phase 3 clinical trial data in JAMA Network Open suggests an investigational oral microbiome-based drug may lead to enhanced quality of life measures.

Study Follows Consequences of Early Confirmatory Trials for Accelerated Approval Indications

Time to traditional approval or withdrawal was shorter when confirmatory trials started prior to accelerated approval, though overall regulatory outcomes remained similar, a JAMA study finds.