Pathway databases are becoming an essential tool for illustrating genomic data within a biological context, but representing the complexities of pathway information in a consistent, computable manner is no easy task.
A growing number of firms are meeting the challenge presented by complex subscellular pathways by using ontologies and other structured information approaches to model pathway data in an understandable, yet scalable, form (see box, p. 5). Ingenuity, the latest company to set out on the pathway database trail, is taking a similar approach to the others, but claims that its comprehensive ontology of more than 280,000 concepts gives it an edge.
The company’s Pathway Knowledgebase, which it officially launched in late May, is built upon an ontology-driven knowledge management system it is developing in collaboration with Millennium Pharmaceuticals. Ingenuity relies upon a worldwide network of PhD-level curators — it won’t disclose how many — to extract interactions between genes and proteins from the scientific literature and represent them using the ontology. These interactions, which now number well over one million, are presented along with their accompanying annotations in a fully computable framework — an accomplishment the company claims is not possible with other methods.
“We can have up to a hundred different fields of information tied to a particular interaction,” said Peter DiLaura, head of corporate development at Ingenuity. Descriptive information such as cell type, organism, and experimental context accompanies each data point, “so that a bioinformatics organization can say, ‘Okay, for this particular algorithm we’re only interested in using findings that are extracted from mouse, or we’re only interested in human, or we’re only interested in things that were done using a certain experimental paradigm,’ and so on,” DiLaura said.
The Pathway Knowledgebase was built for bioinformaticists, not biologists, said Frank Mara, senior vice president of marketing. The resource acts as a “core” dataset and knowledge representation framework to which customers add in-house or third-party data and analysis methods, he said. “Most of the other content plays to date have really been about going deep in a particular area of biology, and then capturing and structuring the information in a way that you could browse and search it, but not compute it,” said DiLaura. Ingenuity expects that its structured content will allow bioinformatics teams to focus less on content issues and more on “the things that they’re good at, which is building algorithms specific to their customers,” he said.
Mara said a typical configuration for a large pharma or biotech would be priced between $300,000 and $500,000 for an annual license, and scaled-down versions of the knowledgebase are also available for smaller firms at a lower price point.
Several potential customers are evaluating the product. The head of an informatics group at a large pharmaceutical firm looking at the product told BioInform that he’s pleased with what he’s seen so far, four months into an evaluation that could last an additional three months. “The ontology has been evolving over time and it’s a very good representation of biology,” he said. “It’s structured in a much better way than, say, the Gene Ontology, which is a good concept, but it’s not as complete.”
The informatics manager said his team began to develop its own pathway representation methods, but found the task was much more complex than initially anticipated. “The relationships between biological entities are not linear, and they can’t easily be parceled out into a named pathway,” he said. Automated text-mining approaches to establish relationships out of the literature “have their ups and downs, too,” he remarked. For now, he said, the company is layering what he described as this “low-quality information network” onto Ingenuity’s “high-quality information network.” Using the resource as a reference dataset helps identify false positives, he said, “which helps tune the engine that generates the automatic interaction network.”
Although Ingenuity is targeting bioinformatics developers for the initial release of the knowledgebase, Mara said the company intends to expand its reach into the broader market of end-user biologists as well. It plans to release a set of analytical applications within the next few months that will run on top of the database and act as “a turnkey solution” for research groups that don’t have a team of bioinformaticists at their disposal.
While many content providers tend to focus on either bioinformaticists or biologists, but not both, “we see them really as different markets, and they have different needs,” said Mara. “One of them wants the pieces, and one of them wants the solution.”
Other Pathway Database Platforms
- 3rd Millennium
PIMS (Pathway Information Management System), an ontology-driven framework for storing pathway data: http://www.3rdmill.com/initiatives/PIMS.html
MetaCore, database of human pathways: http://www.genego.com/about/products.shtml#metacore
- Gene Network Sciences
Cell Navigator, a pathway database expressed in the company’s Diagrammatic Cell Language: http://www.gnsbiotech.com/pathwaydb.shtml
CELL (Coded Electronic Life Library), ontological database of more than 30 million biological entities and 200 million relationships between them: http://www.incellico.com/products.html
- Physiome Sciences
PathwayPrism, signal transduction modeling platform: http://www.physiome.com/code/framesets/applications/ content/bodycontent/signal.htm
Model repository, contains about 125 models expressed in CellML format: http://www.cellml.org/examples/introduction/index.html
- Kyoto University Bioinformatics Center
KEGG (Kyoto Encyclopedia of Genes and Genomes), database with more than 10,500 pathways for around 130 organisms: http://www.genome.ad.jp/kegg/
- National Center for Genome Resources
PathDB, data repository and a system for building, visualizing, and comparing cellular networks: http://www.ncgr.org/pathdb/
Model repository, contains 18 models in SBML format: http://www.sbw-sbml.org/ModelsWebPages/ModelRepository.htm
- SRI International
BioCyc, database of 477 pathways from several species: http://biocyc.org