NEW YORK, Sept. 18 - Future understanding of genomic data may be severely limited unless bioinformaticists gain a better understanding of knowledge representation, according to Peter Karp, director of SRI International's Bioinformatics Research Group.
In a paper that appeared in the September 14, 2001, issue of Science , Karp outlined the role that symbolic computing and artificial intelligence would play as bioinformatics begins to address biological systems and scientific theories. Karp wrote that systems biology theories are quickly growing too complex to be understood by individual scientists. In response, he argued, bioinformaticists will have to turn to artificial intelligence methods that will enable computers to verify a theory's internal consistency, its global properties, and its consistency with external data.
In the paper, Karp highlighted SRI’s EcoCyc Project ( www.ecocyc.org ) as an example of an effective use of AI-based methods in biological research. EcoCyc is a symbolic pathway database that describes the metabolic, transport, and genetic-regulatory networks of Escherichia coli. EcoCyc is structured according to an ontology, or database schema, that captures semantic distinctions and precisely defines the meaning of different database fields. This structure provides an interconnected web of frames stored in a frame knowledge representation system that enables computer-based reasoning across the network.
According to Karp, this holds a significant advantage over conventional bioinformatics approaches, which are typically text-based repositories of theories and data. “Although the scientific community clearly accepts the need to encode the ever-expanding quantity of scientific data within databases,” Karp wrote, “databases of scientific theories, such as a theory describing the transcriptional regulation of E. coli genes, are much rarer.”
As biological research grows more and more dependent on information technology to make sense of increasing amounts of genomic data, Karp wrote, it will be crucial for bioinformaticists to keep up with new developments in symbolic computing. “The genome revolution is increasing the need for pathway databases in the biological sciences, and similar developments will occur in other sciences. However, effective implementation of this paradigm is hampered because most biologists (and most other scientists) receive essentially no education in databases or knowledge representation.”
According to Karp, equipping scientists with a better understanding of knowledge representation concepts--such as data models, ontologies, database query languages, logical inference, database design, and formal grammars--will be necessary in order to carry the field forward.
EcoCyc was developed using SRI's Pathway Tools software environment, which encodes systems biology theories and supports query, analysis, and visualization operations for pathway and genome databases. SRI is currently applying Pathway Tools to a number of other organisms.