Skip to main content
Premium Trial:

Request an Annual Quote

Study Says AI-Based Methods Will Be Key to Advances in Systems Biology Research

Premium

Bioinformaticists may be ill-equipped to handle the demands of systems biology without a better understanding of knowledge representation, according to Peter Karp, director of SRI International’s Bioinformatics Research Group.

In a paper that appeared in the September 14, 2001, issue of Science, Karp outlined the role that symbolic computing and artificial intelligence would play as bioinformatics begins to address increasingly complex biological systems and scientific theories.

“Systems biology is going to define the behavior of an entire biological system — the full molecular parts list of a system and the behavior of all those parts — and we simply can’t begin to approach that without using computer models,” Karp told BioInform.

This complexity will require bioinformaticists to turn to artificial intelligence methods that will allow computers to verify a theory’s internal consistency, its global properties, and its consistency with external data, Karp said.

“We need to look at the genome as defining the biochemical machine and we need to have tools that look at that machinery and determine if it fits together in a coherent way,” said Karp.

“If you open the hood of your car and you saw a banana plugged into where one of your spark plugs should go, you’d know that something was out of whack there. But we don’t have tools for looking at genomes in the same way.”

Qualitative Relationships Give Structure to Data

In the Science paper, Karp highlighted SRI’s EcoCyc Project (www.ecocyc.org) as an example of an effective use of AI-based methods in biological research. EcoCyc is a symbolic pathway database that describes the metabolic, transport, and genetic-regulatory networks of Escherichia coli. The database is structured according to an ontology that captures semantic distinctions and precisely defines the meaning of different database fields. The EcoCyc ontology contains about 1,000 classes that encode concepts in biochemistry and molecular biology, and over 200 slots that define properties of and relationships among those classes. This structure provides an interconnected web of frames stored in a frame knowledge representation system that enables computer-based reasoning across the network.

According to Karp, this holds a significant advantage over conventional bioinformatics approaches, which are typically text-based repositories of theories and data. “Although the scientific community clearly accepts the need to encode the ever-expanding quantity of scientific data within databases,” Karp wrote, “databases of scientific theories, such as a theory describing the transcriptional regulation of E. coli genes, are much rarer.”

Karp said that while quantitative modeling is still an important and useful approach, it has a number of limitations. “You need a lot of quantitative parameters that are very hard to measure, very expensive to measure, and very time consuming to measure, the measurements are hard to do accurately, and those measurements just don’t exist for a lot of systems that we’re interested in understanding,” he said.

However, many qualitative relationships, such as protein-protein interactions or the metabolic and transport networks information within EcoCyc, have already been extracted and described. All that remains is encoding the information in properly structured databases so that researchers can perform symbolic computations with the data.

“When scientists reason about biological systems, I don’t believe they have complex quantitative models of these systems in their heads. They’re using some other kind of reasoning. And that’s where artificial intelligence comes in,” explained Karp.

Lack of Training is Primary Hurdle

But getting biologists up to speed on the fundamentals of knowledge representation and AI-based methods remains a challenge. “The genome revolution is increasing the need for pathway databases in the biological sciences, and similar developments will occur in other sciences,” Karp wrote. “However, effective implementation of this paradigm is hampered because most biologists (and most other scientists) receive essentially no education in databases or knowledge representation.”

Karp singled out the database area of bioinformatics as “the one area where practice lags the state of the art.”

“I think we’re seeing that lack of basic knowledge about databases has come home to roost,” he added. As an example, Karp cited his May 11, 2001, paper in Comparative and Functional Genomics in which he found that the majority of GenBank entries for complete microbial genomes do not comply with the GenBank standard. Simple adherence to the GenBank standard by submitters and, more importantly, enforcement of the standard by NCBI, EMBL, and DDBJ would be a simple way for scientists to ensure that genomic data is represented in a format that will permit symbolic reasoning, Karp said.

One positive sign that Karp sees in the field is the work of the Gene Ontology Consortium to develop a controlled vocabulary for the functions of gene products, as well as ontology projects underway at the University of Manchester and Stanford University.

Karp noted that SRI’s Pathway Tools software could also be used to convert a text string description of gene products into a formal ontology description of gene products.

SRI used Pathway Tools (http://bioinformatics.ai.sri.com/ptools/) to develop the EcoCyc database, and SRI and a number of academic partners are currently applying the software to a number of other organisms. Pathway Tools has three components: PathoLogic, which creates a database containing the predicted metabolic pathways of an organism; Pathway/Genome Navigator, which is used to post a pathway genome database on the web for querying, visualization, and analysis; and Pathway/Genome Editors, which provide interactive editing capabilities for the databases.

So far, Pathway Tools has been used to create pathway genome databases for eight microorganisms, and Karp expects a number of new databases to be completed over the next few months.

Right now, Karp’s group at SRI is working to improve Pathway Tools’ ability to work with eukaryotic genomes and to refine the user interface. In addition, the group plans to enable the conversion from free text string to symbolic representation of function for transport proteins as well as for enzymes.

Karp stressed that the main task at SRI is getting people to think in terms of symbolic reasoning across whole biological systems.

“When we sequence a genome, we should be able to do a lot more wonderful things with it than we can right now,” Karp said.

— BT

Filed under

The Scan

Removal Inquiry

The Wall Street Journal reports that US lawmakers are seeking additional information about the request to remove SARS-CoV-2 sequence data from a database run by the National Institutes of Health.

Likely to End in Spring

Free lateral flow testing for SARS-CoV-2 may end in the UK by next spring, the head of Innova Medical Group says, according to the Financial Times.

Searching for More Codes

NPR reports that the US Department of Justice has accused an insurance and a data mining company of fraud.

Genome Biology Papers on GWAS Fine-Mapping Method, COVID-19 Susceptibility, Rheumatoid Arthritis

In Genome Biology this week: integrative fine-mapping approach, analysis of locus linked to COVID-19 susceptibility and severity, and more.