SAN FRANCISCO — SRI International’s Pathway Tools software is currently in use at more than 1,300 academic and commercial research groups and has generated nearly 400 pathway databases to date, according to Peter Karp, director of SRI's Bioinformatics Research Group.
Speaking at Cambridge Healthtech Institute’s Molecular Medicine Tri-Conference here this week, Karp discussed a number of new features available in the software suite, which automatically generates pathway databases for organisms based on their genome sequence data.
Karp said that SRI’s BioCyc collection now includes 371 pathway/genome databases that it has generated internally. In addition, a “partial list” of externally developed pathway/genome databases on the BioCyc website now numbers 17 projects, ranging from microbes to higher-level organisms such as mouse and Arabidopsis.
Representatives from several of these projects spoke at the conference, providing insight into the benefits and limitations of the software.
For example, Carol Bult, a staff scientist in the informatics group at the Jackson Laboratory, noted that Pathway Tools proved extremely useful in creating MouseCyc, a mouse pathway database that complements the information in the Mouse Genome Informatics database.
However, she noted that the system resulted in a very high number of false predictions: The software originally predicted 304 pathways, but the MGI curators have pruned that down to 174. One reason for this, she explained, is the fact that the predictive engine in Pathway Tools, called PathoLogic, takes as input the genome sequence data of the organism in question — in this case mouse — along with SRI’s MetaCyc database of manually curated metabolic pathways, which is “heavily weighted” with information on microbial biochemistry.
Since mammalian biochemistry is obviously very different from that of microbes, the false predictions are understandable, Bult said. She added that the MGI curators are beginning to add mammalian-centric pathways to MouseCyc, and they plan to add these to MetaCyc as well in order to improve the predictive ability of the resource for other mammalian model organism database projects.
“There are so many new genomes being sequenced, and they won’t all be able to have large curation teams,” he said. “Therefore, these types of tools could prove to be very valuable.”
Karp noted that one challenge SRI is facing is the fact that as MetaCyc grows, the false positive rate is actually increasing. He said that his team is developing an improved version of PathoLogic to account for this, and that it should be available in about a year.
Bult said that the MGI group is also working to add more biological context to the pathways in MouseCyc, such as information about tissue and cell specificity, which Pathway Tools currently doesn’t predict. Karp said that SRI plans to support the prediction of pathways for multiple cell types in future versions of the software.
Bult said that MouseCyc is proving to be a useful tool for “many applications” of interest to MGI users. For example, she said that researchers can view alleles of interest within the context of particular pathways, rather than the database’s default tabular view. In addition, she said that researchers can draw from the MGI ontology of phenotype terms to determine which phenotypes are associated with genes in a particular pathway.
Other model organism databases are also benefiting from Pathway Tools, although for different reasons. For example, Rex Chisholm, director of genetic medicine at the RH Lurie Cancer Center at Northwestern University, described his group’s effort to create a pathway/genome database for the social amoeba Dictyostelium.
Chisholm noted that for model organisms like Dictyostelium that are not as well-annotated as the mouse, software like Pathway Tools offers an opportunity to increase the level of annotation through comparative genomics. This could prove to be a big benefit for future genome sequencing projects, he noted. “There are so many new genomes being sequenced, and they won’t all be able to have large curation teams,” he said. “Therefore, these types of tools could prove to be very valuable.”
Lukas Mueller, a researcher in the department of plant breeding at Cornell University, also discussed the benefits of Pathway Tools for genomes that are not well characterized. His group is leading the development of SolCyc, a pathway/genome database that will complement the Solanaceae Genomics Network, a clade-oriented database that includes genomic data for tomato, potato, eggplant, pepper, petunia, and other species.
Because none of these organisms is sequenced, Mueller said that the SGN researchers are using EST data as input for the PathoLogic predictor rather than sequence data, so they must first assemble the ESTs into unigenes, then Blast them against the Arabidopsis genome and the Genbank non-redundant database, and then run them through PathoLogic.
In addition, he said that the SGN group has performed very little manual curation on the predicted pathways, but plans to adopt a “community annotation” model that would provide web-based tools to allow the plant research community to annotate the database.