Until recently, the bioinformatics tools available for conducting evolutionary analyses have been limited to focusing on the history of a single enzymatic function. But a team of computational biologists from the Argonne National Laboratory has developed software that allows for a systems-level evolutionary analysis of hundreds of enzymatic functions and their pathways.
Chisel is an online, open-access interface that uses enzymatic sequences taken from both users and public databases in order to generate function-specific and taxonomic-specific sequence clusters. With these clusters, the program creates libraries of computational models that can be used to predict unannotated sequences. Users can search Chisel to study how certain enzymatic functions differ from microorganism to microorganism in the context of metabolic pathways throughout evolution.
Prior to Chisel, researchers were forced to troll through annotated enzymatic functions in protein families from the major databases, says Alex Rodriguez, lead developer of Chisel at Argonne. But Chisel offers researchers something different. “We have not found any clusters that are similar to Chisel,” says Rodriquez. “There were many protein families, but they are not specific enough in function, so this goes a little further.”
The program works by clustering sequences together based on domain and phylogenetic analyses; then, separations between the taxonomies are determined. The taxonomy-specific sequence clusters are presented as computational models, such as multiple sequence alignments, HMMs, PSSMs, and consensus sequences, which are all available to download from Chisel’s Web portal. These models can then be used to annotate functions or predict unannotated sequences. At present, Chisel contains more than 900 separate enzymatic functions and more than 8,500 clusters, including 90 models for staphylococcus, 126 models for streptococcus, and 250 models for enzymes related to the enterobacteriaceae, a large family of bacteria that includes pathogens responsible for typhoid fever and E. coli.
To test Chisel’s abilities, the team conducted more than 200,000 experiments predicting functions of hypothetical proteins and unannotated sequences. According to Rodriguez, Chisel performed well, predicting the functions with roughly 95 percent accuracy. Chisel’s results also had a higher degree of accuracy predicting enzymatic functions when compared to the annotated enzymatic functions in protein families from popular databases. Rodriguez hopes that in the future this new tool will offer even more features in the clustering process, such as phenotypic or structural-based clustering, in order to help researchers make sense of the steady flow of genomic and enzymatic data.