In Print: Bioinformatics Tool-Related Papers of Note, April 2006


Berg J, Lässig M. Alignment of biological networks. [arXiv pre-print archive:]: Discusses an "evolutionarily grounded" method for the cross-species analysis of interaction networks, which maps functional relationships between genes in different organisms. Network alignment is based on a scoring function that measures similarities between networks in their interaction patterns as well as sequence similarities between their nodes.

Canaran P, Stein L, Ware D. Look-Align: an interactive web-based multiple sequence alignment viewer with polymorphism analysis support. [Bioinformatics 2006 22(7):885-886]: Presents Look-Align, a web-based viewer for displaying pre-computed multiple sequence alignments. The viewer was initially developed to support the maize diversity website Panzea (, but the authors note that it is a generic tool "that can be easily integrated into other websites." Availability:

Chatterji S, Pachter L. Reference based annotation with GeneMapper. [Genome Biology 2006, 7:R29]: Describes GeneMapper, a program for transferring annotations from a well-annotated genome to other genomes. Availability:

Dutheil J, Gaillard S, Bazin E, Glemin S, Ranwez V, Galtier N, Belkhir K. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. [BMC Bioinformatics. 2006 Apr 4;7(1):188]: Presents Bio++, a set of object-oriented libraries written in C++. Available components include classes for data storage and handling, various input/output formats, basic sequence manipulation, phylogenetic analysis, Markov models, population genetics and genomics, and algorithms for numerical calculus. Availability:

Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. [Proc Natl Acad Sci USA. 2006 Apr 11;103(15):5923-8]: Describes a mathematical method called probably approximately correct sorting for evaluating the robustness of gene expression signatures used to predict the prognosis and metastatic potential of cancer. According to the authors, the number of samples needed to achieve an overlap of 50 percent between two predictive lists of genes would need the expression profiles of several thousand early discovery patients.

Johnson WE, Rabinovic A, Li C. Adjusting batch effects in microarray expression data using Empirical Bayes methods. [Biostatistics. 2006 Apr 21 (e-pub ahead of print)]: Discusses parametric and nonparametric empirical Bayes frameworks for adjusting microarray data for batch effects that is robust to outliers in small sample sizes. Availability:

Kohler J, Munn K, Ruegg A, Skusa A, Smith B. Quality Control for Terms and Definitions in Ontologies and Taxonomies. [BMC Bioinformatics. 2006 Apr 19;7(1):212]: Introduces computational methods that automatically identify terms and definitions that are defined in a circular or unintelligible way in biomedical ontologies. The authors demonstrate the potential of the methods by applying them to a subset of 6,001 "problematic" GO terms.

Lee DY, Yun C, Cho A, Hou BK, Park S, Lee SY. WebCell: a web-based environment for kinetic modeling and dynamic simulation of cellular networks. [Bioinformatics 2006 22(9):1150-1151]: Introduces WebCell, a web-based environment for managing quantitative and qualitative information on cellular networks and for "interactively exploring their steady-state and dynamic behaviors in response to systemic perturbations," according to the authors. Availability: or

Misura KM, Chivian D, Rohl CA, Kim DE, Baker D. Physically realistic homology models built with ROSETTA can be more accurate than their templates. [Proc Natl Acad Sci USA. 2006 Apr 4;103(14):5361-6]: Introduces a method that combines the Rosetta de novo protein-folding method with distance constraints derived from homologous structures "to build homology models that are frequently more accurate than their templates," according to the authors.

Ren Y, Gong W, Xu Q, Zheng X, Lin D, Wang Y, Li T. siRecords: an extensive database of mammalian siRNAs with efficacy ratings. [Bioinformatics 2006 22(8):1027-1028]: Introduces siRecords, a database of siRNAs experimentally tested by researchers with "consistent efficacy ratings." The authors expect the database to help siRNA researchers develop more reliable siRNA design rules. The database currently includes more than 4,100 siRNA sequences obtained from more than 1,200 studies. Availability:

Schilstra MJ, Li L, Matthews J, Finney A, Hucka M, Le Novere N. CellML2SBML: conversion of CellML into SBML. [Bioinformatics 2006 22(8):1018-1020]: Introduces CellML2SBML, a suite of XSLT style sheets for converting biological models expressed in CellML into SBML without significant loss of information. The converter is based on CellML version 1.1 and SBML Level 2 Version 1. Availability:

Shlomi T, Segal D, Ruppin E, Sharan R. QPath: a method for querying pathways in a protein-protein interaction network. [BMC Bioinformatics 2006, 7:199]: Presents a framework for protein network searches. Given a linear query pathway and a network of interest, the algorithm searches the network for homologous pathways, allowing both insertions and deletions of proteins in the identified pathways. Matched pathways are automatically scored according to their variation from the query pathway in terms of the protein insertions and deletions they employ, the sequence similarity of their constituent proteins to the query proteins, and the reliability of their constituent interactions.

Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P, Sansone SA, Taylor C, White J, Stoeckert CJ Jr. The MGED Ontology: a resource for semantics-based description of microarray experiments. [Bioinformatics 2006 22(7):866-873]: Describes the MGED Ontology, developed by the Ontology Working Group of the Microarray Gene Expression Data Society. The MO provides terms for annotating all aspects of a microarray experiment, from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. Availability:

Wiese KC, Hendriks A. Comparison of P-RnaPredict and mfold-algorithms for RNA secondary structure prediction. [Bioinformatics. 2006 Apr 15;22(8):934-42]: Describes P-RnaPredict, a parallel evolutionary algorithm for RNA secondary structure prediction. According to the authors, P-RnaPredict can predict structures with higher true positive base pair counts and lower false positives than mfold on certain sequences. Availability: from the author upon request ([email protected]).

Xirasagar S, Gustafson SF, Huang CC, Pan Q, Fostel J, Boyer P, Merrick BA, Tomer KB, Chan DD, Yost KJ 3rd, Choi D, Xiao N, Stasiewicz S, Bushel P, Waters MD. Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM: design and application. [Bioinformatics 2006 22(7):874-882]: Discusses the use of an object model, SysBio-OM, which has been designed to facilitate the integration of microarray gene expression, proteomics, and metabolomics data in the CEBS data repository. Availability: and

Yu H, Paccanaro A, Trifonov V, Gerstein M. Predicting interactions in protein networks by completing defective cliques. [Bioinformatics 2006 22(7):823-829]: Discusses a method for predicting missed protein-protein interactions in datasets derived from high-throughput methods. The method searches the protein interaction network for "defective cliques," which the authors define as "nearly complete complexes of pairwise interacting proteins," to predict the interactions that complete them. Availability:

