In Print: Bioinformatics Tool-Related Papers of Note, June 2005


Aloisio G, Cafaro M, Fiore S, Mirto M. ProGenGrid: A Grid-Enabled Platform for Bioinformatics. [Stud Health Technol Inform. 2005;112:113-26]: Describes ProGenGrid (Proteomics and Genomics Grid), a virtual laboratory for simulating biological experiments, composing existing analysis and visualization tools, monitoring their execution, storing the intermediate and final output and saving the model of the experiment for updating or reproducing it.

Boulesteix A, Strimmer K. Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach. [Theoretical Biology and Medical Modelling 2005, 2:23]: Proposes a statistical approach based on partial least squares regression to infer the transcription factor activities from a combination of mRNA expression and DNA-protein binding measurements. Availability:

Cannataro M, Cuda G, Veltri P. Modeling and Designing a Proteomics Application on PROTEUS. [Methods Inf Med. 2005;44(2):221-226]: Discusses PROTEUS, a software platform for modeling and executing biomedical applications on a computational grid. The platform uses domain ontologies and workflow techniques for modeling biomedical applications, and grid middleware for high performance execution.

Cartharius K, Frech K, Grote K, Klocke B, Haltmeier M, Klingenhoff A, Frisch M, Bayerlein M, Werner T. MatInspector and beyond: promoter analysis based on transcription factor binding sites. [Bioinformatics 2005 21(13):2933-2942]: Presents a new version of the program MatInspector that identifies transcription factor binding sites in nucleotide sequences using a large library of weight matrices. Availability:

Corcoran DL, Feingold E, Dominick J, Wright M, Harnaha J, Trucco M, Giannoukakis N, Benos PV. Footer: A quantitative comparative genomics method for efficient recognition of cis-regulatory elements. [Genome Res. 2005 Jun;15(6):840-7]: Describes an algorithm for identifying mammalian DNA regulatory regions using evolutionary information to reduce the number of false-positive predictions. The pattern-identification method compares a pair of putative binding sites in two species and assigns two probability scores based on the relative position of the sites in the promoter and their agreement with a known model of binding preferences. In tests, the algorithm exhibited 83 percent sensitivity and 72 percent specificity. Availability:

Crowe M. SeqDoC: rapid SNP and mutation detection by direct comparison of DNA sequence chromatograms. [BMC Bioinformatics 2005, 6:133]: Introduces SeqDoC, a web-based tool to carry out direct comparison of ABI sequence chromatograms. According to the authors, the tool can identify SNPs and point mutations without the need to install or learn more complicated analysis software. Availability:

Hennequet-Antier C, Chiapello H, Piot K, Degrelle S, Hue I, Renard J, Rodolphe F, Robin S. AnovArray: a set of SAS macros for the analysis of variance of gene expression data. [BMC Bioinformatics 2005, 6:150]: Presents AnovArray, a package implementing analysis of variance for gene expression data using SAS statistical software. Availability:

Karchin R, Diekhans M, Kelly L, Thomas D, Pieper U, Eswar N, Haussler D, Sali A. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. [Bioinformatics 2005 21(12):2814-2820]: Introduces LS-SNP, a genomic-scale software pipeline to annotate non-synonymous SNPs. The software maps nsSNPs onto protein sequences, functional pathways, and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. Availability:

Li D, Fu Y, Sun R, Ling C, Wei Y, Zhou H, Zeng R, Yang Q, He S, Gao W. pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry [Bioinformatics 2005 21(13):3049-3050]: Introduces a database-searching software system called pFind (peptide/protein Finder), which employs a previously reported peptide-scoring algorithm. Availability:

Muhammad AJ, Markram H. NEOBASE: Databasing the Neocortical Microcircuit. [Stud Health Technol Inform. 2005;112:167-77]: Introduces NEOBASE, a project to archive neocortical microcircuit data in order to "facilitate development of advanced data mining applications, statistical and bioinformatics analyses tools, custom microcircuit builders, and visualization and simulation applications." The database architecture is based on ROOT, a software environment that enables the creation of an object-oriented database with relational capabilities.

Park J, Hu Y, Murthy T, Vannberg F, et al. Building a human kinase gene repository: Bioinformatics, molecular cloning, and functional validation. [Proc Natl Acad Sci USA. 2005 Jun 7;102(23):8114-9]: Describes a project to mine public databases to collect the sequence information of all identified human kinase genes and the cloning of the corresponding ORFs. The authors identified 663 genes, 511 encoding protein kinases, and 152 encoding nonprotein kinases.

Sarda D, Chua G, Li K, Krishnan A. pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. [BMC Bioinformatics 2005, 6:152]: Presents a new algorithm called pSLIP that uses support vector machines in conjunction with multiple physicochemical properties of amino acids to predict protein subcellular localization in eukaryotes across six different locations: chloroplast, cytoplasmic, extracellular, mitochondrial, nuclear and plasma membrane. Availability:

Smilde A, Jansen J, Hoefsloot H, Lamers R, van der Greef J, Timmerman M. ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data. [Bioinformatics 2005 21(13):3043-3048]: Discusses ASCA, a method that can analyze complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case, according the authors. Availability:

Song J, Tang H. A new 2-D graphical representation of DNA sequences and their numerical characterization. [J Biochem Biophys Methods. 2005 Jun 30;63(3):228-39]: Presents a novel 2D graphical representation of DNA sequences according to chemical structures of bases, reflecting distribution of bases with different chemical structure, preserving information on sequential adjacency of bases, and allowing numerical characterization. The representation avoids loss of information accompanying alternative 2D representations in which the curve standing for DNA overlaps and intersects itself, according to the authors.

Stumpf MPH, Ingram P, Nouvel I, Wiuf C. Statistical model selection methods applied to biological networks. [ArXiv pre-print archive:]: Describes the use of statistical model-selection methods to determine which functional form best describes degree distributions of protein interaction and metabolic networks. According to the authors, current protein interaction and metabolic network data from different organisms "suggests that simple scale-free models do not provide an adequate description of real network data."

Yan C, Burleigh JG, Eulenstein O. Identifying optimal incomplete phylogenetic data sets from sequence databases. [Mol Phylogenet Evol. 2005 Jun;35(3):528-35]: Describes a method for identifying optimal incomplete data sets from large sequence databases based on the graph theoretic concept of alpha-quasi-bicliques, which searches large sequence databases to identify useful phylogenetic data sets with a specified amount of missing data while maintaining the necessary amount of overlap among genes and taxa.

