Bard J, Rhee S, Ashburner M. An ontology for cell types. [Genome Biology 2005, 6:R21]: Describes an ontology for cell types that covers the prokaryotic, fungal, animal and plant worlds. It includes over 680 cell types that are classified under several generic categories and are organized as a directed acyclic graph. The ontology is is designed to be used in the context of model organism genome and other biological databases. Availability: http://obo.sourceforge.net/.
Eddy S. A model of the statistical power of comparative genome sequence analysis. [PLoS Biol 3(1): e10]: Presents a mathematical model for predicting how many genomes are needed for comparative genomics, and at what evolutionary distances. One “rule of thumb” the author notes is that, “For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected.”
Griffiths-Jones S. RALEE-RNA Alignment Editor in Emacs. [Bioinformatics 2005 21(2):257-259]: Describes RALEE (RNA Alignment Editor in Emacs), an environment for RNA multiple sequence alignment editing based on the Emacs text editor. Availability: http://www.sanger.ac.uk/Users/sgj/ralee/.
Guryev V, Berezikov E, Cuppen E. CASCAD: a database of annotated candidate single nucleotide polymorphisms associated with expressed sequences. [BMC Genomics 2005, 6:10]: Describes CASCAD, a database designed for presentation and query of candidate SNPs that are retrieved by in silico mining of high-throughput sequencing data. Availability: http://cascad.niob.knaw.nl.
Hamoudi R, El-Hamidi A, Du M. Identification of novel prognostic markers in cervical intraepithelial neoplasia using LDMAS (LOH Data Management and Analysis Software). [BMC Bioinformatics 2005, 6:18]: Describes LDMAS (Loss of Heterozygosity Data Management and Analysis Software), which was designed for data retrieval, management, and integrated analysis of clinico-pathological data and molecular results from independent databases. LDMAS can be used to stratify disease stages according to clinical stage or histological features and to correlate various clinico-pathological features with molecular findings to obtain relevant prognostic markers. Availability: http://molpath.his.path.cam.ac.uk/bioinformatics/LDMAS.shtml.
Hayes K, Vollrath A, Zastrow G, et al. EDGE: A centralized resource for the comparison, analysis, and distribution of toxicogenomic information. [Mol Pharmacol. 2005 Jan 20 (e-pub ahead of print)]: Describes a centralized microarray data resource called EDGE (Environment, Drugs, Genes and Expression) with uniform informatics tools for the analysis and sharing of toxicogenomic data. Availability: http://edge.oncology.wisc.edu/edge.php.
Kahraman A, Avramov A, Nashev L, et al. PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. [Bioinformatics 2005 21(3):418-420]: Introduces PhenomicDB, a multi-species genotype/phenotype database that merges public data from a wide range of model organisms and Homo sapiens. Availability: http://www.phenomicDB.de.
Krause A, Stoye J, Vingron M. Large scale hierarchical clustering of protein sequences. [BMC Bioinformatics. 2005 Jan 22;6(1):15]: Presents a new method for clustering a group of sequences that a given query sequence belongs to. The approach groups all known protein sequences hierarchically into superfamily and family clusters and uses graph-based algorithms that take into account “the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning.” Availability: http://systers.molgen.mpg.de/.
Liwo A, Khalili M, Scheraga H. Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. [Proc Natl Acad Sci U S A. 2005 Jan 26 (e-pub ahead of print)]: Discusses the use of Langevin dynamics with a physics-based united-residue (UNRES) force field in protein-folding simulations. Folding with Langevin dynamics required 2-10 hours of CPU time on average with a single AMD Athlon MP 2800+ processor. With the advantage of parallel processing, the authors note, “this process leads to the possibility to explore thousands of folding pathways and to predict not only the native structure but also the folding scenario of a protein together with its quantitative kinetic and thermodynamic characteristics.”
Massar J, Travers M, Elhai J, Shrager J. BioLingua: a programmable knowledge environment for biologists. [Bioinformatics 2005 21(2):199-207]: Discusses BioLingua, a web-based programming environment that enables biologists to combine knowledge and data through direct end-user programming. BioLingua embeds a symbolic programming language in a frame-based knowledge environment, integrating genomic and pathway knowledge about a class of similar organisms. The BioLingua language provides interfaces to “numerous” bioinformatics tools, according to the authors. Availability: http://nostoc.stanford.edu/Docs/.
Nix D, Eisen M. GATA: a graphic alignment tool for comparative sequence analysis. [BMC Bioinformatics 2005, 6:9]: Presents GATA, a platform-independent graphic alignment tool for comparative sequence analysis. GATA uses the NCBI-BlastN program and “extensive” post-processing to align two DNA sequences. It functions independent of sequence feature ordering or orientation, and visualizes both large and small sequence inversions, duplications, and segment shuffling. Availability: http://gata.sourceforge.net/.
Singh AV, Knudsen KB, Knudsen TB. Computational systems analysis of developmental toxicity: design, development and implementation of a Birth Defects Systems Manager (BDSM). [Reprod Toxicol. 2005 Jan-Feb;19(3):421-39]: Presents the initial design, development, and implementation of a Birth Defects Systems Manager (BDSM), a bioinformatics infrastructure to manage functional genomics data specifically engineered for the analysis of developmental processes and toxicities. Availability: http://systemsanalysis.louisville.edu/.
Stanislaus R, Chen C, Franklin J, Arthur J, Almeida J. AGML central: web based gel proteomic infrastructure. [Bioinformatics. 2005 Jan 12 (e-pub ahead of print)]: Presents an open-source public infrastructure for dissemination of 2D gel electrophoresis proteomics data in AGML (Annotated Gel Markup Language) format. It includes several converters from proprietary formats, such as those produced by PDQuest (Bio-Rad), Phoretix 2D (Nonlinear Dynamics) and Melanie (GenBio). Availability: http://bioinformatics.musc.edu/agmlcentral.
Sugimoto M, Kikuchi S, Arita M, Soga T, Nishioka T, Tomita M. Large-scale prediction of cationic metabolite identity and migration time in capillary electrophoresis mass spectrometry using artificial neural networks. [Anal Chem. 2005 Jan 1;77(1):78-84]: Describes a computational technique to assist in the large-scale identification of charged metabolites. An ensemble of artificial neural networks is used to predict the electrophoretic mobility of metabolites in capillary electrophoresis/mass spectrometry. When used to characterize all metabolites listed in the KEGG ligand database, the method predicted the correct compounds among the top three candidates in 78 percent of cases.
Wang M, Yang J, Xu Z, Chou K. SLLE for predicting membrane protein types. [J Theor Biol. 2005 Jan 7;232(1):7-15]: Describes the use of the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction to extract the essential features from high-dimensional pseudo amino acid composition space in protein attribute prediction.
Washietl S, Hofacker I, Stadler P. Fast and reliable prediction of noncoding RNAs. [Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2454-9]: Describes an approach that combines comparative sequence analysis and structure prediction to detect functional RNAs. The method consists of two basic components: a measure for RNA secondary structure conservation; and a measure for thermodynamic stability that is normalized with respect to both sequence length and base composition. The method is implemented in the program RNAZ. Availability: http://www.tbi.univie.ac.at/wash/RNAz.
Yiu S, Wong P, Lam T, Mui Y, et al. Filtering of ineffective siRNAs and improved siRNA design tool. [Bioinformatics 2005 21(2):144-151]: Describes a program that filters out “ineffective” siRNAs designed by software tools based on the Max Planck Institute’s basic siRNA design principles. The filtering algorithm is based on “new observations” about the secondary structure of siRNAs. Availability: http://www.cs.hku.hk/~sirna/.
Zhang K, Qin Z, Chen T, Liu J, Waterman M, Sun F. HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. [Bioinformatics 2005 21(1):131-134]: Presents a suite of computer programs to analyze linkage-disequilibrium patterns and to select corresponding tag SNPs. The authors claim this program uses dynamic-programming algorithms that are “guaranteed to find the block partition with a minimum number of tag SNPs for the given criteria of blocks and tag SNPs.” Availability: http://www.cmb.usc.edu/~msms/HapBlock.