Alter O, Golub G. Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. [Proc. Natl. Acad. Sci. USA 101(47):16577-82]: Presents an integrative data-driven mathematical framework that formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples. By using pseudoinverse projection, the molecular biological profiles of the data samples are least-squares-approximated as superpositions of the basis profiles.
Bornheimer S, Maurya M, Farquhar M, Subramaniam S. Computational modeling reveals how interplay between components of a GTPase-cycle module regulates signal transduction. [Proc. Natl. Acad. Sci. USA 2004 101(45): 15899-15904]: Presents a computational model of the GTPase-cycle module that predicts the interplay of local G protein, R, and GAP concentrations and gives rise to 16 distinct signaling regimes and numerous intermediate signaling phenomena.
Chikayama E, Kurotani A, Kuroda Y, Yokoyama S. ProteoMix: an integrated and flexible system for interactively analyzing large numbers of protein sequences. [Bioinformatics 2004 20(16):2836-2838]: Presents ProteoMix, a suite of Java programs for identifying, annotating, and predicting regions of interest in large sets of amino acid sequences, according to systematic and consistent criteria. It is based on two concepts: the integration of results from different sequence analysis tools increases the prediction reliability; and the integration protocol is critical and needs to be easily adaptable in a case-by-case manner. Availability: http://bio.gsc.riken.jp/ProteoMix/.
Corney D, Buxton B, Langdon W, Jones D. BioRAT: extracting biological information from full-length papers. [Bioinformatics 2004 20(17):3206-3213]: Describes BioRAT (Biological Research Assistant for Text mining), a new information-extraction tool for biomedical text that is able to locate and analyze both abstracts and full-length papers. Availability: http://bioinf.cs.ucl.ac.uk/biorat.
Daruwala R, Rudra A, Ostrer H, et al. A versatile statistical analysis algorithm to detect genome copy number variation. [Proc. Natl. Acad. Sci. USA 2004 101(46): 16292-16297]: Presents a statistical analysis algorithm for detecting genomic aberrations in human cancer cell lines. The algorithm analyzes genomic data obtained from a variety of array technologies and uses a priorless maximum a posteriori estimator and a dynamic programming implementation.
Dou Y, Baisnée P, Pollastri G, Pécout Y, Nowick J, Baldi P. ICBS: a database of interactions between protein chains mediated by ß-sheet formation. [Bioinformatics 2004 20(16):2767-2777]: Describes a database of interchain ß-sheet (ICBS) interactions that is updated on a weekly basis. The database uses an index to quantify the relative contributions of the ß-ladders in the overall interchain interaction and to compute first- and second-order statistics regarding amino acid composition and pairing at different relative positions in the ß-strands. Availability: http://www.igb.uci.edu/servers/icbs/.
Grad Y, Roth F, Halfon M, Church G. Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura. [Bioinformatics 2004 20(16):2738-2750]: Introduces a computational method for identifying cis-regulatory modules that combines aspects of two current methods: phylogenetic footprinting and searches for combinations of transcription factor binding motifs. Availability: http://arep.med.harvard.edu/enhancers.
King A, Prulj N, Jurisica I. Protein complex prediction via cost-based clustering. [Bioinformatics 2004 20(17):3013-3020]: Introduces the Restricted Neighborhood Search Clustering (RNSC) algorithm to efficiently partition networks into clusters using a cost function. The authors applied this cost-based clustering algorithm to protein-protein interaction networks of Saccharomyces cerevisiae, Drosophila melanogaster, and Caenorhabditis elegans to identify and predict protein complexes. Availability: upon request.
Kumar C, LeDuc R, Gong G, et al. ESTIMA, a tool for EST management in a multi-project environment. [BMC Bioinformatics 2004, 5:176]: Introduces ESTIMA (Expressed Sequence Tag Information Management and Annotation), a web-based software system for EST annotation and data management. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema accepts output from any EST processing and assembly pipeline. Availability: http://titan.biotec.uiuc.edu/ESTIMA/.
Kyoda K, Baba K, Onami S, Kitano H. DBRF-MEGN method: an algorithm for deducing minimum equivalent gene networks from large-scale gene expression profiles of gene deletion mutants. [Bioinformatics 2004 20(16):2662-2675]: Introduces the DBRF-MEGN (difference-based regulation finding-minimum equivalent gene network) algorithm, which deduces the most parsimonious signed directed graphs consistent with expression profiles of gene deletion mutants. Signed directed graphs are commonly used to represent gene networks in genetics and cell biology. The authors validated the method by applying it to the gene expression profiles of 265 Saccharomyces cerevisiae deletion mutants, which led to the prediction of 132 transcriptional targets and modulators of transcriptional activity of 18 transcriptional regulators. Availability: upon request.
Martin D, Berriman M, Barton G. GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. [BMC Bioinformatics 2004, 5:178]: Describes, GOtcha, a method for predicting gene product function by annotation with GO terms. GOtcha predicts GO term associations with term-specific probability measures of confidence.
Matsumoto T, Yukawa W, Nozaki Y, et al. Novel algorithm for automated genotyping of microsatellites. [Nucleic Acids Res. 2004 32(20):6069-77]: Describes an algorithm for microsatellite genotyping that interprets peak patterns from individual alleles via pattern recognition of various types of noise peaks, such as stutter peaks and additional peaks. The method achieves an average accuracy of 94 percent for allele calling, according to the authors.
M ller H, Kenny E, Sternberg P. Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. [PLoS Biol. 2004 2 (11): e309]: Describes Textpresso, a text-mining system for scientific literature based on two major elements: a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. According to the authors, full-text access increases recall of biological data types from 45 percent to 95 percent. Availability: http://www.textpresso.org.
Oinn T, Addis M, Ferris F, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. [Bioinformatics 2004 20(17):3045-3054]: Presents a tool for the composition and enactment of bioinformatics workflows. The tool includes a workbench application that provides a graphical user interface for the composition of workflows. These workflows are written in a new language called the simple conceptual unified flow language (Scufl), whereby each step within a workflow represents one atomic task. Availability: http://taverna.sourceforge.net.
Pachter L, Sturmfels B. Parametric inference for biological sequence analysis. [Proc. Natl. Acad. Sci. USA 2004 101(46): 16138-16143]: Introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
Pasquier C, Girardot F, Jevardat de Fombelle K, Christen R. THEA: ontology-driven analysis of microarray data. [Bioinformatics 2004 20(16):2636-2643]: Discusses THEA (Tools for High-throughput Experiments Analysis), an integrated information processing system for automatically annotating microarray data. Availability: http://thea.unice.fr/.
Pedrioli PG, Eng JK, Hubley R, et al. A common open representation of mass spectrometry data and its application to proteomics research. [Nat Biotechnol. 2004 22(11):1459-66]: Describes the mxXML format, an open, generic XML representation of MS data applicable to a broad range of mass spectrometers used in proteomics research. An accompanying suite of supporting programs is also described.
Rubin D, Thorn C, Klein T, Altman R. A Statistical Approach to Scanning the Biomedical Literature for Pharmacogenetics Knowledge. [J Am Med Inform Assoc. 2004 Nov 23; (Epub ahead of print)]: Presents an automated method to identify articles in Medline citations that contain pharmacogenetics data pertaining to gene-drug relationships. A sampling of the articles identified from scanning Medline was reviewed by a pharmacologist to assess the precision of the method. The approach identified 4,892 pharmacogenetics articles in the literature, with 92 percent precision. Availability: http://pharmdemo.stanford.edu/pharmdb/main.spy.