In Print: Bioinformatics Tool-Related Papers of Note, March 2005


Bensmail H, Golek J, Moody M, Semmes O, Haoudi A. A novel approach for clustering proteomics data using Bayesian Fast Fourier Transform. [Bioinformatics. 2005 March 15, (e-pub ahead of print)]: Presents algorithms that can organize, cluster, and derive meaningful patterns of expression from large-scale proteomics experiments. Raw data is transformed from real space data-expression to a complex space data-expression using discrete Fourier transformation, a thresholding approach is used to denoise and reduce the length of each spectra, and Bayesian clustering is applied to the reconstructed data.

Best C, Zimmer R, Apostolakis J. Probabilistic methods for predicting protein functions in protein-protein interaction networks. [ArXiv pre-print archive:]: Discusses probabilistic methods for predicting protein functions from protein-protein interaction networks based on previous work using Markov random fields, which is extended and compared to a general machine-learning theoretic approach.

Bokhari S, Sauer J. A parallel graph decomposition algorithm for DNA sequencing with nanopores. [Bioinformatics 2005 21(7):889-896]: Discusses methods for DNA sequence assembly using nanopore devices that can sense the bases of translocating single-stranded DNA, generating reads of 105 bases in large numbers and at high speed. The assembly algorithm is a variation of the Eulerian path approach that searches over a space of de Bruijn graphs until it finds one in which the impact of errors is eliminated and both possible orientations of the two ssDNA sequences can be identified separately and unambiguously.

Cai JJ, Smith DK, Xia X, Yuen KY. MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution. [BMC Bioinformatics. 2005 Mar 22;6(1):64]: Introduces a Matlab toolbox, called MBEToolbox, that offers implementations of the most needed functions in molecular biology and evolution. It can be used to manipulate aligned sequences, calculate evolutionary distances, estimate synonymous and nonsynonymous substitution rates, and infer phylogenetic trees. Availability:

Campagna D, Romualdi C, Vitulo N, Del Favero M, Lexa M, Cannata N, Valle G. RAP: a new computer program for de novo identification of repeated sequences in whole genomes. [Bioinformatics 2005 21(5):582-588]: Introduces RAP (Repeat Analysis Program), which is based on a new word-counting algorithm optimized for high-resolution repeat identification using gapped words. Availability: Upon request, [email protected].

Colbourne JK, Singan VR, Gilbert DG. wFleaBase: the Daphnia genome database. [BMC Bioinformatics. 2005 Mar 7;6(1):45]: Describes wFleaBase, a database for curating, archiving, and sharing genetic, molecular, and functional genomic data and protocols for an emerging model organism, the microcrustacean Daphnia, commonly known as the water flea. The database is built primarily from core components of the Generic Model Organism Database project. Availability:

Dhar P, Meng T, Somani S, Ye L, Sakharkar K, Krishnan A, Ridwan A, Ho Kok Wah S, Mand C, Hao Z. Grid Cellware: the first grid-enabled tool for modelling and simulating cellular processes. [Bioinformatics 2005 21(7):1284-1287]: Introduces Grid Cellware, an integrated modeling and simulation tool that implements various pathway simulation algorithms along with an adaptive swarm algorithm for parameter estimation. Availability:

Di Bernardo D, Thompson M, Gardner T, Chobot S, Eastwood E, Wojtovich A, Elliott S, Schaus S, Collins J. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. [Nature Biotechnology 23, 377-383 (2005)]: Presents an integrated computational-experimental approach for computing the likelihood that gene products and associated pathways are targets of a compound. This is achieved by filtering the mRNA expression profile of compound-exposed cells using a reverse-engineered model of the cell's gene regulatory network.

Dror G, Sorek R, Shamir R. Accurate identification of alternatively spliced exons using support vector machine. [Bioinformatics 2005 21(7):897-901]: Describes a study that used machine-learning methods to generate a robust classifier for identifying alternatively spliced exons. The authors identify seven attributes that are dominant for the task of classification, and several less informative features that help to slightly increase the performance of the classifier, which achieves a true positive rate of 50 percent for a false positive rate of 0.5 percent. Availability: Upon request, [email protected].

Grienberg I, Benayahu D. Osteo-Promoter Database (OPD) — Promoter analysis in skeletal cells. [BMC Genomics. 2005 Mar 25;6(1):46]: Describes the Osteo-Promoter Database, a collection of genes and promoters expressed in skeletal cells. Availability:

Huang Y, Pumphrey J, Gingle A. ESTminer: a Web interface for mining EST contig and cluster databases. [Bioinformatics 2005 21(5):669-670]: Presents ESTminer, a web application and database schema for interactive mining of EST contig and cluster datasets. Availability:

Middendorf M, Ziv E, Wiggins C. Inferring network mechanisms: The Drosophila melanogaster protein interaction network. [Proc Natl Acad Sci USA. 2005 Mar 1;102(9):3192-7]: Describes a method for inferring the mechanism most accurately capturing a given network topology, exploiting discriminative tools from machine learning. The method is used to classify the Drosophila melanogaster protein network as a duplication-mutation-complementation network over preferential attachment, small-world, and a duplication-mutation mechanism without complementation.

Rahman S, Advani P, Schunk R, Schrader R, Schomburg D. Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC). [Bioinformatics 2005 21(7):1189-1193]: Describes Pathway Hunter Tool, a fast tool for analyzing the shortest paths in metabolic pathways. Availability:

Robins H, Li Y, Padgett R. Incorporating structure to predict microRNA targets. [Proc Natl Acad Sci USA. 2005 Mar 15;102(11):4006-9]: Presents an algorithm for predicting microRNA targets that does not rely on evolutionary conservation. As one of the features of this algorithm, the authors incorporate the folded structure of mRNA.

Tang F, Chua CL, Ho LY, Lim YP, Issac P, Krishnan A. Wildfire: distributed, Grid-enabled workflow construction and execution. [BMC Bioinformatics. 2005 Mar 24;6(1):69]: Introduces Wildfire, a graphical user interface for constructing and running workflows. Wildfire borrows user interface features from Jemboss and adds a drag-and-drop interface allowing the user to compose EMBOSS and other programs into workflows. Availability:

Wang L, Liu S, Niu T, Xu X. SNPHunter: a bioinformatic software for single nucleotide polymorphism data acquisition and management. [BMC Bioinformatics 2005, 6:60]: Introduces a software program, SNPHunter, that allows for both ad hoc-mode and batch-mode SNP searching, automatic SNP filtering, and retrieval of SNP data, including physical position, function class, flanking sequences at user-defined lengths, and heterozygosity from NCBI dbSNP. Availability:

Westbrook J, Ito N, Nakamura H, Henrick K, Berman H. PDBML: the representation of archival macromolecular structure data in XML. [Bioinformatics 2005 21(7):988-992]: Describes the Protein Data Bank's recently released versions of the PDB Exchange dictionary and the PDB archival data files in XML format, collectively named PDBML. Availability:

