In Print: Bioinformatics Tool-Related Papers of Note, July 2005


Clark T, Jurek J, Kettler G, Preuss D. A Structured Interface to the Object-Oriented Genomics Unified Schema for XML-Formatted Data. [Appl. Bioinformatics. 2005;4(1):13-24]: Describes XMLGUS, a framework for transferring data into the Genomics Unified Schema developed at the Computational Biology and Informatics Laboratory at the University of Pennsylvania. XMLGUS formulates an XML interface that includes relational database key constraint definitions, regularizes traversal through that XML, enables automatic processing of the XML with database key constraints, and allows for special processing of input data within the framework for automated processing. Availability:

De la Grange P, Dutertre M, Martin N, Auboeuf D. FAST DB: a Website Resource for the Study of the Expression Regulation of Human Gene Products. [Nucleic Acids Res. 2005 Jul 28;33(13):4276-84]: Describes FAST DB, a bioinformatics suite to define the exon content of all known transcripts produced by human genes. The suite also includes software tools for the graphical presentation of gene products, a sequence multi-alignment of all gene transcripts, and an in silico PCR computer program. Availability:

Jeffries N. Algorithms for Alignment of Mass Spectrometry Proteomic Data. [Bioinformatics 2005 21(14):3066-3073]: Describes two algorithms for aligning data sets from mass spectrometers to ensure that the same protein intensities are correctly identified in each sample. Without an alignment procedure, "it is possible to make errors in identifying the signals from peptides with similar molecular weight," according to the authors. The paper discusses an algorithm designed to work with SELDI data, and another that can be used with data in a more general format. Availability:

Johnson T. Bayesian Method for Disease QTL Detection and Mapping, using a Case and Control Design and DNA Pooling. [arXiv pre-print archive:]: Describes a Bayesian statistical method for determining the genetic basis of a complex genetic trait. The method uses a sample of unrelated individuals classified into two groups. Each group is assumed to have been genotyped at a battery of marker loci using DNA pooling. The method works by conducting an exact Bayesian analysis under a number of simplifying population genetic assumptions.

Jones P, Vinod N, Down T, Hackmann A, Kahari A, Kretschmann E, Quinn A, Wieser D, Hermjakob H, Apweiler R. Dasty and UniProt DAS: a Perfect Pair for Protein Feature Visualization. [Bioinformatics 2005 21(14):3198-3199]: Introduces two resources for the Distributed Annotation System: a DAS reference server that provides up-to-date sequence and annotation from UniProt, and a DAS client implemented using Java and Macromedia Flash that is optimized for the display of protein features. Availability:,

Livny J, Fogel MA, Davis BM, Waldor MK. sRNAPredict: an Integrative Computational Approach to Identify sRNAs in Bacterial Genomes. [Nucleic Acids Res. 2005 Jul 26;33(13):4096-105]: Describes sRNAPredict, a program that uses coordinate-based algorithms to integrate the respective positions of individual predictive features of small non-coding bacterial RNAs (sRNAs) and identify putative intergenic sRNAs.

Löytynoja A, Goldman N. An Algorithm for Progressive Multiple Alignment of Sequences with Insertions. [Proc. Natl. Acad. Sci. USA, Early Edition]: Presents a modification of the traditional sequence-alignment algorithm that can distinguish insertions from deletions and avoid repeated penalization of insertions. According to the authors, the algorithm infers a greater number of insertion events and creates gaps that are phylogenetically consistent but spatially less concentrated.

Margolin A, Greshock J, Naylor T, et al. CGHAnalyzer: a Stand-Alone Software Package for Cancer Genome Analysis using Array-based DNA Copy Number Data. [Bioinformatics 2005 21(15):3308-3311]: Discusses CGHAnalyzer, a software suite for displaying and analyzing array-based comparative genomic hybridization data. CGHAnalyzer can be used to load copy number data from multiple platforms, query and describe large, heterogeneous datasets, and export results. The software also includes several algorithms for hierarchical clustering and class differentiation of microarray data. Availability:

Ohler U, Shomron N, Burge C. Recognition of Unknown Conserved Alternatively Spliced Exons. [PLoS Comp. Biol. 1(2): e15]: Proposes a computational approach called UNCOVER (unknown conserved variable exon recognition) that uses a pair hidden Markov model to discover conserved coding exonic sequences subject to alternative splicing that have so far gone undetected. According to the authors, the method was used to predict skipped exons in 1 percent of the human genome represented by the ENCODE regions, leading to more than 50 new exon candidates. Five novel predicted AS exons were validated by RT-PCR and sequencing analysis.

Price T, Regan R, Mott R, Hedman Å, et al. SW-ARRAY: a Dynamic Programming Solution for the Identification of Copy-Number Changes in Genomic DNA Using Array Comparative Genome Hybridization Data. [Nucleic Acids Res. 2005 33(11):3455-3464]: Describes an adaptation of the Smith-Waterman algorithm to provide an analytic approach for detecting copy-number changes in array CGH data.

Tang Y, Jin B, Zhang YQ. Granular Support Vector Machines with Association Rules Mining for Protein Homology Prediction. [Artif. Intell. Med. 2005 Jul 14 (e-pub ahead of print)]: Describes a new learning model called granular support vector machines (GSVM) for protein homology prediction using protein sequences. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory to first build a sequence of information granules and then build support vector machines in some of these information granules on demand.

Thompson JD, Holbrook SR, Katoh K, Koehl P, Moras D, Westhof E, Poch O. MAO: a Multiple Alignment Ontology for Nucleic Acid and Protein Sequences. [Nucleic Acids Res. 2005 Jul 25;33(13):4164-71]: Introduces MAO, an ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve data sharing between different alignment protocols. Availability:

Tsirigos A, Rigoutsos I. A Sensitive, Support-Vector-Machine Method for the Detection of Horizontal Gene Transfers in Viral, Archaeal and Bacterial Genomes. [Nucleic Acids Res. 2005 33(12):3699-3707]: Describes an improved version of a previously developed computational framework for identifying horizontal transfers with increased sensitivity and the ability to work with increasingly smaller genomes. The method, called Wn-SVM, uses a one-class support-vector machine and can learn using rather small training sets. Availability:

Yun H, Lee DY, Jeong J, Lee S, Lee SY. MFAML: a Standard Data Structure for Representing and Exchanging Metabolic Flux Models. [Bioinformatics 2005 21(15):3329-3330]: Introduces MFAML, a data standard for the formal representation and exchange of metabolic flux models. MFAML enables the description of stationary states of a metabolic system by defining environmental and genetic conditions of the system, such as flux measurements. Availability:

Zhou X, Cao X, Perlman Z, Wong ST. A Computerized Cellular Imaging System for High Content Analysis in Monastrol Suppressor Screens. [J. Biomed. Inform. 2005 Jul 9 (e-pub ahead of print)]: Introduces a bioimage informatics system for high-content screening applications. The system is based on an algorithm called multi-phenotypic mitotic analysis, which is integrated with algorithms for correlation analysis and compound clustering used in gene microarray studies.

