In Print: Bioinformatics Tool-Related Papers of Note, August 2003


Cherepinsky V, Feng J, Rejali M, Mishra B. Shrinkage-based similarity metric for cluster analysis of microarray data. [Proc. Natl. Acad. Sci. 2003 100(17): 9668-9673]: Introduces a “mathematically rigorous” correlation coefficient for the analysis of microarray data that improves upon the current standard correlation coefficient, introduced by Eisen et al., which the authors deem “rather arbitrary.”

Frank M, et al. Dynamic molecules: molecular dynamics for everyone. An internet-based access to molecular dynamic simulations: basic concepts. [J Mol Model 2003 Aug 8 (epub ahead of print)]: Describes an internet portal ( with publicly available software for setting up, performing, and analyzing molecular dynamic simulations.

Herron-Olson L, et al. MGView: an alignment and visualization tool to enhance gap closure of microbial genomes. [Nucleic Acids Research, 2003, Vol. 31, No. 17 e106]: Paper describes a software tool, MBView, to graphically depict the alignment of a set of microbial contigs against a completed microbial genome in order to increase the efficiency of gap closure in shotgun genome sequencing.

Hubley R, Zitzler E, Roach J. Evolutionary algorithms for the selection of single nucleotide polymorphisms. [BMC Bioinformatics 2003, 4:30]: Describes MAGMA (Multiobjective Analyzer for Genetic Marker Acquisition), a modified version of the Strength-Pareto evolutionary algorithm implemented in Java that supports genetic marker selection for large-scale SNP-selection projects. Software and source code available at

Killion P, Sherlock G, Iyer V. The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD). [BMC Bioinformatics 2003, 4:32]: Introduces the Longhorn Array Database, a MIAME compliant, open-source version of the Stanford Microarray Database that operates on PostgreSQL and Linux instead of Oracle and Solaris. LAD is available at

Li K. ClustalW-MPI: ClustalW analysis using distributed and parallel computing. [Bioinformatics 2003 Vol. 19 no. 12: 1585-1586]: Discusses ClustalW-MPI, a distributed and parallel implementation of ClustalW. The source code is available at

Lieberfarb M. Genome-wide loss of heterozygosity analysis from laser capture microdissected prostate cancer using single nucleotide polymorphic allele (SNP) arrays and a novel bioinformatics platform dChipSNP. [Cancer Res. 2003 Aug 15;63(16):4781-5]: Introduces an informatics platform, dChipSNP, used to automate the definition of statistically valid regions of loss of heterozygosity (LOH), assign LOH genotypes to prostate cancer samples, and organize by hierarchical clustering prostate cancers based on the pattern of LOH.

Lu X, Olson W. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. [Nucleic Acids Research, 2003, Vol. 31, No. 17 5108-5121]: Describes a software package called 3DNA, for the analysis, reconstruction, and visualization of three-dimensional nucleic acid structures using coordinate files from the Protein Data Bank.

Schou Larsen T, Krogh A. EasyGene — a prokaryotic gene finder that ranks ORFs by statistical significance. [BMC Bioinformatics 2003, 4:21]: Presents an automated gene-finding method that estimates the statistical significance of a predicted gene using a hidden Markov model that is automatically estimated for a new genome. The software and pre-trained models are available at

Sebastiani P. et al. Minimal haplotype tagging. [Proc. Natl. Acad. Sci. 2003 100(17): 9900-9905]: Describes a method, called BEST (best enumeration of SNP tags), that is able to identify the minimum set of SNPs to uniquely identify a haplotype. The software is available at

Whelan S, de Bakker P, Goldman N. Pandit: a database of protein and associated nucleotide domains with inferred trees. [Bioinformatics 2003 Vol. 19 no. 12: 1556-1563]: Describes the Pandit (Protein and Associated Nucleotide Domains with Inferred Trees) database, which contains 4,341 families of sequences derived from the seed alignments of the Pfam database. The database is available at


