Skip to main content
Premium Trial:

Request an Annual Quote

In Print: Bioinformatics Tool-Related Papers of Note, January 2004


Alexandrov V, Gerstein M. Using 3D hidden Markov models that explicitly represent spatial coordinates to model and compare protein structures. [BMC Bioinformatics 2004, 5:2]: Presents a hidden Markov model formalism that explicitly uses 3D coordinates in its match states, along with methods for aligning query structures against 3D HMMs and scoring the results probabilistically. Availability:

Berglund A, et al. ProVal: a protein-scoring function for the selection of native and near-native folds. [Proteins 2004 54(2):289-302.]: Describes a low-resolution scoring function, ProVal, for the selection of native and near-native structures from a set of predicted structures for a given protein sequence.

Chang J, et al. GAPSCORE: finding gene and protein names one word at a time. [Bioinformatics 2004 20(2):216-225]: Introduces GapScore, a method to identify gene and protein names in text by scoring words based on a statistical model of gene names that quantifies their appearance, morphology, and context. Availability:

Comeau S, et al. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. [Bioinformatics 2004 20(1):45-50]: Presents an algorithm for filtering docked conformations with good surface complementarity, and ranking them based on their clustering properties. Robustness was tested on sets of 2,000 docked conformations generated for 48 pairs of interacting proteins. In 31 of these cases, the top 10 predictions include at least one near-native complex, with an average RMSD of 5 Å from the native structure. Availability:

Denoeud F, Vergnaud G. Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains: a web-based resource. [BMC Bioinformatics 2004, 5:4]: Presents a tool to automatically identify tandem repeats of a different length in the genome sequence of two or more closely related bacterial strains. Genome comparisons are pre-computed, and the results of the comparisons are parsed in an online database. Availability:

Diaz-Uriarte R. A simple method for finding molecular signatures from gene expression data. [arXiv pre-print archive:]: Describes a R-based method for searching for gene expression signatures using principal components analysis.

Dugan J, Altman R. Using surface envelopes for discrimination of molecular models. [Protein Sci. 2004 Jan;13(1):15-24]: Describes a method for using shape information to distinguish structural models of biological macromolecules. The method uses a data structure called a surface envelope to represent the shape of the molecule, along with a fitness score for the shape of a particular molecular model. A hybrid algorithm is described that aligns the model to the surface envelope in three-dimensional space and assesses the degree to which atoms in the model fill the surface envelope.

Elias J, et al. Intensity-based protein identification by machine learning from a library of tandem mass spectra. [Nature Biotechnology: Published online: 18 January 2004, doi:10.1038/nbt930]: Introduces an algorithm that exploits the intensity patterns present in mass spectra in order to improve peptide and protein identification from MS/MS spectra.

Farina L, Mogno I. A fast reconstruction algorithm for gene networks. [arXiv pre-print archive:]: Describes a method for reconstructing gene networks in which the amount of available data is largely insufficient to uniquely determine the network.

Gajer P, et al. Automated correction of genome sequence errors. [Nucleic Acids Research 2004 32(2):562-569]: Discusses a base-calling program called AutoEditor that improves the overall accuracy of genome sequences for polymorphism discovery. In a large set of recent genome sequencing projects, the number of erroneous base calls was reduced by 80 percent, the authors wrote. Availability:

Hofacker I, et al. Prediction of locally stable RNA secondary structures for genome-wide surveys. [Bioinformatics 2004 20(2):186-190]: Describes algorithms for computing locally stable RNA structures at genome-wide scales. Availability:

Horne B, Camp N. Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation. [Genet Epidemiol. 2004 Jan;26(1):11-21]: A PCA-based method for identifying linkage disequilibrium groups and the selection of an optimal set of group-tagging SNPs that capture sufficient intragenic genetic diversity. According to the authors, the PCA method differs from haplotype block and haplotype-tagging SNP methods because an LD-group of SNPs does not have to be a contiguous DNA fragment.

Huang Y. Prediction of protein subcellular locations using fuzzy k-NN method. [Bioinformatics 2004 20(1):21-28]: Introduces a fuzzy k-nearest neighbors algorithm to predict proteins’ subcellular locations from their dipeptide composition. Using a data set derived from version 41.0 of Swiss-Prot, an overall predictive accuracy of about 80 percent has been achieved. Availability:

Karney C, et al. Method for computing protein binding affinity. [arXiv pre-print archive:]: Describes a Monte Carlo method for computing the binding affinity of a ligand to a protein.

Kurtz S, et al. Versatile and open software for comparing large genomes. [Genome Biology 2004, 5:R12]: Discusses the latest version of MUMmer, version 3.0, a software system for comparing large eukaryotic genomes at varying evolutionary distances. The updated system includes two new graphical viewing tools and is the first version of MUMmer to be released as open-source software. Availability:

Liu L, et al. Multi-species comparative mapping in silico using the COMPASS strategy. [Bioinformatics 2004 20(2):148-154]: Describes COMPASS (comparative mapping by annotation and sequence similarity), which uses existing comparative genome maps based on conserved regions to predict map locations of a sequence. Availability:

Meyer I, et al. Gene structure conservation aids similarity-based gene prediction. [Nucleic Acids Research 2004 32(2):776-783]: A gene-prediction algorithm implemented in a computer program called Projector, which combines comparative and similarity approaches. Compared to Genewise on a test set of 491 pairs of independently confirmed mouse and human genes, the algorithm was found to be more accurate for genes whose proteins are less than 80 percent identical.

Poeschel T, et al. Online tool for the discrimination of equidistributions. [arXiv pre-print archive:]: A software tool applicable for determining the distribution of point mutations in coding genes based on an equiprobability distribution is described. Availability:

Pollard D, et al. Benchmarking tools for the alignment of functional noncoding DNA. [BMC Bioinformatics 2004, 5:6]: Describes a simulated set of alignments developed to measure the ability of various tools to align non-coding sequence.

Qu Y, et al. Protein structure prediction using sparse dipolar coupling data. [Nucleic Acids Res. 2004 32(2):551-561]: Presents a computer program called RDC-Prospect, for solving protein structures using data from residual dipolar coupling (RDC), an emerging NMR technique for protein structure studies. The software predicts a structure based on a structural homolog or analog of the target protein in the PDB that best aligns with the RDC data for the protein.

Robertson G, et al. Identification and interrogation of highly informative single nucleotide polymorphism sets defined by bacterial multilocus sequence typing databases. [J Med Microbiol. 2004 Jan;53(1):35-45]: Introduces a bioinformatics-driven, SNP-based approach to microbial genotyping that uses multilocus sequence typing (MLST) databases that consist of known variants of standardized housekeeping genes. The approach uses a computer program that can identify highly informative sets of SNPs in the entire MLST database.

Schuster-Bockler B, et al. HMM Logos for visualization of protein families. [BMC Bioinformatics 2004, 5:7]: Describes a visualization method that uses both emission and transmission probabilities of profile hidden Markov models (pHMMs) for protein family research. Availability:

Tu Q. MedBlast: searching articles related to a biological sequence. [Bioinformatics 2004 20(1): 75-77]: Discusses a literature-mining tool called MedBlast, which uses natural language processing techniques to retrieve articles related to a given biological sequence. Availability:

Van Vlijmen H, et al. A novel database of disulfide patterns and its application to the discovery of distantly related homologs. [J Mol Biol. 2004 Jan 23;335(4):1083-1092]: Describes a comprehensive database of disulfide bonding patterns and a search method to find proteins with similar disulfide patterns. The disulfide database was constructed using disulfide annotations extracted from SwissProt, and was expanded significantly from 16,736 to 94,499 disulfide-containing domains by an inference method that combines SwissProt annotations with Pfam multiple alignments.

Wang J, et al. PARIS: a proteomic analysis and resources indexation system. [Bioinformatics 2004 20(1): 133-135]: A system for managing data from two-dimensional electrophoresis-based proteomics experiments. Availability:


Filed under

The Scan

Shape of Them All

According to BBC News, researchers have developed a protein structure database that includes much of the human proteome.

For Flu and More

The Wall Street Journal reports that several vaccine developers are working on mRNA-based vaccines for influenza.

To Boost Women

China's Ministry of Science and Technology aims to boost the number of female researchers through a new policy, reports the South China Morning Post.

Science Papers Describe Approach to Predict Chemotherapeutic Response, Role of Transcriptional Noise

In Science this week: neural network to predict chemotherapeutic response in cancer patients, and more.