Skip to main content

In Print: Bioinformatics Tool-Related Papers of Note, May 2005


Atchley WR, Zhao J, Fernandes AD, Dr ke T. Solving the protein sequence metric problem. [Proc Natl Acad Sci USA 2005 May 3 102(18):6395-400]: Discusses the use of multivariate statistical analyses on almost 500 amino acid attributes to produce a small set of highly interpretable numeric patterns of amino acid variability. These high-dimensional attribute data are summarized by five multidimensional patterns of attribute covariation that reflect polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge.

Davis FP, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. [Bioinformatics 2005 21(9):1901-1907]: Introduces PIBASE, a relational database of structurally defined interfaces between pairs of protein domains extracted from structures in the PDB and the Probable Quaternary Structure server using domain assignments from the Structural Classification of Proteins and CATH fold classification systems. PIBASE currently contains 158,915 interacting domain pairs between 105,061 domains from 2,125 SCOP families. Availability:

Eilbeck K, Lewis S, Mungall C, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. [Genome Biology 2005, 6:R44]: Describes the Sequence Ontology (SO), a structured controlled vocabulary for the parts of a genomic annotation. SO provides a common set of terms and definitions to facilitate the exchange, analysis and management of genomic data.

Emonet T, Macal C, North MJ, Wickersham CE, Cluzel P. AgentCell: a digital single-cell assay for bacterial chemotaxis. [Bioinformatics 2005 21(11):2714-2721]: Presents AgentCell, a model using agent-based technology to study the relationship between stochastic intracellular processes and behavior of individual cells. As a test-bed for the approach, the authors model bacteria, in which each bacterium is an agent equipped with its own chemotaxis network, motors, and flagella, to prove that digital chemotaxis assays reproduce experimental data obtained from both single cells and bacterial populations. Availability: Available upon request from the authors.

Falkner J, Andrews P. Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined. [Bioinformatics 2005 21(10):2177-2184]: Describes an algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named "peptide finite state machine" (PFSM), which can be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. Availability:

Goldberg IG, Allan C, Burel JM, et al. The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging. [Genome Biol. 2005;6(5):R47]: Presents the Open Microscopy Environment (OME), which defines a data model and a software implementation to serve as an informatics framework for imaging in biological microscopy experiments. OME is designed to support high-content cell-based screening as well as traditional image analysis applications.

Hart CE, Sharenbroich L, Bornstein BJ, Trout D, King B, Mjolsness E, Wold BJ. A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data. [Nucleic Acids Research 2005 33(8):2580-2594]: Introduces a mathematical and computational framework to help quantify, compare, visualize, and interactively mine clusters of gene expression data. Availability:

Janga SC, Collado-Vides J, Moreno-Hagelsieb G. Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. [Nucleic Acids Research 2005 33(8):2521-2530]: Describes a system that can build networks of functional relationships of gene products based on their organization into operons in any available genome. The operon predictions are based on intergenic distances.

Jayaraj S, Reid R, Santi DV. GeMS: an advanced software package for designing synthetic genes. [Nucleic Acids Research 2005 33(9):3011-3016]: Presents a software package for gene design that comprises an integrated suite of programs that automatically performs the following tasks: restriction site prediction, codon optimization for any expression host, restriction site inclusion and exclusion, separation of long sequences into synthesizable fragments, Tm and stem-loop determinations, optimal oligonucleotide component design, and design verification/error-checking. The output is a complete design report and a list of optimized oligonucleotides to be prepared for subsequent gene synthesis.

Korbel J, Doerks T, Jensen L, et al. Systematic Association of Genes to Phenotypes by Genome and Literature Mining. [PLoS Biol. 2005 May;3(5):e134]: Presents an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis.

Liu M, Grigoriev A. Fast parsers for Entrez Gene. [Bioinformatics. 2005 May 6 (e-pub ahead of print)]: Describes four parsers developed to address the National Center for Biotechnology Information's transition from Locuslink to Entrez Gene. "Due to the widespread use of Locuslink and the popularity of Perl programming language in bioinformatics, a publicly available high performance Entrez Gene parser in Perl is urgently needed," the authors note. The fastest parser processes the entire human Entrez Gene annotation file in under 12 minutes on one Intel Xeon 2.4 GHz CPU. Availability:

Liu ZJ, Lin D, Tempel W, Praissman JL, Rose JP, Wang BC. Parameter-space screening: a powerful tool for high-throughput crystal structure determination. [Acta Crystallogr D Biol Crystallogr. 2005 May 1;61(Pt 5):520-527]: Describes a structural biology software tool that combines bioinformatics workflow-management techniques, cluster-based computing, and several crystallographic structure-determination software packages. Using the workflow manager, a researcher can set up hundreds of structure-determination jobs, each using a slightly different set of program input parameters, in order to screen parameter space for the set of parameters that leads to a successful structure determination.

Majoros WH, Pertea M, Salzberg SL. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. [Bioinformatics 2005 21(9):1782-1788]: Introduces an open source generalized pair hidden Markov model-based gene finder called TWAIN. When tested on two related Aspergillus species, A. fumigatus and A. nidulans, the program found 89 percent of the exons and predicted 74 percent of the gene models exactly in a test set of 147 conserved gene pairs. Availability:

Menten B, Pattyn F, De Preter K, et al. arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays. [BMC Bioinformatics 2005, 6:124]: Presents arrayCGHbase, an analysis platform for arrayCGH experiments that includes a MIAME-compliant database to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Availability:

Wilkinson M, Schoof H, Ernst R, Haase D. BioMOBY successfully integrates distributed heterogeneous bioinformatics web services. The PlaNet exemplar case. [Plant Physiol. 2005 May;138(1):5-17]: Presents an example in which web services developed as part of the BioMoby interoperability initiative were used to integrate the online plant genome databases and analytical services provided by a European consortium of databases and data service providers.

Wong JWH, Cagney G, Cartwright HM. SpecAlign-processing and alignment of mass spectra datasets. [Bioinformatics 2005 21(9):2088-2090]: Presents a graphical computational tool, SpecAlign, that enables simultaneous visualization and manipulation of multiple mass spectra datasets. SpecAlign implements an algorithm that enables the complete alignment of each mass spectrum within a loaded dataset. Availability:

Zhang D, Wells MT, Smart CD, Fry WE. Bayesian normalization and identification for differential gene expression data. [J Comput Biol. 2005 May;12(4):391-407]: Proposes a Bayesian framework for normalizing microarray data that incorporates measurement errors in both total intensities and differential expression ratios. The authors also describe a method for Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio.

Filed under

The Scan

Pfizer-BioNTech Seek Full Vaccine Approval

According to the New York Times, Pfizer and BioNTech are seeking full US Food and Drug Administration approval for their SARS-CoV-2 vaccine.

Viral Integration Study Critiqued

Science writes that a paper reporting that SARS-CoV-2 can occasionally integrate into the host genome is drawing criticism.

Giraffe Species Debate

The Scientist reports that a new analysis aiming to end the discussion of how many giraffe species there are has only continued it.

Science Papers Examine Factors Shaping SARS-CoV-2 Spread, Give Insight Into Bacterial Evolution

In Science this week: genomic analysis points to role of human behavior in SARS-CoV-2 spread, and more.