Bioinformatics Tool-Related Papers of Note, June 2007

Andrusier N, Nussinov R, Wolfson HJ. FireDock: Fast interaction refinement in molecular docking. [Proteins. 2007 Jun 27; (e-pub ahead of print)]: Describes FireDock, a method for refining and rescoring rigid-body docking solutions. The refinement process includes two main steps: rearrangement of the interface side chains and adjustment of the relative orientation of the molecules. According to the authors, “FireDock's prediction results are comparable to current state-of-the-art refinement methods while its running time is significantly lower.” Availability:

Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. [Nucleic Acids Research 2007 35(11):3823-3835]: Describes SNAP (screening for non-acceptable polymorphisms), a neural network-based method for predicting the functional effects of non-synonymous SNPs. SNAP takes sequence information as input, as well as functional and structural annotations. According to the authors, in a cross-validation test on more than 80,000 mutants, SNAP identified 80 percent of the non-neutral substitutions at 77-percent accuracy and 76 percent of the neutral substitutions at 80 percent accuracy. “This constituted an important improvement over other methods,” according to the paper’s abstract. Availability:

Coghlan A, Durbin R. Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron–exon structure. [Bioinformatics 2007 23(12):1468-1475]: Describes Genomix, a method for combining gene-finders that selects the predicted exons that are best conserved within or between species in terms of sequence and intron–exon structure, and combines them into a gene structure. According to the authors, on a set of 1,500 confirmed C. elegans genes, Genomix increased the exon-level specificity by 10.1 percent and sensitivity by 2.7 percent compared to the best input gene-finder. Availability:

Day A, Carlson MR, Dong J, O'connor BD, Nelson SF. Celsius: a community resource for Affymetrix microarray data. [Genome Biology 2007, 8:R112]: Presents Celsius, a data warehousing system for aggregating Affymetrix CEL files and associated metadata. Celsius contains ten billion assay measurements and affiliated metadata. According to the paper’s authors, the resource is “the largest publicly available source of Affymetrix microarray data, and through sheer volume enables a sophisticated, broad view of transcription that has previously not been possible.”

Englebienne P, Fiaux H, Kuntz DA, Corbeil CR, Gerber-Lemaire S, Rose DR, Moitessier N. Evaluation of docking programs for predicting binding of Golgi alpha-mannosidase II inhibitors: A comparison with crystallography. [Proteins. 2007 Jun 7; (e-pub ahead of print)]: Describes a study that evaluated seven docking programs (GOLD, Glide, FlexX, AutoDock, eHiTS, LigandFit, and FITTED) using the structure of Drosophila melanogaster GMII complexed with three different inhibitors. “We found that small inhibitors could be accurately docked by most of the software, while docking of larger compounds (i.e., those with extended aromatic cycles or long aliphatic chains) was more problematic,” the authors wrote, noting that Glide provided the best docking results overall.

Gordon PM, Sensen CW. Seahawk: moving beyond HTML in Web-based bioinformatics analysis. [BMC Bioinformatics 2007, 8:208]: Introduces Seahawk, a MOBY-S client that allows biologists to link together web services “using a data-centric, rather than the customary service-centric approach,” according to the paper’s abstract. The system uses an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements that import existing user data into the MOBY-S format.

Ivan C. Rankenburg, Veit Elser. Protein structure prediction by an iterative search method. (ArXiv preprint archive: Presents a new algorithm, called the difference map, for finding protein conformations that minimize a non-bonded energy function. The algorithm finds an atomic configuration that is simultaneously in two constraint spaces: The first constraint space is the space of atomic configurations that have a valid peptide geometry, while the second is the space of configurations that have a non-bonded energy below a given target.

Korbel JO, Urban AE, Grubert F, Du J, Royce TE, Starr P, Zhong G, Emanuel BS, Weissman SM, Snyder M, Gerstein MB. Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. [Proc Natl Acad Sci USA. 2007 Jun 12;104(24):10110-5]: Describes an approach, called BreakPtr, for fine-mapping copy number variations. The method statistically integrates sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Availability:

Li IT, Shum W, Truong K. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). [BMC Bioinformatics 2007, 8:185]: Describes a method for accelerating the Smith-Waterman algorithm using FPGA-based hardware that includes a module for computing the score of a single cell of the SW matrix. According to the authors, “these modifications dramatically accelerated the algorithms computation time by up to 160 fold compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor.”

Li Z, Srivastava S, Mittal S, Yang X, Sheng L, Chan C. A Three Stage Integrative Pathway Search (TIPS) framework to identify toxicity relevant genes and pathways. [BMC Bioinformatics 2007, 8:202]: Describes the Three Stage Integrative Pathway Search, or TIPS, approach to reconstruct active pathways involved in conferring a specific phenotype from a limited amount of perturbation data. According to the authors, TIPS can reconstruct active pathways that confer a particular phenotype by integrating gene expression and phenotypic profiles. Availability:

Lu LJ, Sboner A, Huang YJ, Lu HX, Gianoulis TA, Yip KY, Kim PM, Montelione GT, Gerstein MB. Comparing Classical Pathways and Modern Networks: Towards the Development of an Edge Ontology. [Trends Biochem Sci. 2007 Jul;32(7):310-321]: Introduces a prototype edge ontology for use in representing biological pathways in systems biology. According to the authors, “the current edge representation is inadequate to accurately convey all the information in pathways. Therefore, we suggest that a standardized, well-defined, edge ontology is necessary and propose a prototype here, as a starting point for reaching this goal.”

Michel R, Steinmeyer R, Falk M, Harms GS. A new detection algorithm for image analysis of single, fluorescence-labeled proteins in living cells. [Microsc Res Tech. 2007 Jun 7; (e-pub ahead of print)]: Describes a new algorithm for detecting single, fluorescence-labeled proteins in the analysis of images from living cells. The algorithm is “especially suited” for images with very few fluorescence peaks from individual proteins with high background and noise. The algorithm is implemented in Matlab.

Robbe Wunschiers and Martin Vellguth. OrfMapper: A Web-Based Application for Visualizing Gene Clusters on Metabolic Pathway Maps. (ArXivX preprint archive: Presents OrfMapper, a web-based database application for testing whether candidate gene-products are members of known metabolic processes. Availability: 

Salomonis N, Hanspers K, Zambon AC, Vranizan K, Lawlor SC, Dahlquist KD, Doniger SW, Stuart JM, Conklin BR, Pico AR. GenMAPP 2: New Features and Resources for Pathway Analysis. [BMC Bioinformatics 2007, 8:217]: Presents version 2 of Gene Map Annotator and Pathway Profiler, or GenMAPP, which includes a new GenMAPP database schema and integrated resources for pathway analysis. The GenMAPP database has been redesigned to support multiple gene annotations and species as well as custom species database creation for a “potentially unlimited number of species,” according to the authors.

Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O. SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. [Bioinformatics 2007 23(11):1410-1417]: Describes SherLoc, a system for predicting the localization of eukaryotic proteins. SherLoc uses support vector machines to select text sources and features and integrates those with sequence-based features. Availability: 

Toedling J, Sklyar O, Huber W. Ringo — an R/Bioconductor package for analyzing ChIP-chip readouts. [BMC Bioinformatics 2007, 8:221]: Presents Ringo, a free, open-source R package for analyzing ChIP-chip data.

Weniger M, Engelmann JC, Schultz J. Genome Expression Pathway Analysis Tool - Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context. [BMC Bioinformatics 2007, 8:179]: Describes GEPAT, the Genome Expression Pathway Analysis Tool, which integrates statistical methods and data analysis with a biological interpretation for subsets of probes or single probes on the chip. Availability:

Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP, Purkayastha A, Sobral B, Setubal JC. A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case. [Nucleic Acids Research 2007 35(12):3953-3962]: Describes a bacterial genome computational-analysis pipeline called GenVar. The pipeline is based on the GeneWise program and is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames and those with insertions and deletions.

