Bioinformatics Tool-Related Papers of Note, January 2008

Christley S, Lobo NF, Madey G. Multiple organism algorithm for finding ultraconserved elements. [BMC Bioinformatics 2008, 9:15]: Introduces an algorithm that can find all of the ultraconserved sequences between genomes of multiple organisms. The authors define ultraconserved elements as nucleotide or protein sequences with 100 percent identity in the same organism or between two or more organisms. The algorithm uses a combinatorial approach “that finds all sequences without requiring the genomes to be aligned,” the authors note. “The algorithm is significantly faster than Blast and is designed to handle very large genomes efficiently.” Available as part of the BioCocoa library here.

Doring A, Weese D, Rausch T, Reinert K. SeqAn - An efficient, generic C++ library for sequence analysis. [BMC Bioinformatics 2008, 9:11]: Introduces SeqAn, a library of data types and algorithms for sequence analysis in computational biology. SeqAn includes implementations of algorithmic components in order to provide “a sound basis for algorithm testing and development,” according to the paper’s abstract. SeqAn is available here.

Evans TW, Gillespie CS, Wilkinson DJ. The SBML discrete stochastic models test suite. [Bioinformatics 2008 24(2):285-286]: Introduces a test suite of stochastic models that can be used to check the accuracy of a stochastic simulator. The suite includes stochastic models that have been solved either analytically or by using numerical methods, which allows the accuracy of simulators to be tested against known results. The test suite is available here.

Milenkovic T, Lai J, Przulj N. GraphCrunch: a tool for large network analyses. [BMC Bioinformatics 2008, 9:70]: Describes GraphCrunch, a software tool for analyzing large biological networks and comparing them against random graph models according to various network structural similarity measures. The software computes several standard global network measures “and thus supports the largest variety note that GraphCrunch is also “the first software tool that compares real-world networks against a series of network models.” GraphCrunch is available here.

Nagarajan N, Keich U. FAST: Fourier transform based algorithms for significance testing of ungapped multiple alignments. [Bioinformatics. 2008 Jan 6 (e-pub ahead of print)]: Describes the FAST (Fourier transform based Algorithms for Significance Testing) package, an open-source collection of programs and libraries for computing the significance of ungapped local alignments. The package includes C++ implementations of various algorithms that can be used as stand-alone programs or as a library of subroutines. FAST is available here.

Stanislaus R, Arthur JM, Rajagopalan B, Moerschell R, McGlothlen B, Almeida JS. An open-source representation for 2-DE-centric proteomics and support infrastructure for data storage and analysis. [BMC Bioinformatics 2008, 9:4]: Discusses a data standard for two-dimensional gel electrophoresis called AGML (Annotated Gel Markup Language). The authors describe a public repository, called AGML Central, which includes a suite of tools for converting a variety of formats, as well as web-based visualization tools. AGML Central is available here.

Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. [Bioinformatics. 2008 Jan 24 (e-pub ahead of print)]: Describes Augustus, a program that predicts genes in eukaryotic genomic sequences using several different evidence sources, including gene and transcript annotations from related species syntenically mapped to the target genome; evolutionary conservation of DNA, mRNA, and ESTs of the target species; and retroposed genes. Augustus is available here.

Teyra J, Paszkowski-Rogacz M, Anders G, Pisabarro MT. SCOWLP classification: Structural comparison and analysis of protein binding regions. [BMC Bioinformatics 2008, 9:9]: Introduces a classification system for protein-binding regions that is based on agglomerative hierarchical clustering. “We use an accurate similarity index to compare binding regions in combination with the complete-linkage method to aggregate clusters,” the authors note in the paper’s abstract. “Complete-linkage is more suitable to isolate poorly separated binding regions than other standard methods because it increases the differences among clusters.” This classification system is implemented in the SCOWLP (Structural Characterization of Water, Ligands and Proteins) database and extends the SCOP classification with three additional family sub-levels: Binding Region, Interface, and Domain Contact. The SCOWLP web application is available here.

Vareková RS, Bradác I, Plchút M, et al. A new program for analyzing RNA Interference. [Comput Methods Programs Biomed. 2008 Jan 18 (e-pub ahead of print)]: Describes a new software tool, called RNA Workbench, for designing short interfering RNAs. In addition to “standard selection rules,” the software enables researchers to statistically analyze applied selection rules, the authors note in the paper’s abstract. RNA Workbench is available here.

White JR, Roberts M, Yorke JA, Pop M. Figaro: A novel statistical method for vector sequence removal. [Bioinformatics. 2008 Jan 17 (e-pub ahead of print)]: Describes Figaro, a software tool for identifying and removing the vector sequence from raw sequence data without prior knowledge of the vector sequence. The authors note in the paper’s abstract that sequences from Sanger sequencing machines frequently contain fragments of the cloning vector on their ends, but current software tools for identifying and removing the vector sequence “require knowledge of the vector sequence, specific splice sites, and any adapter sequences used in the experiment — information often omitted from public databases.” Figaro automatically infers the vector sequence by analyzing the frequency of occurrence of short oligo-nucleotides using Poisson statistics. Figaro is available as part of the AMOS package here.

