Andronescu M, Bereg V, Hoos HH, Condon A. RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database. [BMC Bioinformatics 2008, 9:340]: Describes the RNA secondary structure and statistical analysis database, or RNA STRAND, a curated database of known secondary structures of “any type and organism,” according to the paper’s abstract. Available here.
Cui J, Liu Q, Puett D, Xu Y. Computational Prediction of Human Proteins That Can Be Secreted into the Bloodstream. [Bioinformatics. 2008 Aug 12. (e-pub ahead of print)]: Presents a computational method for predicting which proteins from highly expressed genes in diseased human tissues, such as cancers, can be secreted into the bloodstream, “suggesting possible marker proteins for follow-up serum proteomic studies,” according to the paper’s abstract. The authors collected, through literature searches, human proteins that are known to be secreted into the bloodstream due to various pathological conditions as detected by previous proteomic studies and then identified a list of features such as signal peptides, transmembrane domains, glycosylation sites, disordered regions, secondary structural content, hydrophobicity, and polarity measures that show relevance to protein secretion. Using these features, they trained a support vector machine-based classifier to predict protein secretion to the bloodstream. Available here.
De Bona F, Ossowski S, Schneeberger K, Rätsch G. Optimal spliced alignments of short sequence reads. [Bioinformatics 2008 24(16):i174-i180]: Describes an approach called QPALMA that computes accurate spliced alignments of short reads from next-generation sequencing platforms. The method uses the read's quality information as well as computational splice site predictions. It uses a training set of spliced reads with quality information and known alignments and a “large margin approach similar to support vector machines to estimate its parameters to maximize alignment accuracy,” according to the paper’s abstract.
Filangi O, Beausse Y, Assi A, Legrand L, Larré JM, Martin V, Collin O, Caron C, Leroy H, Allouche D. BioMAJ: a flexible framework for databanks synchronization and processing. [Bioinformatics 2008 24(16):1823-1825]: Describes BioMAJ, an automated environment for managing biological databases. BioMAJ is a Java application that automates the data update cycle process and supervises locally mirrored data repositories. Available here.
Ganesan H, Rakitianskaia AS, Davenport CF, Tümmler B, Reva ON. The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage. [BMC Bioinformatics 2008, 9:333]: Introduces SeqWord Genome Browser, an applet that can be used to visualize the natural compositional variation of DNA sequences. It can also be used to identify divergent genomic regions in annotated sequences of bacterial chromosomes, plasmids, phages, and viruses, as well as in raw DNA sequences prior to annotation, by comparing local and global oligonucleotide usage patterns. Available here.
Ji S, Sun L, Jin R, Kumar S, Ye J. Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. [Bioinformatics 2008 24(17):1881-1888]: Introduces a computational framework for automatically annotating gene expression patterns in in situ hybridization data using a controlled vocabulary. In currently available high-throughput data, “annotation terms are assigned to groups of patterns rather than to individual images,” the authors note in the abstract. “We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns.”
Jiang H, Wong WH. SeqMap: mapping massive amounts of oligonucleotides to the genome. [Bioinformatics. 2008 Aug 12. (e-pub ahead of print)]: Introduces SeqMap, a software tool for mapping large amounts of short sequences to the genome that is “designed for finding all the places in a reference genome where each sequence may come from,” according to the paper’s abstract. The authors claim that SeqMap can map tens of millions of short sequences to a genome of several billions of nucleotides. Available here.
Lehmann J, Stadler PF, Prohaska SJ. SynBlast: assisting the analysis of conserved synteny information. [BMC Bioinformatics 2008, 9:351]: Describes SynBlast, a computational pipeline that constructs and evaluates local synteny information by using the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and then rank them based on the available evidence for homology. Available here.
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. [Genome Res. 2008 Aug 19. (e-pub ahead of print)]: Discusses the software MAQ, which assembles genomes by mapping short reads from next-generation sequencing platforms to a reference genome. The method is based on the concept of “mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm,” according to the paper’s abstract. Available here.
Marchisio MA, Stelling J. Computational design of synthetic gene circuits with composable parts. [Bioinformatics 2008 24(17):1903-1910]: Describes a method for designing genetic circuits with composable parts that is based on concepts in the MIT Registry of Standard Biological Parts. The authors note in the paper’s abstract that gene expression requires four kinds of signal carriers: RNA polymerases, ribosomes, transcription factors and environmental “messages,” which are inducers or corepressors. “The flux of each of these types of molecules is a quantifiable biological signal exchanged between parts,” which can be modeled using ordinary differential equations. According to the abstract, these ODEs are integrated into a software tool called ProMoT, or Process Modeling Tool. Available upon request: [email protected].
Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Banyai L, Patthy L. Identification and correction of abnormal, incomplete, and mispredicted proteins in public databases. [BMC Bioinformatics 2008, 9:353]: Presents an approach called MisPred that can be used to identify incomplete, abnormal, or mispredicted entries in publicly available biological databases. MisPred uses five routines for these entries “based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins,” according to the paper’s abstract.
Samal BB, Eiden LE. pathFinder: a static network analysis tool for pharmacological analysis of signal transduction pathways. [Sci Signal. 2008 Aug 5;1(31):pt4]: Discusses pathfinder, a software tool that can find signal transduction pathways between a number of messengers and their targets within the cell. The software “can identify qualitatively all possible signal transduction pathways connecting any starting component and target within a database of two-component pathways,” according to the abstract. Available here.
Zhang F, Liu J, Chen J, Deng HW. HAPSIMU: a genetic simulation platform for population-based association studies. [BMC Bioinformatics 2008, 9:33]: Describes a genetic simulation platform called HAPSIMU that can simulate heterogeneous populations with various known and controllable structures in order to evaluate the impact of population structure on population-based association studies of human diseases and to compare the performance of various population structure identification methods. HAPSIMU can simulate both qualitative and quantitative traits using an additive genetic model, according to the paper’s abstract.