Note: In addition to the below listing, papers for Nucleic Acids Research’s annual database issue, which will be published in January, are available under advance access here.
Cheng H, Sen TZ, Jernigan RL, Kloczkowski A. Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM). [Bioinformatics 2007 23(19):2628-2630]: Describes an approach to overcome the 80 percent prediction accuracy barrier in protein secondary structure prediction. Instead of using a single algorithm that relies on a limited data set for training, the method combines two complementary methods with different strengths: Fragment Database Mining and the Garnier-Osguthorpe-Robson V algorithm. FDM uses known protein structures in the Protein Data Bank to predict secondary structure for sequentially similar structural fragments, while GOR V is based on information theory, Bayesian statistics, and PSI-Blast multiple sequence alignments. Availability: http://gor.bb.iastate.edu/cdm.
Edwards RJ, Davey NE, Shields DC. SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins. [PLoS ONE. 2007 Oct 3;2(10):e967]: Discusses SLiMFinder, a software package for analyzing short linear motifs, or SLiMs, in proteins. SLiMFinder includes two algorithms: SLiMBuild, which identifies convergently evolved, short motifs in a dataset of proteins; and SLiMChance, which estimates the probability of returned motifs arising by chance, corrects for the size and composition of the dataset, and assigns a significance value to each motif. Availability: http://bioinformatics.ucd.ie/shields/software/slimfinder/.
Eppley JM, Tyson GW, Getz WM, Banfield JF. Strainer: Software for analysis of population variation in community genomic datasets. [BMC Bioinformatics. 2007 Oct 17;8(1):398]: Presents Strainer, a software package for analyzing and visualizing genetic variation in populations and reconstructing strain variants from otherwise co-assembled sequences.
Käll L, Canterbury JD, Weston J, Noble WS, Maccoss MJ. Semi-supervised learning for peptide identification from shotgun proteomics datasets. [Nat Methods. 2007 Nov;4(11):923-5]: Describes an algorithm, called Percolator, for improving peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications. According to the authors, it can correctly assign peptides to 17 percent more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77 percent more spectra from non-tryptic digests, compared to a fully supervised approach.
Kreuz M, Rosolowski M, Berger H, Schwaenen C, Wessendorf S, Loeffler M, Hasenclever D. Development and Implementation of an Analysis Tool for Array-based Comparative Genomic Hybridization. [Methods Inf Med. 2007;46(5):608-13]: Introduces aCGHPipeline, an R-based analysis pipeline for array comparative genomic hybridization data that supports single- and multi-chip analyses.
Kuentzer J, Backes C, Blum T, Gerasch A, Kaufmann M, Kohlbacher O, Lenhof HP. BNDB — The Biochemical Network Database. [BMC Bioinformatics. 2007 Oct 2;8(1):367]: Introduces the Biochemical Network Database, a relational database platform for the semantic integration of external databases. BNDB is built upon an extensible object model called BioCore, which can model “most known biochemical processes” and is easily extensible to new biological concepts, according to the paper’s abstract. Availability: http://www.bndb.org.
Lombardot T, Kottmann R, Giuliani G, de Bono A, Addor N, Glockner FO. MetaLook: a 3D visualisation software for marine ecological genomics. [BMC Bioinformatics. 2007 Oct 22;8(1):406]: Introduces MetaLook, a software package for visualizing and analyzing marine ecological genomic and metagenomic data. MetaLook includes a 3D user interface for visualizing DNA sequences on a world map, based on a centralized geo-referenced database. Availability: http://www.megx.net/metalook/index.php?navi=documentation.
Nagarajan V, Elasri MO. SAMMD: Staphylococcus aureus Microarray Meta-Database. [BMC Genomics. 2007 Oct 2;8(1):351]: Discusses the Staphylococcus aureus Microarray meta-database, or SAMMD, which includes data from all the published transcriptional profiles for S. aureus. SAMMD helps researchers perform comparative studies of transcriptional profiles and allows users to use ORF IDs to search for all the regulatory mutants or growth conditions in which the query gene's expression is altered. Availability: http://www.bioinformatics.org/sammd/.
Negi SS, Schein CH, Oezguen N, Power TD, Braun W. InterProSurf: a web server for predicting interacting sites on protein surfaces. [Bioinformatics. 2007 Oct 12; (e-pub ahead of print)]: Describes a web server called InterProSurf, which predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex. The prediction method is based on the solvent-accessible surface area of residues in the isolated subunits, a propensity scale for interface residues, and a clustering algorithm to identify surface regions with residues of high interface propensities, according to the paper’s abstract. Availability: http://curie.utmb.edu/.
Seelow D, Hoffmann K, Lindner TH. AssociationDB: web-based exploration of genomic association. [Bioinformatics 2007 23(19):2643-2644]:Describes a graphical, web-based system called AssociationDB for analyzing the results of case-control studies alongside related gene information and tissue-specific expression data. Association results are presented as physical position-based vertical bars with known genes included as horizontal bars at their respective physical positions. Availability: http://genetik.charite.de/AssociationDB.
Thibaud-Nissen F, Campbell M, Hamilton JP, Zhu W, Buell CR. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome. [BMC Genomics. 2007 Oct 25;8(1):388]: Discusses the Eukaryotic Community Annotation Package, or EuCAP, an annotation tool that has been applied to the rice genome. Availability: http://sourceforge.net/projects/eucap/.