Note: In addition to the below listing, papers for Nucleic Acids Research’s annual database issue are available here.
Bryan K, Brennan L, Cunningham P. MetaFIND: A feature analysis tool for metabolomics data. [BMC Bioinformatics. 2008 Nov 5;9(1):470]:Describes Metabolomics Feature Interrogation and Discovery, or MetaFIND, an application for “post-feature selection” correlation analysis of metabolomics data. According to the paper’s abstract, factors such as experimental noise, choice of technique, and threshold selection “may adversely affect the set of selected features retrieved” in metabolomics studies, and “the high dimensionality and multi-collinearity inherent within metabolomics data may exacerbate discrepancies between the set of features retrieved and those required to provide a complete explanation of metabolite signatures.” In the paper, the authors demonstrate how MetaFIND can be used to determine metabolite signatures from a set of features selected by diverse techniques over two metabolomics data sets. Available here.
Chiang DY, Getz G, Jaffe DB, O'Kelly MJ, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES. High-resolution mapping of copy-number alterations with massively parallel sequencing. [Nat Methods. 2008 Nov 30 (e-pub ahead of print)]: Presents SegSeq, a method for detecting copy-number alterations in massively parallel sequencing data. SegSeq can also estimate the breakpoints of copy number alterations.
Christley S, Lu Y, Li C, Xie X. Human genomes as email attachments. [Bioinformatics. 2008 Nov 7 (e-pub ahead of print]: Discusses the use of advanced compression techniques “as part of a standard data format for genomic data,” according to the paper’s abstract. “The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs,” the authors write. In the paper, they describe several techniques that, in combination, reduce a single genome to a size small enough to be sent as an email attachment. Available here.
Huntley D, Tang YM, Nesterova TB, Butcher S, Brockdorff N. Genome Environment Browser (GEB): a dynamic browser for visualising high-throughput experimental data in the context of genome features. [BMC Bioinformatics. 2008 Nov 27;9(1):501]: Describes the Genome Environment Browser, or GEB, a browser for visualizing data from large-scale, high-throughput experiments within the context of repeat sequence features in the genome. The browser includes dynamic scales that are adjustable in real time, “which enables scanning of large regions of the genome as well as detailed investigation of local regions on the same page without the need to load new pages,” according to the paper’s abstract. Available here.
Jensen JH, Hoeg-Jensen T, Padkjær SB. Building a BioChemformatics Database. [J Chem Inf Model. 2008 Nov 21 (e-pub ahead of print)]: Describes an approach for creating a “biocheminformatics database” that includes both sequence and chemistry information. According to the authors, the registration and search of chemically modified macromolecules “has so far posed formidable challenges performance-wise, since today's chemistry-oriented databases do not scale well to macromolecules.” In response, they developed protein format extensions and used pseudoatoms for representing natural amino acids in chemical structures in order to enable the registration and retrieval of large macromolecules.
Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH. An integrated software system for analyzing ChIP-chip and ChIP-seq data. [Nat Biotechnol. 2008 Nov;26(11):1293-300]:Introduces CisGenome, a software system for analyzing genome-wide chromatin immunoprecipitation data. CisGenome performs visualization, data normalization, peak detection, false discovery rate computation, gene-peak association, and sequence and motif analysis for both ChIP-chip and ChIP-seq studies. Available here.
Jones TR, Kang IH, Wheeler DB, Lindquist RA, Papallo A, Sabatini DM, Golland P, Carpenter AE. CellProfiler Analyst: data exploration and analysis software for complex image-based screens. [BMC Bioinformatics 2008, 9:482]: Describes CellProfiler Analyst, an open-source software package for analyzing multidimensional data from image-based screens, which can produce hundreds of features for hundreds of millions of individual cells in a single experiment. The system offers automated scoring for complex phenotypes that require combinations of multiple measured features per cell. Available here.
Mann B, Madera M, Sheng Q, Tang H, Mechref Y, Novotny MV. ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics. [Rapid Commun Mass Spectrom. 2008 Nov 4;22(23):3823-3834]: Describes ProteinQuant Suite, a set of tools for evaluating and quantifying high-throughput label-free quantitative proteomic data. The suite includes three standalone utilities: ProtParser, ProteinQuant, and Turbo RAW2MGF. ProtParser is a filtering utility; ProteinQuant performs quantification; and Turbo RAW2MGF enables ProteinQuant Suite to collect data from different types of mass spectrometers.
Ondov BD, Varadarajan A, Passalacqua KD, Bergman NH. Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications. [Bioinformatics. 2008 Dec 1;24(23):2776-7]:Describes SOCS, or short oligonucleotide color space, a program for mapping data from the Applied Biosystems SOLiD sequencer onto a reference genome. SOCS performs mapping within the context of color space, and “maximizes usable data by allowing a user-specified number of mismatches,” according to the paper’s abstract. Available here.
Pavlopoulos GA, O'Donoghue SI, Satagopam VP, Soldatos TG, Pafilis E, Schneider R. Arena3D: visualization of biological networks in 3D. [BMC Syst Biol. 2008 Nov 28;2(1):104]:Introduces a visualization tool called Arena3D, which uses the concept of “staggered layers” to represent biological networks in three-dimensional space. Related data, such as proteins, chemicals, or pathways, can be grouped onto separate layers and arranged via layout algorithms, such as Fruchterman-Reingold, distance geometry, and a novel hierarchical layout, according to the paper’s abstract. Data on a layer can be clustered via k-means, affinity propagation, Markov clustering, neighbor joining, hierarchical clustering, or UPGMA (unweighted pair-group method with arithmetic mean). Available here.
Reynolds C, Damerell D, Jones S. ProtorP: A Protein-Protein Interaction Analysis Server. [Bioinformatics. 2008 Nov 11 (e-pub ahead of print)]: Discusses PROTORP, a web server that analyses protein-protein associations in 3D structures. The server calculates a series of physical and chemical parameters — such as size and shape, intermolecular bonding, residue and atom composition, and secondary structure contributions — of the protein interaction sites that contribute to the binding energy of the association. The properties that are calculated can be compared with parameter distributions for datasets of different classes of protein associations. Available here.
Starcevic A, Zucko J, Simunkovic J, Long PF, Cullum J, Hranueli D. ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structure. [Nucleic Acids Res. 2008 Dec;36(21):6882-92]:PresentsCluster Scanner, or ClustScan, a program for annotating DNA sequences encoding modular biosynthetic enzymes including polyketide synthases, non-ribosomal peptide synthetases, and hybrid enzymes. The program displays the predicted chemical structures of products and enables export of the structures in a standard format for analyses with other programs. Available here.