Arita M, Suwa K. Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database. [BioData Min. 2008 Sep 17;1(1):7]: Describes a set of embeddable string-search commands for Wiki-based systems that can enabled unstructured text entries to have some of the structure of a relational database. As proof of principle, the authors implemented a flavonoid database with 6,902 molecular structures from more than 1,687 plant species on MediaWiki, the background system of Wikipedia. Registered users can describe information in an arbitrary format, but the structured elements are “subject to text-string searches to realize relational operations,” according to the paper’s abstract.
Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: A Feature Density Estimator for High-Throughput Sequence Tags. [Bioinformatics. 2008 Sep 10. (e-pub ahead of print)]: Describes F-Seq, a software package that generates a continuous tag sequence density estimation for high-throughput sequencing that enables users to identify specific sequence features such as transcription factor binding sites or regions of open chromatin. Available here.
Brohée S, Faust K, Lima-Mendez G, Vanderstocken G, van Helden J. Network Analysis Tools: from biological networks to clusters and pathways. [Nat Protoc. 2008;3(10):1616-29]:DiscussesNetwork Analysis Tools, or NeAT, a suite of tools for analyzing biological networks, including tasks such as comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering; and path finding. The tools are “interconnected to enable a stepwise analysis of the network through a complete analytical workflow,” the authors write in the paper’s abstract. Available here.
Creighton CJ, Nagaraja AK, Hanash SM, Matzuk MM, Gunaratne PH. A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions. [RNA. 2008 Sep 23.]: Presents a desktop software application that, for a given target prediction database, retrieves all microRNA:mRNA functional pairs represented by an experimentally derived set of genes. For each microRNA, the software computes an enrichment statistic for overrepresentation of predicted targets within the gene set, “which could help to implicate roles for specific microRNAs and microRNA-regulated genes in the system under study,” according to the authors. Available here.
Jacob L, Hoffmann B, Stoven V, Vert JP. Virtual screening of GPCRs: an in silico chemogenomics approach. [BMC Bioinformatics 2008, 9:363]: Introduces new methods for in silico chemogenomics that represent “an extension” of a recently proposed machine learning strategy, according to the paper’s abstract. The authors show that incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of the model. They also show that interaction prediction in the chemogenomics framework “outperforms state-of-the-art individual ligand-based methods in accuracy both for receptor with known ligands and without known ligands.”
Meyer F, Paarmann D, D'Souza M, Olson RD, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. The metagenomics RAST server — a public resource for the automatic phylogenetic and functional analysis of metagenomes. [BMC Bioinformatics 2008, 9:386]: Describes an annotation pipeline that produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. The open source service, called RAST (rapid annotation using subsystem technology) generates phylogenetic and functional summaries of metagenomes and offers tools for comparative metagenomics. According to the authors, the service removes “one of the primary bottlenecks in metagenome sequence analysis — the availability of high-performance computing for annotating the data.” Available here.
Neuweger H, Albaum SP, Dondrup M, Persicke M, Watt T, Niehaus K, Stoye J, Goesmann A. MeltDB: A software platform for the analysis and integration of metabolomics experiment data. [Bioinformatics. 2008 Sep 2. (e-pub ahead of print)]: Presents MeltDB, a web-based software platform for storing, analyzing, annotating, and integrating data from metabolomics experiments. MeltDB supports the netCDF, mzXML, and mzDATA file formats. Available here.
Ohtsubo Y, Ikeda-Ohtsubo W, Nagata Y, Tsuda M. GenomeMatcher: A graphical user interface for DNA sequence comparison. [BMC Bioinformatics. 2008 Sep 16;9(1):376]: Presents GenomeMatcher, a software package for Mac OS X that executes Blast and MUMmer and displays detected similarities in two-dimensional and parallel views with similarity values indicated by color. Available here.
Rabani M, Kertesz M, Segal E. Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes. [Proc Natl Acad Sci USA. 2008 Sep 30;105(39):14885-90]: Presents RNApromo, a computational tool for identifying structural elements within mRNAs that are involved in specifying post-transcriptional regulations. Available here.
Ranzinger R, Herget S, Wetter T, von der Lieth CW. GlycomeDB — integration of open-access carbohydrate structure databases. [BMC Bioinformatics 2008, 9:384]: Describes GlycomeDB, a database of carbohydrate structures gathered from the seven major databases, including glycosceinces.de, the Consortium for Functional Glycomics, the Kyoto Encyclopedia of Genes and Genomes, and the Bacterial Carbohydrate Structure Database. The authors imported more than 100,000 datasets, resulting in more than 33,000 “unique sequences” that are in GlycomeDB. “Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators,” the authors note. Available here.
Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. [Genome Res. 2008 Aug 29]: Presents a new ab initio algorithm that identifies protein-coding genes in fungal genomes. The algorithm does not require a pre-determined training set to estimate parameters of the underlying hidden Markov model, but instead uses the anonymous genomic sequence in question as an input for iterative unsupervised training. Available here. (See last week’s issue of BioInform for an interview with one of the paper’s co-authors, Mark Borodovsky)
Vera G, Jansen RC, Suppi RL. R/parallel — speeding up bioinformatics analysis with R. [BMC Bioinformatics 2008, 9:390]: Presents an add-on package for the statistical language R, called R/parallel, which “extends R by adding user-friendly parallel computing capabilities,” according to the paper’s abstract. Because R does not support parallel computation, several tools have been developed to modify the way R programs are written or run. “Although these tools can finally speed up the calculations, the time, skills, and additional resources required to use them are an obstacle for most bioinformaticians,” the authors write. With R/parallel, “any bioinformatician can now easily automate the parallel execution of loops and benefit from the multicore processor power of today's desktop computers.” Available here.
Wirawan A, Kwoh CK, Nim TH, Schmidt B. CBESW: Sequence Alignment on the Playstation 3. [BMC Bioinformatics 2008, 9:377]: Demonstrates how the PlayStation 3, based on the Cell Broadband Engine processor, can be used to accelerate the Smith-Waterman algorithm. “For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman, and CUDA,” according to the paper’s abstract. The implementation described in the paper achieves a peak performance of up to 3,646 MCUPS (million cell updates per second). Available here.
Xiong B, Liu K, Wu J, Burk DL, Jiang H, Shen J. DrugViz: a Cytoscape plugin for visualizing and analyzing small molecule drugs in biological networks. [Bioinformatics. 2008 Sep 15;24(18):2117-8]: Introduces DrugViz, a Cytoscape plugin designed to visualize and analyze small molecules “within the framework of the interactome,” according to the paper’s abstract. DrugViz can import drug-target network information in an extended SIF file format to Cytoscape, display the 2D structures of small-molecule nodes, and identify small-molecule nodes via three different 2D structure searching methods: isomorphism, substructure, and fingerprint-based similarity searches. Available here.
Zhu Y, Li H, Miller DJ, Wang Z, Xuan J, Clarke R, Hoffman EP, Wang Y. caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data. [BMC Bioinformatics 2008, 9:383]: Introduces the Visual Statistical Data Analyzer, or VISDA, a software tool for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine hierarchical clustering, and visualization, “supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data,” according to the paper’s abstract. Available here.
Zimmermann O, Hansmann UH. LOCUSTRA: Accurate Prediction of Local Protein Structure Using a Two-Layer Support Vector Machine Approach. [J Chem Inf Model. 2008 Sep 3.]: Describes an approach for predicting local protein structure that uses two layers of support vector machines. The authors test the method’s prediction ability for a test set of 222 proteins and compare the method to three-class secondary structure prediction and direct prediction of dihedral angles and note that the results “compare favorably with related approaches.” Available here.