In Print: Aug 13, 2010


Bioinformatics Tool-Related Papers of Note, July 2010

Bare JC, Koide T, Reiss DJ, Tenenbaum D, Baliga NS. Integration and visualization of systems biology data in context of the genome. [BMC Bioinformatics. 2010 Jul 19;11(1):382]: Describes the Gaggle Genome Browser, a cross-platform desktop program for visualizing high-throughput genomic data in the context of the genome. The authors note that a "key aspect" of the browser is its interoperability with other bioinformatics tools in the Gaggle framework. Available here.

Carroll AJ, Badger MR, Harvey Millar A. The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets. [BMC Bioinformatics. 2010 Jul 14;11:376]: Describes MetabolomeExpress, an FTP server and web tool for storing, processing, visualizing, and analyzing publicly submitted GC/MS metabolomics datasets. Researchers can upload their own data to the server for online processing. Available here.

Di Tommaso P, Orobitg M, Guirado F, Cores F, Espinosa T, Notredame C. Cloud-Coffee: Implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud. [Bioinformatics. 2010 Aug 1;26(15):1903-4]: Introduces a parallel implementation of the TCoffee multiple aligner. The authors benchmark it on the Amazon Elastic Cloud and show that the parallelization procedure is "reasonably effective" and that the cloud "provides a cost effective alternative to in-house deployment." Available here.

Hach F, Hormozdiari F, Alkan C, Hormozdiari F, Birol I, Eichler EE, Sahinalp SC. mrsFAST: a cache-oblivious algorithm for short-read mapping. [Nat Methods. 2010 Aug;7(8):576-7]: Describes mrsFAST, or the micro-read (substitutions only) fast alignment and search tool, a short-read mapping algorithm that "rapidly finds all mapping locations of a collection of short reads from a donor genome in the reference genome within a user-specified number of mismatches through indexing both the reference genome and the short reads, and executing a simple cache-oblivious, all-to-all list comparison algorithm," according to the paper's abstract. Available here.

Hosseini P, Tremblay A, Matthews BF, Alkharouf NW. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets. [BMC Res Notes. 2010 Jul 2;3(1):183]: Describes TASE (Tag counting and Analysis of Solexa Experiments), a tag-counting and annotation software tool designed for data sets analyzed with Illumina's Casava SNP-calling software. "TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given Casava-build," according to the paper's abstract. Analysis comprises two steps: DNA sequence or read concatenation, followed by tag-counting and annotation. "The end result produces output containing the functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations." Available here.

Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed data sets. [Bioinformatics. 2010 Jul 17. (e-pub ahead of print)]: Discusses BigWig and BigBed, compressed binary indexed files containing data at several resolutions that enable the display of next-generation sequencing experiment results in the University of California, Santa Cruz, Genome Browser. Only the data needed to support the current browser view is transmitted, rather than the entire file, which enables fast remote access to large distributed data sets, according to the paper's abstract. Available here.

Li K, Stockwell TB. VariantClassifier: A hierarchical variant classifier for annotated genomes. [BMC Res Notes. 2010 Jul 13;3(1):191]: Discusses VariantClassifier, which receives a list of polymorphisms and genome annotation and then generates a hierarchically structured classification for each variant. Available here.

Li K, Venter E, Yooseph S, Stockwell TB, Eckerle LD, Denison MR, Spiro DJ, Methe BA. ANDES: Statistical tools for the ANalyses of DEep Sequencing. [BMC Res Notes. 2010 Jul 15;3(1):199]: Introduces ANDES (Analyses of Deep Sequencing), a software library and suite of applications for statistical analysis of sequencing data. According to the paper's abstract, the "fundamental data structure" underlying the software is the position profile, "which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment." Available here.

Mobilio D, Walker G, Brooijmans N, Nilakantan R, Denny RA, Dejoannis J, Feyfant E, Kowticwar RK, Mankala J, Palli S, Punyamantula S, Tatipally M, John RK, Humblet C. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses. [Chem Biol Drug Des. 2010 Aug;76(2):142-53]: According to the authors, it is sometimes difficult to locate relevant structures within the Protein Data Bank search interface, particularly when searching for complexes containing specific interactions between protein and ligand atoms, and searching within a family of proteins "can be tedious." In response, they have developed three databases that contain structures from the Protein Data Bank: Protein Relational Database, in which atom-atom distances between proteins and ligands have been precalculated, "allowing for millisecond retrieval based on atom identity and distance constraints;" and Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, in which catalytic domains have been aligned into common residue numbering schemes.

Naegle KM, Gymrek M, Joughin BA, Wagner JP, Welsch RE, Yaffe MB, Lauffenburger DA, White FM. PTMScout: A web resource for analysis of high-throughput post-translational proteomic studies. [Mol Cell Proteomics. 2010 Jul 14. (e-pub ahead of print)]: Introduces PTMScout, a web-based interface for viewing, manipulating, and analyzing high-throughput experimental measurements of post-translational modifications. PTMScout is based on a custom database of PTM experiments and contains information from external protein and post-translational resources, including Gene Ontology annotations, Pfam domains, and Scansite predictions of kinase and phosphopeptide binding domain interactions. It includes dataset comparison tools, dataset summary views, and tools for protein assignments of peptides identified by mass spectrometry. Available here.

Rialle S, Felicori L, Dias-Lopes C, Pérès S, El Atia S, Thierry AR, Amar P, Molina F. BioNetCAD: design, simulation and experimental validation of synthetic biochemical networks. [Bioinformatics. 2010 Jul 13. (e-pub ahead of print)]: Discusses BioNetCAD, a system that helps users construct synthetic biochemical networks based on three steps: design, simulation, and experimental validation. The authors demonstrate in a case study that BioNetCAD "can rationalize and reduce further experimental validation during the construction of a biochemical network." Available here.

Stehr H, Duarte JM, Lappe M, Bhak J, Bolser DM. PDBWiki: added value through community annotation of the Protein Data Bank. [Database. 2010 Jul 6;2010:baq009. Print 2010]: Describes PDBWiki, a scientific wiki for the community annotation of protein structures. The wiki consists of one structured page for each entry in the Protein Data Bank and allows users to attach categorized comments to the entries. Each page also includes a user-editable list of cross-references to external resources. Available here.

Weeding E, Houle J, Kaznessis YN. SynBioSS designer: a web-based tool for the automated generation of kinetic models for synthetic biological constructs. [Brief Bioinform. 2010 Jul;11(4):394-402]: Presents SynBioSS Designer, a component in the Synthetic Biology Software Suite that takes as input molecular parts involved in gene expression and regulation and automatically generates networks of reactions that represent transcription, translation, regulation, induction and degradation of those parts. Available here.

Wittig M, Helbig I, Schreiber S, Franke A. CNVineta: A data mining tool for large case-control copy number variation data sets. [Bioinformatics. 2010 Jul 6. (e-pub ahead of print)]: Introduces CNVineta, an R package for data-mining and visualization of copy number variants in large case-control data sets genotyped with SNP arrays. Available here.

Xia J, Wishart DS. MetPA: a web-based metabolomics tool for pathway analysis and visualization. [Bioinformatics. 2010 Jul 13. (e-pub ahead of print)]: Describes MetPA (Metabolomics Pathway Analysis), a web-based tool for analyzing and visualizing metabolomic data within the context of metabolic pathways. MetPA combines several pathway-enrichment analysis procedures, along with the analysis of pathway topological characteristics, to help identify the most relevant metabolic pathways involved in a given metabolomic study, according to the paper's abstract. The results are presented in a "Google-map style" network visualization system that supports "intuitive and interactive data exploration through point-and-click, dragging, and lossless zooming." Available here.

Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-né P, Nicolas A, Delattre O, Barillot E. SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. [Bioinformatics. 2010 Aug 1;26(15):1895-6]: Presents SVDetect, a program designed to identify genomic structural variations from paired-end and mate-pair next-generation sequencing data produced by the Illumina Genome Analyzer and Life Technologies SOLiD platforms. "Applying both sliding-window and clustering strategies, we use anomalously mapped read pairs provided by current short read aligners to localize genomic rearrangements and classify them according to their type, e.g. large insertions-deletions, inversions, duplications and balanced or unbalanced inter-chromosomal translocations," the authors state in the paper's abstract. Available here.