Bioinformatics Tool-Related Papers of Note, December 2009
Angly FE, Willner D, Prieto-Davó A, et al. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. [PLoS Comput Biol. 2009[ Dec;5(12):e1000593]: GAAS, or Genome relative Abundance and Average Size, is a software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats, the authors state in the paper's abstract. GAAS uses a novel methodology to control for sampling bias via length normalization, to adjust for multiple Blast similarities by similarity weighting, and to select significant similarities using relative alignment lengths, according to the authors. In benchmark tests, GAAS was shown to be robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Available here.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. [BMC Bioinformatics;10(1):421]: The publication describes features and improvements of rewritten Blast software and introduces new command-line applications addressing the fact that Blast software is "suboptimal" for long queries or database sequences, according to the authors. Long query sequences are broken into chunks for processing, in some cases leading to "dramatically shorter" run times, the abstract states. For long database sequences, the team said it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. Available here.
Caulfield E, Hellander A. CellMC — a multiplatform model compiler for the Cell Broadband Engine and x86. [Bioinformatics. 2009 Dec 8. (e-pub ahead of print)]: Describes CellMC, an SBML model compiler that implements a vectorized version Gillespie's stochastic simulation algorithm, used to study stochastic models of biochemical system, for use with Cell/BE or x86 PCs. According to the paper's abstract, CellMC will run on a wide variety of x86 computers running Linux/MacOSX (Darwin) and on Cell/BE computers such as the Sony PlayStation3 and the IBM BladeCenter QS22. The code is available here.
Clement K, Gustafson N, Berbert A, Carroll H, Merris C, Olsen A, Clement M, Snell Q, Allen J, Roper RJ. PathGen: A Transitive Gene Pathway Generator. [Bioinformatics. 2009 Dec 4. (e-pub ahead of print)]: The gene interaction network tool PathGen incorporates data from several sources to create connections spanning multiple gene interaction databases. Results are displayed in a graphical format, showing gene interaction type and strength, database source, and microarray expression data. PathGen interaction analyses were validated with genes connected to the altered facial development related to Down syndrome, according to the abstract. Available here.
Gerlach W, Jünemann S, Tille F, Goesmann A, Stoye J. WebCarma: a web application for the functional and taxonomic classification of unassembled metagenomic reads. [BMC Bioinformatics. 2009 Dec 18;10:430]: WebCarma is a refined version of the Carma software pipeline for the characterization of species composition and the genetic potential of microbial samples using short, unassembled reads. WebCarma is available as a web application for the taxonomic and functional classification of unassembled short reads from metagenomic communities. The authors note that they have analyzed how well ultra-short reads apply in metagenomics and show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is available here.
Harris EY, Ponts N, Levchuk A, Le Roch K, Lonardi S. BRAT: Bisulfite-treated Reads Analysis Tool. [Bioinformatics. 2009 Dec 22. (e-pub ahead of print)]: The authors present a new tool for mapping short reads obtained from the Illumina Genome Analyzer following sodium bisulfite conversion. BRAT supports single and paired-end reads and handles input files containing reads and mates of different lengths and is faster, maps more unique paired-end reads, and has higher accuracy than existing programs, the researchers wrote in the abstract. The source code is available here.
Madar A, Greenfield A, Ostrer H, Vanden-Eijnden E, Bonneau R. The inferelator 2.0: A scalable framework for reconstruction of dynamic regulatory network models. [Conf Proc IEEE Eng Med Biol Soc. 2009;1:5448-51]: Introduces a method for reconstructing biological networks called Inferelator 2.0, which simultaneously learns both topology and kinetic-parameters given a set of genome-wide measurements as input. Inferelator 1.0 was designed to learn a system of ordinary differential equations describing the rate of change in transcription of each gene or gene-cluster, as a function of environmental and transcription factors. The team has now developed, implemented and tested a new Markov-Chain-Monte-Carlo dynamical modeling method, Inferelator 2.0, that works in tandem with Inferelator 1.0 and which is designed to relax these approximations. Results for the prokaryote Halobacterium show the platform yields "a marked improvement" in the predictive performance in modeling the regulatory dynamics of the system over longer timescales, the scientists wrote. Available here.
[ pagebreak ]
Salari K, Tibshirani R, Pollack JR. DR-Integrator: a new analytic tool for integrating DNA copy number and gene expression data. [Bioinformatics. 2009 Dec 22. (e-pub ahead of print)]: DNA/RNA-Integrator is a statistical software tool to perform integrative analyses on paired DNA copy number and gene expression data. It identifies genes with significant correlations between DNA copy number and gene expression, and implements a supervised analysis that captures genes with significant alterations in both DNA copy number and gene expression between two sample classes, the scientists stated in their paper. DR-Integrator is available for non-commercial use here. The R package is available under the name 'DRI' here.
Seaman JD, Sanford JC. Skittle: A 2-Dimensional Genome Visualization Tool. [BMC Bioinformatics. 2009 Dec 30;10(1):452]: Describes a new data visualization tool, called Skittle, which creates a two-dimensional nucleotide display by assigning four colors to the four nucleotides, and then text-wraps to a user adjustable width. This display is accompanied by a "repeat map," which displays all local repeating units, based upon analysis of all possible local alignments. Skittle includes a smooth-zooming interface that lets users analyze genomic patterns at any scale and is "especially useful" in identifying and analyzing tandem repeats, according to the authors. Available here.
Simpson JT, McIntyre RE, Adams DJ, Durbin R.Copy number variant detection in inbred strains from short read sequence data. [Bioinformatics. 2009 Dec 18. (e-pub ahead of print)]: The authors describe an algorithm to detect copy number variants from short read sequence data in homozygous organisms, such as inbred laboratory strains of mice. The approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model. This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms, to determine genomic copy number. The scientists tested the algorithm on short read sequence data generated from re-sequencing chromosome 17 of mouse strains with the Illumina platform, identifying 118 copy number variants. They also investigated the algorithm's performance through comparison to CNVs previously identified by array-comparative genomic hybridization. Source code and pre-compiled binaries are available here.
Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, Masak C, Torrance G, Wagener J, Willighagen EL, Steinbeck C, Wikberg JE. Bioclipse 2: A scriptable integration platform for the life sciences. [BMC Bioinformatics. 2009 Dec 3;10(1):397]: Bioclipse 2.0 is a "complete rewrite" of the bioclipse platform, according to the authors. The functionality is available from the graphical user interface and from a built-in "novel domain-specific language." New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments. It was developed as a rich client but also takes advantage of the web and cloud-based services, for "more demanding calculations" or data retrieval, the developers stated. Bioclipse 2 has been released under the Eclipse Public License, an open source license that allows additional plugins to be of any license type. Source code and binaries are available here.
Tsiknakis M, Sfakianakis S, Zacharioudakis G, Umakis L, Kanterakis A, Potamias G, Kafetzopoulos D. A semantically aware platform for the authoring and secure enactment of bioinformatics workflows. [Conf Proc IEEE Eng Med Biol Soc. 2009;1:5625-8]: Describes a bioinformatics workflow authoring and execution environment intended to help with complex, long-running data and compute-intensive experiments. The researchers also present a "semantic framework" used for supporting specific user-requirements related to the reasoning and inference capabilities of the environment.
Yan RX, Si JN, Wang C, Zhang Z. DescFold: A web server for protein fold recognition. [BMC Bioinformatics;10(1):416]: The team reports it has improved the machine-learning method for protein fold recognition platform DescFold by incorporating "more powerful descriptors" and setting up a web server. DescFold was established by using support vector machines to combine four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores; a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores; a descriptor based on secondary structure element alignment; and a descriptor based on the occurrence of PROSITE functional motifs. The team said they trained and tested the new DescFold in a total of 1,835 diverse proteins. When the new descriptors were introduced, the new DescFold boosted the performance of fold recognition "substantially." The DescFold server is accessible here.
Zhang KX, Ouellette BF. Pandora, a PAthway and Network DiscOveRy Approach based on common biological evidence. [Bioinformatics. 2009 Dec 22. (e-pub ahead of print)]: The scientists present an approach that uses network topology to predict biological pathways, integrating four types of biological evidence: protein-protein interaction; genetic interaction; domain-domain interaction; and semantic similarity of GO terms to generate a functionally associated network. This network was then used to develop a new pathway-finding algorithm to predict biological pathways in yeast. It discovered 195 biological pathways and 31 functionally redundant pathway pairs. The method, implemented in Perl, is available here.