Bioinformatics Tool-Related Papers of Note, November 2009
Note: In addition to the below listing, papers for Nucleic Acids Research's annual database issue are available under advance access here.
Blouin C, Perry S, Lavell A, Susko E, Roger AJ. Reproducing the manual annotation of multiple sequence alignments using a SVM classifier. [Bioinformatics. 2009 Dec 1;25(23):3093-3098)]: Aligning protein sequences presents a challenge because some sites do not "respect the assumption of positional homology," which leads to the need for manual editing, according to the paper's abstract. In response, the authors note that they have trained a support vector machine classifier to reproduce decisions made during manual editing with an accuracy of 95 percent, which implies that manual editing can "be made reproducible and applied to large-scale analyses." The authors also claim that it is possible to retrain the classifier by providing examples of multiple sequence alignment annotation. Available here.
Dogrusoz U, Cetintas A, Demir E, Babur O. Algorithms for effective querying of compound graph-based pathway databases. [BMC Bioinformatics. 2009 Nov 16;10(1):376]: Describes a querying framework and a number of graph-theoretic algorithms applicable to graph-based pathway databases, from protein-protein interactions to metabolic and signaling pathways. The framework can account for compound or nested structures and ubiquitous entities present in the pathway data, according to the authors. The algorithms were implemented within the querying component of a new version of the software tool Pathway Analysis Tool for Integration and Knowledge Acquisition, or PATIKAweb. Available here.
Emde AK, Grunert M, Weese D, Reinert K, Sperling SR. MicroRazerS: Rapid alignment of small RNA reads. [Bioinformatics. 2009 Oct 29. (e-pub ahead of print)]: The authors have developed MicroRazerS, a tool for mapping small RNAs onto a reference genome to help scientists apply deep sequencing to determine the small RNA content of a cell. It is "an order of magnitude faster than MegaBlast and comparable in speed to other short-read mapping tools," the paper's abstract states. Available here.
Homer N, Merriman B, Nelson SF. BFAST: an alignment tool for large scale genome resequencing. [PLoS One. 2009 Nov 11;4(11):e7767]: Introduces a new algorithm for rapid and accurate aligning of billions of short DNA sequence reads to a large reference genome. The algorithm, called BFAST (Blat-like Fast Accurate Search Tool) can align data produced by any of current sequencing platforms, and lets users set parameters for speed and accuracy, according to the paper's abstract. When compared to BLAT, MAQ, SHRiMP, and SOAP, BFAST can achieve "substantially" greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods, according to the authors. Available here.
Hutter H, Ng MP, Chen N. GExplore: a web server for integrated queries of protein domains, gene expression and mutant phenotypes. [BMC Genomics. 2009 Nov 16;10(1):529]: Presents GExplore, a user-friendly database interface for data-mining at the gene expression/protein function level. GExplore supports combinatorial searches for proteins "with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes," the abstract states. Available here.
Kono N, Arakawa K, Ogawa R, Kido N, Oshita K, Ikegami K, Tamaki S, Tomita M. Pathway projector: web-based zoomable pathway browser using KEGG atlas and Google Maps API. [PLoS One. 2009 Nov 11;4(11):e7710]: Introduces Pathway Projector, a web-based pathway browser that provides integrated pathway maps based on the KEGG Atlas, with additional nodes for genes and enzymes. It is implemented as a scalable, zoomable map using the Google Maps API. Users can search pathway-related data using keywords, molecular weights, nucleotide sequences, and amino acid sequences as possible routes between compounds. Available here.
Kriseman J, Busick C, Szelinger S, Dinu V. BING: Biomedical informatics pipeline for next generation sequencing. [J Biomed Inform. 2009 Nov 16. (e-pub ahead of print)]: Describes the Biomedical Informatics Pipeline, or BING, for the analysis of next-generation sequencing data. According to the paper's abstract, BING offers "several novel computational approaches" for image alignment; signal correlation, compensation, separation, and pixel-based cluster registration; signal measurement and base calling; and quality control and accuracy measurement. The authors benchmarked the new algorithms against the Illumina Genome Analysis Pipeline and found that its pixel-based approach "produces a significant increase in the number of sequence reads, while reducing the computational time per experiment and error rate" by less than 2 percent.
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computing. [Genome Biol. 2009 Nov 20;10(11):R134]: Describes Crossbow, a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85, according to the authors. Available here.
[ pagebreak ]
Muller J, Creevey CJ, Thompson JD, Arendt D, Bork P. AQUA: Automated quality improvement for multiple sequence alignments. [Bioinformatics. 2009 Nov 19. (e-pub ahead of print)]: Describes a "simple tiil" for identifying the most reliable automatically generated multiple sequence alignment for a given protein family. The protocol, called AQUA (Automated quality improvement for multiple sequence alignments), relies on two alignment programs (MUSCLE and MAFFT), one refinement program (RASCAL), and one assessment program (NORMD), "but other programs could be incorporated at any of the three steps," the authors note. Available here.
Mutwil M, Usadel B, Schutte M, Loraine A, Ebenhoh O, Persson S. Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm. [Plant Physiol. 2009 Nov 4. (e-pub ahead of print)]: Presents the Heuristic Cluster Chiseling Algorithm, which aids visualization and interpretion of correlation relationships on a genome scale. Unlike many other methods, it supports weighted edges and may help control average cluster sizes, according to the authors. Comparative clustering analyses demonstrated that HCCA performed as well as, or better than, both the commonly used Markov, MCode, and k-means clustering algorithms.
Naito Y, Yoshimura J, Morishita S, Ui-Tei K. siDirect 2.0: updated software for designing functional siRNA with reduced seed-dependent off-target effect. [BMC Bioinformatics. 2009 Nov 30;10(1):392]: Discusses siDirect 2.0, an update of the web-based software siDirect for functional and off-target siRNA design for mammalian RNAi. In the new version of the software, the siRNA design algorithm is "extensively updated to eliminate off-target effects by reflecting our recent finding that the capability of siRNA to induce off-target effect is highly correlated to the thermodynamic stability, or the melting temperature (Tm), of the seed-target duplex, which is formed between the nucleotides positioned at 2-8 from the 5' end of the siRNA guide strand and its target mRNA," the abstract states. Selection of siRNAs with lower seed-target duplex stabilities, followed by the elimination of unrelated transcripts with nearly perfect match, should minimize the off-target effects. Available here.
Néron B, Ménager H, Maufrais C, Joly N, Maupetit J, Letort S, Carrere S, Tuffery P, Letondal C. Mobyle: a new full web bioinformatics framework. [Bioinformatics. 2009 Nov 15;25(22):3005-11]: The paper describes Mobyle, a flexible web environment for defining and running bioinformatics analyses. It includes data-management features that let users reproduce analyses and combine tools using a hierarchical typing system. Mobyle can invoke services distributed over remote Mobyle servers, enabling a federated network of curated bioinformatics portals so the user does not have to install sophisticated software, according to the authors. Available here.
Qi J, Zhao F, Buboltz A, Schuster SC. inGAP: an integrated next-generation genome analysis pipeline. [Bioinformatics. 2009 Oct 30. (e-pub ahead of print)]: The authors present their mining pipeline, inGAP, which is guided by a Bayesian principle to detect single nucleotide polymorphisms and insertion/deletions by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. The pipeline also lets users compare multiple genomes and helps with assembling bacterial genomes. According to the authors, experiments on simulated and experimental data show that this pipeline can achieve overall 97 percent accuracy in SNP detection and 94 percent accuracy in indel detection. Available here.
Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP. Mining gene functional networks to improve mass-spectrometry-based protein identification. [Bioinformatics. 2009 Nov 15;25(22):2955-61]: The paper addresses low-sensitivity and low-confidence protein identifications in high-throughput protein identification experiments based on tandem mass spectrometry. The researchers developed a method, MSNet, that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. According to the authors, the method increases the number of proteins identified in the sample at a given error rate. Specifically, they identified 8 percent to 29 percent more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37 percent more proteins in a human sample. Available here.
Wu C, Orozco C, Boyer J, et al. BioGPS: an extensible and customizable portal for organizing and querying gene annotation resources. [Genome Biol. 2009 Nov 17;10(11):R130]: Introduces BioGPS, a centralized gene portal for aggregating distributed gene annotation resources. BioGPS "embraces the principle of community intelligence, enabling any user to easily and directly contribute to the BioGPS platform," the abstract states. Available here.