Castrignanò T, D'Antonio M, Anselmo A, et al. ASPicDB: A database resource for alternative splicing analysis. [Bioinformatics. 2008 May 15;24(10):1300-4]: Describes ASPicDB, a database of annotations of the alternative splicing pattern of human genes and functional annotation of predicted splicing isoforms. The database uses the ASPic algorithm to detect splice sites and model full-length transcripts. The algorithm is based on the multiple alignment of gene-related transcripts to the genomic sequences, “a strategy that greatly improves prediction accuracy compared to methods based on independent and progressive alignments,” according to the paper’s abstract. ASPicDB is updated on a monthly basis and is available here.
Daigle BJ Jr., Altman RB. M-BISON: microarray-based integration of data sources using networks. [BMC Bioinformatics 2008, 9:214]: Introduces Microarray-Based Integration of data SOurces using Networks, or M-BISON, a probabilistic model that integrates background biological knowledge with microarray data to predict individual differentially expressed genes. According to the paper’s authors, M-BISON “improves signal detection on a range of simulated data, particularly when using very noisy microarray data.”
Dandass YS, Burgess SC, Lawrence M, Bridges SM. Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research. [BMC Bioinformatics. 2008 Apr 15;9:197]: Describes techniques for improving the performance of the string set matching problem in computational proteomics. The authors focus on the process of matching peptide sequences against a genome translated in six reading frames. The method is an adaptation of the Aho-Corasick algorithm for field programmable gate array devices. “The FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation,” according to the paper’s abstract.
Friedländer MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N. Discovering microRNAs from deep sequencing data using miRDeep. [Nat Biotechnol. 2008 Apr;26(4):407-15]: Introduces an algorithm called miRDeep, which uses a probabilistic model of microRNA biogenesis to score compatibility of the position and frequency of sequenced RNA with the secondary structure of the miRNA precursor. The authors demonstrate the algorithm’s accuracy using published Caenorhabditis elegans data and sequencing data for human and dog RNAs. “miRDeep reports altogether 230 previously unannotated miRNAs, of which four novel C. elegans miRNAs are validated by northern blot analysis,” they report.
Guo Y, Yu L, Wen Z, Li M. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. [Nucleic Acids Res. 2008 Apr 4 (e-pub ahead of print)]: Describes a sequence-based method for predicting novel protein-protein interactions that combines a new feature representation using auto covariance and support vector machine. Using an independent data set of 11,474 yeast PPIs, the authors found that the prediction accuracy of the model is 88.09 percent. The software is available here.
Hoff KJ, Tech M, Lingner T, Daniel R, Morgenstern B, Meinicke P. Gene prediction in metagenomic fragments: a large scale machine learning approach. [BMC Bioinformatics 2008, 9:217]: Describes a new gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, the method uses linear discriminants for monocodon usage, dicodon usage, and translation initiation sites to extract features from DNA sequences. In the second stage, the approach uses an artificial neural network to combine these features with open reading frame length and fragment GC-content to compute the probability that the open reading frame encodes a protein. “The combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines,” according to the paper’s abstract.
Kim PG, Cho HG, Park K. A scaffold analysis tool using mate-pair information in genome sequencing. [J Biomed Biotechnol. 2008;2008:675741]: Describes a scaffold analysis program called ConPath that orders and orients separate sequence contigs “by exploiting the mate-pair information between contig-pairs.” ConPath determines the relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. The authors note that they have used ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, “where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath.”
Li W, Carroll JS, Brown M, Liu S. xMAN: extreme MApping of OligoNucleotides. [BMC Genomics. 2008;9 Suppl 1:S20]: Introduces a new algorithm for rapidly mapping millions of oligonucleotide fragments to a reference genome. The algorithm, called extreme MApping of OligoNucleotide, or xMAN, converts oligonucleotides to integers hashed in RAM and scans through genomes using a bit-shifting operation. According to the authors, it achieves “at least one order of magnitude speed increase over existing tools” and can map the 42 million 25-mer probes on Affymetrix whole-human genome tiling arrays to the entire genome in less than six CPU hours. xMAN is available here.
Pavesi G, Zambelli F, Caggese C, Pesole G. Exalign: a new method for comparative analysis of exon-intron gene structures. [Nucleic Acids Res. 2008 Apr 8 (e-pub ahead of print)]: Introduces Exalign, an algorithm designed to retrieve, compare, and search for the exon-intron structure of existing gene annotations. Exalign is available here.
Quandt A, Hernandez P, Masselot A, Hernandez C, Maffioletti S, Pautasso C, Appel RD, Lisacek F. swissPIT: A novel approach for pipelined analysis of mass spectrometry data. [Bioinformatics. 2008 Apr 23 (e-pub ahead of print)]: Describes the Swiss Protein Identification Toolbox, or swissPIT, which is an expandable multi-tool platform for executing workflows to analyze tandem mass spectrometry-based data. “One of the major problems in proteomics is the absence of standardized workflows to analyze the produced data,” the authors note in the abstract. “The main idea of swissPIT is not only the usage of different identification tools in parallel but also the meaningful concatenation of different identification strategies at the same time.” The swissPIT software is available here.
Schütz F, Delorenzi M. MAMOT: Hidden MArkov MOdeling Tool. [Bioinformatics. 2008 Apr 25 (e-pub ahead of print)]: Discusses MAMOT, a command-line program for Unix-like operating systems that allows scientists to use hidden Markov models more easily. Users can define the architecture and initial parameters of a model in a text file and then use MAMOT for parameter optimization on example data, decoding, and the production of stochastic sequences generated according to the probabilistic model. MAMOT is available here.
Siepen JA, Belhajjame K, Selley JN, et al. ISPIDER Central: an integrated database web-server for proteomics. [Nucleic Acids Res. 2008 Apr 25 (e-pub ahead of print)]: Introduces the ISPIDER Central Proteomic Database search platform, which allows users to search for proteins and peptides that have been characterized in mass spectrometry-based proteomics experiments from different groups and stored in different databases. The resource relies on the Protein Identifier Cross-Reference service to resolve accessions from different sequence repositories. ISPIDER also offers custom-built clients that allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories. ISPIDER is available here.
Spasiæ I, Schober D, Sansone SA, Rebholz-Schuhmann D, Kell DB, Paton NW. Facilitating the development of controlled vocabularies for metabolomics technologies with text mining. [BMC Bioinformatics. 2008 Apr 29;9 Suppl 5:S5]: Introduces a text mining-based approach for rapidly developing controlled vocabularies. The authors present case studies involving two controlled vocabularies — for nuclear magnetic resonance spectroscopy and gas chromatography — that are being developed as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms, respectively. Using the new method, a total of 5,699 and 2,612 new terms were acquired automatically from the literature, respectively, the authors note. “The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts,” according to the paper’s abstract.