Zhang J, He S, Ling CX, Cao X, Zeng R, Gao W. PeakSelect: preprocessing tandem mass spectra for better peptide identification. [Rapid Commun Mass Spectrom. 2008 Mar 18;22(8):1203-1212]: Introduces a preprocessing method, called PeakSelect, that improves the accuracy and efficiency of tandem mass-spectrometry peptide identification. The authors propose “a new and important concept of an isotope pattern vector, which characterizes the isotope cluster of fragment ions. Then the noise and real peaks can be distinguished by the quantitative IPV values.” According to the abstract, “experiments show that PeakSelect can help to reduce the Mascot searching time and increase the reliability of peptide identifications.”
Bioinformatics Tool-Related Papers of Note, March 2008
Cahan P, Godfrey LE, Eis PS, Richmond TA, Selzer RR, Brent M, McLeod HL, Ley TJ, Graubert TA. wuHMM: a robust algorithm to detect DNA copy number variation using long oligonucleotide microarray data. [Nucleic Acids Res. 2008 Mar 11 (e-pub ahead of print)]: Discusses wuHMM, an algorithm for mapping copy number variants from array comparative genomic hybridization platforms comprised of up to 3 million probes. The authors note in the abstract that long-oligonucleotide arrays offer the highest resolution for CNV studies, but the performance of currently available analytical tools “suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays.” The wuHMM overcomes this problem because it uses sequence divergence information to reduce the false positive rate, according to the abstract.
Choi JH, Kim S, Tang H, Andrews J, Gilbert DG, Colbourne JK. A machine-learning approach to combined evidence validation of genome assemblies. [Bioinformatics 2008 24(6):744-750]: Presents a machine-learning method to detect assembly errors in sequence assemblies. The method combines several available measures for assembly validation, including good-minus-bad coverage, good-to-bad-ratio, the average Z-score, and the average absolute Z-score. According to the authors, the good-minus-bad measure “performs better than the others in both its sensitivity and its specificity for assembly error detection. Nevertheless, no single method performs sufficiently well to reliably detect genomic regions requiring attention for further experimental verification.” The combined approach, however, achieves a prediction accuracy of more than 90 percent.
Denisov G, Walenz B, Halpern AL, Miller J, Axelrod N, Levy S, Sutton G. Consensus Generation and Variant Detection by Celera Assembler. [Bioinformatics. 2008 Mar 4 (e-pub ahead of print)]: Describes an algorithm for identifying allelic variation given a whole-genome shotgun assembly of haploid sequences. The algorithm, which was used to produce the first diploid genome sequence of an individual human, produces a set of haploid consensus sequences rather than a single consensus sequence. According to the abstract, other WGS assemblers “take a column-by-column approach to consensus generation, and produce a single consensus sequence, which can be inconsistent with the underlying haploid alleles and inconsistent with any of the aligned sequence reads.” The new algorithm overcomes this problem by taking a “dynamic windowing approach” that detects alleles by “simultaneously processing portions of aligned reads spanning a region of sequence variation. It then assigns reads to their respective alleles, phases adjacent variant alleles, and generates a consensus sequence corresponding to each confirmed allele. Available here.
Ferro M, Tardif M, Reguer E, Cahuzac R, Bruley C, Vermat T, Nugues E, Vigouroux M, Vandenbrouck Y, Garin J, Viari A. PepLine: A Software Pipeline for High-Throughput Direct Mapping of Tandem Mass Spectrometry Data on Genomic Sequences. [J Proteome Res. 2008 Mar 19 (e-pub ahead of print)]: Introduces PepLine, software that maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on peptide sequence tags that are obtained from partial interpretation of QTOF MS/MS spectra. These PSTs are then mapped on translations of genomic sequences, and any hits are clustered in order to detect potential coding regions. Available here.
Hernandez D, François P, Farinelli L, Osterås M, Schrenzel J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. [Genome Res. 2008 Apr 3 (e-pub ahead of print)]: Introduces a de novo assembly algorithm for short sequencing reads of around 35 base pairs. “Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome,” according to the paper’s abstract. Available here.
Laborde T, Tomita M, Krishnan A. GANDivAWeb: A Web Server for detecting early folding units ('foldons') from protein 3D structures. [BMC Struct Biol. 2008 Mar 7;8(1):15]: Discusses a webserver, called GANDivAWeb, that is used to identify “foldons,” also called autonomous folding units, which are small regions of proteins that tend to fold independently. The website can take as input an uploaded PDB file, identify the modules using the GANDivA algorithm and e-mail the results back to the user. Results include the module decomposition of the protein, plots of cartoon representations of the protein colored by module identity and connectivity, and contour plots of the hydrophobicity and relative accessible surface area distributions. Available here.
Lai W, Choudhary V, Park PJ. CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms. [Bioinformatics 2008 24(7):1014-1015]: Describes a web-based tool that applies several algorithms to a single array comparative genomic hybridization profile entered to generate a heatmap panel of the segmented profiles for each method as well as a consensus profile. The interface calls algorithms written in the statistical language R. Available here.
Parisien M, Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. [Nature. 2008 Mar 6;452(7183):51-5]: Describes how “pipelining” two algorithms, MC-Fold and MC-Sym, “reproduces a series of experimentally determined RNA three-dimensional structures from the sequence,” according to the paper’s abstract. Available here.
Sanchez-Villeda H, Schroeder S, Flint-Garcia S, Guill KE, Yamasaki M, McMullen MD. DNAAlignEditor: DNA Alignment editor tool. [BMC Bioinformatics 2008, 9:154]: Describes a nucleotide sequence alignment editor called DNAAlignEditor that provides an interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. DNAAlignEditor is a client/server tool with two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. Available here.
Sturm M, Bertsch A, Groepl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O. OpenMS — An open-source software framework for mass spectrometry. [BMC Bioinformatics 2008, 9:163]: Discusses OpenMS, a software framework for “rapid application development” in mass spectrometry, according to the patent abstract. “OpenMS has been designed to be portable, easy-to-use, and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis.” Available here.
Visvanathan M, Breit M, Pfeifer B, Baumgartner C, Modre-Osprian R, Tilg B. DMSP — Database for Modeling Signaling Pathways. Combining Biological and Mathematical Modeling Knowledge for Pathways. [Methods Inf Med. 2008;47(2):140-8]: Introduces the Database for Modeling Signaling Pathways, which includes information on mathematical modeling and biology about different signaling pathways. DMSP incorporates biological datasets from online databases like BIND, DIP, PIP, and SPiD, as well as “modeling knowledge” based on a literature study. Users can design, visualize, and simulate pathway models with the information in the database.