Skip to main content
Premium Trial:

Request an Annual Quote

In Print: Bioinformatics Tool-Related Papers of Note, October 2005


Buck M, Nobel A, Lieb J. ChIPOTle: A User-Friendly Tool for the Analysis of ChIP-Chip Data. [Genome Biology 2005, 6:R97]: Presents ChIPOTle (Chromatin Immunoprecipitation on Tiled Arrays), which "takes advantage of two unique properties of ChIP-chip data: the single-tailed nature of the data, caused by specific enrichment but not specific depletion of genomic fragments; and the predictable enrichment of DNA fragments adjacent to sites of direct protein-DNA interaction," the authors write. Availability:

Chinnasamy A, Mittal A, Sung W. Probabilistic Prediction of Protein-Protein Interactions from the Protein Sequences. [Comput Biol Med. 2005 Oct 24 (e-pub ahead of print)]: Introduces a method that uses protein sequence information to predict protein-protein interactions via a probabilistic-based tree-augmented naive Bayesian network. The framework provides a confidence level for every predicted interaction.

Churbanov A, Pauley M, Quest D, Ali H. A Method of Precise mRNA/DNA Homology-Based Gene Structure Prediction. [BMC Bioinformatics. 2005 Oct 21;6(1):261]: Describes a gene structure prediction tool called GIGOgene that is based on mRNA/DNA homology. The method uses a new affine gap penalty splice-enhanced global alignment algorithm for a high-quality annotation of splice sites, along with a new algorithm to assemble partial gene structure predictions using interval graphs. According to the authors, GIGOgene exhibited a sensitivity of 99.08 percent and a specificity of 99.98 percent on the Genie learning set. Availability:

Engelhardt B, Jordan M, Muratore K, Brenner S. Protein Molecular Function Prediction by Bayesian Phylogenomics. [PLoS Comput Biol 1(5): e45]: Discusses a statistical method called SIFTER (Statistical Inference of Function Through Evolutionary Relationships) to infer molecular function for unannotated protein sequences using homology. SIFTER predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy, according to the authors, and offers improved accuracy over Blast, GeneQuiz, Gotcha, and Orthostrapper. Availability: upon request from the authors ([email protected]).

Goldovsky L, Janssen P, Ahrén D, Audit B, Cases I, Darzentas N, Enright A, López-Bigas N, Peregrin-Alvarez J, Smith M, Tsoka S, Kunin V, Ouzounis C. Cogent++: An Extensive and Extensible Data Environment for Computational Genomics. [Bioinformatics 2005 21(19):3806-3810]: Describes CoGenT++, a data environment for computational research in comparative and functional genomics. The environment is built on ProXSim, a continually updated all-against-all similarity database that stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions, putative orthologs, protein families, phylogenetic profiles, and phylogenetic trees. Availability:

Huentelman M, Craig D, Shieh A, Corneveaux J, Hu-Lince D, Pearson J, Stephan D. SNiPer: Improved SNP Genotype Calling for Affymetrix 10K GeneChip Microarray Data. [BMC Genomics 2005, 6:149]: Describes SNiPer, an application that uses two clustering algorithms to yield increased call rates and equivalent concordance to Affymetrix-called genotypes. SNiPer can be retrained for lab-specific training sets. Availability:

Leser U. A Query Language for Biological Networks. [Bioinformatics 2005 21(suppl_2):ii33-ii39]: Introduces the pathway query language (PQL) for querying large protein interaction or pathway databases. PQL is based on a graph data model with extensions reflecting properties of biological objects. According to the author, "The syntax is easy to learn for anybody familiar with SQL." Availability: upon request from the author ([email protected]).

Lutteke T, Bohne-Lang A, Loss A, Goetz T, Frank M, von der Lieth C. An Internet Portal to Support Glycomics and Glycobiology Research. [Glycobiology. 2005 Oct 20 (e-pub ahead of print)]: Describes a bioinformatics portal for glycan-related data and applications from different resources using a single user interface. Availability:

Myers E. The Fragment Assembly String Graph. [Bioinformatics 2005 21(suppl_2):ii79-ii85]: Describes the concept of the string graph, which can be used to represent a DNA sequence from a collection of shotgun sequencing reads. According to the author, the paper "is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method." The concept is currently being used to build a next-generation whole-genome assembler called BOA (Berkeley Open Assembler) that is expected to scale to mammalian genomes.

Prli A, Down T, Hubbard T. Adding Some SPICE to DAS. [Bioinformatics. 2005 Sep 1;21 Suppl 2:ii40-ii41]: Describes an extension of the distributed annotation system (DAS) protocol that is applicable to macromolecular structures. While the original DAS protocol was designed to serve annotation of genomic sequences, SPICE can be used to visualize protein sequence and structure annotations. Availability:

Riley R, Lee C, Sabatti C, Eisenberg D. Inferring Protein Domain Interactions from Databases of Interacting Proteins. [Genome Biology 2005, 6:R89]: Describes DPEA (domain pair exclusion analysis), a method for inferring domain interactions from databases of interacting proteins. DPEA features a log odds score that reflects the confidence that two domains interact. The authors analyzed 177,233 potential domain interactions underlying 26,032 protein interactions to infer 3,005 high-confidence domain interactions, which were evaluated using known domain interactions in the Protein Data Bank.

Shadforth I, Dunkley T, Lilley K, Bessant C. i-Tracker: For Quantitative Proteomics Using iTRAQ. [BMC Genomics 2005, 6:145]: Describes i-Tracker, software for extracting reporter ion peak ratios from non-centroided tandem MS peak lists in a format that can be integrated with the results of protein identification tools such as Mascot and Sequest. According to the authors, this functionality is currently not provided by ProQuant, the software that Applied Biosystems supplies with iTRAQ, "which is restricted to matching quantitative information to the peptide identifications from Applied Biosciences' Interrogator software." Availability:

Zhan Y, Kulp D. Model-P: A Basecalling Method for Resequencing Microarrays of Diploid Samples. [Bioinformatics 2005 21(suppl_2):ii182-ii189]: Describes a new basecalling method for resequencing microarrays called Model-P, which takes into consideration the expected feature intensities for different potential genotypes. According to the authors, Model-P has better performance at high call rates compared with Abacus, the current state-of-the-art method. Availability: upon request from the authors ([email protected]).

Zhang J, Wheeler D, Yakub I, Wei S, Sood R, et al. SNPdetector: A Software Tool for Sensitive and Accurate SNP Detection. [PLoS Comput Biol 1(5): e53]: Introduces SNPdetector, a software tool for automated identification of SNPs and mutations in fluorescence-based resequencing reads. Availability:

Filed under

The Scan

Study Finds Sorghum Genetic Loci Influencing Composition, Function of Human Gut Microbes

Focusing on microbes found in the human gut microbiome, researchers in Nature Communications identified 10 sorghum loci that appear to influence the microbial taxa or microbial metabolite features.

Treatment Costs May Not Coincide With R&D Investment, Study Suggests

Researchers in JAMA Network Open did not find an association between ultimate treatment costs and investments in a drug when they analyzed available data on 60 approved drugs.

Sleep-Related Variants Show Low Penetrance in Large Population Analysis

A limited number of variants had documented sleep effects in an investigation in PLOS Genetics of 10 genes with reported sleep ties in nearly 192,000 participants in four population studies.

Researchers Develop Polygenic Risk Scores for Dozens of Disease-Related Exposures

With genetic data from two large population cohorts and summary statistics from prior genome-wide association studies, researchers came up with 27 exposure polygenic risk scores in the American Journal of Human Genetics.