Bioinformatics Tool-Related Papers of Note, December 2007

Adler P, Reimand J, Jänes J, Kolde R, Peterson H, Vilo J. 18056068 KEGGanim: pathway animations for high-throughput data. [Bioinformatics. 2007 Dec. 1 (e-pub ahead of print)]: Describes KEGGanim, a web-based tool for visualizing experimental data in biological pathways. KEGGanim produces animations and images of KEGG pathways using public or user uploaded high-throughput data. “Pathway members are colored according to experimental measurements, and animated over experimental conditions,” according to the paper’s abstract. KEGGanim visualization “highlights dynamic changes over conditions and allows the user to observe important modules and key genes that influence the pathway,” the authors add. Available here.

Brylinski M, Skolnick J. 18165317 A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation [Proc Natl Acad Sci USA. 2007 Dec 28 (e-pub ahead of print)]: Describes FINDSITE, a method for ligand-binding site prediction and functional annotation that is based on binding-site similarity across groups of weakly homologous template structures identified from threading. “In most cases, FINDSITE can accurately assign a molecular function to the protein model,” according to the paper’s abstract.

Droit A, Hunter JM, Rouleau M, Ethier C, Picard-Cloutier A, Bourgais D, Poirier GG. 18093328 PARPs Database: A LIMS systems for protein-protein interaction data mining or Laboratory Information management system. [BMC Bioinformatics. 2007 Dec 19;8(1):483]: Describes the PARPs database, a data-analysis and data-management pipeline for liquid chromatography tandem mass spectrometry proteomics. Features include experiment annotation, protein database searching, protein sequence management, and data mining. Available here.

Givan SA, Sullivan CM, Carrington JC. 18088438 The Personal Sequence Database: a suite of tools to create and maintain web-accessible sequence databases. [BMC Bioinformatics 2007, 8:479]: Introduces the Personal Sequence Database, a suite of tools to create and maintain “small- to medium-sized web-accessible sequence databases.” The suite also includes BLASTAgent, a hit-tracking system, which automatically monitors public databases for new Blast hits. Available here.

Gross SS, Do CB, Sirota M, Batzoglou S. 18096039 CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. [Genome Biol. 2007 Dec 20;8(12):R269 [(e-pub ahead of print)]: Introduces CONTRAST, a gene predictor that incorporates information from multiple alignments rather than using phylogenetic models. According to the paper’s abstract, the method is based on discriminative machine-learning techniques, including a “novel training algorithm.” The method uses a two-stage approach, in which a set of binary classifiers designed to recognize coding region boundaries is combined with a global model of gene structure. Available here.

Helles G. 18077243 A comparative study of the reported performance of ab initio protein structure prediction algorithms. [J R Soc Interface. 2007 Dec 11 (e-pub ahead of print)]: Describes the comparison reported performance results for 18 recently published structure prediction algorithms. The authors identify “the general algorithmic settings most likely responsible for the difference in the reported performance.” Average normalized root-mean-square-deviation scores ranged from 11.17 to 3.48. According to the authors, the best-performing prediction algorithm is the I-TASSER algorithm.

Laibe C, Le Novere N. 18078503 MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology. [BMC Syst Biol. 2007 Dec 13;1(1):58 (e-pub ahead of print)]: Introduces MIRIAM Resources, a set of online services designed to catalog uniform resource identifiers and other information useful for complying with the “Minimal Information Requested In the Annotation of biochemical Models” guidelines. MIRIAM Resources are composed of several components: MIRIAM Database, which stores the information; MIRIAM Web Services, which allows users to programmatically access the database; MIRIAM Library, which provides access to the Web Services; and MIRIAM Web Application, a method for accessing and editing the data.

Li C, Li M. GWAsimulator: a rapid whole-genome simulation program. [Bioinformatics 2008 24(1):140-142]: Describes GWAsimulator, which simulates genotype data for case-control or population samples from genomic SNP chips. ”As genome-wide association studies become increasingly popular and new GWA data analysis methods are being developed, we anticipate that GWAsimulator will be an important tool for evaluating performance of new GWA analysis methods,” the authors note. Available here.

Mayer U. 18095364 Protein Information Crawler (PIC): Extensive spidering of multiple protein information resources for large protein sets. [Proteomics. 2008 Jan;8(1):42-4]: Introduces Protein Information Crawler, a system that automatically bulk-collects protein data from multiple databases and prediction servers and summarizes them in a Microsoft Excel spreadsheet or HTML table. Available here.

Pandya GA, Holmes MH, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann RD, Peterson SN. 18006572 A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform. [Nucleic Acids Res. 2007;35(21):e148]: Describes the development of an array-based whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia, as well as a set of bioinformatic filters that targeted “systematic base-calling errors” that were caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence, according to the paper’s abstract.

Zemla A, Geisbrecht B, Smith J, Lam M, Kirkpatrick B, Wagner M, Slezak T, Zhou CE. 18039711 STRALCP — structure alignment-based clustering of proteins. [Nucleic Acids Res. 2007 Nov 26 (e-pub ahead of print)]: Describes an algorithm called SRALCP, or Structure Alignment-based Clustering of Proteins, that identifies regions of structural similarity within a set of protein structures, and then uses those regions for clustering. The method generates detailed information about global and local similarities between pairs of protein structures, identifies fragments that are structurally conserved among proteins, and uses these spans to group the structures accordingly. Available here.


Zimin AV, Smith DR, Sutton G, Yorke JA. 18057021 Assembly reconciliation. [Bioinformatics 2008 24(1):42-45]: Discusses a method called “assembly reconciliation,” that can merge draft genome assemblies from multiple genome sequencing centers. The approach was designed to combat the limitations of multi-center genome sequencing projects, in which each center produces an assembly using its own assembly software and the collaborators then pick the single draft assembly that they judge to be the best and discard the others. Assembly reconciliation starts with one draft assembly, detects apparent errors, and, “when possible, patches the problem areas using pieces from alternative draft assemblies.” It also closes gaps in places where one of the alternative assemblies has spanned the gap correctly, according to the paper’s abstract. Available here.

