Skip to main content

Bioinformatics Tool-Related Papers of Note, July and August 2007

Bhave SV, Hornbaker C, Phang TL, Saba L, Lapadat R, Kechris K, Gaydos J, McGoldrick D, Dolbey A, Leach S, Soriano B, Ellington A, Ellington E, Jones K, Mangion J, Belknap JK, Williams RW, Hunter LE, Hoffman PL, Tabakoff B. The PhenoGen Informatics website: tools for analyses of complex traits[BMC Genet. 2007 Aug 30;8(1):59]: Describes the PhenoGen Informatics website, a toolbox for storing, analyzing, and integrating microarray data and related genotype and phenotype data. Researchers can use the site to conduct in silico microarray experiments using their own or shared data, according to the authors. Availability:

Chen SC, Zhao T, Gordon GJ, Murphy RF. Automated image analysis of protein localization in budding yeast. [Bioinformatics 2007 23(13):i66-i71]: Discusses computational methods for automatically analyzing images created by the University of California, San Francisco’s yeast green fluorescent protein fusion localization project. The system was trained to recognize the same location categories that were used in that study. According to the paper abstract, the authors applied the system to 2,640 images, and it gave the same label as the previous assignments to 2,139 images. Availability: 

Dyer MD, Murali TM, Sobral BW. Computational prediction of host-pathogen protein–protein interactions. [Bioinformatics 2007 23(13):i159-i166]: Introduces a method that integrates known intra-species protein-protein interactions with protein-domain profiles to predict interactions between host and pathogen proteins. Given a set of intra-species protein interactions, the method identifies the functional domains in each of the interacting proteins, according to the paper’s abstract.

Gerber GK, Dowell RD, Jaakkola TS, Gifford DK. Automated Discovery of Functional Generality of Human Gene Expression Programs. [PLoS Comput Biol. 2007 Aug 10;3(8):e148]: Describes GeneProgram, which uses expression data to identify “expression programs,” or sets of co-expressed genes that carry out normal or pathological processes. GeneProgram organizes tissues into groups and genes into overlapping programs with consistent temporal behavior in order to produce maps of expression programs that are sorted by generality scores.

Gräf S, Nielsen FG, Kurtz S, Huynen MA, Birney E, Stunnenberg H, Flicek P. Optimized design and assessment of whole genome tiling arrays. [Bioinformatics 2007 23(13):i195-i204]: Introduces the uniqueness score, or U, a quality measure for oligonucleotide probes. According to the paper’s abstract, U is equivalent to the number of shortest unique substrings in the probe. The paper describes an efficient greedy algorithm to design mammalian whole-genome tiling arrays using probes that maximize U. Availability:

Jiang X, Jiang X, Han G, Ye M, Zou H. Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics. [BMC Bioinformatics 2007, 8:323]: Describes the use of a predictive genetic algorithm for optimizing filtering criteria to maximize the number of identified peptides at a fixed false-discovery rate for Sequest database searching. “Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time,” according to the paper’s abstract.

Lee JK, Havaleshko DM, Cho H, Weinstein JN, Kaldjian EP, Karpovich J, Grimshaw A, Theodorescu D. A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery. [Proc Natl Acad Sci USA. 2007 Aug 7;104(32):13086-91]: Describes an algorithm called "coexpression extrapolation," or COXEN, which uses expression microarray data as a “Rosetta stone” for translating from drug activities in the NCI-60 cell lines to drug activities in other cell panels or clinical tumors, according to the paper’s abstract. The authors demonstrate that COXEN can “accurately predict drug sensitivity of bladder cancer cell lines and clinical responses of breast cancer patients treated with commonly used chemotherapeutic drugs.” Availability:

Pandey J, Koyutürk M, Kim Y, Szpankowski W, Subramaniam S, Grama A. Functional annotation of regulatory pathways. [Bioinformatics 2007 23(13):i377-i386]: Presents a framework for projecting gene regulatory networks onto the space of functional attributes using multigraph models, with the goal of deriving “statistically significant pathway annotations,” according to the paper’s abstract. The authors also describe an algorithm and software, called NARADA, for computing significant pathways in large regulatory networks. Availability:

Shtatland T, Guettler D, Kossodo M, Pivovarov M, Weissleder R. PepBank — a database of peptides based on sequence text mining and public peptide data sources. [BMC Bioinformatics 2007, 8:280]: Presents PepBank, a database of nearly 20,000 individual peptide entries. According to the authors, prior to PepBank, “there [did] not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources.” Availability:

Siso-Nadal F, Ollivier JF, Swain PS. Facile: a command-line network compiler for systems biology. [BMC Systems Biology 2007, 1:36]: Present Facile, a Perl command-line tool for analyzing the dynamics of a systems biology model. “For many biochemical systems, parameter values and even the existence of interactions between some chemical species are unknown,” the authors note in the paper’s abstract. “It is therefore important to be able to easily investigate the effects of adding or removing reactions and to easily perform a bifurcation analysis, which shows the qualitative dynamics of a model for a range of parameter values.” Facile uses the law of mass action to automatically compile a biochemical network into scripts for analytical analysis, simulation, and bifurcation analysis. Availability:

Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ. An "electronic fluorescent pictograph" browser for exploring and analyzing large-scale biological data sets. [PLoS ONE. 2007 Aug 8;2:e718]: Describes a tool called the electronic Fluorescent Pictograph — or eFP — Browser, for exploring microarray and other data for hypothesis generation. The eFP Browser engine “paints” data from large-scale data sets onto pictographic representations of the experimental samples used to generate the data sets, according to the paper’s abstract. Availability: available at


Xiang Z, Tian Y, He Y. PHIDIAS: a pathogen-host interaction data integration and analysis system. [Genome Biol. 2007, 8:R150]: Describes PHIDIAS, the Pathogen-Host Interaction Data Integration and Analysis System, a web-based database for searching, comparing, and analyzing integrated genome sequences, conserved domains, and gene-expression data related to pathogen-host interactions for pathogen species designated as high-priority agents for public health and biological security. Availability:

Filed under

The Scan

Call to Look Again

More than a dozen researchers penned a letter in Science saying a previous investigation into the origin of SARS-CoV-2 did not give theories equal consideration.

Not Always Trusted

In a new poll, slightly more than half of US adults have a great deal or quite a lot of trust in the Centers for Disease Control and Prevention, the Hill reports.

Identified Decades Later

A genetic genealogy approach has identified "Christy Crystal Creek," the New York Times reports.

Science Papers Report on Splicing Enhancer, Point of Care Test for Sexual Transmitted Disease

In Science this week: a novel RNA structural element that acts as a splicing enhancer, and more.