Blanchette M, Green E, Miller W, Haussler D. Reconstructing large regions of an ancestral mammalian genome in silico. [Genome Research 14:2412-2423, 2004]: Describes the computational reconstruction of an ancient genome sequence around the CFTR locus using genomic sequences from 19 extant mammals. Detailed examination suggests the reconstruction is accurate and that it allows researchers to identify features in modern species, such as remnants of ancient transposon insertions, that were not identified by direct analysis.
Bowers P, Cokus S, Eisenberg D, Yeates T. Use of Logic Relationships to Decipher Protein Network Organization. [Science 2004 306: 2246-2249]: Describes a computational approach for identifying detailed relationships between proteins using logic analysis of phylogenetic profiles, which identifies triplets of proteins whose presence or absence obey certain logic relationships.
Chou K, Cai Y. Using GO-PseAA predictor to predict enzyme sub-class. [Biochem Biophys Res Commun. 2004 Dec 10;325(2):506-9]: Describes the GO-PseAA predictor, which identifies the sub-class for each of the six main enzyme families.
Cooper G, Singaravelu S, Sidow A. ABC: software for interactive browsing of genomic multiple sequence alignment data. [BMC Bioinformatics 2004, 5:192]: Presents the Application for Browsing Constraints (ABC), Java software for exploring multiple sequence alignments and data typically associated with alignments. Availability: http://mendel.stanford.edu/sidowlab/downloads.html.
Dal Palu’ A, Dovier A, Fogolari F. Constraint Logic Programming approach to protein structure prediction. [BMC Bioinformatics 2004, 5:186]: Discusses the use of constraint logic programming — “a declarative programming paradigm suitable for solving combinatorial optimization problems” — for protein structure prediction. Advantages of the approach include rapid software prototyping, and an easy method for encoding heuristics, according to the authors.
Ding L, Sabo A, Berkowicz N, et al. EAnnot: A genome annotation tool using experimental evidence. [Genome Res. 2004 Dec;14(12):2503-9]: Presents EAnnot (Electronic Annotation), a program originally developed for manually annotating the human genome. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals.
Elefsinioti A, Bagos P, Spyropoulos I, Hamodrakas S. A database for G proteins and their interaction with GPCRs. [BMC Bioinformatics. 2004 Dec 24;5(1):208]: Describes gpDB, a publicly accessible G proteins/GPCRs relational database. Including species homologs, the database contains detailed information for 418 G protein monomers (272 Galpha, 87 Gbeta and 59 Ggamma) and 2782 GPCRs sequences belonging to families with known coupling to G proteins. Availability: http://bioinformatics.biol.uoa.gr/gpDB.
Han D, Kim H, Jang W, Lee S, Suh J. PreSPI: a domain combination based prediction system for protein-protein interaction. [Nucleic Acids Research 2004 32(21):6312-6320]: Discusses a probabilistic framework to predict the interaction probability of proteins and develop an interaction possibility ranking method for multiple protein pairs.
Using the ranking method, one can discern the protein pairs that are more likely to interact with each other in multiple protein pairs.
Henkina J, Jennings M, Matthews D, Vigoreauxa J. Mass Processing-An Improved Technique for Protein Identification with Mass Spectrometry Data. [Journal of Biomolecular Techniques, 15:230-237]: Describes a strategy called mass processing, in which the list of masses generated from a mass spectrometer undergoes two stages of data reduction before identification.
According to the authors, mass processing improves the ability to identify in-gel tryptic-digested proteins by reducing the number of nonsample masses submitted to protein identification database search engines.
Hodas N, Aalberts D. Efficient computation of optimal oligo-RNA binding. [Nucleic Acids Research 2004 32(22):6636-6642]: Presents an algorithm called BINDIGO that calculates the optimal binding conformation and free energy of two RNA molecules, one or both oligomeric. The algorithm scales as the product of the sequence lengths.
Hu H, Pan Y, Harrison R, Tai P. Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier. [IEEE Trans Nanobioscience. 2004 Dec;3(4):265-71]: Describes the use of a support vector machine as a machine-learning tool for the prediction of secondary structure, along with several encoding schemes, including orthogonal matrix, hydrophobicity matrix, BLOSUM62 substitution matrix, and combined matrix, which are applied and optimized to improve the prediction accuracy.
Moses A, Chiang D, Pollard D, Iyer V, Eisen M. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. [Genome Biology 2004, 5:R98]: Introduces Monkey, software that identifies conserved transcription-factor binding sites in multispecies alignments using probabilistic models of factor specificity and binding-site evolution. Availability: http://rana.lbl.gov/monkey/.
Roberts M, Hayes W, Hunt B, Mount S, Yorke J. Reducing storage requirements for biological sequence comparison. [Bioinformatics 2004 20(18):3363-3369]: Describes a method for storing biological sequence data using a string-matching method called the “seed-and-extend” approach, in which occurrences of short subsequences called seeds are used to search for potentially longer matches in a large database of sequences. In this method, only a small fraction of seeds, called “minimizers,” needs to be stored, which can speed up string-matching computations by a large factor, according to the authors.
Samuelsson J, Dalevi D, Levander F, Rögnvaldsson T. Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting. [Bioinformatics 2004 20(18):3628-3635]: Discusses a set of new algorithms and software tools for automatic protein identification using peptide mass fingerprinting. The software modules do peak extraction, peak filtering and protein database matching, and communicate via XML. Availability: http://www.hh.se/staff/bioinf/.
Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J. CSB.DB: a comprehensive systems-biology database. [Bioinformatics 2004 20(18):3647-3651]: Presents the comprehensive systems-biology database (CSB.DB), an open-access resource that presents the results of biostatistical analyses on gene expression data in association with additional biochemical and physiological knowledge. Availability: http://csbdb.mpimp-golm.mpg.de/.
Tang S, Tan S, Ramadoss S, et al. Computational method for discovery of estrogen responsive genes. [Nucleic Acids Research 2004 32(21):6212-6217]: Describes a computational method to predict a subclass of estrogen responsive genes that relies on the similarity of estrogen response element (ERE) frames across different promoters in the human genome.
Theilhaber J, Ulyanov A, Malanthara A, et al. GECKO: a complete large-scale gene expression analysis platform. [BMC Bioinformatics 2004, 5:195]: Introduces Gecko (Gene Expression: Computation and Knowledge Organization), a centralized gene expression analysis system based on a client-server architecture. Gecko includes automatic processing pipelines for uploading data from remote sites, a database, a computational engine implementing ~50 different analysis tools, and a client application. Availability: http://sourceforge.net/projects/geckoe.
Tian W, Arakaki A, Skolnick J. EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. [Nucleic Acids Research 2004 32(21):6226-6239]: Presents EFICAz (Enzyme Function Inference by Combined Approach), an automatic engine for large-scale enzyme function inference that combines predictions from four different methods. Availability: http://www.bioinformatics.buffalo.edu/eficaz/.