Avila-Campillo I, Drew K, Lin J, Reiss DJ, Bonneau R. BioNetBuilder: automatic integration of biological networks. [Bioinformatics 2007 23(3):392-393]: Introduces BioNetBuilder, a Cytoscape plug-in that creates biological networks integrated from several databases. Availability: http://err.bio.nyu.edu/cytoscape/bionetbuilder/.
Davidovich O, Kimmel G, Shamir R. GEVALT: An integrated software tool for genotype analysis. [BMC Bioinformatics. 2007 Feb 1;8:36]: Introduces GEVALT (Genotype Visualization and Algorithmic Tool), a genotype analysis software package that provides a common interface to several analytical tasks. GEVALT combines the Haploview visualization tool with several algorithms for genotype phasing, tag SNP selection, and permutation testing. Availability: http://www.cs.tau.ac.il/~rshamir/gevalt/.
Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BO. Global reconstruction of the human metabolic network based on genomic and bibliomic data. [Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82]: Describes a manually reconstructed human metabolic network based on build 35 of the genome annotation and information from more than 50 years of the scientific literature. The authors discuss the reconstruction process and demonstrate how the resulting genome-scale network can be used for the discovery of missing information, for the formulation of an in silico model, and as a structured context for analyzing high-throughput biological data sets. “The establishment of this network represents an important step toward genome-scale human systems biology,” the authors write.
Go EP, Rebecchi KR, Dalpathado DS, Bandu ML, Zhang Y, Desaire H. GlycoPep DB: A Tool for Glycopeptide Analysis Using a “Smart Search.” [Anal Chem. 2007 Feb 15;79(4):1708-13]: Describe a web-based tool, GlycoPep DB, for profiling glycan and glycopeptide structures. The software compares experimentally measured masses to all calculated glycopeptide masses from a carbohydrate database with N-linked glycans and uses a concept called “smart searching,” in which only biologically relevant carbohydrate compositions are searched, making glycopeptide compositional assignment more efficient.
Han A, Kim WY, Park SM. SNP2NMD: A database of human single nucleotide polymorphisms causing nonsense-mediated mRNA decay. [Bioinformatics 2007 23(3):397-399]: Introduces the SNP2NMD database for human SNPs that result in premature termination codons and trigger nonsense-mediated mRNA decay. Availability: http://variome.net.
Lau W, Kuo TY, Tapper W, Cox S, Collins A. Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome. [Bioinformatics 2007 23(4):517-519]: Describes LDMAP-cluster, a parallel program for constructing genome-wide linkage disequilibrium maps in a Linux cluster environment using more than 8.2 million SNPs from Phase II of the HapMap project. Availability: http://www.som.soton.ac.uk/research/geneticsdiv/epidemiology/LDMAP.
Mirabeau O, Perlas E, Severini C, Audero E, Gascuel O, Possenti R, Birney E, Rosenthal N, Gross C. Identification of novel peptide hormones in the human proteome by hidden Markov model screening. [Genome Res. 2007 Mar;17(3):320-7]: Presents a bioinformatics search tool based on a hidden Markov model that uses several peptide hormone sequence features to estimate the likelihood that a protein contains a processed and secreted peptide of this class.
Narayanan M, Karp R. Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm. [ArXiv preprint archive: http://arXiv.org/abs/q-bio/0702001]: Describes a method that compares the protein interaction networks of two species to detect functionally conserved protein modules between them. The method is based on an algorithm that identifies matching subgraphs between two graphs.
Pyysalo S, Ginter F, Heimonen J, Bjorne J, Boberg J, Jarvinen J, Salakoski T. BioInfer: A corpus for information extraction in the biomedical domain. [BMC Bioinformatics. 2007 Feb 9;8:50]: Introduces BioInfer, an annotated corpus of biomedical English for use in developing information-extraction software and components such as parsers and domain analyzers. The corpus contains 1,100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, and syntactic dependencies. It is targeted at protein, gene, and RNA relationships. Availability: http://www.it.utu.fi/BioInfer.
Schellhammer I, Rarey M. TrixX: structure-based molecule indexing for large-scale virtual screening in sublinear time. [J Comput Aided Mol Des. 2007 Feb 9 (e-pub ahead of print)]: Describes a structure-based screening method that avoids sequential searching and enables sublinear runtime behavior. The method has been implemented in the virtual screening tool TrixX. “With computing times clearly below one second per compound, TrixX counts among the fastest virtual screening tools currently available and is nearly two orders of magnitude faster than standard FlexX,” the authors write.
Shih AC, Lee DT, Peng CL, Wu YW. Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences. [BMC Bioinformatics 2007, 8:63]: Describes a software tool designed for aligning and visualizing many very long sequences. The software, called Phylo-mLogo, calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies and displays the global logo patterns of the whole alignment of multiple sequences, as well as local homologous logos for each clade hierarchically. Availability: http://biocomp.iis.sinica.edu.tw/phylomlogo.
Swertz MA, Jansen RC. Beyond standardization: dynamic software infrastructures for systems biology. [Nat Rev Genet. 2007 Mar;8(3):235-43]: Discusses the concept of “a minimal computer language and a software tool called a ‘generator’” that would serve as a software infrastructure for systems biology research. “Biologists need infrastructure that easily connects to work that is done in other laboratories, for which standardization is helpful,” the authors write. “However, the infrastructure must also accommodate the specifics of their biological system, but appropriate mechanisms to support variation are currently lacking."
Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE. [Bioinformatics 2007 23(4):500-501]: Presents SSAKE, a tool for “aggressively” assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. Availability: http://www.bcgsc.ca/bioinfo/software/ssake.