Ackermann M, Strimmer K. A general modular framework for gene-set enrichment analysis. [BMC Bioinformatics 2009, 10:47]: The authors conducted an "extensive survey" of statistical approaches for gene-set enrichment analysis and identified "a common modular structure underlying most published methods." Based on this finding, they have developed a general framework for detecting gene set enrichment that provides a "meta-theory" of gene-set analysis that "not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods," according to the paper's abstract.
Antonov AV, Dietmann S, Wong P, Igor R, Mewes HW. PLIPS, an automatically collected database of protein lists reported by proteomics studies. [J Proteome Res. 2009 Feb 13. (e-pub ahead of print)]: Describes a tool called PLIPS (Protein Lists Identified in Proteomics Studies) that accepts as input a list of protein/gene identifiers and then uses statistical analysis to infer recently published proteomics studies and provide a report of protein lists that "significantly intersect" with the query list. Available here.
Benoukraf T, Cauchy P, Fenouil R, Jeanniard A, Koch F, Jaeger S, Thieffry D, Imbert J, Andrau JC, Spicuglia S, Ferrier P. CoCAS: A ChIP-on-chip Analysis Suite [Bioinformatics. 2009 Feb 4. (e-pub ahead of print)]: Describes CoCAS (ChIP-on-chip analysis suite), a software package that provides data normalization, peak detection, and quality-control reports for ChiP-chip and ChIP-seq experiments. Available here.
Biegert A, Söding J. Sequence context-specific profiles for homology searching. [Proc Natl Acad Sci USA. 2009 Feb 20. (e-pub ahead of print)]: Describes a sequence-alignment approach "that derives context-specific amino acid similarities from short windows centered on each query sequence residue," according to the paper's abstract. By using the context-specific approach, called CS-BLAST, in combination with NCBI Blast, the authors said they increased the sensitivity more than two-fold on a "difficult benchmark set," without loss of speed. Available here.
Bryant DW Jr, Wong WK, Mockler TC. QSRA - a quality-value guided de novo short read assembler. [BMC Bioinformatics 2009, 10:69]: "Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data," according to the paper's abstract. The authors have designed a new assembler, Quality-value guided Short Read Assembler, "created to take advantage of quality-value scores as a further method of dealing with error." The authors claim that the assembler shows significant improvements in speed and output quality when compared to previously published short-read assemblers. Available here.
Campagna D, Albiero A, Bilardi A, Caniato E, Forcato C, Manavski S, Vitulo N, Valle G. PASS: a program to align short sequences. [Bioinformatics. 2009 Feb 13. (e-pub ahead of print)]: Introduces PASS (Program to Align Short Sequences), which performs fast gapped and ungapped alignments of short DNA sequences onto a reference genome. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of "seed" words and an index of the precomputed scores of short words that are aligned against each other. After building the genomic index, the program scans every query sequence performing three steps: it finds matching seed words in the genome; for every match, it checks the precomputed alignment of the short flanking regions; it performs an exact dynamic alignment of a narrow region around the match. Available here.
[ pagebreak ]
Eaves HL, Gao Y. MOM: Maximum oligonucleotide mapping. [Bioinformatics. 2009 Feb 19. (e-pub ahead of print)]: Describes a short-read mapping program called MOM (Maximum Oligonucleotide Mapping). According to the paper's abstract, MOM improves over current methods that assume that most sequencing errors occur near the 3' end of the read. It is based on a query-matching concept "that is designed to capture a maximal length match within the short read satisfying the user defined error parameters." According to the authors, the method technique demonstrates greater sensitivity and a higher percentage of uniquely mapped reads when compared to SOAP, MAQ, and SHRiMP. Available here.
Haynes BC, Brent MR. Benchmarking regulatory network reconstruction with GRENDEL. [Bioinformatics. 2009 Feb 2. (e-pub ahead of print)]: "In contrast to the massive effort that has gone into automated deconvolution of biological networks, relatively little effort has been invested in benchmarking the proposed algorithms," the authors write in the paper's abstract, noting that this is "largely due to a lack of fully understood biological networks to use as gold standards." In response, they have developed a system that generates synthetic regulatory networks for benchmarking reconstruction algorithms that "leads to conclusions about the relative accuracies of reconstruction algorithms that are significantly different from those obtained with A-BIOCHEM, an established in-silico benchmark." Available here.
Hur J, Schuyler AD, States DJ, Feldman EL. SciMiner: Web-based literature mining tool for target identification and functional enrichment analysis. [Bioinformatics. 2009 Feb 2. (e-pub ahead of print)]: Introducees SciMiner, a web-based literature-mining tool that identifies genes and proteins using a context-specific analysis of Medline abstracts and full texts. Available here.
Jones AR, Siepen JA, Hubbard SJ, Paton NW. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. [Proteomics. 2009 Feb 27;9(5):1220-1229]: Describes a scoring method for protein search engine results called the FDA Score that is based on the false discovery rate and allows peptide identifications from different search engines to be combined. According to the paper's abstract, the combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, "allowing on average 35 percent more peptide identifications to be made at a fixed FDR than using a single search engine."
Korbel JO, Abyzov A, Mu XJ, Carriero N, Cayting P, Zhang Z, Snyder M, Gerstein MB. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. [Genome Biol. 2009 Feb 23;10(2):R23]: Describes Paired-End Mapper (PEMer), a method for calling structural variants from paired-end sequencing data. PEMer includes a parallelizable analysis pipeline, simulation-based error models to provide confidence values for each structural variant, and a back-end database. Available here.
Li H, Ding G, Xie L, Li Y. PAnnBuilder: An R package for assembling proteomic annotation data. [Bioinformatics. 2009 Feb 23. (e-pub ahead of print)]: Describes PAnnBuilder, an R package for gathering protein annotation information from public resources to provide annotation data for large-scale proteomic studies. Available here.
[ pagebreak ]
Lu Y, Sze SH. Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. [Nucleic Acids Research 2009 37(2):463-472]: "While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information," the authors note in the paper's abstract. They show that their strategy can improve alignment accuracy by up to 3 percent on protein sequence alignment and up to 10 percent on DNA/RNA sequence alignment.
Paananen J, Wong G. FORG3D: Force-directed 3D graph editor for visualization of integrated genome scale data. [BMC Systems Biology 2009, 3:26]: Describes a visualization method and bioinformatics software tool called FORG3D that is based on real-time three-dimensional force-directed graphs and can be used to visualize integrated networks of genome-scale data, such as interactions between genes or gene products, signaling transduction, metabolic pathways, functional interactions, and evolutionary relationships. Available here.
Pavlopoulos GA, Pafilis E, Kuhn M, Hooper SD, Schneider R. OnTheFly: A tool for automated document-based text annotation, data linking and network generation. [Bioinformatics. 2009 Feb 17 (e-pub ahead of print)]: Introduces OnTheFly, a web-based application that applies biological named entity recognition to enrich Microsoft Office, PDF, and plain text documents. Available here.
Reynolds C, Damerell D, Jones S. ProtorP: a protein–protein interaction analysis server. [Bioinformatics 2009 25(3):413-414]: Describes the ProtorP server, which analyzes protein-protein associations in 3D structures by calculating a series of physical and chemical parameters of the protein interaction sites that contribute to the binding energy of the association. Available here.
Schreyer A, Blundell T. CREDO: a protein-ligand interaction database for drug discovery. [Chem Biol Drug Des. 2009 Feb;73(2):157-67]: Discusses CREDO, a database of protein-ligand interactions, "which represents contacts as structural interaction fingerprints, implements novel features and is completely scriptable through its application programming interface," according to the paper's abstract. Available here.
Vanlier J, Wu F, Qi F, Vinnakota KC, Han Y, Dash RK, Yang F, Beard DA. BISEN: Biochemical Simulation Environment. [Bioinformatics. 2009 Feb 25. (e-pub ahead of print)]: Discusses the Biochemical Simulation Environment (BISEN), a suite of tools for simulating biochemical systems in the Matlab computing environment. Available here.