BioInform's Surfing Report: Web-Based Tools and Algorithms Discussed at ISMB 2008

Chemical Entities of Biological Interest, ChEBI, is a freely available database of small molecular entities, a manually annotated chemoinformatics resource with data such as nomenclature, ontology, and chemical structures. New developments include the ability to provide chemical substructure and similarity searching. The database was developed at EBI.

Chipster from the Finnish IT Center for Science lets researchers benefit from R/Bioconductor method development. This open source software offers a graphical user interface to a comprehensive collection of up-to-date analysis methods. Chipster supports Affymetrix, Illumina, Agilent, and cDNA arrays and runs on Windows, Linux, and Mac. It is available here.

CiteXplore is a tool developed at EBI for querying PubMed, Agricola, biological patent abstracts, and more. Links are provided from citations to relevant biological database entries, such as UniProt proteins. Text-mining methods from the research community like “Whatizit” provide filters for enrichment of text with annotation.

The Comprehensive Phytopathogen Genomics Resource from Michigan State University provides a web-based portal for plant pathologists and diagnosticians to access genomic data for plant pathogens and to provide tools to rapidly create candidate diagnostic markers. The CPGR features the Genome Warehouse, Genome Browser, rDNA database, and Transcript Assembly Database and is available here.

DIAL (DIhedral ALignment Server) and LocalMove are two algorithms developed at Boston College to classify three-dimensional RNA structural motifs and backbone atoms, currently one of the focal interests of the RNA Ontology Consortium. 

A software development pipeline and platform for eco-informatics from Iowa State University lets scientists apply data management principles to ecological information. Implemented systems are available here.

EHCO 2.0, the Encyclopedia of Hepatocellular Carcinoma genes Online, from National Taiwan University and Academia Sinica, has extended the numbers of HCC-related genes to 3,963. To decipher hepato-carcinogenesis at the transcriptome level, the team annotated and identified expression of alternative splicing variants. The results can be accessed through EHCO 2.0 here.

ERIC, the Enteropathogen Resource Integration Center, from SRA International is an NIAID Bioinformatics Resource Center focused on enteropathogens, and currently provides integrated access to data on 78 genomes. It provides curated genome annotation of these organisms with evidence codes, and tools for comparative genomics, microarray analysis, and text mining.
ERIC also offers a text-mining application that extracts gene roles, mutation – phenotype links, and organism - pathogenesis relationships from PubMed abstracts. The application and search tools are available here.

Natural Resources Canada has generated an 80,000 EST dataset for the pest insect Choristoneura fumiferana from 8 different cDNA libraries. More than 16,000 putative transcripts were functionally identified by performing a search of Gene Ontology database and KEGG Orthology databases.

ExonMine is a global meta-analysis tool that integrates data from microarray experiments in several cancer types from the Institute of Molecular Medicine in Lisbon, Portugal.

FARMS (Factor Analysis for Robust Microarray Summarization) is a probabilistic latent variable model for summarizing Affymetrix array data at the probe level from Johannes Kepler University of Linz in Austria.

Galaxy from Pennsylvania State University is an integrated sets of tools for analyzing metagenomic data obtained from second-generation sequencers for short-read analyses.

GenColors is a web tool to facilitate genome annotation and comparative genomics from the in Germany. GenColors was initially aimed at prokaryotic genomes. As a new feature the system has now been adapted to handle eukaryotic genomes as well.

Genome Grid is a set of rapid genome analysis pipelines for arthropod genomes developed by Indiana University for the National Science Foundation’s TeraGrid as part of the GMOD project.

Infernal 1.0 performs RNA sequence analysis using covariance models for RNA homology search and alignment. Infernal introduces E-values and a query-dependent HMM filtering procedure for accelerating homology search. Developed at the Howard Hughes Medical Institute’s Janelia Farm Research Campus, it is available here.

InnateDB is a manually-curated interactome database and analysis platform for systems-level analysis of the mammalian immune response from Simon Fraser University.

An automatic tool from the Commissariat à l’énergie atomique in France and the European Molecular Biology Laboratory called InteroPorc predicts protein-protein interactions, and can be used for all sequenced organisms. Based on the interolog concept, this tool combines source interactions with clusters of orthologous proteins. This open-source Java application can either be run online through a web interface or downloaded here.

The Jena Library of Biological Macromolecules, or JenaLib, from the Leibniz Institute for Age Research in Germany offers information for all PDB and NDB database entries, such as PDB/NDB atlas pages, QuickSearch, PDB/UniProt alignments, a Jmol-based molecule viewer, and SNP and PROSITE motif mapping. New features include PFAM domain mapping, sequence pattern search, and integration of SNP, exon, and domain data into PDB/UniProt alignments.

A kinase-specific phosphorylation prediction server from the University of Illinois at Chicago is available here.

MetNet 3 is an online platform to retrieve information on plant metabolic and regulatory networks from MetNetDB. Pathways and subnetworks visualized with MetNet3 represent user-selected data types, including information flow from genes to metabolites, interactions, and feedback loops that induce post-transcriptional perturbations. The site, set up at Iowa State University, is available here

mGene is a discriminative gene finder from the Max Planck Institute in Tübingen, Germany.

miRNAminer is a web-based tool from the Massachusetts Institute of Technology for homologous miRNA search, based on known miRNA characteristics of secondary structure and conservation. The software uses stringent criteria to increase specificity. According to its developers, miRNAminer identified several hundred high-confidence miRNAs in seven mammals, increasing the collection of miRbase in these species by more than 50 percent. It is available here.

The MSU Solanaceae Comparative Genomics Resource is a web portal from Michigan State University that provides the Solanaceae community with a suite of comparative databases and tools to enable functional genomics study in this large agriculturally important and morphologically diverse family. It is available here

PROMALS3D, orPROfile Multiple Alignment with predicted Local Structures and 3D constraints, is a method that constructs multiple sequence and/or structure alignments by integrating information from database homologs, secondary structure predictions, and available 3D structures. This tool can help in the computational analysis of biological sequences and structures. The PROMALS3D webserver is available from the University of Texas Southwestern Medical Center here.

PubCurator from Johannes Gutenberg University in Mainz, Germany, is a biomedical text analysis platform.

PubMeth is an annotated and reviewed cancer methylation database based on automated text mining created at Ghent University in Belgium. 

A pipeline called RAMMCAP from the University of California at San Diego was developed to quickly analyze extremely large metagenomic datasets. It includes ultra-fast sequence clustering, annotation, metagenome comparison, and a visualization interface.

RiceRBP is a rice RNA-binding protein database and analysis platform from Washington State University. 

RNA StrAT, or the RNA Structure Analysis Toolkit, is a server that offers tools to perform both RNA secondary structure comparison and database searching using an edit distance algorithm that considers a wide range of edit operations. Available from the University of Montreal here.

SciTrends is a tool from Harvard Medical School to visualize more than 200,000 biomedical scientific trends. 

The Subcellular Location Image Finder (SLIF) from Carnegie Mellon University uses a combination of text-mining and image analysis techniques to extract assertions from figures and their associated captions. The publicly available, searchable database of over 30,000 fluorescence micrographs from Pubmed Central is annotated with the depicted protein, cell type, or pixel resolution and available here.

To handle differences in nomenclature that can interfere with integrating data from multiple databases, Harvard Medical School developed the Synergizer service, which allows mapping sets of database identifiers via a simple function call. A demonstration client may be found here.

The TMDU Clinical Omics Database from Tokyo Medical and Dental University integrates molecular biological and clinical information. It includes more than 500 cases including hepatic, colon, and oral cancer, and can be accessed here.

The University of Minnesota BBD Pathway Prediction System uses information in the University of Minnesota Biocatalysis/Biodegradation Database to predict microbial catabolism of organic compounds. To improve pathway predictions, the team has added the ability to allow relative reasoning and variable aerobic likelihood to the code infrastructure.

UPS 2.0 evaluates probe-to-target hybridization under user-defined conditions to ensure high-performance hybridization and minimize the possibility of non-specific reactions. UPS is available from Academica Sinica here.

Violin is a prototype of a vaccine ontology and analysis system based on a collaborative effort of the University of Michigan, Duke University, and SUNY Buffalo. Vaccine data exchange formats have been standardized with the ontology and are used to represent vaccine data in the system.

