Robust Semi-Supervised Clustering with Application to Multi-Modal Database Categorization. Start date: Aug. 31, 2004. Expires: July 31, 2007. Cumulative award amount: $30,000. Principal investigator: Hichem Frigui. Sponsor: University of Louisville Research Foundation.
Supports development of clustering algorithms for categorizing massive multi-modal scientific data collections. The proposal addresses theoretical aspects of clustering algorithms and their applications in analyzing and organizing scientific data sets, namely: scientific text and related images in botanical biodiversity data produced in the last 50 years; and gene expression data from the Arabidopsis thaliana genome project.
Modeling Molecular Recognition. Start date: March 15, 2005. Expires: Feb. 28, 2006. Current year award amount: $165,717. Principal investigator: Carlos Camacho. Sponsor: University of Pittsburgh.
Supports a project to model the early molecular-recognition events responsible for the specificity of protein-protein interactions and to predict protein interactions. The specific aims of this project are to develop a "suitable" set of proteins to benchmark specific and non-specific interactions, and to model the relevant interactions between proteins and develop a semi-rigid full-atom Brownian dynamics platform to evaluate the time scale that two proteins take to move apart from local minima on the free-energy landscape of partially desolvated encounter complexes. The method will be implemented as a web server to automatically evaluate the likelihood of any two proteins to interact based on estimates of the escape time from relevant free-energy minima.
Mining Salient Localized Patterns in Complex Data. Start date: March 15, 2005. Expires: Feb. 28, 2010. Current year award amount: $415,000. Principal investigator: Wei Wang. Sponsor: University of North Carolina Chapel Hill
Supports development of new methods and tools to find "significant and non-obvious patterns" within immense and complex data sets. The project is focused on four problems related to bioinformatics: integrative genetics of cancer susceptibility; HIV salivary gland disease pathogenesis; discovering family-specific residue packing patterns of proteins; and integrative functional annotation of proteins. The software will be made publicly available through a web portal.
A Unified Architecture for Data Mining Large Biomedical Literature Databases. Start date: March 15, 2005. Expires: Feb. 28, 2010. Current year award amount: $415,000. Principal investigator: Xiaohua Hu. Sponsor: Drexel University.
Funds a project to investigate the efficiency and effectiveness of information-retrieval procedures, as well as the effectiveness and robustness of pattern-learning methods for information extraction. The project aims to develop a semantic-based query expansion method for large biomedical literature databases, to design an automatic pattern generation and evaluation method from unlabeled text files based on mutual bootstrapping and dynamic programming, and to develop a set of text-mining algorithms such as ontology-enhanced textual clustering and text summarization.
Quantitative Atomistic Simulations of Molecular Associations. Start date: April 1, 2005. Expires: March 31, 2006. Current year award amount: $174,085. Principal investigator: Adrian Elcock. Sponsor: University of Iowa.
Supports a project to apply molecular dynamics simulation methods to study small-molecule associations in aqueous solutions, and to develop faster but approximate simulation methods. First, unforced molecular dynamics simulations will be used to study the association of pairs of rigid small molecules in explicitly modeled aqueous solution. Second, similar MD simulations will be performed of rigid small-molecule associations in a wide range of salt solutions. Third, comparison of MD simulations in which small molecules are allowed internal degrees of freedom will reveal the extent to which conformational flexibility modulates associations.
A New Approach for Identifying Regulatory Motifs in Groups of Co-regulated Genes. Start date: April 1, 2005. Expires: March 31, 2006. Current year award amount: $209,480. Principal investigator: Robert Gross. Sponsor: Dartmouth College.
Supports further development of the BEAM (Beam-search Enumerative Algorithm for Motif finding) software. BEAM is a pattern-driven motif-finding program that employs a pruned search strategy that results in a reduction of the search space by several orders of magnitude compared to a complete enumeration of all possible motifs, according to the developers. The funds from this award will be used to expand BEAM into a suite of programs that can find additional types of regulatory motifs — those containing ambiguous characters as well as bipartite motifs.
Applications of Probability Measures on the Self-Motion Manifold of Deformable Fragments in Proteins. Start date: April 1, 2005. Expires: March 31, 2006. Current year award amount: $274,359. Principal investigator: Jean-Claude Latombe. Sponsor: Stanford University.
Supports a project to develop a new mathematical model for a redundant, closed, protein-like kinematic chain, using techniques from algebraic geometry and differential topology and probabilistic roadmap techniques from robotics to determine the structure of its configuration space. The objective of the program is to enable crystallographers to retrieve and study important, dynamic properties of the molecule, and to develop new algorithms for protein model-building in areas of weak or ambiguous electron density.
Sequence Assembly for High-Throughput Technologies. Start date: July 1, 2005. Expires: June 30, 2008. Current year award amount: $330,018. Principal investigator: Steven Skiena. Sponsor: SUNY Stony Brook.
Funds development of improved sequence-assembly methods for high-throughput sequencing technologies. The investigators plan to build assembly programs for sequencing technologies slated to become commercially available over the next one to three years, including technologies that produce massive numbers of very short reads cheaply, as well as those that provide extremely long reads with high error rates.
Combinatorial Algorithms for Pattern Discovery with Applications to Data Mining and Computational Biology. Start date: Aug. 1, 2005. Expires: July 31, 2007. Awarded amount to date: $185,000. Principal investigator: Stefano Lonardi. Sponsor: University of California-Riverside.
Supports a project centered around a new compressed representation of data, called a "sketch," based on a novel family of gapped patterns. The new method will be applied to three problems: databases; data compression; and computational biology. One goal of the proposal is to establish the software development component for an interdisciplinary bioinformatics curriculum.
Optimal Utilization of Genomic Information for Dissecting Complex Traits. Start date: Sept. 1, 2005. Expires: Aug. 31, 2006. Current year award amount: $126,843. Principal investigator: Shizhong Xu. Sponsor: University of California Riverside.
Funds development of new statistical methods for optimal utilization of established genomic databases for genetically dissecting complex traits. The statistical method to be used is the Bayesian method, which will be implemented via the Markov chain Monte Carlo algorithm. Specific areas to be studied include development of optimal statistical methods and computational algorithms for mapping quantitative trait loci with epistatic effects using markers of the entire genome.