A new homology prediction method developed by researchers at Carnegie Mellon University has raised questions about the applicability of commonly used sequence similarity tools such as Blast for analyzing the evolution of multidomain proteins. The method, called Neighborhood Correlation, was developed specifically to deal with the challenge of multidomain proteins — proteins comprised of multiple sequence segments. While these proteins represent around 40 percent of the proteome in metazoans, they present a hurdle for current homology analysis tools because it is difficult to determine whether a common sequence is the result of shared ancestry or of domain insertion. Neighborhood Correlation relies on a sequence similarity network that is weighted to give gene duplication and domain insertion very "neighborhood structures," which enables the method to distinguish true homologs from domain-only matches, according to the authors. Dannie Durand, a computational biologist at Carnegie Mellon, and colleagues demonstrated that the method outperformed sequence similarity methods like Blast and Psi-Blast against a curated benchmark data set of sequences known to share common ancestry. "The paper really tackles a fundamental problem that people have been hoping to avoid," says David Haussler, director of biomolecular engineering at the University of California, Santa Cruz. "No one has really faced it head-on before so I commend [Durand] for [that]. … She puts her method up against some others and demonstrates better performance." Neighborhood Correlation takes a geographical view, looking at the genomic neighborhood, Durand says. "Basically we make a network … in which every dot or node is a sequence … a line between two dots means there is a meaningful Blast score," she says. "If the neighborhoods are similar, the genes are related; if they are not similar, we say the genes aren't related." — Vivien Marx Bioinformatics Notes GeneBio has announced that it will distribute Protagen AG's Modiro software tool for the automated detection of post-translational modifications in MS/MS datasets. The two companies will work together to link Modiro's functions with GeneBio's Phenyx MS data analysis software tool. The Center for Information Technology at the National Institutes of Health has licensed Bioalma's AlmaKnowledgeServer 2, which is distributed by Active Motif, to support an annotation platform for the Human Salivary Proteome Project. The Fred Hutchinson Cancer Research Center's Translational and Outcomes Research group selected GenoLogics to assist in developing its biomedical informatics infrastructure for a new biorepository.. Datapoint 71 Funded Grants $54,961/FY2008 Researchers hope to elucidate a microscopic basis for biomolecular folding codes using data from ultrafast spectroscopy and multiple large-scale explicit water simulations. The fastest known ultrafast folding sequence, chicken villin headpiece subdomain, will be studied using ultrafast experiments and computational all-atom simulations. $206,770/FY2008 Researchers aim to develop an integrated approach to modeling and inferring dynamic transcriptional regulatory networks. They propose to provide a comprehensive framework to integrate gene expression data, protein-DNA interaction data, mRNA decay data, nucleosome occupancy data, and other data for regulatory network inference. Statistical methods will also be developed to systematically model and integrate the data.
Linguamatics,a British life sciences software company, Is the 71st group to join the Microsoft BioIT Alliance..
Biomolecular folding by unltrafast spectroscopy and high-performance computing
Grantee: Jeffrey Evanseck, California Institute of Technology
Began: Jan. 1, 2008; Ends: Dec. 31, 2008
Statistical Methods to Study Dynamic Transcriptional Regulatory Networks
Grantee: Ning Sun, Yale University
Began: May 5, 2008; Ends: Apr. 30, 2010
CMU Tackles Homology Analysis for Proteins
Premium