Skip to main content
Premium Trial:

Request an Annual Quote

CMU Tackles Homology Analysis for Proteins


A new homology prediction method developed by researchers at Carnegie Mellon University has raised questions about the applicability of commonly used sequence similarity tools such as Blast for analyzing the evolution of multidomain proteins.

The method, called Neighborhood Correlation, was developed specifically to deal with the challenge of multidomain proteins — proteins comprised of multiple sequence segments. While these proteins represent around 40 percent of the proteome in metazoans, they present a hurdle for current homology analysis tools because it is difficult to determine whether a common sequence is the result of shared ancestry or of domain insertion.

Neighborhood Correlation relies on a sequence similarity network that is weighted to give gene duplication and domain insertion very "neighborhood structures," which enables the method to distinguish true homologs from domain-only matches, according to the authors.

Dannie Durand, a computational biologist at Carnegie Mellon, and colleagues demonstrated that the method outperformed sequence similarity methods like Blast and Psi-Blast against a curated benchmark data set of sequences known to share common ancestry.

"The paper really tackles a fundamental problem that people have been hoping to avoid," says David Haussler, director of biomolecular engineering at the University of California, Santa Cruz. "No one has really faced it head-on before so I commend [Durand] for [that]. … She puts her method up against some others and demonstrates better performance."

Neighborhood Correlation takes a geographical view, looking at the genomic neighborhood, Durand says. "Basically we make a network … in which every dot or node is a sequence … a line between two dots means there is a meaningful Blast score," she says. "If the neighborhoods are similar, the genes are related; if they are not similar, we say the genes aren't related."

Vivien Marx

Bioinformatics Notes

GeneBio has announced that it will distribute Protagen AG's Modiro software tool for the automated detection of post-translational modifications in MS/MS datasets. The two companies will work together to link Modiro's functions with GeneBio's Phenyx MS data analysis software tool.

The Center for Information Technology at the National Institutes of Health has licensed Bioalma's AlmaKnowledgeServer 2, which is distributed by Active Motif, to support an annotation platform for the Human Salivary Proteome Project.

The Fred Hutchinson Cancer Research Center's Translational and Outcomes Research group selected GenoLogics to assist in  developing its biomedical informatics infrastructure for a new biorepository..


Linguamatics,a British life sciences software company, Is the 71st group to join the Microsoft BioIT Alliance..

Funded Grants

Biomolecular folding by unltrafast spectroscopy and high-performance computing
Grantee: Jeffrey Evanseck, California Institute of Technology
Began: Jan. 1, 2008; Ends: Dec. 31, 2008

Researchers hope to elucidate a microscopic basis for biomolecular folding codes using data from ultrafast spectroscopy and multiple large-scale explicit water simulations. The fastest known ultrafast folding sequence, chicken villin headpiece subdomain, will be studied using ultrafast experiments and computational all-atom simulations.

Statistical Methods to Study Dynamic Transcriptional Regulatory Networks
Grantee: Ning Sun, Yale University
Began: May 5, 2008; Ends: Apr. 30, 2010

Researchers aim to develop an integrated approach to modeling and inferring dynamic transcriptional regulatory networks. They propose to provide a comprehensive framework to integrate gene expression data, protein-DNA interaction data, mRNA decay data, nucleosome occupancy data, and other data for regulatory network inference. Statistical methods will also be developed to systematically model and integrate the data.

The Scan

Push Toward Approval

The Wall Street Journal reports the US Food and Drug Administration is under pressure to grant full approval to SARS-CoV-2 vaccines.

Deer Exposure

About 40 percent of deer in a handful of US states carry antibodies to SARS-CoV-2, according to Nature News.

Millions But Not Enough

NPR reports the US is set to send 110 million SARS-CoV-2 vaccine doses abroad, but that billions are needed.

PNAS Papers on CRISPR-Edited Cancer Models, Multiple Sclerosis Neuroinflammation, Parasitic Wasps

In PNAS this week: gene-editing approach for developing cancer models, role of extracellular proteins in multiple sclerosis, and more.