Skip to main content
Premium Trial:

Request an Annual Quote

CMU Tackles Homology Analysis for Proteins


A new homology prediction method developed by researchers at Carnegie Mellon University has raised questions about the applicability of commonly used sequence similarity tools such as Blast for analyzing the evolution of multidomain proteins.

The method, called Neighborhood Correlation, was developed specifically to deal with the challenge of multidomain proteins — proteins comprised of multiple sequence segments. While these proteins represent around 40 percent of the proteome in metazoans, they present a hurdle for current homology analysis tools because it is difficult to determine whether a common sequence is the result of shared ancestry or of domain insertion.

Neighborhood Correlation relies on a sequence similarity network that is weighted to give gene duplication and domain insertion very "neighborhood structures," which enables the method to distinguish true homologs from domain-only matches, according to the authors.

Dannie Durand, a computational biologist at Carnegie Mellon, and colleagues demonstrated that the method outperformed sequence similarity methods like Blast and Psi-Blast against a curated benchmark data set of sequences known to share common ancestry.

"The paper really tackles a fundamental problem that people have been hoping to avoid," says David Haussler, director of biomolecular engineering at the University of California, Santa Cruz. "No one has really faced it head-on before so I commend [Durand] for [that]. … She puts her method up against some others and demonstrates better performance."

Neighborhood Correlation takes a geographical view, looking at the genomic neighborhood, Durand says. "Basically we make a network … in which every dot or node is a sequence … a line between two dots means there is a meaningful Blast score," she says. "If the neighborhoods are similar, the genes are related; if they are not similar, we say the genes aren't related."

Vivien Marx

Bioinformatics Notes

GeneBio has announced that it will distribute Protagen AG's Modiro software tool for the automated detection of post-translational modifications in MS/MS datasets. The two companies will work together to link Modiro's functions with GeneBio's Phenyx MS data analysis software tool.

The Center for Information Technology at the National Institutes of Health has licensed Bioalma's AlmaKnowledgeServer 2, which is distributed by Active Motif, to support an annotation platform for the Human Salivary Proteome Project.

The Fred Hutchinson Cancer Research Center's Translational and Outcomes Research group selected GenoLogics to assist in  developing its biomedical informatics infrastructure for a new biorepository..


Linguamatics,a British life sciences software company, Is the 71st group to join the Microsoft BioIT Alliance..

Funded Grants

Biomolecular folding by unltrafast spectroscopy and high-performance computing
Grantee: Jeffrey Evanseck, California Institute of Technology
Began: Jan. 1, 2008; Ends: Dec. 31, 2008

Researchers hope to elucidate a microscopic basis for biomolecular folding codes using data from ultrafast spectroscopy and multiple large-scale explicit water simulations. The fastest known ultrafast folding sequence, chicken villin headpiece subdomain, will be studied using ultrafast experiments and computational all-atom simulations.

Statistical Methods to Study Dynamic Transcriptional Regulatory Networks
Grantee: Ning Sun, Yale University
Began: May 5, 2008; Ends: Apr. 30, 2010

Researchers aim to develop an integrated approach to modeling and inferring dynamic transcriptional regulatory networks. They propose to provide a comprehensive framework to integrate gene expression data, protein-DNA interaction data, mRNA decay data, nucleosome occupancy data, and other data for regulatory network inference. Statistical methods will also be developed to systematically model and integrate the data.

The Scan

Researchers Compare WGS, Exome Sequencing-Based Mendelian Disease Diagnosis

Investigators find a diagnostic edge for whole-genome sequencing, while highlighting the cost advantages and improving diagnostic rate of exome sequencing in EJHG.

Researchers Retrace Key Mutations in Reassorted H1N1 Swine Flu Virus With Avian-Like Features

Mutations in the acidic polymerase-coding gene boost the pathogenicity and transmissibility of Eurasian avian-like H1N1 swine influenza viruses, a PNAS paper finds.

Genome Sequences Reveal Evolutionary History of South America's Canids

An analysis in PNAS of South American canid species' genomes offers a look at their evolutionary history, as well as their relationships and adaptations.

Lung Cancer Response to Checkpoint Inhibitors Reflected in Circulating Tumor DNA

In non-small cell lung cancer patients, researchers find in JCO Precision Oncology that survival benefits after immune checkpoint blockade coincide with a dip in ctDNA levels.