Skip to main content
Premium Trial:

Request an Annual Quote

Rensselaer Data Mining Duo Puts Protein Structure Prediction on the Informatics Map


Mohammed Zaki and Chris Bystroff, two researchers at Rensselaer Polytechnic Institute, are applying new data mining techniques to the protein structure prediction problem. Zaki, an assistant professor of computer science, and Bystroff, an assistant professor of biology, are collaborating to build a library of protein “contact maps” —two-dimensional renderings of unique three-dimensional tertiary protein structures.

The approach places a protein’s amino acid sequence along the x- and y-axes of a matrix. Interactions between amino acids are plotted on the matrix, resulting in a distinct pattern for each protein that can be manipulated and mined like any other 2D data set. Secondary structures such as alpha helices, beta sheets, and beta turns are revealed as clusters of contacts in the 2D map. Alpha helices, for example, appear as bands along the main diagonal, while beta sheets appear as thicker bands parallel or anti-parallel to the main diagonal. Zaki and Bystroff are compiling a library of contact map profiles based on known structures from the Protein Data Bank that they believe can serve as a useful new protein structure prediction resource.

The goal is to use contact map prediction as a first step toward 3D structure prediction. Bystroff’s HMMSTR structure prediction program, a hidden Markov model-based approach that he developed with David Baker, uses the same I-sites library of sequence-structure motifs that underpins Baker’s Ro-setta algorithm. The Rensselaer team first uses HMMSTR to predict the local structural elements that make up the contact map, and then adds a data mining layer to capture non-local interactions between the amino acids and provide further insight into the tertiary structure of the protein.

The two are slowly working their way through the PDB in an effort to compile a representative set of “contact rules” for each protein family that can be used to improve the performance of their predictive methods. Just as the I-sites library has been a useful source of common motifs in short, contiguous residues, the new resource would serve as a similar record for non-local interaction patterns.

The library will eventually be made available to the public, but Zaki said the work is still too early to release. All of Bystroff’s work is available, however, at:

Other researchers are using protein contact maps to aid their structural proteomics work. For example, Gianluca Pollastri and Pierre Baldi at the University of California, Irvine, have developed a protein contact map predictor that is available at:

Zaki and Bystroff’s research, funded under a three-year, $333,928 DOE award, will appear in the IEEE journal, Transactions on Systems, Man and Cybernetics, in early 2003.

— BT

Filed under

The Scan

Genetic Risk Factors for Hypertension Can Help Identify Those at Risk for Cardiovascular Disease

Genetically predicted high blood pressure risk is also associated with increased cardiovascular disease risk, a new JAMA Cardiology study says.

Circulating Tumor DNA Linked to Post-Treatment Relapse in Breast Cancer

Post-treatment detection of circulating tumor DNA may identify breast cancer patients who are more likely to relapse, a new JCO Precision Oncology study finds.

Genetics Influence Level of Depression Tied to Trauma Exposure, Study Finds

Researchers examine the interplay of trauma, genetics, and major depressive disorder in JAMA Psychiatry.

UCLA Team Reports Cost-Effective Liquid Biopsy Approach for Cancer Detection

The researchers report in Nature Communications that their liquid biopsy approach has high specificity in detecting all- and early-stage cancers.