Skip to main content
Premium Trial:

Request an Annual Quote

Rensselaer Data Mining Duo Puts Protein Structure Prediction on the Informatics Map


Mohammed Zaki and Chris Bystroff, two researchers at Rensselaer Polytechnic Institute, are applying new data mining techniques to the protein structure prediction problem. Zaki, an assistant professor of computer science, and Bystroff, an assistant professor of biology, are collaborating to build a library of protein “contact maps” —two-dimensional renderings of unique three-dimensional tertiary protein structures.

The approach places a protein’s amino acid sequence along the x- and y-axes of a matrix. Interactions between amino acids are plotted on the matrix, resulting in a distinct pattern for each protein that can be manipulated and mined like any other 2D data set. Secondary structures such as alpha helices, beta sheets, and beta turns are revealed as clusters of contacts in the 2D map. Alpha helices, for example, appear as bands along the main diagonal, while beta sheets appear as thicker bands parallel or anti-parallel to the main diagonal. Zaki and Bystroff are compiling a library of contact map profiles based on known structures from the Protein Data Bank that they believe can serve as a useful new protein structure prediction resource.

The goal is to use contact map prediction as a first step toward 3D structure prediction. Bystroff’s HMMSTR structure prediction program, a hidden Markov model-based approach that he developed with David Baker, uses the same I-sites library of sequence-structure motifs that underpins Baker’s Ro-setta algorithm. The Rensselaer team first uses HMMSTR to predict the local structural elements that make up the contact map, and then adds a data mining layer to capture non-local interactions between the amino acids and provide further insight into the tertiary structure of the protein.

The two are slowly working their way through the PDB in an effort to compile a representative set of “contact rules” for each protein family that can be used to improve the performance of their predictive methods. Just as the I-sites library has been a useful source of common motifs in short, contiguous residues, the new resource would serve as a similar record for non-local interaction patterns.

The library will eventually be made available to the public, but Zaki said the work is still too early to release. All of Bystroff’s work is available, however, at:

Other researchers are using protein contact maps to aid their structural proteomics work. For example, Gianluca Pollastri and Pierre Baldi at the University of California, Irvine, have developed a protein contact map predictor that is available at:

Zaki and Bystroff’s research, funded under a three-year, $333,928 DOE award, will appear in the IEEE journal, Transactions on Systems, Man and Cybernetics, in early 2003.

— BT

Filed under

The Scan

US Booster Eligibility Decision

The US CDC director recommends that people at high risk of developing COVID-19 due to their jobs also be eligible for COVID-19 boosters, in addition to those 65 years old and older or with underlying medical conditions.

Arizona Bill Before Judge

The Arizona Daily Star reports that a judge is weighing whether a new Arizona law restricting abortion due to genetic conditions is a ban or a restriction.

Additional Genes

Wales is rolling out new genetic testing service for cancer patients, according to BBC News.

Science Papers Examine State of Human Genomic Research, Single-Cell Protein Quantification

In Science this week: a number of editorials and policy reports discuss advances in human genomic research, and more.