Separating proteins by chromatography has been a bit of a dark art, requiring lots of trial and error to find the right conditions for a protein of interest. Now researchers at Rensselaer Polytechnic Institute have published a new computational method to predict protein separation behavior in ion exchange chromatography from protein structure and industry is taking note.
The approach has potential applications in proteomics, "where you need to separate out a larger number of proteins and optimize conditions in order to do this," said Curt Breneman, a professor in the department of chemistry and chemical biology at RPI and one of the authors of the study, which was published online in PNAS this month.
Several pharmaceutical and biotechnology companies are interested in licensing the technology for preparative use, he said, and GE Healthcare, which supported the work, "are interested in making use of it as well."
The method is based on earlier technology that Breneman and his collaborators developed for drug design. Instead of drug-protein interactions, they now looked at protein-chromatography resin interactions and developed a set of descriptors for the proteins that are similar to those they had previously developed for small molecules. Among them are descriptors that capture elements of the shape and the property distribution on the protein surface, which he called a novel approach.
"We are developing the method further to see under what circumstances we need [just] sequence information, and under what circumstances we need sequence and either homology or 3D structure."
The scientists then used these descriptors to develop a machine learning model, using 16 proteins of known chromatographic behavior. "What this means is that we are not setting out to do an a priori prediction of the protein property, but we have to have some examples," Breneman said.
Depending on the sophistication of the model, they need to know either the three-dimensional structures of the model proteins, or homology-based structures, or just their amino acid sequences. "We are developing the method further to see under what circumstances we need [just] sequence information, and under what circumstances we need sequence and either homology or 3D structure," according to Breneman. "It's not a good thing to need crystal structure data to make a prediction, because oftentimes that is not available on a novel protein."
The researchers then tested their model on two proteins that were not contained in the model training set, and found that their predictions came close to the experimental data. The reason they did not test it on more proteins was that "we are data-limited here because we have a low number of determined cases," Breneman said. He added that this might change in the future.
The Rensselaer scientists believe their approach is better than that of others who have tried to predict protein separation in the past. Their method mainly differs in the type of descriptors they chose and the way they built their model, which uses approaches such as support vector machine regression, according to Breneman.
At the moment, he and his colleagues are working on extending the method to hydrophobic interaction chromatography, and on building a model to select small molecule displacers for protein ion exchange chromatography.
Several aspects of the technology developed originally for drug-protein interactions are patented or copyrighted and have been licensed by a number of pharmaceutical companies, including Pfizer Global Research. The new application for protein separation prediction is patent-pending.
Breneman and his colleagues are also currently working on a publicly available software package, funded by the NIH, that will be available within the next two years. That package will contain a protein-based descriptor generation tool, visualization tools, and machine learning tools.
Julia Karow ([email protected])