Emboldened by a vision of computationally derived leads filling up their dwindling pipelines, pharmas are enthusiastically pushing new developments in virtual screening — and vendors are answering the challenge.
Last week, at CHI’s Structure Based Drug Design conference in Boston, a full day was set aside for pharmaceutical executives to share their virtual screening experiences. While it was evident that high-throughput docking is no silver bullet, the consensus was that the technology — when used properly — can identify promising leads that would remain undetected by biochemical assays and other methods. Most speakers ceded that the technology is still in its early days, and there is much room for improvement. In particular, the scoring functions that docking algorithms rely on to determine the best fit for ligands within 3D protein structures still require some refinement. In addition, most docking methods are still unable to account for protein flexibility — a biologically crucial feature of the binding process that has proven too computationally intense to overcome.
But even with these limitations, most pharmaceutical firms have found that the speed and low cost of virtual screening have made the approach an indispensable tool in the drug discovery process — and one that fills a very important gap. As Juan Alvarez, associate director of chemical and screening services at Wyeth, noted, the technology’s ability to screen virtual combinatorial chemistry collections gives it a key advantage over other methods that are limited to those chemicals that have already been synthesized: “If it doesn’t exist in your compound library, you’ll never find it with high-throughput screening,” he said.
Improving Scoring Functions
Alvarez said that his group at Wyeth has developed its own virtual screening method based on the UCSF Dock algorithm. Called phDock, for pharmacophore Dock, the method uses “ensembles” of similar conformers that can account for ligand flexibility without increasing the computational time, he said. PhDock screens the ensembles against a given target based on their underlying pharmacophore structure, and then uses a two-step scoring process that first ranks the poses based on their contact, and then rescores them with a “chemically aware” function that Wyeth developed to assess more “reasonable” poses. This tiered scoring approach improves the false positive rate of the method, Alvarez said, and was also able to “rescue” false negatives that other scoring functions would have eliminated.
Ignatius Turchi, a research fellow in computer-assisted drug discovery at Johnson & Johnson, described a J&J study that assessed the performance of various combinations of docking algorithms and scoring functions. As it turns out, none of the combinations stood out as a universal winner for all protein families, and performance varied widely for the different target molecules studied. In the case of COX2, for example, the best combination was Schrödinger’s Glide algorithm with its own GlideScore scoring function, but for CDK2, FlexX and FlexScore came out on top. The same FlexX algorithm provided very different results, however, when used with different scoring functions. In addition, the study found a few surprising pairing successes, such as Accelrys’ LigandFit algorithm and Tripos’ PMF consensus scoring method. The study proved how important it is for researchers to be aware of the different docking options available, Tuchi said, and to have a good working knowledge of which scoring functions work best for a given target. The “major conclusion” of the study, he added, was that researchers should run a similar test on a known data set before any virtual screening production run, in order to ensure they are using the best scoring function for a given target.
Researchers at Biogen are taking a similar approach, said Juswinder Singh, associate director of structural informatics. His team is currently developing scoring functions that are “configured to specific protein families,” and can account for the unique binding behavior of different target groups.
Mark McGann, a software developer at OpenEye Scientific Software, said that the secret to creating an effective scoring function is to look beyond the “positive contributions” of intermolecular behavior. “You also need to take into account negative contributions” for certain ligand/target combinations, he said.
Dances with Proteins
In addition to scoring functions, researchers are also trying to improve the accuracy of virtual screening methods through a better understanding of protein flexibility. “It takes two to tango,” said Tomas Lundqvist, a structural chemist at AstraZeneca. “The ligand isn’t the only molecule that exhibits different conformations — the protein also dances,” he said. While protein flexibility has been very closely studied over the years, it is still “poorly understood,” Lundqvist said, and nearly impossible to predict using 3D structures in the PDB and other databases. The result, he said, is that current docking methods assume a rigid binding site — an approach that he likened to “dancing with a partner with rigor mortis.” Not only difficult, but unpleasant.
Lundqvist advised treating each target as “an individual” during virtual screening. While many assumptions about structure and binding behavior can be made about proteins in the same family, he warned that every family can have “a few black sheep.”
Jon Erickson, a research scientist at Eli Lilly, said that his team found in a recent study that protein flexibility and ligand flexibility were the most important factors that affected docking accuracy (with “accuracy” defined as less than 2 Å root-mean-square deviation compared to the x-ray co-complex). On the ligand side, he said, accuracy declined rapidly beyond eight rotatable bonds, while on the protein side, movement of greater than 1.5 Å “significantly reduced” docking accuracy. This finding has important implications for homology modeling, Erickson said. Many researchers in the field are interested in using homology-based models in cases where crystal structures are not available, but the accuracy of those modeling methods will have to improve substantially before reliable docking is possible, he said.
Others said that homology modeling is good enough for some high-throughput docking experiments, especially for proteins that are difficult to crystallize. Andrew Maynard, a computational chemist at AstraZeneca, said that high-resolution 3D structures are not required for “pharmacophore queries,” which have looser requirements than structure-based queries.
Biogen’s Singh agreed, saying that a great deal can be learned using pharmacophore filters with homology models, even though docking with these models “may be fuzzy.”
Schrödinger, meanwhile, is working on a virtual screening method that can account for protein flexibility. Shi-Yu Liu, vice president of marketing at Schrödinger, told BioInform that the method uses the Prime homology modeling program the company released last year in combination with its Glide docking algorithm to “relax” the active site of a target so that it can adjust to fit a given ligand.
The method can provide an alternative model for a protein that can then be used to screen other ligands, Liu said. The company is not actively marketing this particular solution, Liu said, noting that the company’s researchers only “stumbled upon it ourselves in-house two or three months ago.” She added, however, that Schrödinger has seen significant interest from the community in the combined solution and has put together a series of scripts that can link the two programs together.
Accelrys is also working to improve its offerings for the virtual screening community. Dave Edwards, director of computational biology at Accelrys, said that the company’s researchers are working on classifying binding sites into six different classes in order to improve docking success. Ranging from small, well-defined pockets to ill-defined sites that result from hinges in the protein structure, these classes can be used to determine which docking solution is most appropriate, or, in extreme cases, whether to proceed with the protein target at all, Edwards said. In the case of a shallow site, for example, Edwards said that it might be necessary to go back to the biologists and ask them to “find a different protein target in the same pathway.”
Accelrys is also expanding its suite of docking tools. As part of the upcoming spin-off of Pharmacopeia’s PDD drug discovery group, Accelrys will begin marketing LibDock, a docking algorithm developed within Pharmacopeia. Marguerita Lim-Wilby, a product specialist in structure-based design at Accelrys, said that LibDock takes a “different approach” to docking than the company’s LigandFit algorithm, and is better suited to the company’s Catalyst database of ligands than LigandFit is.
With LibDock, researchers can select “hot spots” for binding in the target molecule, and the program runs faster than LigandFit, Lim-Wilby said. The company has not yet compared the accuracy of the two algorithms, however. An evaluation version of LibDock is currently available through the Cerius2 interface, and it is expected to be available as a standalone program in December of this year.
Some Popular Docking Algorithms and Scoring Functions
- Cerius2/LigandFit (Accelrys)
- DOCK (UCSF)
- DockVision (University of Alberta)
- Fred (OpenEye)
- FlexX (Tripos)
- Glide (Schrödinger)
- Gold (Cambridge Crystallographic Data Centre)
- ICM (MolSoft)
- MCDock (Yale University)
- C2 DockScore (Accelrys)
- C2 LigScore (Accelrys)
- ChemScore (Tripos)
- Dock Energy Score (UCSF, Tripos)
- DockVision Energy Score (University of Alberta)
- FlexX (Tripos)
- GlideScore (Schrödinger)
- Gold (Tripos)
- Potential of Mean Force (Tripos, Accelrys)