Blogging has been a little slow lately mainly because there have been exciting new developments with one of the projects I have been involved in and I was in meetings related to this. One of the topics that was discussed at the conference I was at last week was the accurate prediction of free energies of binding, one of the holy grails of drug discovery. Free-energy perturbation (FEP) still remains the gold standard to get relative free energies of binding, but the procedure is very computer intensive and therefore can be carried out only with small changes in congeneric series of inhibitors. The goal remains elusive and extremely challenging.
A poor man's way of quickly obtaining such ∆Gs is MMGBSA (Molecular Mechanics Generalized Born Surface Area). The GBSA model is well-established as a continuum solvation model for taking solvation into account. What MMGBSA does is take a docked ligand structure and then calculate the free energy of binding as the difference between the bound and unbound states using a force field, including implicit solvation.
Therefore, it calculates
∆G (binding) = ∆G (protein-ligand complex) - ∆G (protein) - ∆G (ligand)Clearly it has to calculate the energies of the free ligand and free protein. Much of the challenge lies in these two terms. For starters, one has to calculate the strain energy penalty that the protein has to pay in order to bind the ligand. The binding energy that we see experimentally emerges after the protein has paid this strain penalty. How much this strain energy can be has been a controversial topic recently and I will get into it in another post. Suffice it to say that it's a challenging calculation that is not always handled well by MMGBSA. This is because in calculating the ligand free energy, MMGBSA essentially uses a force field to relax the ligand from the bound conformation to the nearest local energy minimum. However, a complex ligand exists in several local energy minima in solution and this force field local minimum may not correspond to any of them. Thus, one has to consider the global strain penalty that the protein has to pay. For this the method also has to consider the multiple conformations that a ligand adopts in solution. Sadly there are very few techniques that will deconvolute the Boltzmann population of a ligand's real conformations in solution and give us the global minimum. This problem in calculating strain energies remains an important drawback of the method.
Calculating ∆G (protein) is also not a trivial matter. We need to consider the entropy of the protein. One can get this from time-consuming MD simulations but it's not certain if the force field is parametrized well and if conformational space has been sampled comprehensively. Another uncertain factor is the induced fit effects involved in binding. A lot of these effects can be subtle and may extend to second shell amino acid residues.
Given these drawbacks, MMGBSA has nonetheless been quite successful in improving agreement with experiment. One of the reasons it works so well is that when you are dealing with congeneric series of ligands for a given target, many of the terms like conformational entropy and protein reorganization energy are the same or very similar and cancel, although there can be surprises. It seems now that at least one of the problems in MMGBSA- not considering the multiple conformations of the ligand in solution- can be tackled. A simple way to get multiple conformations of a ligand in solution is to do a conformational search. Assuming that the search is "complete", one can then calculate the conformational entropy penalty that the ligand has to pay in order to sacrifice all conformations except one in which it binds to the protein. There has been an implicit way to take this into account- many docking programs include a penalty of 0.65 kcal/mol per frozen rotatable bond. But clearly this penalty may be quite less if there are hundreds of conformations in solution that would lead to a large conformational penalty.
Now a group from Amgen has done such multi-conformational MMGBSA rescoring for four important targets and their ligands- CDK2, Thrombin, Factor Xa and HIV-RT. They compare scores obtained with Schrodinger's GlideXP routine with experimental binding affinities. Then they compare scores obtained with MMGBSA rescoring either with a single ligand conformer representation or with a multiple conformer representation that takes ligand conformational entropy into account. The comparison between single and multiple conformers gives somewhat mixed results and sometimes the single conformer representation also does fairly well; however, one thing is strikingly clear, that MMGBSA rescoring can radically improve correlation with experimental affinities compared to simple GlideXP scoring. In some cases the correlation coefficient jumps from essentially 0.00 to a whopping (by current standards) 0.75. There is a lot of interesting methodology described in the paper worth taking a look at. But it's quite clear how including some of the explicit physical effects involved in protein-ligand binding can substantially improve correlation with experiment. In this case the extra effort expended is a fraction of the cost involved in FEP calculations and the methods can also tackle more diverse ligands.
Even if we are not close to conquering the free energy fort, at least we seem to be getting concrete footholds on it.
Guimaraes, C.R., Cardozo, M. (2008). MM-GB/SA Rescoring of Docking Poses in Structure-Based Lead Optimization. Journal of Chemical Information and Modeling, 48(5), 958-970. DOI: 10.1021/ci800004w