Hybrid Quantum Mechanics / Molecular Mechanics (QM/MM) Approaches - Practical aspects: Size of the QM region, initial structure (classical equilibration), QM/MM protocol; computational requirements Mauro Boero Institut de Physique et Chimie des Matériaux de Strasbourg University of Strasbourg - CNRS, F-67034 Strasbourg, France and @Institute of Materials and Systems for Sustainability, Nagoya University - Oshiyama Group, Nagoya Japan 1
Time and Length scales in simulations continuum mesoscale coarse grain 10 -6 molecular dynamics 10 -8 e -( D E / kT ) ab initio F = m a 10 -12 H y = E y 10 -10 10 -8 10 -6 10 -4 Length Scale (m) 2
From the crystallographic data to the initial structure: Starting a QM/MM calculation • First question to be addressed: Do we really need QM ? (Sometimes MM is enough !) • Second question: Do we have all what we need ? Not always: the structure of proteins and or nucleic acids available via the web-accessible Protein Data Bank or provided by experiments, are generally obtained by X-ray scattering. As it is well known that X-rays are insensitive to H atoms*, thus only the coordinates of heavier atoms are available. *X-rays are scattered by the electron cloud of the atom and hence the scattering amplitude of X-rays increases with the atomic number 3
From the crystallographic data to the initial structure: Starting a QM/MM calculation Trp-cage, the second smallest protein made of 20 residues. Protein Data Bank www.pdb.org access codes: 2JOF, 1L2Y, 1RIJ C = gray N = blue O = red …and H ? 4
Then, before starting any calculation we need to complete the structure… • First warning: Hydrogens addition leaves in general a certain freedom in the selection of the protonation state of the various residues or bases • To complement the missing information, a careful analysis a posteriori of the local hydrogen bond (H-bond) network is used • And (or jointly with) theoretical estimations of the p K While the former operation is easily done with most of the available graphic software, the second one is more demanding and generally implies the solution of the Poisson-Boltzmann equation. 5
Poisson-Boltzmann equation (1) • The method has been extensively tested and assessed over the years and makes use of the Poisson equation 4 V = 2 ( r ) ( r ) where the total electrostatic potential V ( r ) is due to the charge distribution ( r ), i.e. all the positive and negative ions of the system, including also the solvent, if present. The dielectric function e is generally assumed to be a constant and typical values e ~ 80 represent the standard assumption in the case of aqueous solutions (D. Andelman, Electrostatic properties of membranes: The Poisson- Boltzmann Theory in “Handbook of Biological Physics”, vol. 1, p. 603-638, ed. By R. Lipowsky and E. Sackmann, Elsevier Science B.V. 1995). 6
Poisson-Boltzmann equation (2) • In the limit of a continuum density distribution, the chemical potential m i of the i -th charge in the system is defined by m = ( r ) eZ V ( r ) k T ln ( r ) i i B i where e is the electron charge, Z i the valence of the i -th ion, V the electrostatic potential, k B the Boltzmann constant, T the temperature and i the charge density of the i -th ion. • The first addend in the right-hand side of Eq. (2) comes from the pure electrostatic contribution • The second one is due to the entropy contribution of the ions under the hypothesis of weak solution. 7
Poisson-Boltzmann equation (3) • This leads to a Boltzmann distribution of the charges = eZ V ( r ) / k T 0 ( r ) e i B i i and the explicit form of the Poisson-Boltzmann equation reads 4 e = 2 0 eZ V ( r ) / k T 0 eZ V ( r ) / k T V ( r ) Z e Z e B B The contributions from positive and negative ions are written separately. This equation is solved numerically by free computer codes (N. A. Baker et al. (2001) Proc. Natl. Acad. Sci. USA 98 , 10037 - http://apbs.sourceforge.net/) or by web-based services (http://bioserv.rpbs.jussieu.fr/PCE http://biophysics.cs.vt.edu/H++ etc.). 8
Poisson-Boltzmann equation (4) • Once that the Poisson-Boltzmann is solved, one can compute the electrostatic free energy of the system by integrating over the volume W of the system, = 2 3 F V ( r ) d r k T el B 8 W d 3 ( r ) ln[ ( r ) / ( r )] ( r ) ln[ ( r ) / ( r )] [ ( r ) ( r ) 2 ( r )] r 0 0 W and estimate the theoretical theoretical p K is computed by introducing the free energy difference between two (or more) states, A and B, D F el = F el (B) – F el (A), 1 = D p K F el ln 10 k T B 9
The complete structure: • The Poisson-Boltzmann / H-bond network inspection allows to recover the missing H coordinates 10
…and this is not yet the end of the story: • Once the protonation state of the system has been addressed, the protein or nucleic acid system has to be solvated in water, since this would correspond to a natural environment of the biomolecule. 304 atoms →2306 at. 11
How can I add water ? • Few O atoms of H 2 O molecules can be already present in the crystal structure (crystallographic water) • Several software tools are available - Xleap or tleap (Amber / glycam/parm99 add force fields) http://www.chem.hamilton.edu/~kkirschn/Amber_Tutorial/ xleap_glycam_tutorial.pdf - MOIL , http://cbsu.tc.cornell.edu/software/moil/moil.html - GROMACS , http://www.gromacs.org/ - PACKMOL , http://code.google.com/p/packmol/ • Add sufficient amount of solvent water around the crystallographic structure at standard density (1.0 g/cm 3 ) 12
The next step: MM-equilibrating your system • The Cartesian positions ( x 1 , y 1 , z 1 ),…,( x N , y N , z N ) of our whole system, hereafter identified by vectors R I = ( x I , y I , z I ), are interacting atoms • We have then to select an appropriate classical potential that will be our MM description of the interactions among atoms (AMBER, CHARMM, GROMOS, GROMACS, OPLS-AA, etc.) • These potentials are analytic functions V ( R I ) of the positions R I • The forces f I on each particle are simply the gradients (derivatives) of this potential, V ( R ) = I f I R I and the analytical 3 x N dimensional function f I is called force field 13
MM: what is a Force Field ? Which interactions ? bond stretch = 0 2 v ( r , k ) k ( r r ) 1 IJ IJ IJ IJ IJ 2 torsion v ( f IJKL ) intermolecular interactions bending intramolecular nonbonded = 0 2 v ( ) k ( ) 1 IJK IJK 2 interactions 14
Files in input (whatever MM code): • Basically we need to handle three pieces of information that can be either contained in a single file (almost never) or separated in three different files (practically always) COORDINATES : R I = ( x I , y I , z I ), TOPOLOGY : ???? INPUT : list of parameters (integration step, total simulation time, etc.) and keywords activating specific functions, e.g. MD Files in output: • ENERGIES, TRAJECTORY, CHARGES, MULTIPOLES, etc. 15
TOPOLOGY ? Any force field (of course) does not do any electronic structure calculation. So giving the coordinates ( x , y , z ) is insufficient. We must supply to the MM code also the chemical species (this we do also in QM approaches) and which type of interaction bewteen atoms (=parameters of the force field). Example: Suppose we have an atom at ( x , y , z ). The numbers x , y , z are listed in the COORDINATES file. Now we want this atom to be C and we want it in sp 3 configuration. Evidently a force field for sp 3 is an analytical function different from the one for sp 2 . 3 2 f sp sp V r , , V ( r , , z ) This information is given in the file called TOPOLOGY. 16
TOPOLOGY ?... I do not have it… • The first thing to do is complaining with the person(s) who gave you only the coordinates • The second alternative is to re-generate yourself the topology via Amber_tools, e.g. 1. Write a tentative input file tleap.in as → load glycam and parm99 additional ff source leaprc.cell cellmod=loadpdb <your-PBD-file>.pdb set cellmod box {110.466 110.466 110.466} → cell size # save coordinates, topology and pdb files ! saveamberparm cellmod cellmod.top cellmod.crd savepdb cellmod cellmod.pdb quit 2. Run tleap/xleap tleap -f tleap.in 17
Equilibrating your system • Once the full system has been obtained, before doing any quantum calculation one should better do a full classical (=MM) dynamical run …why ? 1) You had just frozen coordinates of heavy atoms only 2) You added H atoms a posteriori 3) You added solvent water molecules 4) So, you have to reach an equilibrium state solute-solvent Monitor your MM dynamics, especially crucial quantities: T (temperature of the system) and partial T solute/solvent Root Mean Square Displacement (RMSD) of heavy atoms heavy N 1 atoms 2 = RMSD R ( t ) R ( 0 ) I I heavy N = I 1 atoms 18
The RMSD: comparison with pristine crystallographic data Check if your system is still compatible with the experimental resolution* ! (at very best 1-2 Å. Mostly about 3 Å) 19
Recommend
More recommend