Q2MM (PON) Parameterisation

An ideal way of doing MM force-field parameterisation, based on QM data.
Can treat also transition states.
"Automated Molecular Mechanics Parameterization with Simultaneous Utilization of Experimental and Quantum Chemical Data", Per-Ola Norrby and Tommy Liljefors, J.Comput.Chem. 1998, 19, 1146-1166.
Review: "Hansen, E.; Rosales, A. R.; Tutkowski, B.; Norrby, P.-O.; Wiest, O. Prediction of Stereochemistry using Q2MM. Acc. Chem. Res. 2016, 49, 996-1005."

Implementation for Amber, Dec. 2006: "P. Rydberg, L. Olsen, P.-O. Norrby & U. Ryde (2007) "A general transition-state force field for cytochrome P450 hydroxylation", J. Chem. Theory Comput., 3, 1765-1773"

All files and scripts needed are in /away/bio/Pon_param, as well as some sample input files.
Some info is found in the README file
And more info about the programs ParEval, ParMod and ParUtil can be found in a pdf file.

Note that the period and phase of dihedrals will not be optimised (neither will the non-bonded parameters or charges).

All instances of X in the text below denotes a number series. Preferably 1,2,3.. etc. This is to enable all scripts to function with multiple structures (for example an enzyme active site with different substrates)
  1. Optimise the desired structure(s) with QM (it is wise to put the atoms in a sensible order, viz. the same order as in Amber).
    Calculate RESP charges and the Hessian (by Gaussian).
    Alternatively, Turbomole can now be used for the Hessian calculation (but still with RESP charges).
    Construct a pdb file from each of the optimised structure (gausstopdb).

  2. Build an amber topology file (a *.in file) for every structure from the data. This can be done automatically by describe (or alternatively by changepdb, command p).
    Be very careful NOT to renumbering any atom (can be cured, see below).
    Set appropriate atom types and insert the RESP charges.
    Check the tree symbols (M, S, B, 3, E).

    Note that with Gaussian, you need to use the coordinates from the archive entry, not from the standard orientation or input coordinates, otherwise the correlation of the Hessian elements will be poor (~0.5, rather than >0.95). This should now be automatically done by describe.

  3. Construct a template Amber parameter file for all missing parameters (if you used describe in the previous point, this is already done). This file must be named frcmod.hemts.
    You should set reasonablee start values (but not zero values) for those parameters you want to parameterise (e.g. analogous values from the amber force field).
    Good starting points can be taken from parameters generated by describe (it even makes Amber template files).

    The program MkPar assumes the following strict format:
    Force constants for bonds are given in format f8.3, starting from column 6
    Equilibrium bond lengths in format f10.5, starting from column 16
    Force constants and equilibrium values for angles in format f8.3, starting from columns 10 and 22.
    Force constants for dihedrals (and impropers) in format f8.3, starting from column 19

    See the sample frcmod.hemts file below.

  4. Set up a leap command file for each structure, leapcom_X, which sucessfully construct the prmtop and prmcrd file for the calculation (check that no atoms are renumbered).
    It should read the three files (pdb, *.in, and frcmod.dat) constructed above.

    You run it by
    tleap -f leapcom_X

  5. Carefully check the values in the frcmod.hemts file, especially the periodicity and phase of the dihedrals (which are not changed in the parameterisation).

    This can be done by changeparm, commands e (the bond and angle energies should be close to zero) and dc (there should be no dihedrals in the erroneous phase or large angles lists; also check all dihedrals with suboptimal values).

    Dihedrals with large angles should be removed by setting the force constants to zero.

  6. Construct a nmodemm.in file for MM minimisation (optionally, you can use a sander.in file instead).
    Construct a nmode.in file for Hessian calculation.
    Use the modified version of nmode in $AMBERHOME/src/nmode/nmode (/away/bio/Bin/Linux/nmodeh), that writes out the Hessian. See below for the modifications needed.
    Test the programs with
    nmode -O -i nmodemm.in -o nmodemm.out -p prmtop -c prmcrd -r mdrest
    and then
    nmodeh -O -i  nmode.in -o  nmode.out -p prmtop -c mdrest -r temp
    Check that a file hessian is created.

  7. Run
    MkPar > paropt
    It takes the *dat file and writes a list of all parameters in the right format to stdout.
    Note that the titles of the parts of the frcmod file must be:
    BOND
    ANGLE
    DIHEDRAL
    IMPROPER
    Otherwise, the file will not be properly  read.

    Note that MkPar does not work with atom types with only one character!
    mkpar probably works better:
    mkpar

    Note that paropt is tab-delimited, so be careful not to delete the tabs.

    Check the file paropt and remove parameters you do not want to fit.
     
    If you want to force two parameters to be the same, add the number in first column (the line number in forcmod) of the second parameter as a sixth column (i.e. a new final column) to the first parameter and delete the row with the second parameter from paropt (the program assumes that the parameters come in the same column in the two rows).
    Example:
    15      6         49.0190          4.7009       %8.3f   16
    (now the parameter in column 6 on line 16 will be the same as the one in column 6 on line 15).

  8. Create a tethering file if needed (to tether some values to the QM values).
    Create a file called parteth. This should be a copy of paropt, but with only bond distances and angles in degrees. The fourth column should be changed to the weighting of the tethering (10 for bonds and 1 for angles is a good start guess). The third column should be the equilibrium data from QC calculations. Run the script AveTeth to get average data over all your structures for all bonds and angles in your frcmod file.

  9. With Gaussian frequencies:
    Run the script
    GetInvHess gauss.out (if you have a transition state)
    or the script
    GetHess gauss.out (if you have a ground state)
    on the Gaussian frequency output file (note that the prmtop file also needs to be in the same directory).

    NOTE: gauss.out should only contain ONE frequency calculation. If you have several, then just delete all except the one you want to use in the output file.

    This is the mass-weighted and reformed Hessian (ts eigenvalue made positive).

    If you get only the error message in the file
    Unknown atomic number, 12, exiting...
    you have to first run
    GetScale
    to get the file scaling
    and then
    ParUtil -l gauss.out -w12,24.305 -i D -f1185.8211,2240.877 -s1,.01 -2 -4 scaling > gauss.out.cal
    or for GetInvHess
    ParUtil -l $1 -w12,24.305 -e > D.eigen
    awk 'i==0 {print} i==1 {i=0;x=0;n=0;for(k=1;k<=NF;k++) if($k<x) {x=$k;n=k}\
        for(k=1;k<=NF;k++) {y=k==n?1.:$k;printf "\t%16.8E",y} printf "\n"}\
        /Eigenvalues/ {i=1}' D.eigen > D.eigmod
    ParUtil -er D.eigmod -i D -w12,24.305 -f1185.8211,2240.877 -s1,.01 -2 -4 scaling > $1.cal
    i.e., you should give the missing atomic weight after the -w command to ParUtil.
    For Turbomole use 
    GetHess_t force.log
    or
    GetInvHess_t force.log
    with the output from aoforce. The file with the hessian needs to be named hessian and be in
    the current directory.
    Note that you need to take a backup copy of the file hessian, because this name is later used also for the MM hessian.
    cp hessian hessian-qm

    If you want to run ParUtil manually the command is the same as for a
    gaussian log file, but with -l replaced by -m (t was allready taken):

    ParUtil -m force.log -w12,24.305 -i D -f1185.8211,2240.877 -s1,.01 -2 -4 scaling > force.log.cal

  10. Only needed if the atom numbers in Gaussian and Amber do not coincide
    Correct the atom numbers in the hessian D.cal files:
    For each structure do the following

  11. Run the script PonInit
    This creates files named dihlist_X with list of dihedrals from changeparm.
    You may remove from this file dihedrals that you don't want to be included in the penalty function.
    Dihedrals that contain angles that are larger than 150 degrees are automatically removed to avoid that small changes causes large problems.

  12. Currently buggy, skip this point.
    Run script LinSearch to provide good starting values for all parameters.
    Output in stdout.
    Files paropt and frcmod.hemts are modified.
    Note that the program may remove parameters. Check this thoroughly.
    If the LinSearch script goes into an infinite loop for a variable, edit the paropt file manually and modify this variable slightly to get it out of the loop


  13. Run the script CalcLoop
    This is the main optimisation program. It runs iterative Newton-Raphson (NR) optimisation of the parameters and Simplex optimisation of some parameters that are poorly behaving in NR.
    It calculates numerical derivatives by changing one parameter in the frcmod file at a time (both up and down, giving 2*#parameters modified frcmod files, called frcmod.nnn).

    The output is in loop.log
    It modifies the paropt and frcmod.hemts files.
    Other files are described below.

    Options:
    Skip Simplex optimization by creating the file NoSimplex.
    Use Sander MM minimisation instead of Nmode by creating the file CalcOneSander
    The program can be softly stopped by
    touch Loop.stop

    There are some bugs when a parameter goes towards 0, which usually makes the program crash.
    If this happen, remove the parameter or change the period or phase of a dihedral.

  14. During the run of CalcLoop, you should check:


  15. After convergence, you should check the following
    (all this is automatically done with the script ponresult):
  16. An error estimate can be found by checking how large changes in one parameter gives 1% worse penalty function.
    This is done by the script pon_errorcheck.py

    It outputs two files error_results and error_par.anal.
    error_results contains all force constants and uncertainties (error estimates) for these.
    The first column is line number in paropt (or # for a poor parameter).
    The second is the line number in frcmod file.
    The third is equilibrium value from paropt.
    The fourth is the error estimate.
    The  fifth is an alternative error estimate (from the smallest solution of the second degree equation).
    grep '\#' error_results

    error_par.anal contains the label, the line number, the weight, the reference value, the  error, the quadratic errors (= their influence on the penalty function), sorted by their quadratic error.
    This shows which terms gives the largest errors in the penalty function

    If your optimization crashes you can also get the latter file by:
    cut -f1-4 par.tot > par.anal
    ParEval -z par.anal | sort -n -k6 -r > error_par.anal

  17. If you want to change a parameter and rerun CalcLoop, you should change it in both paropt  and frcmod.hemts.
    You can use the old files and make changes only in frcmod.hemts and then run the program
    ponmkpar
    It will copy the old paropt file to paropt.old and update it with the data from frcmod.hemts, asking for appropriate delta values.

  18. When the parametrisation has converged, you can optimise the parameters a bit further.
    To do this manually, decrease the step lengths in paropt to 10.0 for bonds and lower for angles and dihedrals. Then start CalcLoopHC (this script has tighter convergence than CalcLoop and does not update step lengths in paropt).
    At convergence decrease the step lengths even more.
    Continue until you can't get improvements with CalcLoopHC by lowering the step lengths.

  19. When all jobs are finished, you can run the script purpon to remove unnessesary files.
  20. If you want to save additional space, you could also remove D1.cal and par.tot.
    Finally, gzip all files.
    gzip *&

Tips & Tricks


Files from CalcLoop

Files with useful information


Sample leap_com file

source leaprc.ff03
loadAmberPrep heme_ts.in
loadAmberParams frcmod.hemts
x=loadpdb hemets.pdb
check x
saveamberparm x prmtop prmcrd
quit

To avoid errors like in tleap:
+Currently only Sp3-Sp3/Sp3-Sp2/Sp2-Sp2 are supported
+---Tried to superimpose torsions for: *-C1-C6-*
+--- With Sp0 - Sp0
+--- Sp0 probably means a new atom type is involved
+--- which needs to be added via addAtomTypes

Issue the following command for all new parameters in tleap:
addAtomTypes {
  {"c1" "C" "sp2"}
  {"c2" "C" "sp2"} }

For transition states, you should define both the breaking and the forming bond.
You do that with
bond x.1.C3 x.2.HN1

Often, you then get the error message:
Bond: Maximum coordination exceeded on .R<TSP 2>.A<HN1 2>
      -- setting atoms pert=true overrides default limits
which you solve by issuing the command:
set x.2.HN1 pert true
bond x.1.C3 x.2.HN1


Sample nmodemm.in file

MM minimisation with nmode
 &data
  ntrun = 4,
  nprint = 100,
  nsave = 1000,
  idiel = 1,
  scnb = 2.0,
  scee = 1.2,
  cut = 999.0
 &end


Sample sander.in file

Title
&cntrl
  irest=0,ntx=1,
  imin=1,maxcyc=1000,drms=0.001,
  ntc=1,ntf=1,
  nsnb=40,cut=999.0,dielc=1.0,
  ntpr=100,ntwx=0,ntwv=0,ntwe=0,
  ntb=0,ntp=0,
  ipol=0,igb=0,
  scnb=2.0,scee=1.2,
  ncyc=10,ntmin=1,dx0=0.002
 &end


Sample nmode.in file

Title
 &data
  ntx    = 1,
  ntrun  = 1,       nvect  = 0,
  drms   = 1,
  dielc  = 1.0,     idiel  = 1,
  scnb   = 2.0,     scee   = 1.2,
  cut    = 999.0,
  iulf=1
 &end
END

Sample paropt file

The first column is the line in the *dat file, the second the column, the third is the current value, the fourth is a step factor and the last the format to rewrite it to the file.

15      6         49.0190          4.7009       %8.3f
15      16         2.0194          0.0088       %10.5f
16      6         50.6960          3.2380       %8.3f
16      16         2.0116          0.0039       %10.5f
17      6        286.9810          7.0154       %8.3f
17      16         1.3676          0.0032       %10.5f
18      6        316.9810          7.6348       %8.3f
18      16         1.3660          0.0060       %10.5f
19      6        272.0190          3.7641       %8.3f
19      16         1.4427          0.0060       %10.5f
20      6        390.0190          3.2305       %8.3f
20      16         1.3835          0.0060       %10.5f
21      6        417.0190          5.1356       %8.3f
21      16         1.3623          0.0058       %10.5f
22      6        337.0190          2.9041       %8.3f
22      16         1.0882          0.0064       %10.5f
23      6        337.0190          4.2527       %8.3f
23      16         1.0834          0.0063       %10.5f
24      6        200.9810          6.2876       %8.3f
24      16         1.4079          0.0064       %10.5f
25      6        200.9810          7.4684       %8.3f
25      16         1.1511          0.0040       %10.5f
26      6        199.0190          3.2894       %8.3f


Sample parteth file

The first column is the line in the *dat file, the second the column, the third is the average QC value, the fourth is the weight factor and the last the format to rewrite it to the file.

15      16         2.0194          10.0       %10.5f
16      16         2.0116          10.0       %10.5f
17      16         1.3676          10.0       %10.5f
18      16         1.3660          10.0       %10.5f
19      16         1.4427          10.0       %10.5f
20      16         1.3835          10.0       %10.5f
21      16         1.3623          10.0       %10.5f
22      16         1.0882          10.0       %10.5f
23      16         1.0834          10.0       %10.5f
24      16         1.4079          10.0       %10.5f
25      16         1.1511          10.0       %10.5f


Sample frcmod.hemts file

Title - P. Rydberg 11/11-05

MASS
NP 14.01
NO 14.01

BOND
FE-NP   87.661     2.01280
FE-NO   49.891     2.03930

ANGLE
NP-FE-NP    32.753     173.285
NO-FE-NO    38.190     179.739

DIHEDRAL
X-NO-FE-X     1       1.000     180.000       2.000
X-NP-FE-X     1       2.000     180.000       2.000

IMPROPER
c1-c1-NP-FE   0       1.041     180.000       2.000
c1-c1-NO-FE   0       0.842     180.000       2.000

NONBON
  FE         1.20000   0.05000   0.00000
  HQ         1.00      0.020     0.00000


Modifications of Amber nmode

Note that the Hessian must be mass-weighted.
Three files are modified: inpdat.h, rdinp.f, and nmode.f

diff inpdat.h~ inpdat.h
10c10
<       ipol,i3bod,ismem
---
>   ipol,i3bod,ismem,iulf

diff rdinp.f~ rdinp.f
38c38
<          ipol,i3bod,ismem
---
>          ipol,i3bod,ismem,iulf
91a92
>    iulf=0

diff nmode.f~   nmode.f (Amber9)
186d185
<          if ((iulf.eq.0).and.(iulf1.eq.0)) then
188d186
<          endif
191,210d188
<       !       ----- Dump coordinates force and Hessian
<       !       ----- Added by Ulf Ryde, 20/9-06
<       if (iulf1.gt.0) then
<          write(6,*)' '
<          write(6,*)mx,mf,mh
<          open(73,file='hessian',status='unknown',form='formatted')
<          write(73,'(i9)')natom
<          write(73,'(a)')'Coordinates'
<          write(73,'(3f16.8)')(x(mx+i),i=0,3*natom-1)
<          write(73,'(a)')'Forces (not mass weighted)'
<          write(73,'(3e16.8)')(x(mf+i),i=0,3*natom-1)
<          write(73,'(a)')'Hessian (not mass weighted)'
<          k=0
<          do i=1,natom*3
<             write(73,'(6e16.8)')(x(mh+k+j),j=0,i-1)
<             k=k+i
<          enddo
<          close(73)
<       endif
<      
215,233d192
<
<       !       ----- Dump coordinates force and Hessian
<       !       ----- Added by Ulf Ryde, 28/11-05
<       if (iulf.gt.0) then
<          write(6,*)mx,mf,mh
<          open(73,file='hessian',status='unknown',form='formatted')
<          write(73,'(i9)')natom
<          write(73,'(a)')'Coordinates'
<          write(73,'(3f16.8)')(x(mx+i),i=0,3*natom-1)
<          write(73,'(a)')'Forces (mass weighted)'
<          write(73,'(3e16.8)')(x(mf+i),i=0,3*natom-1)
<          write(73,'(a)')'Hessian (mass weighted)'
<          k=0
<          do i=1,natom*3
<             write(73,'(6e16.8)')(x(mh+k+j),j=0,i-1)
<             k=k+i
<          enddo
<          close(73)
<       endif
300,308d258
< !    Code added by Jacob Kongsted 2007 for new MM/PBSA entropy
< ! JK
<          nvecs =  3*natsys - 6
<          call thermo(natsys,nvecs,ilevel,x(mx),x(mamass),x(mcval), &
<                x(mh),x(mh+ns3),x(mh+2*ns3), &
<                x(mh+3*ns3),t,patm)
< !    End of new code
<
<


Compiling the programs
Most of the programs are unix shell scripts.
However, there are also a few c programs:
The needed files are found in the Source directory:
ParEval.c  ParMod.c  ParUtil.c  ParUtil_t.c  jacobi.c  nrutil.c

ParUtil is compiled by:
gcc ParUtil.c jacobi.c nrutil.c -o ParUtil -lm

ParUtil_t is compiled by:
gcc ParUtil_t.c jacobi.c nrutil.c -o ParUtil_t -lm

However, I failed to compile the remaining two program, owing to missing *.c files:

gcc ParMod.c nrutil.c -lm
ParMod.c:(.text+0x71b): undefined reference to `indexx'
file indexx.c is missing

gcc ParEval.c nrutil.c -lm
ParEval.c:(.text+0x2675): undefined reference to `dsvdcmp'
ParEval.c:(.text+0x2b17): undefined reference to `dsvbksb'
ParEval.c:(.text+0x2bf9): undefined reference to `choldc'
ParEval.c:(.text+0x2c23): undefined reference to `cholsl'
collect2: error: ld returned 1 exit status
Four *.c files are missing.



The README file

Automated parameterization example

Copyright (C) 1997 Per-Ola Norrby

See: "Automated Molecular Mechanics Parameterization with
Simultaneous Utilization of Experimental and Quantum
Chemical Data", Per-Ola Norrby, Tommy Liljefors,
J.Comput.Chem. 1998, 19, 1146-1166.

Please don't give this away. I like to keep track of
who's using it, refer interested people to me,
pon .at. kemi.dtu.dk or http://organisk.kemi.dtu.dk/PON/

Per-Ola Norrby


In this example, some parameters in MM3* are refined.
The reference data in this example are structures, energies,
and cartesian mass-weighted Hessians from B3LYP/6-31G*
calculations. The structures have been exported to
files X.mae, D.mae, and E.mae.
ESP charges were calculated from the wavefunction,
and can be found in the D.mae-file. X.mae contains
reference structures, D.mae have all structures for which
Hessians have been calculated (together with reference charges),
and E.mae have structures for calculation of energy
differences, with the QM energy for each structure.
The mass-weighted Hessians have been
extracted into file D.cal in lower triangular form. For
creating D.cal, see GetHess.

The current force field comparison data is calculated by
the following MacroModel batch files:
Mass-weighted Hessian: D.com
Minimized structures : X.com
Relative energies : E.com
There is also the command file XXR.com, which uses X.mae as input
and evaluates all structural elements (bonds, angles, torsions)
for the file of reference data. It also minimizes all structures
AFTER outputting structure data. Thus, by reading the file XXR-out.mae
into MacroModel 2 structures at a time, you can compare reference
and minimized structures on the screen (not necessary, but nice :-).

For structural data, note that only structure elements with the string
OPT in the comment portion of the force field will be extracted and
used in comparisons. To use non-default scale factors for bonds,
angles, and torsions, respectively, the second word of the structure
name must be "Scaling", and the third through fifth should be the new
scale factors. Note that scaling applies to ALL subsequent structures,
unless a new set is given.


The files mm3.fld and atom.typ should be modified so that all
structures can be calculated (and preferably be recognizable)
before starting the parameter refinement. All parameters to be
refined should be defined in file paropt. The first two colums
correspond to row and position in the force field (position values
1-3 are interpreted as default MacroModel offsets, otherwise as the
number of characters before the parameter. Negative position values
allow the parameter to get a negative value). The third column is
the current parameter value, the fourth (optional) is the numerical
differentiation step, and the fifth (optional) is a C format for
writing the parameter into the force field file (default: %9.4f).
The Unix script MkPar can be modified and used to create an
initial paropt. You may want to modify paropt.

Parameters to be tethered are defined together with
the harmonic tethering constant (column 4) in file parteth.

Three executables are included, ParEval, ParMod, and ParUtil (together
with C-source: Numerical Recipes routines are not included). These
are not described in detail, but the main functions are:

ParEval: Evaluate force field results, suggest new solutions.
ParMod : Modify the force field and parameter definition files
(mm3.fld and paropt).
ParUtil: Whatever needs doing, including extraction and manipulating
QM Hessians, and a Unix paste for
very large, tab-separated column files.

Several Unix scripts are included. These are all files that start
with the letters C, M, N, or S. Only CalcRef and CalcOne will actually
start MM calculations. These should be modified for each application.
MkPar was mentioned above, the others are called by the two top-level
commands CalcLoop and NR_Loop, as follows:

CalcLoop ; Uses both NR and Simplex, first does a
LinSearch ; Line Search (once only, see below), then loops until
; no improvement can be obtained.
NR_Jacob ; Calculates the Jacobian by numerical differentiation,
; suggests several new parameter sets
NR_CalcAll ; Calculates all reference values and all data points for
; every force field, puts the complete matrix > par.tot
CalcRef ; Scaling factors and reference values > par.ref
CalcOne ; Calculates values from one force field > par.one
NR_Eval ; Evaluates all suggested force fields, selects the best
NR_CalcAll
CalcOne
Simplex ; Performs a Simplex optimization based on parameter
; definition file simpopt
SimpAll ; Similar to NR_CalcAll, output > simp.tot
CalcRef
CalcOne

LinSearch ; Makes a Line Search with parabolic interpolation
; for each parameter in paropt; use only initially
NR_CalcAll
CalcOne
CalcRef

NR_Loop ; Loops like CalcLoop, but using only NR (not updated)
NR_Jacob
NR_CalcAll
CalcRef
CalcOne
NR_Eval
NR_CalcAll
CalcOne

Looping can be controlled by "touch"-ing specific control files.
For example, CalcLoop can be stopped gracefully by "touch Loop.stop".
The numerical differentiation steps should be "balanced", modify
them so that all derivatives (eg., in nr.lin after NR) are "similar".
What is similar? Say, within two orders of magnitude, if possible.

Manuals for the C-commands are found in the files Par*.txt

Good luck!

Send comments to Per-Ola Norrby, pon .at. chem.gu.se