GCMC simulations

Majda 28/6-19

This document describes how to setup, run and analyze grand canonical Monte Carlo (GCMC) simulations using the ProtoMS software. More about the method can be found in this paper:

Ross, Bodnarchuk, & Essex, 2015

Prerequisites:


1. Protein

It is recommended that the protein is minimized before running a GCMC calculation. This can be done with Amber. Execute the two following scripts:

prot1.sh
cat << EOF > prot_leap.in
source leaprc.ff14SB
x=loadpdb protein.pdb
saveamberparm x prot.prm prot.rst
quit
EOF

tleap -f prot_leap.in > prot_leap.out

prot2.sh
cat << EOF > prot_min.in
 
  &cntrl
     irest=0,ntx=1,
     imin=1,maxcyc=100,drms=0.0001,ntmin=2,
     ntc=1,ntf=1,
     cut=20.0,
     ntpr=100,ntwx=0,ntwv=0,ntwe=0,
     ipol=0,igb=0,ntb=0,
   &end
EOF

sander -O -i prot_min.in -o prot_min.out -p prot.prm -c prot.rst -r prot_mincrd

ambpdb -p prot.prm < prot_mincrd > prot_mincrd.pdb
header=$(sed -n '1p' protein.pdb)
echo $header > protein.pdb
sed "/REMARK/d" prot_mincrd.pdb >> protein.pdb

The residue and atom names in the protein.pdb file should then be converted to the ProtoMS format:

python2.7 $PROTOMSHOME/tools/convertatomnames.py -p protein.pdb -o protein_pms.pdb

This gives protein_pms.pdb file that is used in the GCMC simulations.


2. Ligand

First, minimize the ligand:

lig1.sh
cat << EOF > lig_leap.in
source leaprc.gaff2
loadAmberPrep lig.in
loadAmberParams lig.dat
x=loadpdb ligand.pdb
saveamberparm x lig.prm lig.rst
quit
EOF

tleap -f lig_leap.in > lig_leap.out

lig2.sh
cat << EOF > lig_min.in
 
  &cntrl
     irest=0,ntx=1,
     imin=1,maxcyc=100,drms=0.0001,ntmin=2,
     ntc=1,ntf=1,
     cut=20.0,
     ntpr=100,ntwx=0,ntwv=0,ntwe=0,
     ipol=0,igb=0,ntb=0,
   &end
EOF

sander -O -i lig_min.in -o lig_min.out -p lig.prm -c lig.rst -r lig_mincrd

ambpdb -p lig.prm < lig_mincrd > lig_mincrd.pdb
header=$(sed -n '1p' ligand.pdb)
echo $header > ligand.pdb
sed "/REMARK/d" lig_mincrd.pdb >> ligand.pdb

Next, build a ProtoMS template file ligand.tem (assumed residue name ~LIG~):

python2.7 $PROTOMSHOME/tools/build_template.py -p lig.in -f lig.dat -n LIG -o ligand.tem


3. GCMC box

The next step is to setup the box region around the binding site. Usually, a box that is centered on the ligand and extends at least 3 Å on each side of the ligand is used. To define this box, use:

python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -s ligand.pdb -p 3 -o gcmc_box.pdb


4. Water

The protein is then solvated in a droplet of water molecules extending at least 10 Å from the solute:

python2.7 $PROTOMSHOME/tools/solvate.py -pr protein_pms.pdb -g droplet -p 10 -b $PROTOMSHOME/data/wbox_tip4p.pdb -o solvent_water.pdb

The program requires that water molecules within the box (GCMC water molecules) are kept in a separate file and have a different residue name (WA1) from the regular solvent water molecules in solvent_water.pdb. Therefore, we must generate a file containing these water molecules:

python2.7 $PROTOMSHOME/tools/solvate.py -s gcmc_box.pdb -g flood -o gcmc_water.pdb -b $PROTOMSHOME/data/wbox_tip4p.pdb

and delete the corresponding water molecules from the solvent_water.pdb file:

python2.7 $PROTOMSHOME/tools/clear_gcmcbox.py -b gcmc_box.pdb -s solvent_water.pdb -o solvent_water_clear.pdb

For more options regarding the setup of your system, please check ProtoMS documentation webpage, or use -h argument to check the usage of a script.


5. Executing ProtoMS

We now have all the files needed to run the simulations.

protein_pms.pdb
ligand.pdb
ligand.tem
gcmc_box.pdb
gcmc_water.pdb
solvent_water_clear.pdb

This is how a typical GCMC run file looks like:
run_gcmc.in
parfile $PROTOMSHOME/parameter/amber14SB.ff
parfile $PROTOMSHOME/parameter/solvents.ff
parfile $PROTOMSHOME/parameter/amber14SB-residues.ff
parfile $PROTOMSHOME/parameter/gaff14.ff
parfile ligand.tem
protein1 protein_pms.pdb
solute1 ligand.pdb
solvent1 solvent_water_clear.pdb
outfolder out
streamheader off
streamdetail off
streamwarning warning
streaminfo info
streamfatal fatal
streamresults results
streamaccept accept
cutoff 10.0
feather 0.5
temperature 25.0
ranseed 5793757
boundary solvent
pdbparams on
#  GCMC specific parameters
gcmc 0
parfile $PROTOMSHOME/data/gcmc_tip4p.tem
grand1 gcmc_water.pdb
multigcmc -20.000 -19.000 -18.000 -17.000 -16.000 -15.000 -14.000 -13.000 -12.000 -11.000 -10.000 -9.000 -8.000 -7.000 -6.000 -5.000 -4.000 -3.000 -2.000 -1.000
originx -32.947
originy 1.616
originz -5.943
x 26.277
y 12.870
z 14.993
#  End of GCMC specific parameters
dump 200000 results write results
dump 200000 pdb all file=all.pdb standard
dump 200000 restart write restart
dump 200000 averages reset
chunk equilibrate 5000000 solvent=0 protein=0 solute=0 insertion=200 deletion=200 gcsolute=200
chunk equilibrate 5000000 solvent=200 protein=200 solute=200 insertion=200 deletion=200 gcsolute=200
chunk simulate 200000000 solvent=200 protein=200 solute=200 insertion=200 deletion=200 gcsolute=200      #############################################################################################

Under multigcmc specify the B values for your run. Values for the originx, originy, originz, x, y and z can be read from the gcmc_box.pdb file. Again, check the ProtoMS documentation  webpage for more options.

To run the simulations on Aurora, use:
q-gcmc
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 20
#SBATCH -t 168:00:00
#SBATCH -J gcmc
#SBATCH -A snic2018-2-24

cd $SLURM_SUBMIT_DIR

export PROTOMSHOME=/home/majda/protoms-3.4

mpirun -n 20 $PROTOMSHOME/protoms3 run_gcmc.in
##############################################################################################

It is also possible to use $PROTOMSHOME/tools/protoms.py to automatically setup the calculations.


6. Analysis

In the out directory specified in the run file, you can find subdirectories for each B value (e.g. b_-3.000), which contain the output files. These are:

warning
restart.prev
results
all.pdb
restart
info
accept

The simulations can be checked visually with Pymol, for example:

pymol out/b_-3.000/all.pdb

Check the warning files as well to make sure nothing unexpected has happened.

Check to see if the simulations are equilibrated. We can look at the average number of inserted GCMC water molecules for each snapshot:

python2.7 $PROTOMSHOME/tools/calc_series.py -f out/b_-3.000/results -s solventson

To perform a grand canonical integration (GCI), use:

python2.7 $PROTOMSHOME/tools/calc_gci.py -d out/b_* -v 112.0

where -v is the volume of the GCMC region.

Once the optimal B value is known, the corresponding all.pdb file can be used to obtain water densities and water clusters (assumed residue name for GCMC water WA1 and atom name O00 for oxygen atoms):

python2.7 $PROTOMSHOME/tools/calc_density.py -f all.pdb -r WA1 -a O00 -o water_density.dx

python2.7 $PROTOMSHOME/tools/calc_clusters.py -i all.pdb -m WA1 -a O00 -o water_clusters.pdb
export PROTOMSHOME=/home/majda/protoms-3.4

mpirun -n 20 $PROTOMSHOME/protoms3 run_gcmc.in