GCMC simulations
Majda 28/6-19
This document describes how to setup, run and analyze grand
canonical Monte Carlo (GCMC) simulations using the ProtoMS software.
More about the method can be found in this paper:
Ross, Bodnarchuk, & Essex, 2015
Prerequisites:
- Amber software
- ProtoMS software
- A pdb file of your protein which was prepared following the
instructions found here,
protein.pdb
-
A pdb file of your ligand, ligand.pdb,
as well as Amber lig.in and lig.dat files
1. Protein
It is recommended that the protein is minimized before running a
GCMC calculation. This can be done with Amber. Execute the two
following scripts:
prot1.sh
cat << EOF > prot_leap.in
source leaprc.ff14SB
x=loadpdb protein.pdb
saveamberparm x prot.prm prot.rst
quit
EOF
tleap -f prot_leap.in > prot_leap.out
prot2.sh
cat << EOF > prot_min.in
&cntrl
irest=0,ntx=1,
imin=1,maxcyc=100,drms=0.0001,ntmin=2,
ntc=1,ntf=1,
cut=20.0,
ntpr=100,ntwx=0,ntwv=0,ntwe=0,
ipol=0,igb=0,ntb=0,
&end
EOF
sander -O -i prot_min.in -o prot_min.out -p prot.prm -c prot.rst -r
prot_mincrd
ambpdb -p prot.prm < prot_mincrd > prot_mincrd.pdb
header=$(sed -n '1p' protein.pdb)
echo $header > protein.pdb
sed "/REMARK/d" prot_mincrd.pdb >> protein.pdb
The residue and atom names in the protein.pdb file should then be
converted to the ProtoMS format:
python2.7 $PROTOMSHOME/tools/convertatomnames.py -p protein.pdb -o
protein_pms.pdb
This gives protein_pms.pdb file that is used in the GCMC
simulations.
2. Ligand
First, minimize the ligand:
lig1.sh
cat << EOF > lig_leap.in
source leaprc.gaff2
loadAmberPrep lig.in
loadAmberParams lig.dat
x=loadpdb ligand.pdb
saveamberparm x lig.prm lig.rst
quit
EOF
tleap -f lig_leap.in > lig_leap.out
lig2.sh
cat << EOF > lig_min.in
&cntrl
irest=0,ntx=1,
imin=1,maxcyc=100,drms=0.0001,ntmin=2,
ntc=1,ntf=1,
cut=20.0,
ntpr=100,ntwx=0,ntwv=0,ntwe=0,
ipol=0,igb=0,ntb=0,
&end
EOF
sander -O -i lig_min.in -o lig_min.out -p lig.prm -c lig.rst -r
lig_mincrd
ambpdb -p lig.prm < lig_mincrd > lig_mincrd.pdb
header=$(sed -n '1p' ligand.pdb)
echo $header > ligand.pdb
sed "/REMARK/d" lig_mincrd.pdb >> ligand.pdb
Next, build a ProtoMS template file ligand.tem (assumed residue name
~LIG~):
python2.7 $PROTOMSHOME/tools/build_template.py -p lig.in -f lig.dat
-n LIG -o ligand.tem
3. GCMC box
The next step is to setup the box region around the binding site.
Usually, a box that is centered on the ligand and extends at least 3
Å on each side of the ligand is used. To define this box, use:
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -s ligand.pdb -p 3 -o
gcmc_box.pdb
4. Water
The protein is then solvated in a droplet of water molecules
extending at least 10 Å from the solute:
python2.7 $PROTOMSHOME/tools/solvate.py -pr protein_pms.pdb -g
droplet -p 10 -b $PROTOMSHOME/data/wbox_tip4p.pdb -o
solvent_water.pdb
The program requires that water molecules within the box (GCMC water
molecules) are kept in a separate file and have a different residue
name (WA1) from the regular solvent water molecules in
solvent_water.pdb. Therefore, we must generate a file containing
these water molecules:
python2.7 $PROTOMSHOME/tools/solvate.py -s gcmc_box.pdb -g flood -o
gcmc_water.pdb -b $PROTOMSHOME/data/wbox_tip4p.pdb
and delete the corresponding water molecules from the
solvent_water.pdb file:
python2.7 $PROTOMSHOME/tools/clear_gcmcbox.py -b gcmc_box.pdb -s
solvent_water.pdb -o solvent_water_clear.pdb
For more options regarding the setup of your system, please check
ProtoMS
documentation
webpage, or use -h argument to check the usage of a script.
5. Executing ProtoMS
We now have all the files needed to run the simulations.
protein_pms.pdb
ligand.pdb
ligand.tem
gcmc_box.pdb
gcmc_water.pdb
solvent_water_clear.pdb
This is how a typical GCMC run file looks like:
run_gcmc.in
parfile $PROTOMSHOME/parameter/amber14SB.ff
parfile $PROTOMSHOME/parameter/solvents.ff
parfile $PROTOMSHOME/parameter/amber14SB-residues.ff
parfile $PROTOMSHOME/parameter/gaff14.ff
parfile ligand.tem
protein1 protein_pms.pdb
solute1 ligand.pdb
solvent1 solvent_water_clear.pdb
outfolder out
streamheader off
streamdetail off
streamwarning warning
streaminfo info
streamfatal fatal
streamresults results
streamaccept accept
cutoff 10.0
feather 0.5
temperature 25.0
ranseed 5793757
boundary solvent
pdbparams on
# GCMC specific parameters
gcmc 0
parfile $PROTOMSHOME/data/gcmc_tip4p.tem
grand1 gcmc_water.pdb
multigcmc -20.000 -19.000 -18.000 -17.000
-16.000 -15.000 -14.000 -13.000 -12.000 -11.000 -10.000 -9.000
-8.000 -7.000 -6.000 -5.000 -4.000 -3.000 -2.000 -1.000
originx -32.947
originy 1.616
originz -5.943
x 26.277
y 12.870
z 14.993
# End of GCMC specific parameters
dump 200000 results write results
dump 200000 pdb all file=all.pdb standard
dump 200000 restart write restart
dump 200000 averages reset
chunk equilibrate 5000000 solvent=0 protein=0 solute=0 insertion=200
deletion=200 gcsolute=200
chunk equilibrate 5000000 solvent=200 protein=200 solute=200
insertion=200 deletion=200 gcsolute=200
chunk simulate 200000000 solvent=200 protein=200 solute=200
insertion=200 deletion=200
gcsolute=200
#############################################################################################
Under multigcmc specify the B values for your run. Values for the
originx, originy, originz, x, y and z can be read from the
gcmc_box.pdb file. Again, check the
ProtoMS
documentation
webpage
for more options.
To run the simulations on Aurora, use:
q-gcmc
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 20
#SBATCH -t 168:00:00
#SBATCH -J gcmc
#SBATCH -A snic2018-2-24
cd $SLURM_SUBMIT_DIR
export PROTOMSHOME=/home/majda/protoms-3.4
mpirun -n 20 $PROTOMSHOME/protoms3 run_gcmc.in
##############################################################################################
It is also possible to use $PROTOMSHOME/tools/protoms.py to
automatically setup the calculations.
6. Analysis
In the out directory specified in the run file, you can find
subdirectories for each B value (e.g. b_-3.000), which contain the
output files. These are:
warning
restart.prev
results
all.pdb
restart
info
accept
The simulations can be checked visually with Pymol, for example:
pymol out/b_-3.000/all.pdb
Check the warning files as well to make sure nothing unexpected has
happened.
Check to see if the simulations are equilibrated. We can look at the
average number of inserted GCMC water molecules for each snapshot:
python2.7 $PROTOMSHOME/tools/calc_series.py -f out/b_-3.000/results -s solventson
To perform a grand canonical integration (GCI), use:
python2.7 $PROTOMSHOME/tools/calc_gci.py -d out/b_* -v 112.0
where -v is the volume of the GCMC region.
Once the optimal B value is known, the corresponding all.pdb file
can be used to obtain water densities and water clusters (assumed
residue name for GCMC water WA1 and atom name O00 for oxygen atoms):
python2.7 $PROTOMSHOME/tools/calc_density.py -f all.pdb -r WA1 -a
O00 -o water_density.dx
python2.7 $PROTOMSHOME/tools/calc_clusters.py -i all.pdb -m WA1 -a
O00 -o water_clusters.pdb
export PROTOMSHOME=/home/majda/protoms-3.4
mpirun -n 20 $PROTOMSHOME/protoms3 run_gcmc.in