GCMC simulations
    Majda 28/6-19
    
    This document describes how to setup, run and analyze grand
    canonical Monte Carlo (GCMC) simulations using the ProtoMS software.
    More about the method can be found in this paper:
    
    Ross, Bodnarchuk, & Essex, 2015
    
    Prerequisites:
    
      - Amber software
- ProtoMS software
- A pdb file of your protein which was prepared following the
        instructions found here,
        protein.pdb
        
      
- 
        
        
         A pdb file of your ligand, ligand.pdb,
        as well as Amber lig.in and lig.dat files
    
    1. Protein 
    
    It is recommended that the protein is minimized before running a
    GCMC calculation. This can be done with Amber. Execute the two
    following scripts:
    
    prot1.sh
    cat << EOF > prot_leap.in
    source leaprc.ff14SB
    x=loadpdb protein.pdb
    saveamberparm x prot.prm prot.rst
    quit
    EOF
    
    tleap -f prot_leap.in > prot_leap.out
    
    prot2.sh
    cat << EOF > prot_min.in
      
      &cntrl
         irest=0,ntx=1,
         imin=1,maxcyc=100,drms=0.0001,ntmin=2,
         ntc=1,ntf=1,
         cut=20.0,
         ntpr=100,ntwx=0,ntwv=0,ntwe=0,
         ipol=0,igb=0,ntb=0,
       &end
    EOF
    
    sander -O -i prot_min.in -o prot_min.out -p prot.prm -c prot.rst -r
    prot_mincrd
    
    ambpdb -p prot.prm < prot_mincrd > prot_mincrd.pdb
    header=$(sed -n '1p' protein.pdb)
    echo $header > protein.pdb
    sed "/REMARK/d" prot_mincrd.pdb >> protein.pdb
    
    The residue and atom names in the protein.pdb file should then be
    converted to the ProtoMS format:
    
    python2.7 $PROTOMSHOME/tools/convertatomnames.py -p protein.pdb -o
    protein_pms.pdb
    
    This gives protein_pms.pdb file that is used in the GCMC
    simulations.
    
    
    2. Ligand 
    
    First, minimize the ligand:
    
    lig1.sh
    cat << EOF > lig_leap.in
    source leaprc.gaff2
    loadAmberPrep lig.in
    loadAmberParams lig.dat
    x=loadpdb ligand.pdb
    saveamberparm x lig.prm lig.rst
    quit
    EOF
    
    tleap -f lig_leap.in > lig_leap.out
    
    lig2.sh
    cat << EOF > lig_min.in
      
      &cntrl
         irest=0,ntx=1,
         imin=1,maxcyc=100,drms=0.0001,ntmin=2,
         ntc=1,ntf=1,
         cut=20.0,
         ntpr=100,ntwx=0,ntwv=0,ntwe=0,
         ipol=0,igb=0,ntb=0,
       &end
    EOF
    
    sander -O -i lig_min.in -o lig_min.out -p lig.prm -c lig.rst -r
    lig_mincrd
    
    ambpdb -p lig.prm < lig_mincrd > lig_mincrd.pdb
    header=$(sed -n '1p' ligand.pdb)
    echo $header > ligand.pdb
    sed "/REMARK/d" lig_mincrd.pdb >> ligand.pdb
    
    Next, build a ProtoMS template file ligand.tem (assumed residue name
    ~LIG~):
    
    python2.7 $PROTOMSHOME/tools/build_template.py -p lig.in -f lig.dat
    -n LIG -o ligand.tem
    
    
    3. GCMC box
    
    The next step is to setup the box region around the binding site.
    Usually, a box that is centered on the ligand and extends at least 3
    Å on each side of the ligand is used. To define this box, use:
    
    python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -s ligand.pdb -p 3 -o
    gcmc_box.pdb
    
    
    4. Water
    
    The protein is then solvated in a droplet of water molecules
    extending at least 10 Å from the solute:
    
    python2.7 $PROTOMSHOME/tools/solvate.py -pr protein_pms.pdb -g
    droplet -p 10 -b $PROTOMSHOME/data/wbox_tip4p.pdb -o
    solvent_water.pdb
    
    The program requires that water molecules within the box (GCMC water
    molecules) are kept in a separate file and have a different residue
    name (WA1) from the regular solvent water molecules in
    solvent_water.pdb. Therefore, we must generate a file containing
    these water molecules:
    
    python2.7 $PROTOMSHOME/tools/solvate.py -s gcmc_box.pdb -g flood -o
    gcmc_water.pdb -b $PROTOMSHOME/data/wbox_tip4p.pdb 
    
    and delete the corresponding water molecules from the
    solvent_water.pdb file:
    
    python2.7 $PROTOMSHOME/tools/clear_gcmcbox.py -b gcmc_box.pdb -s
    solvent_water.pdb -o solvent_water_clear.pdb 
    
    For more options regarding the setup of your system, please check
    
    ProtoMS
          documentation
    
    
    webpage, or use -h argument to check the usage of a script. 
    
    
    5. Executing ProtoMS 
    
    We now have all the files needed to run the simulations. 
    
    protein_pms.pdb
    ligand.pdb
    ligand.tem
    gcmc_box.pdb
    gcmc_water.pdb
    solvent_water_clear.pdb
    
    This is how a typical GCMC run file looks like:
    run_gcmc.in
    parfile $PROTOMSHOME/parameter/amber14SB.ff
    parfile $PROTOMSHOME/parameter/solvents.ff
    parfile $PROTOMSHOME/parameter/amber14SB-residues.ff
    parfile $PROTOMSHOME/parameter/gaff14.ff
    parfile ligand.tem
    protein1 protein_pms.pdb
    solute1 ligand.pdb
    solvent1 solvent_water_clear.pdb
    outfolder out
    streamheader off
    streamdetail off
    streamwarning warning
    streaminfo info
    streamfatal fatal
    streamresults results
    streamaccept accept
    cutoff 10.0
    feather 0.5
    temperature 25.0
    ranseed 5793757
    boundary solvent
    pdbparams on
    #  GCMC specific parameters
    gcmc 0
    parfile $PROTOMSHOME/data/gcmc_tip4p.tem
    grand1 gcmc_water.pdb
    multigcmc -20.000 -19.000 -18.000 -17.000
      -16.000 -15.000 -14.000 -13.000 -12.000 -11.000 -10.000 -9.000
      -8.000 -7.000 -6.000 -5.000 -4.000 -3.000 -2.000 -1.000
    originx -32.947
      originy 1.616
      originz -5.943 
      x 26.277
      y 12.870
      z 14.993
    #  End of GCMC specific parameters
    dump 200000 results write results
    dump 200000 pdb all file=all.pdb standard
    dump 200000 restart write restart
    dump 200000 averages reset
    chunk equilibrate 5000000 solvent=0 protein=0 solute=0 insertion=200
    deletion=200 gcsolute=200
    chunk equilibrate 5000000 solvent=200 protein=200 solute=200
    insertion=200 deletion=200 gcsolute=200 
    chunk simulate 200000000 solvent=200 protein=200 solute=200
    insertion=200 deletion=200
    gcsolute=200     
#############################################################################################
    
    Under multigcmc specify the B values for your run. Values for the
    originx, originy, originz, x, y and z can be read from the
    gcmc_box.pdb file. Again, check the
    
    ProtoMS
          documentation 
    webpage
    
    
     for more options.
    
    To run the simulations on Aurora, use:
    q-gcmc
    #!/bin/bash
    #SBATCH -N 1
    #SBATCH -n 20
    #SBATCH -t 168:00:00
    #SBATCH -J gcmc
    #SBATCH -A snic2018-2-24
    
    cd $SLURM_SUBMIT_DIR
    
    export PROTOMSHOME=/home/majda/protoms-3.4
    
    mpirun -n 20 $PROTOMSHOME/protoms3 run_gcmc.in
##############################################################################################
    
    It is also possible to use $PROTOMSHOME/tools/protoms.py to
    automatically setup the calculations. 
    
    
    6. Analysis
    
    In the out directory specified in the run file, you can find
    subdirectories for each B value (e.g. b_-3.000), which contain the
    output files. These are: 
    
    warning
    restart.prev
    results
    all.pdb 
    restart
    info
    accept 
    
    The simulations can be checked visually with Pymol, for example:
    
    pymol out/b_-3.000/all.pdb
    
    Check the warning files as well to make sure nothing unexpected has
    happened.
    
    Check to see if the simulations are equilibrated. We can look at the
    average number of inserted GCMC water molecules for each snapshot:
    
    python2.7 $PROTOMSHOME/tools/calc_series.py -f out/b_-3.000/results -s solventson
    
    To perform a grand canonical integration (GCI), use:
    
    python2.7 $PROTOMSHOME/tools/calc_gci.py -d out/b_* -v 112.0
    
    where -v is the volume of the GCMC region.
    
    Once the optimal B value is known, the corresponding all.pdb file
    can be used to obtain water densities and water clusters (assumed
    residue name for GCMC water WA1 and atom name O00 for oxygen atoms):
    
    python2.7 $PROTOMSHOME/tools/calc_density.py -f all.pdb -r WA1 -a
    O00 -o water_density.dx
    
    python2.7 $PROTOMSHOME/tools/calc_clusters.py -i all.pdb -m WA1 -a
    O00 -o water_clusters.pdb
    export PROTOMSHOME=/home/majda/protoms-3.4
    
    mpirun -n 20 $PROTOMSHOME/protoms3 run_gcmc.in