Computing relative free energies with FEP using Q

This guide will show how to set up free energy perturbation (FEP) calculations in Q. The FEP calculations are used to compute the relative free energy between to ligand molecules, typically drug molecules. FEP calculations will be done when the ligand is bound to a protein and when it is free in solution.

This guide will show how to setup the calculations more or less black-box. A detailed explanation of the script can be found here.

A description of the program pre-requisite can be found here.

An additional guide, that will show how to set up these calculations for a number of ligands can be found here.

Samuel Genheden, 2011-2012

Prerequisite

You need to have:

two leap prep-files with file ending .in, e.g. l01.in and l02.in. It is important the ligand atoms have the same name in both .in-files, apart from the atoms that will be perturbed.
a pdb-file with hydrogen atoms and with the first ligand (l01) bound. It should be named prot-l01.pdb, where prot is the protein name and l01 is the name of the first ligand.

Note!: The direction off the perturbation is important in the setup, due to technical reasons. The first ligand (l01) should have more or equal number of atoms than the second ligand (l02).
Note 2!:The naming of the input files are important. For instance the protein and ligand name in the pdb-file should be separated by a dash (-), not an underscore.

The bash script will create leap files that use the Amber99SB and GAFF force field for the protein and ligand, respectively. Water molecules will be described using TIP3P.
If the system requires additional parameters, these should be added to a file called leap_lines

Before we start, make sure that the $QPREP path is set. Locally it should be,

export QPREP=/away/bio/Q/Q/Utils

Creating preparatory files

If the complex contains ions or co-factors, or if the ligand itself needs additional parameters, the following steps need to be executed.

First, we will create a file called leap_lines that tells leap to load the additional information. This will be added to the leap input file that the script is created. Here is an example for a Ca2+ ion.

cat << EOF > leap_lines loadamberprep ca.in loadamberparams ca.dat EOF
Next, we need to create a qlib file for the ion, which is the corresponding file in Q.

python $QPREP/in2qlib.py ca.in > ca.qlib
And then we need to create a qprep_lines file that tells Qprep5 to read this qlib-file

cat << EOF > qprep_lines readlib ca.qlib EOF

Now, we are ready to execute the main script.

Executing prep_fepq.bash

Type:
bash $QPREP/prep_fepq.bash L01 L02 fxa-l01.pdb 20

L01 and L02 are the residue names of the two ligands. L01 will be perturbed to L02. fxa-l01.pdb is the name of the protein-ligand pdb-file. The protein name will be taken as the characters before the dash (-). Finally, 20 is the desired size of the water droplet and thereby the simulated system size.

This script will, in order do:

Center the complex on the ligand

Solvate the complex using Qprep5

When this is finnished, there will be text telling if a pdb-file was made successfully. If not, some error occured with QPrep5.

Solvate the free ligand using Qprep5

When this is finnished, there will be text telling if a pdb-file was made successfully. If not, some error occured with QPrep5.

Create Amber prmtop and prmcrd files

When this is finnished, the script will list the prmtop files (one for the complex and one for the ligand). If an error occurred, they will not exists or have zero-size

Minimize the complex and the free ligand

Create Q prmtop files

changeparm has a mysterious bug, so sometimes there will be some error messages. Usually, it is nothing to worry about.
When this is finnished, the script will list the qparams files (one for the complex and one for the ligand). If an error occurred, they will not exists.

Create perturbation files

You will be asked to enter corresponding atoms for atoms that differ between the two ligands. If a corresponding atom do not exists, i.e., it should be perturbed to a dummy atom, just enter blank.

Create Q input-files and queue scripts

You will be asked to enter the length of the simulations, the number of transformations (1 or 2), the lambda-values, and the cluster where the calculations should be executed.

When this script is finnished, the preparatory files will be in Fep_prep and the files necessary for the simulations in Fep_sim. Copy the files in Fep_sim to a cluster and execute the queue scripts. Usually you only need to execute a bash script that was created in the very last step above.

Cluster library

The last task in the list above, will create queue script that is appropriate for a specific cluster. The script that generates these scripts (make_fepinp.py) uses a file called cluster_lib to know cluster-specific commands. One can edit this file manually, to support additional clusters. An example is shown below for the Akka cluster at HPC2N.

%Akka #!/bin/bash #PBS -l nodes=1:ppn=8 #PBS -l walltime=WWW:00:00 #PBS -A SNIC014-10-24 cd $PBS_O_WORKDIR module add intel-mpi/3.1.026 module add openmpi/intel export QHOME=/home/g/genheden/pfs/Q/bin/ export PATH=$QHOME:$PATH export PATH=$PBS_O_PATH:$PATH qsub Qdyn5 mpirun -np 8 Qdyn5p %end

The first line should start with a percentage sign (%) followed by the name of the cluster.

The cluster specification should end with the line %end.

The last three lines of the specifications are special. First the command used on the cluster to add jobs to the queue are specified (e.g. qsub), second the command used to execute the serial version of Qdyn5, and third the command used to execute the parallel version of the same program.

All other lines in the specification are appended to the start of the queue script and should contains queue-specific lines (e.g. starting with #PBS), modules that need to be loaded and paths that need to be set. The QHOME path must be set. Also, the number of nodes need to be set manually. At the moment, the make_fepinp.py assumes that at least 8 cores are available for execution of the calculations. The WWW mask is special and will be replaced by an estimated walltime (in hours).

Calculating the free energies

This is done with the calc_dg.py script

Detailed guide to the prep_fepq.bash script

To perform this guide, one need to complete the steps in the guide above, before the execution of prep_fepq.bash

Before we begin preparation, let us create two folders Fep_prep and Fep_sim. Then move all the interesting files to this folder and finally, change to this folder.

mkdir Fep_prep mkdir Fep_sim mv *in *dat *lib *lines *frcmod fxa-l01.pdb Fep_prep/ cd Fep_prep

First, the protein--ligand complex is centered on the centre of geometry of the ligand using changepdb. It is also of interest to remove far-away water molecules. The default is to remove any water that is more than R-2 Å, where R is the simulation radius. The pdb filename and the ligand residue name is taken from the command line arguments

changepdb << EOF > center.log fxa-l01.pdb m l L01 row 18 w EOF

Next, changepdb will be used again to renumber the atoms from 1, and creating a cyx.dat file that can be used in leap to create cys-cys linkages

changepdb << EOF > cyx.log fxa-l01.pdb REN 1 CYX w EOF

Now, when all the preparatory steps have been done, we will use Qprep5 to solvate the complex and the free ligand

First, we need to create a qlib file, which is a file, similar to an in-file, that describes the ligand

python/away/bio/Q/Q/Utils/in2qlib.py l01.in > l01.qlib

Second, the Amber-formatted pdb-file need to be converted to a Q-formatted pdb file

python /away/bio/Q/Q/Utils/amb2q.py fxa-l01.pdb > fxa-l01q.pdb

Then we need to create the input file to Qprep5 from several files. The qprep_lines file is provided by the user.

cat << EOF > qprep_prot.inp readlib /away/bio/Q/Q/Data/qamber95.lib readlib /away/bio/Q/Q/Data/qamber95_neu.lib readlib l01.qlib readprm /away/bio/Q/Q/Data/qamber95.prm readprm /away/bio/Q/Q/Data/qgaff.prm EOF cat qprep_lines >> qprep_prot.inp cat << EOF > >> qprep_prot.inp readpdb fxa-l01q.pdb boundary 1 0 0 0 20 solvate boundary 20 1 HOH maketop qtop writepdb fxa-l01solvq.pdb y quit EOF Qprep5 < qprep_prot.inp > qprep_prot.log Now, we need to convert the Q-formatted file to Amber format, and we also need to be sure to add TER records.

python /away/bio/Q/Q/Utils/q2amb.py fxa-l01solvq.pdb > fxa-l01solv.pdb changepdb << EOF > ter.log fxa-l01solv.pdb ter 1 w EOF

And now the same for the free ligand:

grep L01 fxa-l01.pdb > l01.pdb python /away/bio/Q/Q/Utils/amb2q.py l01.pdb > l01q.pdb cat << EOF > qprep_lig.inp readlib /away/bio/Q/Q/Data/qamber95.lib readlib /away/bio/Q/Q/Data/qamber95_neu.lib readlib l01.qlib EOF cat qprep_lines >> qprep_lig.inp cat << EOF >> qprep_lig.inp readprm /away/bio/Q/Q/Data/qamber95.prm readpdb l01q.pdb boundary 1 0 0 0 20 solvate boundary 20 1 HOH maketop qtop writepdb l01solvq.pdb y quit EOF Qprep5 < qprep_lig.inp > qprep_lig.log python /away/bio/Q/Q/Utils/q2amb.py l01solvq.pdb > l01solv.pdb

The next step is to create the Amber prmtop and prmcrd files. The script is a little bit cumbersome, because we need information from several external files:

First we create a leapcom file by merging information from several files. leap_lines is provided by the user, and cyx.dat was created above

cat << EOF > leap_solv.inp source leaprc.ff99SB loadamberparams gaff.dat loadamberprep l01.in EOF cat leap_lines >> leap_solv.inp cat << EOF >> leap_solv.inp x = loadpdb fxa-l01solv.pdb EOF cat cyx.dat >> leap_solv.inp cat << EOF >> leap_solv.inp solvateCap x TIP3PBOX {0 0 0} 20 1000 saveamberparm x fxa-l01.prm fxa-l01.rst x = loadpdb l01solv.pdb solvateCap x TIP3PBOX {0 0 0} 20 1000 saveamberparm x l01.prm l01.rst quit EOF

Finally, we can execute tleap

tleap -f leap_solv.inp >& leap_solv.log

Next, we will perform a short constrained minimization of the water molecules and hydrogen atoms. This can be done with sander

cat << EOF > sander.in Restrained minimization &cntrl irest=0,ntx=1, imin=1,maxcyc=100,drms=0.0001,ntmin=2, ntc=2,ntf=2, cut=20.0, ntpr=100,ntwx=0,ntwv=0,ntwe=0, ipol=0,igb=0,ntb=0, ntr=1,restraint_wt=100, restraintmask="!:WA= & !@H=" &end EOF sander -O -i sander.in -o sander.out_prot -p fxa-l01.prm -c fxa-l01.rst -ref fxa-l01.rst -r fxa-l01.mdrest0 sander -O -i sander.in -o sander.out_wat -p l01.prm -c l01.rst -ref l01.rst -r l01.mdrest0

Alternatively, the minimization can be done with the nab program nabmin which is a good cost-free alternative:

Start with creating a pdb-file of the system

changeparm << EOF > min.log fxa-l01.prm p unmin.pdb m fxa-l01.rst q EOF

then execute the program:

/away/bio/Q/Q/Utils/nabmin fxa-l01.prm unmin.pdb min.pdb >> min.log

and finally, convert the pdb-file to an mdrest-file

changepdb << EOF >> min.log min.pdb wm fxa-l01.mdrest0 q EOF

And then for the free-ligand

changeparm << EOF >> min.log l01.prm p unmin.pdb m l01.rst q EOF

/away/bio/Q/Q/Utils/nabmin l01.prm unmin.pdb min.pdb >> min.log

changepdb << EOF >> min.log min.pdb wm l01.mdrest0 q EOF

Now, when we have the minimized systems, we can create the Q prmtop file. First for the complex and then for the free ligand. Unfortunately, there is some mysterious bug in changeparm, so the commands might have to be repeated until the qparm-files exists.

changeparm << EOF >& create_qparm1.log fxa-l01.prm qparm m fxa-l01.mdrest0 20 20 fxa-l01.qparm q EOF changeparm << EOF >& create_qparm2.log l01.prm qparm m l01.mdrest0 20 20 l01.qparm q EOF

Now it is time to introduce the second ligand, and create a perturbation file that describes the difference between the two ligands.

First, we create a dummy prmtop file for the two ligands

cat << EOF > leap_lig.inp source leaprc.gaff loadamberprep l01.in loadamberprep l02.in EOF cat leap_lines >> leap_lig.inp cat << EOF >> leap_lig.inp saveamberparm L01 a.prm a.rst saveamberparm L02 b.prm b.rst quit EOF leap -f leap_lig.inp >& leap_lig.out

Then, we use changeparm to get charge and van der Waals parameters for our two ligands. This will create a .charges and a .vdws for each of the ligands.

changeparm << EOF > info1.log a.prm qfep L01 q EOF changeparm << EOF > info2.log b.prm qfep L02 q EOF

The .charges and .vdws files will then be used by a Python script to create the Q perturbation files. This script will ask for atoms in L01 that do not have any corresponding atom in L02. If such atoms should be perturbed to dummy atoms, just leave the corresponding atom blank.

python /away/bio/Q/Q/Utils/make_pert.py l01 l02

The script will create three perturbation files to be used with the complex. One for only electrostatic perturbation, one for only van der Waals perturbation and finally, one if these should be done at the same time. We will create the corresponding files for use with the free ligand by removing the offset-line:

sed "/OFF/d" ele_prot.fep > ele_wat.fep sed "/OFF/d" vdw_prot.fep > vdw_wat.fep sed "/OFF/d" elevdw_prot.fep > elevdw_wat.fep

Then, we will substitue the OFF mask with the real offset, i.e. the first ligand atom minus 1

temp=`grep L01 fxa-l01.pdb | head -n 1 | awk '{print substr($0,6,6)}'` nprot=`echo " ${temp} - 1" | bc` sed -i "s/OFF/$nprot/" ele_prot.fep sed -i "s/OFF/$nprot/" vdw_prot.fep sed -i "s/OFF/$nprot/" elevdw_prot.fep

Before proceeding, we will clean up a little bit

tar -cjf sav.tar.bz2 a.* b.* *charges *vdws *solv.pdb *q.pdb *log *dat *inp mdinfo sander.out_* rm -f a.* b.* *charges *vdws *solv.pdb *q.pdb *log *dat *inp mdinfo sander.out_*

Finally, we will move to the Fep_sim folder and create input files for Q and queue-scripts. The python script will ask about equilibration and production length, number of transformations (1 or 2), the lambda-values and the cluster that the calculation should be run on.

cp -p *qparm *fep /away/bio/Q/Q/Utils/cluster_lib ../Fep_sim cd ../Fep_sim python /away/bio/Q/Q/Utils/make_fepinp.py fxa l01 20
cluster_lib contains information about the clusters, the way Q should be executed and modules that need to be loaded.

Program pre-requisite

The following programs needs to exists on your machine:

Q - with modifications by me and G. Wallin.
AmberTools including tleap and nab. antechamber is required to create new force fields.
A set of utility scripts, these can be found in /away/bio/Q/Q/Utils
A nab program called nabmin needs to be compiled. It is located in /away/bio/Q/Q/Utils as well.
multirun.py that are located in /away/bio/Q/Q/bin and always should be in bin folder.
changepdb
changeparm

Compiling Q

The code downloaded from Aqvist homepage do not work properly with MPI on the clusters. G. Wallin sent me an updated code that made it work.

Thereafter, I have made changes to the code to support soft-core Coloumb potentials. Therefore, it exsist three versions of md.f90, the original one, md_scc.f90 which implements the soft-core Coloumb but retains the original soft-core Lennard-Jones potential, and md_sc2.f90 which implements a lambda-dependent soft-core Lennard-Jones potential. I recommend to use md_sc2.f90 instead of md_scc.f90. Consequently there is three different versions of Qdyn5 and Qdyn5p (the parallel version of the former), with similar endings. Qdyn5p_sc2 will be compiled upon parallel installation.

I have not been able to compile to code with any other compiler than the Intel ifort compiler.

Install locally:
Locally, ifort is installed only on the gefion machine. Type the following commands:
export PATH=$PATH:/opt/intel/bin/ make ifort_unopt mv Q* ../bin/

Install on Abisko (similar on the other clusters):
module add intel-fortran module add openmpi/intel make ifc make ifort_unopt mv Q* ../bin/

Compiling nabmin

export AMBERHOME=/away/bio/AMBER/Amber11/ $AMBERHOME/exe/nab nabmin.nab -o nabmin