Calculating entropies

This guide will show how to estimates various entropies using different programs. All of the program that are in-house are in the development stage and are not thoroughly tested. The usage of the program will be explained with a number of examples from recent usage of them.

All the programs assumes that you have an Amber prmtop-file and at least one mdcrd-file, which contains a trajectory of snapshots. It is recommended to create a copy of the mdcrd-file where the solvent has been stripped off. In this way, the analysis is much faster.

All the in-house programs can be found in /temp4/bio/Ent/ and is written either in Fortran77 or Fortran95.

Samuel Genheden, 2012


Rules of thumb (for Galectin project)


Conformational entropy from dihedral histogramming

The program calcent exists that is able to calculate several kinds of entropies. One of those entropies is dihedral conformational entropy by histogramming.

Here is an example:

/temp4/bio/Ent/calcent << EOF > ent.out
prmtop_stripped
all_mdcrd
1
50000
1
1000
1
1
HB
1
0
72
a
EOF

The program will always start by asking after a prmtop-file and an mdcrd-file. Thereafter, it will ask for in turn the first snapshot to process, the last snapshot, and the frequency of analysis. Next, it will ask for the output frequency and at which snapshot it should start printing output.

Then, the HB command is executed, which is histogramming. This command will start by asking for which atom it should start analysing the entropy. Then it will ask for the number of mask atoms, which makes it possible to calculate entropy for a subset of atoms (however, this is not used in the above script). Finally it will ask for the bin width and the type of output, which can be t=total, r=residue, or a=atom.

The HB command will write the total entropy after the specified number of snapshots as well as the backbone entropy. At the end it will write out the total entropy, or decomposed on a residue or atom-wise basis.

The entropies are calculated atom-wise as:
      S = R [ 0.5 - ln72 - Sum(i) (p(i) lnp(i) ]
where each dihedral is binned into 72 bins (each of 5 degrees) and p(i) is the probability that a dihedral is found in bin i.
If the dihedral is freely rotating, all p(i) = 1/72 and then S = R/2 = 4.2 J/mol/K = 1.2 kJ/mol (at 300 K) (because the last two terms cancel). This is the entropy of a free rotor.
If the dihedral is fixed, then one p(i)=1 and the others = 0. Since ln(1)=0, the sum vanishes and S = R(0.5-ln72) = -31.4 J/mol/K = -9.4 kJ/mol.

As a reference for the entropies, we have used the free rotor, +R/2. This means that nearly all entropies are negative, which is somewhat confusing. A large entropy then is weakly positive entropy (up to 4.2 J/mol/K), whereas fixed groups have a strongly negative entropy (down to -31.4 J/mol/K).


Conformational entropy from MIST

Octav 12/12-18

To estimate correlation, entropies can also be calculated by employing the maximum information spanning tree algorithm [1] (MIST), with the pdb2entropy program, developed by Fogolari et al. [2].
Entropies were calculated to the tenth nearest neighbor to account for high-order correlations, whereas entropies calculated to the first nearest neighbor were considered correlation-free. A 2 ns windowing of the MD simulations was used (i.e. each of the 10 simulations was divided into 5 parts of equal length). MIST entropies are reported as TdS at 300 K.
Unfortunately, our results indicate that the entropies depend very strongly on the windowing size (100 kJ/mol difference going from 2 to 10 ns).
Therefore, we use this approach mainly to decide the importance of correlations by comparing the results using 10th nearest neighbours with 1st nearest neighbours (cf. Hyp article).
  1. King, B. M.; Silver, N. W.; Tidor, B. Efficient Calculation of Molecular Configurational Entropies Using an Information Theoretic Approximation. J. Phys. Chem. B 2012, 116, 2891-2904.
  2. Fogolari, F.; Maloku, O.; Dongmo Foumthuim, C. J.; Corazza, A.; Esposito, G. PDB2ENTROPY and PDB2TRENT: Conformational and Translational–Rotational Entropy from Molecular Ensembles. J. Chem. Inf. Model. 2018, 58, 1319-1324.


Conformational entropy from von Mises approach

Entropy from all dihedral angles

To calculate the conformational entropy using the von Mises approach in a similar fashion to the histogramming above, use also calcent:

/temp4/bio/Ent/calcent << EOF > ent.out
prmtop_stripped
all_mdcrd
1
50000
1
1000
1000
VM
1
EOF

The VM command will only asks for the start atom to analyse. It will in the end write out the entropy on a atom-wise level.

Two-dimensional backbone entropy

There is an alternative to calcent for calculating entropies using the von Mises approach. This is a two-step process: first one run a program that will write out one and two-dimensional distribution-files, and secondly another program is used to calculate entropies from these distributions. The two-dimensional is only for the psi and phi angles of the protein backbone.

Start with creating an atom correspondence file. For most situations it will just be a list of all atom numbers, e.g.,

for X in {1..2522}
do
echo $X >> corr
done

where 2522 is the total number of protein atoms. The correspondence list were created for situations in which the prmtop and the mdcrd-file do not have the same atom order. This is for instance the situation when the prmtop was created in-house but the mdcrd-file was created by a third-party (e.g. D. E. Shaw Research).

Next, create a text file called calcdist.in that look something likes this

$prmtop
prmtop_stripped
$distfile
dist
$distnr
1 10000
$corr
corr
$crds
1
all_mdcrd 1 50000

The numbers after the $distnr line is an start number and the output-frequency, respectively. The line after $distfile is an output-file prefix.

Execute

/temp4/bio/Ent/calcdist calcdist.in > calcdist.out

The program will create distribution files that are called dist_1d1, dist_2d1, and so on until dist_1d5, dist_2d5. The files with 1d contains one-dimensional distribution information for all dihedral angles, and the files with 2d contains two-dimensional distribution information. The output from the program contains important connectivity information that is used below.

To calculate the backbone entropy from the distribution files, execute the following

/temp4/bio/Ent/hist2vm2d << EOF
158
dist_2d1
dist_2d2
dist_2d3
dist_2d4
dist_2d5

EOF

The program will first ask for the number of atoms, which in this case is the number of residues. Then it will ask for a number of distribution files to analyse, it can analyse one or several. End the input with a space.

The program will write the entropy for each backbone vector, i.e. each residue, except the first.

If the entropies should be compatible with the dictionary of Bruschweiler, you have to execute hist2vm2d once for each of the distribution files and then average the entropies over all windows.

Dictionary-compatible sidechain entropy

To calculate sidechain entropies that are compatible with the dictionary of Bruschweiler (J. Am. Chem. Soc. 2009,131, 7226-7227, DOI: 10.1021/ja902477s), we can use the one-dimensional distribution files created by calcdist above.

First, execute

for X in {1..5}
do
/
temp4/bio/Ent/hist2vm << EOF > ent.out${X}
2522
dist_1d${X}

EOF
done

The program will write out the entropy for each dihedral angle, similar to the execution of calcent above. However, all of them are not relevant for the dictionary. Therefore, we need to extract certain entropies and we also need to sum up entropies for the residues.

Execute

/temp4/bio/Ent/extract_dic_ent.py pdb calcdist.out ent.out 5

This script will take pdb, which is a pdb-file with the same atom order as the prmtop-file, and the connectivity information from calcdist.out and extract the relevant residue entropies from the entropy output-files. The last two arguments are an output-file prefix and the number of output-files, respectively.

The program will for each residue write out the residue name and number, the number of dihedral angles summed up and the entropy estimate with a standard error.


Conformational entropy using quasi-harmonic analysis

There is a developmental implementation of quasi-harmonic analysis in the calcent program, but it is recommended to use the ptraj program from Amber.

Here is an example:

cat << EOF > ptraj.in
trajin all_mdcrd 1 50000
matrix mwcovar name allres :1-57 mass
analyze matrix allres thermo vecs 0
EOF

ptraj prmtop_stripped ptraj.in >& ptraj.out

The matrix and the analyze commands are fairly well explained in the ptraj manual. ptraj will write out the total entropy when it has processed all snapshots.


Rotational and translational entropy

The calcent program is able to calculate translational and rotational entropies according to an approach suggested by Carlsson and Aqvist (Phys. Chem. Chem. Phys. 8, p5385, 2006, DOI:10.1039/B608486A). It is based on fluctuations of principal components and Euler angles, for the translation and rotation, respectively.

Start by creating a copy of the mdcrd-file where everything but the ligand as been stripped off, and where the overall translation and rotation has been removed by an RMS-fit

cat << EOF > ptraj.in
trajin r1_mdcrd17.gz
rms first @C,CA,N
strip !:IN1
trajout lig_mdcrd
EOF

ptraj prmtop ptraj.in

Next execute something similar to this

/temp4/bio/Ent/calcent << EOF > transrot.out
prmtop_lig
lig_mdcrd
1
50000
1
1000
1000
TR
EOF

The TR command will write out the entropy at the specified snapshots as well as at the end.


Compile the program
(SG 16/6-15)
1) Flyttade lapack och blas filerna till en egen folder (detta är bara ifall man vill ha kvar dem)
2) Kopierade liblapack.so.3 från /usr/lib till /temp4/bio/Ent
3) Ersatte användandet av de gamla *.a filerna med liblapack.so.3
$(Linker) -o calcent -L/usr/lib/ $(Lopt1) $(calcento) $(OBJ1) eigen.o liblapack.so.3 $(Lopt2)
in makefile 4) Ersatte RdParm med AmbRdParm i calcent.f (hoppas det är rätt ändring) Lyckats kompilera alla programmen i Ent mappen. calcent kompileras mycket riktigt med make calcent Jag tror bestämt att jag tog *.a filerna från en gammal Amber distribution, men kunde inte hitta nyare så därför gjorde jag 2) istället.
*************

Downloaded new lapack and blas from
http://www.netlib.org/lapack/
http://www.netlib.org/blas/

On Alarik, I tried
cp /usr/lib64/liblapack.so.3 .
but I got
/usr/bin/ld: skipping incompatible /usr/lib//libm.so when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib//libm.so when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib//libc.so when searching for -lc
Still, the program compiled and gave an executable.

22/1-21: Compiled calcdist successfully on gefion simply by
make calcdist
after deleting a number of *.o and *.mod files.

22/1-21: Managed to compile calcent locally:
Downloaded lapack from the link above (includes blas already).
tar -xvf
cp make.inc.example make.inc
make all
copied liblapack.a and librefblas.a to /temp4/bio/Ent
inserted them into makefile (instead of liblapack.so.3
make calcent


Leap file to use TIP4P water and special boxes

addPath /lunarc/nobackup/users/martin/GPU/Galectin3/Leap

source leaprc.ff14SB
source leaprc.gaff
loadAmberPrep l1c.in
loadAmberParams frcmod.tip4pew
loadAmberParams gal3.dat
loadOff TIP4PEWBOX18
WAT = T4E
HOH = T4E

x=loadpdb gal3-l1c.pdb
solvateOct x SBOX 10
saveamberparm x prmtop prmcrd

quit


Sample sander.in files for a long MD simulation with PBE (e.g. entropy)

Use TIP4P water


sander.in1
Restrained minimization
  &cntrl
   irest=0,ntx=1,
   imin=1,maxcyc=1000,drms=0.0001,ntmin=2,
   ntc=2,ntf=2,
   cut=8.0,
   ntpr=100,ntwx=0,ntwv=0,ntwe=0
,ioutfm=0,ntxo=1,
   ipol=0,igb=0,
   ntr=1,restraint_wt=100,
   restraintmask="!:WA= & !@H="
  &end

sander.in2
Restrained NVT equilibration
  &cntrl
   irest=0,ntx=1
,ig=-1,
   nstlim=10000,dt=0.002,
   temp0=300.0,ntt=3,gamma_ln=2.0,
   ntc=2,ntf=2,
   cut=8.0,
   ntpr=500,ntwx=0,ntwv=0,ntwe=0,
iwrap=1,ioutfm=0,ntxo=1,
   ntb=1,
   ipol=0,igb=0,
   ntr=1,restraint_wt=50,
   restraintmask="!:WA= & !@H="
  &end

sander.in3
Restrained NPT equilibration
  &cntrl
   irest=1,ntx=5,
   nstlim=10000,dt=0.002,
   temp0=300.0,ntt=3,gamma_ln=2.0,
   ntc=2,ntf=2,
   cut=8.0,
   ntpr=500,ntwx=0,ntwv=0,ntwe=0
,iwrap=1,ioutfm=0,ntxo=1,
   ntb=2,ntp=1,pres0=1.0,taup=1.0,
   ipol=0,igb=0,
   ntr=1,restraint_wt=50,
   restraintmask="!:WA= & !@H="
  &end

sander.in4
Equilibration, 1 ns NPT
  &cntrl
   irest=1,ntx=5,
   nstlim=500000,dt=0.002,
   temp0=300.0,ntt=3,gamma_ln=2.0,
   ntc=2,ntf=2,
   cut=8.0,
   ntpr=500,ntwx=0,ntwv=0,ntwe=0,iwrap=1
,ioutfm=0,ntxo=1,
   ntb=2,ntp=1,pres0=1.0,taup=1.0,
   ipol=0,igb=0,
   ntr=0
  &end

sander.in5
Production 10 ns, sampling frequency 10 ps
  &cntrl
   irest=1,ntx=5,
   nstlim=5000000,dt=0.002,
   temp0=300.0,ntt=3,gamma_ln=2.0,
   ntc=2,ntf=2,
   cut=8.0,
   ntpr=5000,ntwx=5000,ntwv=0,ntwe=0,iwrap=1
,ioutfm=0,ntxo=1,
   ntb=2,ntp=1,pres0=1.0,taup=1.0,
   ipol=0,igb=0,
   ntr=0
  &end