calc_dg.py

This document describes calc_dg.py, which is a complete program to read output files from Amber, Gromos, or Q, and estimate free energy differences.

Currently, the program is able to read the following output files:

Amber output files with only <DV/DL> records
Amber output files with both <DV/DL> and BAR-energies (Amber11 and newer)
Amber output files with <DV/DL> decomposition energies
Gromos output files with <DV/DL> records
Pure energy files with records of V0_tot and V1_tot energies. See below
Q output files

The program can be used to estimate Bennet's acceptance ratio (BAR), free energy peturbation (FEP) for forward or reverse direction as well as their average, and thermodynamic integration (TI). The following estimators can be used with the different output files

BAR: 2, 5, 6
FEP: 2, 5, 6
TI: 1, 2, 3, 4, 6

The recommended usage of the program is:

calc_dg.py calc.in

where calc.in is a textfile controlling the behaviour of calc_dg.py.

The code is written in Pyhton and located locally in /away/bio/Dg_tools/

Control file

The control file contains values of variables determining the execution of calc_dg.py. Each variable name starts with an ampersand (&) and the rows that follow the variable name sets the variable and associated variables.
Note that the file cannot contain any emptly lines - it will give an error.

Currently, the following variables can be set:

&base_path - sets a prefix to the output filenames
The value of a shell variable can be used if the string starts with a dollar sign ($), e.g. $FEP_PATH An absolute path must end with a slash (/)
&filenames - the filenames of the output names
First line after this should contain an integer, n, that is the number of output files.
The n lines after this should contain the filenames
&expand_filenames - if the filenames should be expanded to concatenate output files
If for instance the specified filname is wat_l005.out3, and &expand_filenames=2, then the program reads first wat_l005.out3, then wat_l005.out4, and finally, wat_l005.out5. It only works if the last character of the filename is an integer. Note: at the moment, this is only implemented for Q-files.
&select_frames - controls how many snapshots to read
The line should be a string with one, two or, three columns of an integer.
If there is a single integer, this is the first snapshot to read
If there is two columns, the first integer is the first snapshot (set to 0 to read from the beginning), and the second is the last snapshot.
If there is three columns, the first integer is the first snapshot to read (set to 0 to read from the beginning), the second integer is the last snapshot to read (set to -1 to read until the end), and the third integer is the stride.
&application - sets the type of output files
The recognized values are
1. Amber
2. Amber11
3. Decomp
4. Gromos
5. Pure
6. Q
&estimates - sets the type of estimator
Can be one or several of: BAR,FEP, or TI. If several estimates are desired, separate them with space. As a default, all the supported estimators for the specified application is used.
&nsim - sets the number of independent simulations
Should be a positive integer, n. Set to 0 or omit it if there is no independent simulations. If larger than 0, the program will substitute an asterix (*) in the filenames with an order number from 1 to n.
&bootstrap - turn on bootstrapping of uncertainties
Should be a positive integer, equal to the number of bootstrap samples.
&qextract - turn on customized Q parsing
The first line should be a string that is matched in the output files. The following strings are recongnized:
1. "Q-Q " - extract internal energies
2. "Q-prot " - extract interaction energies with the protein
3. "Q-wat " - extract interaction energies with the water
4. "Q-surr. " - extract intermolecular energies
5. "Q-any " - extract total energies
The second line should be a positive integer, either 3 or 4 (or both separated by a space), and tells the program to extract the electrostatic or van der Waals component(s).
&dextract - turn on customized Amber decomposition parsing
The first line should be either TDC, BDC, or SDC, and tells the program to extract the total, backbone, or side-chain component of the decomposition, respectively. The default is TDC.
The second line should be a positive integer, 2, 3, or 4 (or several separated by space(s)), and tells the program to extract the internal, electrostatic, or van der Waals component(s) of the free energy. The defaul is all three of them.
The third line is a residue specification. If there is only a single integer, the program extract the free energy for only this residue. If there is two columns of integer, this is a range specification (use : as a wild card to specify start or end). If there are more integers, each of them are interpreted as individual residues.
&weighted - turn on non-simulated weighting of energies
The specification is identical to &filename. Note that the number of specified filenames need to be identical. Each file should contain a number of single-column rows with the total energy differences. The weights will be calculated in the program by exponentiating the differences.
&extrapolate - turn on extrapolation to 0 and 1
Set to 1 to turn it on (default), and 0 to turn if off.

&diagnostics - turn on/off diagnostics
The first line is an integer. If it is 1 it turns on diagnostics, other values turn it off. Default is turned off. This can only be run with output filetypes that supports BAR and FEP estimates. The second line that is optional specifies the prefix to output files generated by the diagnostics.

1) Omega vtot ; beräknat överlapp i total energin mellan två lambda-grannar.
dvs u(lambda) så överlappet är med u(lambda) samplat vid lambda och u(lambda+1) samplat vid lambda+1
2) Omega_f : här har jag tagit U(lambda) samplat vid lambda och beräknat överlapp med U(lambda) samplat vid lambda+1
3) Omega_r ; här har jag tagit U(lambda+1) samplat vid lambda och beräknat överlapp med U(lambda+1) samplat vid lambda+1
4) Omega_d ; här tittar jag på work-values, dvs W1=U(lambda+1)-U(lambda) samplat vid lambda och W2=U(lambda+1)-U(lambda) samplat vid lambda+1 och beräknat överlappet på det.
5) Omega_dt ; som Omega_d, men jag har försökt och uppskattat överlappet bara på svansregionen, som jag godtyckligt har tagit att vara den lägre 25% kvartilen. 

Maxratf är delen av den största boltzmann faktorn som bidrar till exp averaging för framåt perten, maxratb är samma men åt andra hållet (båda i procent).

Filerna som skrivs ut skall vara histogrammen som ligger bakom överlappen. Första kol är bin, andra och tredje är histogrammen, fjärde är överlappet, sqrt(n1*n2).

&print - details the printing
Should be two integers, separated by space. If the first is 1, the total dG is printed out, and if the second is 1, the uncertainty of the total dG is printed out. Other values turns this printing off.
&verbosity - sets the print level
Should be a positive integer that sets the level of printed text.

Example of basic control files:

Read 3 Amber11 output files and perform bootstrap and diagnostics:
&base_path $FEP_PATH &filenames 3 prot_l005_v0_sander.out4 prot_l050_v0_sander.out4 prot_l095_v0_sander.out4 &application Amber11 &bootstrap 200 &diagnostics 1 prot_

Read 3 Q output files divided into two output files for each lambda, and calculate TI of intermolecular energies:
&base_path $FEP_PATH &filenames 3 prot_l005_q.out3 prot_l050_q.out3 prot_l095_q.out3 &expand_filenames 1 &application Q &estimates TI &qextract Q-surr. 3 4

Similar to the one above, but here we have a simulated ligand in water with Amber
&base_path $FEP_PATH &filenames 3 prot_l005_q.out4 prot_l050_q.out4 prot_l095_q.out4 &application Decomp &estimates TI &dextract TDC 3 4 2 :

We have run 5 independent simulations with Amber 11
&base_path $FEP_PATH &filenames 3 prot_r*_l005_v0_sander.out4 prot_r*_l050_v0_sander.out4 prot_r*_l095_v0_sander.out4 &nsim 5 &application Amber11

Pure energy file format

The pure-energy files should have the following format

The first line should contain two numbers: n, the number of data points, and l, the value of the coupling parameter (lambda).
The next n lines should contain, in free-format, two columns of data values. The first should be the total energy of V0, and the second should be the total energy of V1.

Note! The mixing of of V0 and V1, is as follow: V = l V0 + (1 - l) V1, i.e., the opposite of what is done in for instance Amber, where V = (1 - l) V0 + l V1 is used. If you want the Amber-type of mixing, just specify 1 - l on the first row.

Example:
20 0.99 -765.1850 -733.4273 -790.8062 -752.5439 -717.3305 -680.7185 -679.1150 -653.5069 -699.8720 -669.2577 -671.9884 -637.7892 -760.1654 -718.7986 -771.3860 -726.2168 -757.0964 -721.2894 -796.7219 -765.2958 -693.3311 -665.8190 -745.6914 -728.7871 -696.9361 -673.9320 -684.1877 -666.7317 -710.9796 -681.1000 -724.3771 -686.8643 -718.2849 -700.4401 -773.8563 -732.4859 -741.6559 -711.3101 -744.6763 -708.1315

This is an energy file for 20 snapshots sampled with Amber at l = 0.01.

Paulius' result.bash script
#!/bin/bash
# 2013-05-31
# PM

cat << EOF > prot.in
&base_path
/platon/nobackup/users/ulf/Sampl4/Oa/Fep/1-4/
&filenames
11
prot_l005_v0_sander.out4
prot_l010_v0_sander.out4
prot_l020_v0_sander.out4
prot_l030_v0_sander.out4
prot_l040_v0_sander.out4
prot_l050_v0_sander.out4
prot_l060_v0_sander.out4
prot_l070_v0_sander.out4
prot_l080_v0_sander.out4
prot_l090_v0_sander.out4
prot_l095_v0_sander.out4
&application
Amber11
&estimates
TI BAR
&nsim
1
&select_frames
0 -1 10
&verbosity
3
&diagnostics
0
&bootstrap
100
EOF
cat << EOF > wat.in
&base_path
/platon/nobackup/users/ulf/Sampl4/Oa/Fep/1-4/
&filenames
11
wat_l005_v0_sander.out4
wat_l010_v0_sander.out4
wat_l020_v0_sander.out4
wat_l030_v0_sander.out4
wat_l040_v0_sander.out4
wat_l050_v0_sander.out4
wat_l060_v0_sander.out4
wat_l070_v0_sander.out4
wat_l080_v0_sander.out4
wat_l090_v0_sander.out4
wat_l095_v0_sander.out4
&application
Amber11
&estimates
TI BAR
&nsim
1
&select_frames
0 -1 10
&verbosity
3
&diagnostics
0
&bootstrap
100
EOF
calc_dg.py prot.in #> prot-tibar.txt
calc_dg.py wat.in #> wat-tibar.txt