calc_dg.py
This document describes calc_dg.py, which is a complete program
to read output files from Amber, Gromos, or Q, and estimate free energy
differences.
Currently, the program is able to read the following output files:
- Amber output files with only <DV/DL> records
- Amber output files with both <DV/DL> and BAR-energies
(Amber11 and newer)
- Amber output files with <DV/DL> decomposition energies
- Gromos output files with <DV/DL> records
- Pure energy files with records of V0tot and V1tot
energies. See below
- Q output files
The program can be used to estimate Bennet's acceptance ratio (BAR),
free energy peturbation (FEP) for forward or reverse direction as well
as their average, and thermodynamic integration (TI). The following
estimators can be used with the different output files
- BAR: 2, 5, 6
- FEP: 2, 5, 6
- TI: 1, 2, 3, 4, 6
The recommended usage of the program is:
calc_dg.py calc.in
where calc.in
is a textfile controlling the behaviour of calc_dg.py.
The code is written in Pyhton and located locally in /away/bio/Dg_tools/
Control file
The control file contains values of variables determining the execution
of calc_dg.py. Each variable name starts with an ampersand
(&) and the rows that follow the variable name sets the variable
and associated variables.
Note that the file cannot contain any emptly lines - it will give an
error.
Currently, the following variables can be set:
&base_path
- sets a prefix to the output
filenames
The value of a shell variable can be used if the string starts with a
dollar sign ($), e.g. $FEP_PATH
An absolute path must end with a slash (/)
&filenames
- the filenames of the output names
First line after this should contain an integer, n, that is the
number of output files.
The n lines after this should contain the filenames
&expand_filenames
- if the filenames should be
expanded to concatenate output files
If for instance the specified filname is wat_l005.out3, and &expand_filenames
=2,
then
the
program reads first wat_l005.out3, then wat_l005.out4,
and
finally, wat_l005.out5. It only works if the last character
of the filename is an integer. Note: at the moment, this is
only implemented for Q-files.
&select_frames
- controls how many snapshots to
read
The line should be a string with one, two or, three columns of an
integer.
If there is a single integer, this is the first snapshot to read
If there is two columns, the first integer is the first snapshot (set
to 0 to read from the beginning), and the second is the last snapshot.
If there is three columns, the first integer is the first snapshot to
read (set to 0 to read from the beginning), the second integer is the
last snapshot to read (set to -1 to read until the end), and the third
integer is the stride.
&application
- sets the type of output files
The recognized values are
- Amber
- Amber11
- Decomp
- Gromos
- Pure
- Q
&estimates
- sets the type of estimator
Can be one or several of: BAR,FEP, or TI. If
several estimates are desired, separate them with space. As a default,
all the supported estimators for the specified application is used.
&nsim
- sets the number of independent
simulations
Should be a positive integer, n. Set to 0 or omit it if there
is no independent simulations. If larger than 0, the program will
substitute an asterix (*) in the filenames with an order number from 1
to n.
&bootstrap
- turn on bootstrapping of
uncertainties
Should be a positive integer, equal to the number of bootstrap samples.
&qextract
- turn on customized Q parsing
The first line should be a string that is matched in the output files.
The following strings are recongnized:
"Q-Q "
- extract
internal energies
"Q-prot "
- extract interaction
energies with the protein
"Q-wat "
- extract interaction
energies with the water
"Q-surr. "
- extract intermolecular
energies
"Q-any "
- extract total
energies
The second line should be a positive integer, either 3 or 4 (or both
separated by a space), and tells the program to extract the
electrostatic or van der Waals component(s).
&dextract
- turn on customized Amber
decomposition parsing
The first line should be either TDC, BDC, or SDC,
and
tells
the program to extract the total, backbone, or side-chain
component of the decomposition, respectively. The default is TDC.
The second line should be a positive integer, 2, 3, or 4 (or several
separated by space(s)), and tells the program to extract the internal,
electrostatic, or van der Waals component(s) of the free energy. The
defaul is all three of them.
The third line is a residue specification. If there is only a single
integer, the program extract the free energy for only this residue. If
there is two columns of integer, this is a range specification (use :
as a wild card to specify start or end). If there are more integers,
each of them are interpreted as individual residues.
&weighted
- turn on non-simulated weighting of
energies
The specification is identical to &filename
. Note
that the number of specified filenames need to be identical. Each file
should contain a number of single-column rows with the total energy
differences. The weights will be calculated in the program by
exponentiating the differences.
&extrapolate
- turn on extrapolation to 0 and 1
Set to 1 to turn it on (default), and 0 to turn if off.
&diagnostics
- turn on/off diagnostics
The first line is an integer. If it is 1 it turns on diagnostics, other
values turn it off. Default is turned off. This can only be run with
output filetypes that supports BAR and FEP estimates.
The second line that is optional specifies the prefix to output files
generated by the diagnostics.
1) Omega vtot ; beräknat överlapp i total energin mellan två lambda-grannar.
dvs u(lambda) så överlappet är med u(lambda) samplat vid lambda och u(lambda+1) samplat vid lambda+1
2) Omega_f : här har jag tagit U(lambda) samplat vid lambda och beräknat överlapp med U(lambda) samplat vid lambda+1
3) Omega_r ; här har jag tagit U(lambda+1) samplat vid lambda och beräknat överlapp med U(lambda+1) samplat vid lambda+1
4) Omega_d ; här tittar jag på work-values, dvs W1=U(lambda+1)-U(lambda) samplat vid lambda och W2=U(lambda+1)-U(lambda) samplat vid lambda+1 och beräknat överlappet på det.
5) Omega_dt ; som Omega_d, men jag har försökt och uppskattat överlappet bara på svansregionen, som jag godtyckligt har tagit att vara den lägre 25% kvartilen.
Maxratf är delen av den största boltzmann faktorn som bidrar till exp averaging för framåt perten, maxratb är samma men åt andra hållet (båda i procent).
Filerna som skrivs ut skall vara histogrammen som ligger bakom överlappen. Första kol är bin, andra och tredje är histogrammen, fjärde är överlappet, sqrt(n1*n2).
&print
- details the printing
Should be two integers, separated by space. If the first is 1, the
total dG is printed out, and if the second is 1, the uncertainty of the
total dG is printed out. Other values turns this printing off.
&verbosity
- sets the print level
Should be a positive integer that sets the level of printed text.
Example of basic control files:
Read 3 Amber11 output files and perform bootstrap and diagnostics:
&base_path
$FEP_PATH
&filenames
3
prot_l005_v0_sander.out4
prot_l050_v0_sander.out4
prot_l095_v0_sander.out4
&application
Amber11
&bootstrap
200
&diagnostics
1
prot_
Read 3 Q output files divided into two output files for each lambda,
and calculate TI of intermolecular energies:
&base_path
$FEP_PATH
&filenames
3
prot_l005_q.out3
prot_l050_q.out3
prot_l095_q.out3
&expand_filenames
1
&application
Q
&estimates
TI
&qextract
Q-surr.
3 4
Similar to the one above, but here we have a simulated ligand in
water with Amber
&base_path
$FEP_PATH
&filenames
3
prot_l005_q.out4
prot_l050_q.out4
prot_l095_q.out4
&application
Decomp
&estimates
TI
&dextract
TDC
3 4
2 :
We have run 5 independent simulations with Amber 11
&base_path
$FEP_PATH
&filenames
3
prot_r*_l005_v0_sander.out4
prot_r*_l050_v0_sander.out4
prot_r*_l095_v0_sander.out4
&nsim
5
&application
Amber11
Pure energy file format
The pure-energy files should have the following format
- The first line should contain two numbers: n, the number
of data points, and l, the value of the coupling parameter
(lambda).
- The next n lines should contain, in free-format, two
columns of data values. The first should be the total energy of V0, and
the second should be the total energy of V1.
Note! The mixing of of V0 and V1, is as follow: V = l V0
+ (1 - l) V1, i.e., the opposite of what is done in for
instance Amber, where V = (1 - l) V0 + l V1 is used. If
you want the Amber-type of mixing, just specify 1 - l on the
first row.
Example:
20 0.99
-765.1850 -733.4273
-790.8062 -752.5439
-717.3305 -680.7185
-679.1150 -653.5069
-699.8720 -669.2577
-671.9884 -637.7892
-760.1654 -718.7986
-771.3860 -726.2168
-757.0964 -721.2894
-796.7219 -765.2958
-693.3311 -665.8190
-745.6914 -728.7871
-696.9361 -673.9320
-684.1877 -666.7317
-710.9796 -681.1000
-724.3771 -686.8643
-718.2849 -700.4401
-773.8563 -732.4859
-741.6559 -711.3101
-744.6763 -708.1315
This is an energy file for 20 snapshots sampled with Amber at l
= 0.01.
Paulius' result.bash script
#!/bin/bash
# 2013-05-31
# PM
cat << EOF > prot.in
&base_path
/platon/nobackup/users/ulf/Sampl4/Oa/Fep/1-4/
&filenames
11
prot_l005_v0_sander.out4
prot_l010_v0_sander.out4
prot_l020_v0_sander.out4
prot_l030_v0_sander.out4
prot_l040_v0_sander.out4
prot_l050_v0_sander.out4
prot_l060_v0_sander.out4
prot_l070_v0_sander.out4
prot_l080_v0_sander.out4
prot_l090_v0_sander.out4
prot_l095_v0_sander.out4
&application
Amber11
&estimates
TI BAR
&nsim
1
&select_frames
0 -1 10
&verbosity
3
&diagnostics
0
&bootstrap
100
EOF
cat << EOF > wat.in
&base_path
/platon/nobackup/users/ulf/Sampl4/Oa/Fep/1-4/
&filenames
11
wat_l005_v0_sander.out4
wat_l010_v0_sander.out4
wat_l020_v0_sander.out4
wat_l030_v0_sander.out4
wat_l040_v0_sander.out4
wat_l050_v0_sander.out4
wat_l060_v0_sander.out4
wat_l070_v0_sander.out4
wat_l080_v0_sander.out4
wat_l090_v0_sander.out4
wat_l095_v0_sander.out4
&application
Amber11
&estimates
TI BAR
&nsim
1
&select_frames
0 -1 10
&verbosity
3
&diagnostics
0
&bootstrap
100
EOF
calc_dg.py prot.in #>
prot-tibar.txt
calc_dg.py wat.in
#> wat-tibar.txt