PYMBAR and Non-Boltzmann Bennett
Pymbar is a Python implementation of the multistate Bennett
acceptance ratio method
https://github.com/choderalab/pymbar
Shirts MR and Chodera JD. Statistically optimal analysis of
samples from multiple equilibrium states. J. Chem. Phys.
129:124105 (2008) Publication
The Non-Boltzmann Bennet (NBB) method is now implemented by Pär in
the PYMBAR module and in the standalone program /temp4/bio/PYMBAR/pymbar/pymbar_alchemical_nbb.py.
Some instructions how to use it.
It is completely analogous to the "normal" pymbar_alchemical.py program
that
computes TI,
FEP, BAR and MBAR energies, so they both take the same arguments:
pymbar_alchemical.py prefix
temperature ndiscard [n] [lambda0 lambda1
lambda2 ... ]
pymbar_alchemical_nbb.py
prefix temperature ndiscard [nb] [lambda0
lambda1 lambda2 ... ]
- prefix - the
prefix for datafiles (possibly including a
directory). The names of the files must be prefix.number
- temperature - temperature in K
- ndiscard - number of data points that should be discarded from
the beginning of the file
- 'n' - turn off filtering (otherwise it automatically selects a
non-correlated subset of the data points)
- nb - turn off filtering and on bootstrapping (recommended for
NBB, but time consuming)
- lambda0 lambda1... - lambda values for the various files
(only used
for TI), default: even spacing between 0.01 and 0.99
For a calculation with N lambda values the program expects N data
files, one for each simulation, (which are sorted "numerically" on
the part of
the filenames after the prefix), which should have the columns
time V dv/dL DeltaV(L_1) DeltaV(L_2) .... DeltaV(L_N)
for pymbar_alchemical.py, and
time V dv/dL DeltaV(L_1) DeltaV(L_2) .... DeltaV(L_N) BIAS
for pymbar_alchemical_nbb.py
All energies are in kJ/mol. DeltaV(L_j) in the file for L_k equals
V(L_j)-V(L_k) evaluated for the snapshots from the L_k simulation ,
so the (k+3)'rd column should be zero. BIAS is the difference
between the "simulated" potential energy and the target potential
energy that you would like to calculated the free energy for,
evaluated for this particular snapshot. For example, if you do a MM
simulation but have the QM energies for all snapshots, the BIAS(t)
in the L_k file should equal V(MM, L_k, t) - V(QM, L_k, t), each of
which you would probably construct by a linear combination
(corresponding to L_k) of the endpoint energies (V(L) = (1-L)V(0) -
L V(L)). The V and DeltaV(L_k) columns should contain the QM data.
This format is the one needed for MBAR (and generated from Amber by
pymbar_extract.pl, except for the bias column), so BAR and NBB do
not use all the data, but only the "neighboring" columns. dV/dL is
also not needed, it can be set to zero. If you have your own format
you
can just use the python function pymbar.computeNBB directly.
The precision given in the output should not be trusted, I just
experimented by plugging in the exponential of the bias in the
precision formula in a similar way as it is plugged in into the real
formula. One
should probably use some bootstrapping technique instead.
Pär 30/5-13
# Script to run PyMbar using
ambtombar
# UR 8/10-13
\rm mbar-*
for c in prot wat; do
for a in 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80
0.90 0.95 1.00; do
aa=$(echo $a | cut -c 1,3-4 )
ambtombar <<EOF
${c}-${a}-sander.out4
mbar-${c}.${aa}
13
0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00
0
EOF
done
pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.05 0.10 0.20 0.30
0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00 >mbar-results-$c
done
It is necessary to remove old mbar-* files if you change the number
of lambda values
The cut command is necessary to remove the decimal point in the
number; otherwise pymbar read the files in wrong order.
The files must be named file.lambda with a dot before lambda and no
decimal dots)
For lambda = 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.95 1.00,
you can use -13 and skip the lambda line in ambtombar
# Special extractscript for
this particular setup (modify for any other setup):
# Lambda values: 005 010 020 030 040 050 060 070 080 090 095
# bar_l_min=0.05,bar_l_max=0.95,bar_l_incr=0.05
# bar_intervall same as ntpr
# Each directory contains the files prot_lXXX_v0_sander.out4
and wat_lXXX_v0_sander.out4
# where XXX is the lambda value.
rm results*
echo
'
TI
(kJ/mol)
TI-CUBIC
(kJ/mol)
DEXP
(kJ/mol)
IEXP
(kJ/mol)
BAR (kJ/mol) MBAR (kJ/mol)'
>xresults_prot
echo
'
TI
(kJ/mol)
TI-CUBIC
(kJ/mol)
DEXP
(kJ/mol)
IEXP
(kJ/mol)
BAR (kJ/mol) MBAR (kJ/mol)'
>xresults_wat
#for b in 1-2 2-3 1-4 1-5 1-6 6-7 6-8 6-9
for b in 1-4 6-7
do
cd $b
for c in prot wat; do
for a in 005 010 020 030 040 050 060 070 080 090 095; do
pymbar_extract.pl ${c}_l${a}_v0_sander.out4 >tmp
#Remove the missing columns 015, 025 etc.
grep -v "#" tmp | awk '{print
$1,$2,$3,$4,$5,$7,$9,$11,$13,$15,$17,$19,$21,$22}' >
${c}dhdl.${a}
done
pymbar_alchemical.py ${c}dhdl 300 0 >res$c
tail -1 res$c >>../results_$c
done
cd ..
done
Overlap
Once the mbar files are constructed, you can also run the program
overlap.
It calculates the same energies, except MBAR.
In addition, it calculates a number of overlap measures, which check
that a proper number of lambda values has been used.
Enter the number of lambda values
The starting name of the mbar files
Sample output
Protein
Calculation # Olap(l) l+1 diff KAB
KBA BAR TI
estimate ExpAv Cumulant Average
Pure average PI Range Weight ExpAv Cumulant
Average Pure average PI Range Weight
Hyst
.000.005 2000 0.99 1.00 0.96
1.03 1.03 0.54
0.54+- 0.12 0.54
0.61+- 0.04 0.19+- 0.03
3.2 4. 0.002
0.56 0.58+- 0.03
0.84+- 0.03 2.5 5. 0.003
0.03
.005.010 2000 0.99 0.99 0.95
1.06 1.06 0.40
0.40+- 0.13 0.40
0.47+- 0.04 0.02+- 0.03
3.1 5. 0.004
0.42 0.45+- 0.03
0.74+- 0.03 2.5 7. 0.007
0.02
.010.020 2000 0.99 0.99 0.86
1.03 1.02 0.39
0.42+- 0.19 0.41
1.37+- 0.18 -1.27+- 0.08
2.5 10. 0.012
0.26 0.50+- 0.08
1.62+- 0.05 1.9 15. 0.104 0.15
.020.030 2000 1.00 1.00 0.87
1.05 1.05 0.07
0.12+- 0.18 0.13
0.89+- 0.16 -1.51+- 0.08
2.9 10. 0.012
0.16 0.21+- 0.08
1.26+- 0.05 1.8 9. 0.011
0.03
.030.040 2000 0.99 0.99 0.87
1.06 1.05 0.07
0.11+- 0.17 0.05
0.55+- 0.11 -1.38+- 0.07
2.8 9. 0.010
0.18 0.23+- 0.07
1.24+- 0.05 1.9 10. 0.016 0.14
.040.050 2000 0.99 0.99 0.90
1.02 1.01 0.47
0.52+- 0.16 0.47
0.78+- 0.09 -0.65+- 0.06
2.7 8. 0.009
0.51 0.51+- 0.06
1.43+- 0.05 2.1 9. 0.016
0.05
.050.060 2000 0.99 0.99 0.91
1.06 1.05 1.40
1.39+- 0.15 1.43
1.57+- 0.07 0.51+- 0.05
2.5 8. 0.009
1.36 1.32+- 0.07
2.25+- 0.05 2.4 9. 0.012
0.07
.060.070 2000 0.99 0.99 0.91
1.08 1.08 2.36
2.30+- 0.16 2.49
2.54+- 0.08 1.42+- 0.05
2.5 11. 0.027
2.33 2.13+- 0.08
3.34+- 0.05 2.3 9. 0.012
0.17
.070.080 2000 1.00 1.00 0.87
1.02 1.01 2.72
2.68+- 0.18 2.59
2.77+- 0.08 1.48+- 0.06
1.9 8. 0.007
2.71 2.10+- 0.14
4.16+- 0.07 2.7 10. 0.015 0.13
.080.090 2000 0.99 0.99 0.84
1.06 1.05 2.35
2.25+- 0.20 2.36
2.91+- 0.14 0.62+- 0.08
2.3 10. 0.015
2.49 1.80+- 0.17
4.08+- 0.08 2.0 11. 0.023 0.13
.090.095 2000 0.99 0.99 0.94
1.02 1.02 0.76
0.76+- 0.16 0.78
0.82+- 0.04 0.27+- 0.04
2.7 8. 0.011
0.73 0.71+- 0.04
1.24+- 0.04 2.7 6. 0.005
0.05
.095.100 2000 0.99 0.99 0.95
1.04 1.03 0.44
0.45+- 0.16 0.47
0.49+- 0.04 -0.04+- 0.04
2.6 7. 0.008
0.40 0.37+- 0.05
0.93+- 0.04 2.7 5. 0.004
0.07
TOTAL 2000 0.99 0.99
0.84 1.02 1.02 11.98
11.95+- 0.57 12.11 15.78+-
0.35 -0.35+- 0.20 1.9 11.
0.027 12.12 10.91+-
0.30 23.13+- 0.17 1.8 15.
0.104 0.17
Water
Calculation # Olap(l) l+1 diff KAB
KBA BAR TI
estimate ExpAv Cumulant Average
Pure average PI Range Weight ExpAv Cumulant
Average Pure average PI Range Weight
Hyst
.000.005 2000 0.99 0.99 0.97
1.09 1.09 0.32
0.33+- 0.11 0.31
0.36+- 0.03 0.06+- 0.03
3.0 3. 0.002
0.31 0.35+- 0.02
0.55+- 0.02 2.8 6. 0.006
0.00
.005.010 2000 0.99 0.99 0.96
1.00 1.00 0.16
0.16+- 0.12 0.17
0.25+- 0.04 -0.15+- 0.03
3.5 3. 0.002
0.19 0.22+- 0.02
0.41+- 0.02 2.4 5. 0.004
0.02
.010.020 2000 0.99 0.99 0.89
1.08 1.07 -0.01 -0.02+-
0.18 0.04 0.74+-
0.15 -1.37+- 0.07 2.3
8. 0.007 -0.38 0.06+-
0.07 0.99+- 0.05 2.2
13. 0.049 0.42
.020.030 2000 0.99 0.99 0.88
1.04 1.03 -0.32 -0.31+-
0.18 -0.30 0.36+-
0.14 -1.72+- 0.07 2.6
8. 0.006 -0.40 -0.28+-
0.07 0.76+- 0.05 2.0
11. 0.022 0.09
.030.040 2000 0.99 0.99 0.86
1.00 0.99 -0.53 -0.49+-
0.18 -0.62 0.04+-
0.14 -2.06+- 0.07 2.8
7. 0.005 -0.39 -0.33+-
0.07 0.68+- 0.05
1.9 9. 0.014 0.23
.040.050 2000 1.00 1.00 0.89
1.09 1.08 -0.28 -0.26+-
0.17 0.22 0.19+-
0.12 -1.45+- 0.06 2.9 16.
0.151 -0.41 -0.31+-
0.07 0.71+- 0.05 1.8
10. 0.019 0.63
.050.060 2000 0.99 0.99 0.90
1.05 1.04 0.47
0.46+- 0.16 0.44
0.67+- 0.08 -0.54+- 0.05
2.1 8. 0.008
0.24 0.27+- 0.12
1.41+- 0.05 2.7 14. 0.083 0.19
.060.070 2000 0.99 0.99 0.89
1.02 1.01 1.20
1.15+- 0.16 1.18
1.39+- 0.09 0.13+- 0.06
2.4 11. 0.023
1.18 1.04+- 0.08
2.26+- 0.06 2.4 10. 0.019 0.00
.070.080 2000 0.99 0.99 0.88
1.04 1.03 1.47
1.41+- 0.18 1.53
1.84+- 0.11 0.13+- 0.07
2.3 9. 0.013
1.44 1.22+- 0.11
2.77+- 0.06 2.2 12. 0.041 0.09
.080.090 2000 0.99 0.99 0.85
1.03 1.02 1.11
1.11+- 0.19 1.06
1.28+- 0.10 -0.42+- 0.07
2.1 10. 0.014
1.05 0.71+- 0.13
2.68+- 0.07 2.3 10. 0.017 0.01
.090.095 2000 0.99 0.99 0.96
1.08 1.08 0.38
0.37+- 0.13 0.41
0.39+- 0.03 0.05+- 0.03
2.8 7. 0.009
0.36 0.33+- 0.04
0.72+- 0.03 2.8 5. 0.003
0.05
.095.100 2000 0.99 0.99 0.97
1.02 1.02 0.29
0.29+- 0.12 0.31
0.28+- 0.03 0.02+- 0.03
2.6 5. 0.003
0.27 0.22+- 0.04
0.59+- 0.03 3.0 4. 0.002
0.04
TOTAL 2000 0.99 0.99
0.85 1.00 1.00
4.27 4.22+- 0.55
4.74 7.78+- 0.34
-7.31+- 0.19 2.1 16.
0.151 3.47 3.49+-
0.27 14.53+- 0.17 1.8 14.
0.083 0.63
Difference
TOTAL 2000 0.99 0.99
0.84 1.00 1.00
7.71 7.73+- 0.80
7.37 8.00+- 0.49
6.96+- 0.28 2.0 16.
0.151 8.65 7.41+-
0.41 8.60+- 0.24 1.8
15. 0.104 0.63
All energies are in kJ/mol.
The first column (#) is the number of snapshot
Olap(l) is the overlap between lambda-1 and lambda. It should be
over 0.7
l+1 is the overlap between lambda and lambda+1. It should be
over 0.7
diff is the difference overlap between lambda-lambda-1 and
lambda-lambda+1. It should be over 0.7 (but less strict)
KAB and KBA are the Kab and Kba overlap measures by Wu and Kofke.
They should be larger than 0.7.
BAR is the BAR measure
TI is the TI measure with uncertainty
ExpAv is the exponential averaging measure (FEP)
Cumulant average is the cumulant approximation to the FEP estimate,
with uncertainty.
Pure average is the pure average (aprroximate FEP measure) with
uncertainty
PI is th pi overlap measure by Wu and Kofke. It should be larger
than 0.5. This often the best measure.
Range is the range of the values over which the averages are taken.
It should be lower than ~20 to give stable results.
Weight is the weight of the largest term in the exponential average.
It should be less than 0.2
These seven measures are given both in the forward and backward
direction.
Hyst is the hysteresis between the two ExpAv measures. It should be
below 4.
A final, useful measure is the difference between the TI and BAR
estimate. It should also be below 4.
Besst Regards,
Meiting
On cosmos:
# Script to run PyMbar using ambtombar on cosmos
# Meiting 24/07-23
\rm mbar-*
for c in free; do
for a in 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00; do
aa=$(echo $a | cut -c 1,3-4 )
/lunarc/nobackup/projects/teobio/Bio/Bin/ambtombar <<EOF
${c}-${a}-sander.out4
mbar-${c}.${aa}
11
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
0
EOF
done
/lunarc/nobackup/projects/teobio/Meiting/Program/miniconda3/envs/py_27/bin/python2.7 /lunarc/nobackup/projects/teobio/Bio/Bin/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 >mbar-results-$c
done
On Tetralith:
# Script to run PyMbar using ambtombar on Tetralith
# Meiting 14/03-24
\rm mbar-*
for c in free; do
for a in 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.700.80 0.09 1.00; do
aa=$(echo $a | cut -c 1,3-4 )
/proj/teobio/Bio/Bin/ambtombar <<EOF
${c}-${a}-sander.out4
mbar-${c}.${aa}
11
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.09 1.00
0
EOF
done
/proj/teobio/Meiting/program/miniconda3/envs/py27/bin/python2.7 /proj/teobio/Bio/PYMBAR/pymbar/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.09 1.00 >mbar-results-$c
done
On Dardel:
# Script to run PyMbar using ambtombar on Dardel
# Meiitng 24/07-23
\rm mbar-*
for c in free; do
for a in 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00; do
aa=$(echo $a | cut -c 1,3-4 )
/cfs/klemming/projects/snic/teobio/Bio/Bin/ambtombar <<EOF
${c}-${a}-sander.out4
mbar-${c}.${aa}
11
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
0
EOF
done
/cfs/klemming/projects/snic/teobio/Meiting/program/miniconda3/envs/py_27/bin/python2.7 /cfs/klemming/projects/snic/teobio/Bio/PYMBAR/pymbar/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 >mbar-results-$c
done
Installation of PYMBAR
On COSMOS 8/3-24:
cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar
module add GCC/12.2.0 OpenMPI/4.1.4 Python-bundle/3.10.8
Vilhelm writes 20/8-21:
I
_pymbar.c står det början under rubriken INSTALLATION
INSTRUCTIONS hur man kompilerar, vilket blir följande för linux:
gcc -O3 -lm -fPIC -shared
-I(directory with Python.h) -I(directory with
numpy/arrayobject.h) -o _pymbar.so _pymbar.c
Efter att kompileringen har
lyckats så får man kanske lägga till PYTHONPATH i .bash_profile
enligt:
export
PYTHONPATH="/path/to/pymbar/:$PYTHONPATH"
on Aurora 18/11-21
cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar
module add icc/2016.1.150-GCC-4.9.3-2.25
impi/5.1.2.150 numpy/1.9.2-Python-2.7.10
python setup.py build
After that it worked
on Aurora 24/6-21
cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar
module add icc/2015.3.187-GNU-4.9.3-2.25 impi/5.0.3.048
numpy/1.8.2-Python-2.7.11
python setup.py build
[The newest version gave segmentation fault]
Tried to install the latest version of pymbar (3.0.5):
Downloaded it from https://pypi.org/project/pymbar/
But
conda install -c omnia pymbar
and
pip install pymbar
Both failed.