PYMBAR and Non-Boltzmann Bennett

Pymbar is a Python implementation of the multistate Bennett acceptance ratio method
https://github.com/choderalab/pymbar
Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105 (2008) Publication

The Non-Boltzmann Bennet (NBB) method is now implemented by Pär in the PYMBAR module and in the standalone program /temp4/bio/PYMBAR/pymbar/pymbar_alchemical_nbb.py.

Some instructions how to use it.

It is completely analogous to the "normal" pymbar_alchemical.py program that computes TI, FEP, BAR and MBAR energies, so they both take the same arguments:

pymbar_alchemical.py prefix temperature ndiscard [n] [lambda0 lambda1 lambda2 ...  ]
pymbar_alchemical_nbb.py prefix temperature ndiscard [nb] [lambda0 lambda1 lambda2 ...  ]
For a calculation with N lambda values the program expects N data files, one for each simulation, (which are sorted "numerically" on the part of the filenames after the prefix), which should have the columns

time  V  dv/dL  DeltaV(L_1) DeltaV(L_2) .... DeltaV(L_N)

for pymbar_alchemical.py, and

time  V  dv/dL  DeltaV(L_1) DeltaV(L_2) .... DeltaV(L_N)    BIAS

for pymbar_alchemical_nbb.py


All energies are in kJ/mol. DeltaV(L_j) in the file for L_k equals V(L_j)-V(L_k) evaluated for the snapshots from the L_k simulation , so the (k+3)'rd column should be zero. BIAS is the difference between the "simulated" potential energy and the target potential energy that you would like to calculated the free energy for, evaluated for this particular snapshot. For example, if you do a MM simulation but have the QM energies for all snapshots, the BIAS(t) in the L_k file should equal V(MM, L_k, t) - V(QM, L_k, t), each of which you would probably construct by a linear combination (corresponding to L_k) of the endpoint energies (V(L) = (1-L)V(0) - L V(L)). The V and DeltaV(L_k) columns should contain the QM data.

This format is the one needed for MBAR (and generated from Amber by pymbar_extract.pl, except for the bias column), so BAR and NBB do not use all the data, but only the "neighboring" columns. dV/dL is also not needed, it can be set to zero. If you have your own format you can just use the python function pymbar.computeNBB directly.

The precision given in the output should not be trusted, I just experimented by plugging in the exponential of the bias in the precision formula in a similar way as it is plugged in into the real formula. One should probably use some bootstrapping technique instead.

Pär 30/5-13


# Script to run PyMbar using ambtombar
# UR 8/10-13

\rm mbar-*
for c in prot wat; do
 for a in 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00; do
  aa=$(echo $a | cut -c 1,3-4 )
  ambtombar <<EOF
${c}-${a}-sander.out4
mbar-${c}.${aa}
13
0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00
0
EOF
  done
 pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00 >mbar-results-$c
done

It is necessary to remove old mbar-* files if you change the number of lambda values

The cut command is necessary to remove the decimal point in the number; otherwise pymbar read the files in wrong order.
The files must be named file.lambda with a dot before lambda and no decimal dots)

For lambda = 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00,
you can use -13 and skip the lambda line in ambtombar


# Special extractscript for this particular setup (modify for any other setup):
#  Lambda values: 005 010 020 030 040 050 060 070 080 090 095
#  bar_l_min=0.05,bar_l_max=0.95,bar_l_incr=0.05
#  bar_intervall  same as ntpr
#  Each directory contains the files prot_lXXX_v0_sander.out4 and wat_lXXX_v0_sander.out4
#    where XXX is the lambda value.

rm results*
echo '               TI (kJ/mol)  TI-CUBIC (kJ/mol)      DEXP (kJ/mol)      IEXP (kJ/mol)       BAR (kJ/mol)      MBAR (kJ/mol)' >xresults_prot
echo '               TI (kJ/mol)  TI-CUBIC (kJ/mol)      DEXP (kJ/mol)      IEXP (kJ/mol)       BAR (kJ/mol)      MBAR (kJ/mol)' >xresults_wat

#for b in 1-2 2-3 1-4 1-5 1-6 6-7 6-8 6-9
for b in 1-4 6-7
do
cd $b
for c in prot wat; do
for a in 005 010 020 030 040 050 060 070 080 090 095; do
pymbar_extract.pl ${c}_l${a}_v0_sander.out4 >tmp

#Remove the missing columns 015, 025 etc.
grep -v "#" tmp | awk '{print $1,$2,$3,$4,$5,$7,$9,$11,$13,$15,$17,$19,$21,$22}' > ${c}dhdl.${a}

done
pymbar_alchemical.py ${c}dhdl 300 0 >res$c
tail -1 res$c >>../results_$c
done
cd ..
done


Overlap
Once the mbar files are constructed, you can also run the program overlap.
It calculates the same energies, except MBAR.
In addition, it calculates a number of overlap measures, which check that a proper number of lambda values has been used.

Enter the number of lambda values
The starting name of the mbar files

Sample output
Protein
Calculation  # Olap(l) l+1  diff  KAB   KBA     BAR     TI estimate    ExpAv Cumulant Average   Pure average  PI Range Weight   ExpAv Cumulant Average   Pure average  PI Range Weight  Hyst
.000.005  2000  0.99  1.00  0.96  1.03  1.03    0.54    0.54+-  0.12    0.54    0.61+-  0.04    0.19+-  0.03  3.2    4. 0.002    0.56    0.58+-  0.03    0.84+-  0.03  2.5    5. 0.003  0.03
.005.010  2000  0.99  0.99  0.95  1.06  1.06    0.40    0.40+-  0.13    0.40    0.47+-  0.04    0.02+-  0.03  3.1    5. 0.004    0.42    0.45+-  0.03    0.74+-  0.03  2.5    7. 0.007  0.02
.010.020  2000  0.99  0.99  0.86  1.03  1.02    0.39    0.42+-  0.19    0.41    1.37+-  0.18   -1.27+-  0.08  2.5   10. 0.012    0.26    0.50+-  0.08    1.62+-  0.05  1.9   15. 0.104  0.15
.020.030  2000  1.00  1.00  0.87  1.05  1.05    0.07    0.12+-  0.18    0.13    0.89+-  0.16   -1.51+-  0.08  2.9   10. 0.012    0.16    0.21+-  0.08    1.26+-  0.05  1.8    9. 0.011  0.03
.030.040  2000  0.99  0.99  0.87  1.06  1.05    0.07    0.11+-  0.17    0.05    0.55+-  0.11   -1.38+-  0.07  2.8    9. 0.010    0.18    0.23+-  0.07    1.24+-  0.05  1.9   10. 0.016  0.14
.040.050  2000  0.99  0.99  0.90  1.02  1.01    0.47    0.52+-  0.16    0.47    0.78+-  0.09   -0.65+-  0.06  2.7    8. 0.009    0.51    0.51+-  0.06    1.43+-  0.05  2.1    9. 0.016  0.05
.050.060  2000  0.99  0.99  0.91  1.06  1.05    1.40    1.39+-  0.15    1.43    1.57+-  0.07    0.51+-  0.05  2.5    8. 0.009    1.36    1.32+-  0.07    2.25+-  0.05  2.4    9. 0.012  0.07
.060.070  2000  0.99  0.99  0.91  1.08  1.08    2.36    2.30+-  0.16    2.49    2.54+-  0.08    1.42+-  0.05  2.5   11. 0.027    2.33    2.13+-  0.08    3.34+-  0.05  2.3    9. 0.012  0.17
.070.080  2000  1.00  1.00  0.87  1.02  1.01    2.72    2.68+-  0.18    2.59    2.77+-  0.08    1.48+-  0.06  1.9    8. 0.007    2.71    2.10+-  0.14    4.16+-  0.07  2.7   10. 0.015  0.13
.080.090  2000  0.99  0.99  0.84  1.06  1.05    2.35    2.25+-  0.20    2.36    2.91+-  0.14    0.62+-  0.08  2.3   10. 0.015    2.49    1.80+-  0.17    4.08+-  0.08  2.0   11. 0.023  0.13
.090.095  2000  0.99  0.99  0.94  1.02  1.02    0.76    0.76+-  0.16    0.78    0.82+-  0.04    0.27+-  0.04  2.7    8. 0.011    0.73    0.71+-  0.04    1.24+-  0.04  2.7    6. 0.005  0.05
.095.100  2000  0.99  0.99  0.95  1.04  1.03    0.44    0.45+-  0.16    0.47    0.49+-  0.04   -0.04+-  0.04  2.6    7. 0.008    0.40    0.37+-  0.05    0.93+-  0.04  2.7    5. 0.004  0.07
TOTAL     2000  0.99  0.99  0.84  1.02  1.02   11.98   11.95+-  0.57   12.11   15.78+-  0.35   -0.35+-  0.20  1.9   11. 0.027   12.12   10.91+-  0.30   23.13+-  0.17  1.8   15. 0.104  0.17
Water
Calculation  # Olap(l) l+1  diff  KAB   KBA     BAR     TI estimate    ExpAv Cumulant Average   Pure average  PI Range Weight   ExpAv Cumulant Average   Pure average  PI Range Weight  Hyst
.000.005  2000  0.99  0.99  0.97  1.09  1.09    0.32    0.33+-  0.11    0.31    0.36+-  0.03    0.06+-  0.03  3.0    3. 0.002    0.31    0.35+-  0.02    0.55+-  0.02  2.8    6. 0.006  0.00
.005.010  2000  0.99  0.99  0.96  1.00  1.00    0.16    0.16+-  0.12    0.17    0.25+-  0.04   -0.15+-  0.03  3.5    3. 0.002    0.19    0.22+-  0.02    0.41+-  0.02  2.4    5. 0.004  0.02
.010.020  2000  0.99  0.99  0.89  1.08  1.07   -0.01   -0.02+-  0.18    0.04    0.74+-  0.15   -1.37+-  0.07  2.3    8. 0.007   -0.38    0.06+-  0.07    0.99+-  0.05  2.2   13. 0.049  0.42
.020.030  2000  0.99  0.99  0.88  1.04  1.03   -0.32   -0.31+-  0.18   -0.30    0.36+-  0.14   -1.72+-  0.07  2.6    8. 0.006   -0.40   -0.28+-  0.07    0.76+-  0.05  2.0   11. 0.022  0.09
.030.040  2000  0.99  0.99  0.86  1.00  0.99   -0.53   -0.49+-  0.18   -0.62    0.04+-  0.14   -2.06+-  0.07  2.8    7. 0.005   -0.39   -0.33+-  0.07    0.68+-  0.05  1.9    9. 0.014  0.23
.040.050  2000  1.00  1.00  0.89  1.09  1.08   -0.28   -0.26+-  0.17    0.22    0.19+-  0.12   -1.45+-  0.06  2.9   16. 0.151   -0.41   -0.31+-  0.07    0.71+-  0.05  1.8   10. 0.019  0.63
.050.060  2000  0.99  0.99  0.90  1.05  1.04    0.47    0.46+-  0.16    0.44    0.67+-  0.08   -0.54+-  0.05  2.1    8. 0.008    0.24    0.27+-  0.12    1.41+-  0.05  2.7   14. 0.083  0.19
.060.070  2000  0.99  0.99  0.89  1.02  1.01    1.20    1.15+-  0.16    1.18    1.39+-  0.09    0.13+-  0.06  2.4   11. 0.023    1.18    1.04+-  0.08    2.26+-  0.06  2.4   10. 0.019  0.00
.070.080  2000  0.99  0.99  0.88  1.04  1.03    1.47    1.41+-  0.18    1.53    1.84+-  0.11    0.13+-  0.07  2.3    9. 0.013    1.44    1.22+-  0.11    2.77+-  0.06  2.2   12. 0.041  0.09
.080.090  2000  0.99  0.99  0.85  1.03  1.02    1.11    1.11+-  0.19    1.06    1.28+-  0.10   -0.42+-  0.07  2.1   10. 0.014    1.05    0.71+-  0.13    2.68+-  0.07  2.3   10. 0.017  0.01
.090.095  2000  0.99  0.99  0.96  1.08  1.08    0.38    0.37+-  0.13    0.41    0.39+-  0.03    0.05+-  0.03  2.8    7. 0.009    0.36    0.33+-  0.04    0.72+-  0.03  2.8    5. 0.003  0.05
.095.100  2000  0.99  0.99  0.97  1.02  1.02    0.29    0.29+-  0.12    0.31    0.28+-  0.03    0.02+-  0.03  2.6    5. 0.003    0.27    0.22+-  0.04    0.59+-  0.03  3.0    4. 0.002  0.04
TOTAL     2000  0.99  0.99  0.85  1.00  1.00    4.27    4.22+-  0.55    4.74    7.78+-  0.34   -7.31+-  0.19  2.1   16. 0.151    3.47    3.49+-  0.27   14.53+-  0.17  1.8   14. 0.083  0.63
Difference
TOTAL     2000  0.99  0.99  0.84  1.00  1.00    7.71    7.73+-  0.80    7.37    8.00+-  0.49    6.96+-  0.28  2.0   16. 0.151    8.65    7.41+-  0.41    8.60+-  0.24  1.8   15. 0.104  0.63


All energies are in kJ/mol.
The first column (#) is the number of snapshot
Olap(l) is the overlap between lambda-1 and lambda. It should be over 0.7
 l+1 is the overlap between lambda and lambda+1. It should be over 0.7
diff is the difference overlap between lambda-lambda-1 and lambda-lambda+1. It should be over 0.7 (but less strict)
KAB and KBA are the Kab and Kba overlap measures by Wu and Kofke. They should be larger than 0.7.
BAR is the BAR measure
TI is the TI measure with uncertainty
ExpAv is the exponential averaging measure (FEP)
Cumulant average is the cumulant approximation to the FEP estimate, with uncertainty.
Pure average is the pure average (aprroximate FEP measure) with uncertainty
PI is th pi overlap measure by Wu and Kofke. It should be larger than 0.5. This often the best measure.
Range is the range of the values over which the averages are taken. It should be lower than ~20 to give stable results.
Weight is the weight of the largest term in the exponential average. It should be less than 0.2
These seven measures are given both in the forward and backward direction.
Hyst is the hysteresis between the two ExpAv measures. It should be below 4.

A final, useful measure is the difference between the TI and BAR estimate. It should also be below 4.


Installation of PYMBAR

Vilhelm writes 20/8-21:

I _pymbar.c står det  början under rubriken INSTALLATION INSTRUCTIONS hur man kompilerar, vilket blir följande för linux:


gcc -O3 -lm -fPIC -shared -I(directory with Python.h) -I(directory with numpy/arrayobject.h) -o _pymbar.so _pymbar.c


Efter att kompileringen har lyckats så får man kanske lägga till PYTHONPATH i .bash_profile enligt:


export PYTHONPATH="/path/to/pymbar/:$PYTHONPATH"



on Aurora 18/11-21
cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar
module add icc/2016.1.150-GCC-4.9.3-2.25  impi/5.1.2.150 numpy/1.9.2-Python-2.7.10
python setup.py build

After that it worked


on Aurora 24/6-21

cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar
module add icc/2015.3.187-GNU-4.9.3-2.25  impi/5.0.3.048 numpy/1.8.2-Python-2.7.11
python setup.py build

[The newest version gave segmentation fault]

Tried to install the latest version of pymbar (3.0.5):
Downloaded it from https://pypi.org/project/pymbar/
But
conda install -c omnia pymbar
and

pip install pymbar
Both failed.