PYMBAR

PYMBAR and Non-Boltzmann Bennett

Pymbar is a Python implementation of the multistate Bennett acceptance ratio method
https://github.com/choderalab/pymbar
Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105 (2008) Publication

Unfortunately, the installed pyMBAR version works only under Python 2.7.

The Non-Boltzmann Bennet (NBB) method is now implemented by Pär in the PYMBAR module and in the standalone program /temp4/bio/PYMBAR/pymbar/pymbar_alchemical_nbb.py.

Some instructions how to use it.

It is completely analogous to the "normal" pymbar_alchemical.py program that computes TI, FEP, BAR and MBAR energies, so they both take the same arguments:

pymbar_alchemical.py prefix temperature ndiscard [n] [lambda0 lambda1 lambda2 ... ]
pymbar_alchemical_nbb.py prefix temperature ndiscard [nb] [lambda0 lambda1 lambda2 ... ]

prefix - the prefix for datafiles (possibly including a directory). The names of the files must be prefix.number
temperature - temperature in K
ndiscard - number of data points that should be discarded from the beginning of the file
'n' - turn off filtering (otherwise it automatically selects a non-correlated subset of the data points)
nb - turn off filtering and on bootstrapping (recommended for NBB, but time consuming)
lambda0 lambda1... - lambda values for the various files (only used for TI), default: even spacing between 0.01 and 0.99

For a calculation with N lambda values the program expects N data files, one for each simulation, (which are sorted "numerically" on the part of the filenames after the prefix), which should have the columns

time V dv/dL DeltaV(L_1) DeltaV(L_2) .... DeltaV(L_N)

for pymbar_alchemical.py, and

time V dv/dL DeltaV(L_1) DeltaV(L_2) .... DeltaV(L_N) BIAS

for pymbar_alchemical_nbb.py

All energies are in kJ/mol. DeltaV(L_j) in the file for L_k equals V(L_j)-V(L_k) evaluated for the snapshots from the L_k simulation , so the (k+3)'rd column should be zero. BIAS is the difference between the "simulated" potential energy and the target potential energy that you would like to calculated the free energy for, evaluated for this particular snapshot. For example, if you do a MM simulation but have the QM energies for all snapshots, the BIAS(t) in the L_k file should equal V(MM, L_k, t) - V(QM, L_k, t), each of which you would probably construct by a linear combination (corresponding to L_k) of the endpoint energies (V(L) = (1-L)V(0) - L V(L)). The V and DeltaV(L_k) columns should contain the QM data.

This format is the one needed for MBAR (and generated from Amber by pymbar_extract.pl, except for the bias column), so BAR and NBB do not use all the data, but only the "neighboring" columns. dV/dL is also not needed, it can be set to zero. If you have your own format you can just use the python function pymbar.computeNBB directly.

The precision given in the output should not be trusted, I just experimented by plugging in the exponential of the bias in the precision formula in a similar way as it is plugged in into the real formula. One should probably use some bootstrapping technique instead.

Pär 30/5-13

# Script to run PyMbar using ambtombar
# UR 8/10-13

\rm mbar-*for c in prot wat; do for a in 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00; do aa=$(echo $a | cut -c 1,3-4 ) ambtombar <<EOF${c}-${a}-sander.out4mbar-${c}.${aa}130.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.000EOF done pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00 >mbar-results-$cdone

On Cosmos:
/lunarc/nobackup/projects/teobio/Meiting/Program/miniconda3/envs/py_27/bin/python2.7 /lunarc/nobackup/projects/teobio/Bio/Bin/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00 >mbar-results-$c On Dardel: /cfs/klemming/projects/snic/teobio/Meiting/program/miniconda3/envs/py_27/bin/python2.7 /cfs/klemming/projects/snic/teobio/Bio/PYMBAR/pymbar/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 >mbar-results-$c.dat
It is necessary to remove old mbar-* files if you change the number of lambda values

The cut command is necessary to remove the decimal point in the number; otherwise pymbar read the files in wrong order.
The files must be named file.lambda with a dot before lambda and no decimal dots)

For lambda = 0.00 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 1.00,
you can use -13 and skip the lambda line in ambtombar

The installed version of pymbar is now old and outdated.
To run it on Cosmos:
/lunarc/nobackup/projects/teobio/Meiting/Program/miniconda3/envs/py_27/bin/python2.7 pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 >mbar-results-$c
possibly, you need to copy pymbar_alchemical.py, pymbar.py and timeseries.py from /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar

Some versions of Amber give erroneous mbar outputs
A correct mbar file should have one column with zeros (corresponding to the current lambda value, i.e. the third for lambda=0 or the last one for lambda=1.
Amber 22 changed the format of MBAR, so that it has more decimals than Vpot, then the column should be ~1E-4 and not exactly zero.
ambtombar had a bug until 19/8-24 which gave incorrect results for Amber 22 (the zero column gave values of ~1E-1.

Versions that give correct results:
Amber22 both cpu (pmemd) and gpu (Cosmos).
Amber 18 gpu (Kebnekaise)
Amber 16 sander (Tetralith)
Amber 14 pmemd (Aurora)

# Special extractscript for this particular setup (modify for any other setup):
# Lambda values: 005 010 020 030 040 050 060 070 080 090 095
# bar_l_min=0.05,bar_l_max=0.95,bar_l_incr=0.05
# bar_intervall same as ntpr
# Each directory contains the files prot_lXXX_v0_sander.out4 and wat_lXXX_v0_sander.out4
#    where XXX is the lambda value.

rm results*
echo '               TI (kJ/mol) TI-CUBIC (kJ/mol)      DEXP (kJ/mol)      IEXP (kJ/mol)       BAR (kJ/mol)      MBAR (kJ/mol)' >xresults_prot
echo '               TI (kJ/mol) TI-CUBIC (kJ/mol)      DEXP (kJ/mol)      IEXP (kJ/mol)       BAR (kJ/mol)      MBAR (kJ/mol)' >xresults_wat

#for b in 1-2 2-3 1-4 1-5 1-6 6-7 6-8 6-9
for b in 1-4 6-7
do
cd $b
for c in prot wat; do
for a in 005 010 020 030 040 050 060 070 080 090 095; do
pymbar_extract.pl ${c}_l${a}_v0_sander.out4 >tmp

#Remove the missing columns 015, 025 etc.
grep -v "#" tmp | awk '{print $1,$2,$3,$4,$5,$7,$9,$11,$13,$15,$17,$19,$21,$22}' > ${c}dhdl.${a}

done
pymbar_alchemical.py ${c}dhdl 300 0 >res$c
tail -1 res$c >>../results_$c
done
cd ..
done

Overlap
Once the mbar files are constructed, you can also run the program overlap.
It calculates the same energies, except MBAR.
In addition, it calculates a number of overlap measures, which check that a proper number of lambda values has been used.

Enter the number of lambda values
The starting name of the mbar files

Sample output
Protein Calculation # Olap(l) l+1 diff KAB KBA BAR TI estimate ExpAv Cumulant Average Pure average PI Range Weight ExpAv Cumulant Average Pure average PI Range Weight Hyst .000.005 2000 0.99 1.00 0.96 1.03 1.03 0.54 0.54+- 0.12 0.54 0.61+- 0.04 0.19+- 0.03 3.2 4. 0.002 0.56 0.58+- 0.03 0.84+- 0.03 2.5 5. 0.003 0.03 .005.010 2000 0.99 0.99 0.95 1.06 1.06 0.40 0.40+- 0.13 0.40 0.47+- 0.04 0.02+- 0.03 3.1 5. 0.004 0.42 0.45+- 0.03 0.74+- 0.03 2.5 7. 0.007 0.02 .010.020 2000 0.99 0.99 0.86 1.03 1.02 0.39 0.42+- 0.19 0.41 1.37+- 0.18 -1.27+- 0.08 2.5 10. 0.012 0.26 0.50+- 0.08 1.62+- 0.05 1.9 15. 0.104 0.15 .020.030 2000 1.00 1.00 0.87 1.05 1.05 0.07 0.12+- 0.18 0.13 0.89+- 0.16 -1.51+- 0.08 2.9 10. 0.012 0.16 0.21+- 0.08 1.26+- 0.05 1.8 9. 0.011 0.03 .030.040 2000 0.99 0.99 0.87 1.06 1.05 0.07 0.11+- 0.17 0.05 0.55+- 0.11 -1.38+- 0.07 2.8 9. 0.010 0.18 0.23+- 0.07 1.24+- 0.05 1.9 10. 0.016 0.14 .040.050 2000 0.99 0.99 0.90 1.02 1.01 0.47 0.52+- 0.16 0.47 0.78+- 0.09 -0.65+- 0.06 2.7 8. 0.009 0.51 0.51+- 0.06 1.43+- 0.05 2.1 9. 0.016 0.05 .050.060 2000 0.99 0.99 0.91 1.06 1.05 1.40 1.39+- 0.15 1.43 1.57+- 0.07 0.51+- 0.05 2.5 8. 0.009 1.36 1.32+- 0.07 2.25+- 0.05 2.4 9. 0.012 0.07 .060.070 2000 0.99 0.99 0.91 1.08 1.08 2.36 2.30+- 0.16 2.49 2.54+- 0.08 1.42+- 0.05 2.5 11. 0.027 2.33 2.13+- 0.08 3.34+- 0.05 2.3 9. 0.012 0.17 .070.080 2000 1.00 1.00 0.87 1.02 1.01 2.72 2.68+- 0.18 2.59 2.77+- 0.08 1.48+- 0.06 1.9 8. 0.007 2.71 2.10+- 0.14 4.16+- 0.07 2.7 10. 0.015 0.13 .080.090 2000 0.99 0.99 0.84 1.06 1.05 2.35 2.25+- 0.20 2.36 2.91+- 0.14 0.62+- 0.08 2.3 10. 0.015 2.49 1.80+- 0.17 4.08+- 0.08 2.0 11. 0.023 0.13 .090.095 2000 0.99 0.99 0.94 1.02 1.02 0.76 0.76+- 0.16 0.78 0.82+- 0.04 0.27+- 0.04 2.7 8. 0.011 0.73 0.71+- 0.04 1.24+- 0.04 2.7 6. 0.005 0.05 .095.100 2000 0.99 0.99 0.95 1.04 1.03 0.44 0.45+- 0.16 0.47 0.49+- 0.04 -0.04+- 0.04 2.6 7. 0.008 0.40 0.37+- 0.05 0.93+- 0.04 2.7 5. 0.004 0.07 TOTAL 2000 0.99 0.99 0.84 1.02 1.02 11.98 11.95+- 0.57 12.11 15.78+- 0.35 -0.35+- 0.20 1.9 11. 0.027 12.12 10.91+- 0.30 23.13+- 0.17 1.8 15. 0.104 0.17 Water Calculation # Olap(l) l+1 diff KAB KBA BAR TI estimate ExpAv Cumulant Average Pure average PI Range Weight ExpAv Cumulant Average Pure average PI Range Weight Hyst .000.005 2000 0.99 0.99 0.97 1.09 1.09 0.32 0.33+- 0.11 0.31 0.36+- 0.03 0.06+- 0.03 3.0 3. 0.002 0.31 0.35+- 0.02 0.55+- 0.02 2.8 6. 0.006 0.00 .005.010 2000 0.99 0.99 0.96 1.00 1.00 0.16 0.16+- 0.12 0.17 0.25+- 0.04 -0.15+- 0.03 3.5 3. 0.002 0.19 0.22+- 0.02 0.41+- 0.02 2.4 5. 0.004 0.02 .010.020 2000 0.99 0.99 0.89 1.08 1.07 -0.01 -0.02+- 0.18 0.04 0.74+- 0.15 -1.37+- 0.07 2.3 8. 0.007 -0.38 0.06+- 0.07 0.99+- 0.05 2.2 13. 0.049 0.42 .020.030 2000 0.99 0.99 0.88 1.04 1.03 -0.32 -0.31+- 0.18 -0.30 0.36+- 0.14 -1.72+- 0.07 2.6 8. 0.006 -0.40 -0.28+- 0.07 0.76+- 0.05 2.0 11. 0.022 0.09 .030.040 2000 0.99 0.99 0.86 1.00 0.99 -0.53 -0.49+- 0.18 -0.62 0.04+- 0.14 -2.06+- 0.07 2.8 7. 0.005 -0.39 -0.33+- 0.07 0.68+- 0.05 1.9 9. 0.014 0.23 .040.050 2000 1.00 1.00 0.89 1.09 1.08 -0.28 -0.26+- 0.17 0.22 0.19+- 0.12 -1.45+- 0.06 2.9 16. 0.151 -0.41 -0.31+- 0.07 0.71+- 0.05 1.8 10. 0.019 0.63 .050.060 2000 0.99 0.99 0.90 1.05 1.04 0.47 0.46+- 0.16 0.44 0.67+- 0.08 -0.54+- 0.05 2.1 8. 0.008 0.24 0.27+- 0.12 1.41+- 0.05 2.7 14. 0.083 0.19 .060.070 2000 0.99 0.99 0.89 1.02 1.01 1.20 1.15+- 0.16 1.18 1.39+- 0.09 0.13+- 0.06 2.4 11. 0.023 1.18 1.04+- 0.08 2.26+- 0.06 2.4 10. 0.019 0.00 .070.080 2000 0.99 0.99 0.88 1.04 1.03 1.47 1.41+- 0.18 1.53 1.84+- 0.11 0.13+- 0.07 2.3 9. 0.013 1.44 1.22+- 0.11 2.77+- 0.06 2.2 12. 0.041 0.09 .080.090 2000 0.99 0.99 0.85 1.03 1.02 1.11 1.11+- 0.19 1.06 1.28+- 0.10 -0.42+- 0.07 2.1 10. 0.014 1.05 0.71+- 0.13 2.68+- 0.07 2.3 10. 0.017 0.01 .090.095 2000 0.99 0.99 0.96 1.08 1.08 0.38 0.37+- 0.13 0.41 0.39+- 0.03 0.05+- 0.03 2.8 7. 0.009 0.36 0.33+- 0.04 0.72+- 0.03 2.8 5. 0.003 0.05 .095.100 2000 0.99 0.99 0.97 1.02 1.02 0.29 0.29+- 0.12 0.31 0.28+- 0.03 0.02+- 0.03 2.6 5. 0.003 0.27 0.22+- 0.04 0.59+- 0.03 3.0 4. 0.002 0.04 TOTAL 2000 0.99 0.99 0.85 1.00 1.00 4.27 4.22+- 0.55 4.74 7.78+- 0.34 -7.31+- 0.19 2.1 16. 0.151 3.47 3.49+- 0.27 14.53+- 0.17 1.8 14. 0.083 0.63 Difference TOTAL 2000 0.99 0.99 0.84 1.00 1.00 7.71 7.73+- 0.80 7.37 8.00+- 0.49 6.96+- 0.28 2.0 16. 0.151 8.65 7.41+- 0.41 8.60+- 0.24 1.8 15. 0.104 0.63

All energies are in kJ/mol.
The first column (#) is the number of snapshot
Olap(l) is the overlap between lambda-1 and lambda. It should be over 0.7
l+1 is the overlap between lambda and lambda+1. It should be over 0.7
diff is the difference overlap between lambda-lambda-1 and lambda-lambda+1. It should be over 0.7 (but less strict)
KAB and KBA are the Kab and Kba overlap measures by Wu and Kofke. They should be larger than 0.7.
BAR is the BAR measure
TI is the TI measure with uncertainty
ExpAv is the exponential averaging measure (FEP)
Cumulant average is the cumulant approximation to the FEP estimate, with uncertainty.
Pure average is the pure average (aprroximate FEP measure) with uncertainty
PI is th pi overlap measure by Wu and Kofke. It should be larger than 0.5. This often the best measure.
Range is the range of the values over which the averages are taken. It should be lower than ~20 to give stable results.
Weight is the weight of the largest term in the exponential average. It should be less than 0.2
These seven measures are given both in the forward and backward direction.
Hyst is the hysteresis between the two ExpAv measures. It should be below 4.

A final, useful measure is the difference between the TI and BAR estimate. It should also be below 4.

Besst Regards,

Meiting

On cosmos:

# Script to run PyMbar using ambtombar on cosmos

# Meiting 24/07-23

\rm mbar-*

for c in free; do

for a in 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00; do

aa=$(echo $a | cut -c 1,3-4 )

/lunarc/nobackup/projects/teobio/Bio/Bin/ambtombar <<EOF

${c}-${a}-sander.out4

mbar-${c}.${aa}

11

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

0

EOF

done

/lunarc/nobackup/projects/teobio/Meiting/Program/miniconda3/envs/py_27/bin/python2.7 /lunarc/nobackup/projects/teobio/Bio/Bin/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 >mbar-results-$c

done

On Tetralith:

# Script to run PyMbar using ambtombar on Tetralith

# Meiting 14/03-24

\rm mbar-*

for c in free; do

for a in 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.700.80 0.09 1.00; do

aa=$(echo $a | cut -c 1,3-4 )

/proj/teobio/Bio/Bin/ambtombar <<EOF

${c}-${a}-sander.out4

mbar-${c}.${aa}

11

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.09 1.00

0

EOF

done

/proj/teobio/Meiting/program/miniconda3/envs/py27/bin/python2.7 /proj/teobio/Bio/PYMBAR/pymbar/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.09 1.00 >mbar-results-$c

done

On Dardel:

# Script to run PyMbar using ambtombar on Dardel

# Meiitng 24/07-23

\rm mbar-*

for c in free; do

for a in 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00; do

aa=$(echo $a | cut -c 1,3-4 )

/cfs/klemming/projects/snic/teobio/Bio/Bin/ambtombar <<EOF

${c}-${a}-sander.out4

mbar-${c}.${aa}

11

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

0

EOF

done

/cfs/klemming/projects/snic/teobio/Meiting/program/miniconda3/envs/py_27/bin/python2.7 /cfs/klemming/projects/snic/teobio/Bio/PYMBAR/pymbar/pymbar_alchemical.py mbar-$c 300 0 n 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 >mbar-results-$c

done

Installation of PYMBAR

On COSMOS 8/3-24:
cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar module add GCC/12.2.0 OpenMPI/4.1.4 Python-bundle/3.10.8

Vilhelm writes 20/8-21:

I _pymbar.c står det början under rubriken INSTALLATION INSTRUCTIONS hur man kompilerar, vilket blir följande för linux:

gcc -O3 -lm -fPIC -shared -I(directory with Python.h) -I(directory with numpy/arrayobject.h) -o _pymbar.so _pymbar.c

Efter att kompileringen har lyckats så får man kanske lägga till PYTHONPATH i .bash_profile enligt:

export PYTHONPATH="/path/to/pymbar/:$PYTHONPATH"

on Aurora 18/11-21
cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbarmodule add icc/2016.1.150-GCC-4.9.3-2.25 impi/5.1.2.150 numpy/1.9.2-Python-2.7.10 python setup.py build

After that it worked

on Aurora 24/6-21

cd /lunarc/nobackup/projects/teobio/Bio/PYMBAR/pymbar
module add icc/2015.3.187-GNU-4.9.3-2.25 impi/5.0.3.048 numpy/1.8.2-Python-2.7.11
python setup.py build

[The newest version gave segmentation fault]

Tried to install the latest version of pymbar (3.0.5):
Downloaded it from https://pypi.org/project/pymbar/
But
conda install -c omnia pymbar
and
pip install pymbar
Both failed.

conda install -c conda-forge pymbar
conda install -c conda-forge pymbar -p /lunarc/nobackup/projects/teobio/Bio/PYMBAR/New
module load Anaconda3/2024.02-1

pip install alchemlyb

git clone https://github.com/alchemistry/alchemlyb.git
module add GCCcore/13.3.0 Python/3.12.3
cd Alchemlyb
pip install .