QualStat
QualStat is a simple program to calculate errors of quality
measurements, such as MAD, MADtr, r2 and PI. It
does so by a parametric bootstrap-procedure, assuming that all the
input data are normal distributed. For data that is not normal
distributed, alternative exists, but cannot be handled by this
program.
The code is located in /away/bio/Qualstat
and the
executable is located in /away/bio/Bin/Qualstat
The program is compiled with gfortran using command make all
Samuel Genheden, 2011-2012
Input
The program requires, and will ask for, three inputs:
- A data file that has the following format:
The first line specifies the number of records, n
The next n lines all have the same format and consists
of three or four free-format columns:
- The calculated quantity
- The standard deviation (error) of the calculated quantity
- The expected quantity
- The standard deviation (error) of the expected quantity,
this column is optional
- A factor, fac, by which to scale the errors. The
errors are scaled with 1/sqrt(fac)
This makes it possible to give either the standard deviation or
the standard deviation of the mean of the calculated quantities.
The standard deviation of the mean is the assumed standard
deviation of the normal distribution that generated the data. So
if one, in the data file, specify the standard deviation one
must be sure to set fac to the number of data points on
which the average was calculated. Otherwise, one can just press
enter (and use the default of 1).
- The quality measurements to calculate
The quality measurements should be written inside double quotes
(") . The default is "MAD MADtr r2 tau"
A common choice is "MAD MADtr r2 tau taux"
or for FEP "MAD r2 r22 PI RMSD MSD slope slope2 inter taur
taurx"
Output
The program has the possibility to calculate the following quality
measurements:
- Mean absolute deviation (MAD)
- Mean absolute deviation, when systematic error is removed
(MADtr)
- Squared Pearson's correlation coefficient r2
(r2)
- Predictive index (PI)
- Root mean square deviation (RMSD)
- Mean signed deviation (MSD)
- Median of deviation (Median)
- Mean quota (MQ)
- Q-value (Q)
- Slope of linear regression line with expected quantity as
x-variable (slope)
- Intercept of linear regression line with expected quantitiy as
x-variable (inter)
- Slope times mean expected (multi)
- Median absolute deviation (AbsMed)
- Kendall's tau (tau)
- MAD from regression line (regMAD)
- Area under ROC curve (ROCar)
- Kendall's tau without insignificant pairs (taux_90)
- Kendall's tau for relative energies (taur)
- Kendall's tau for signficant relative energies (taurx_90)
- Squared Pearson's correlation coefficient r2
with also negated data (r22)
- Slope of linear regression line with also negated data
(slope2)
Each of these measurements can be selected by the program by giving
the abbreviation in parenthesis.
For each of these quality measurements, the program will calculate
the following output:
- Biased estimate
- Standard deviation of biased estimate
Verbose input and output
If the program is started with the flag -v (QualStat -v) the program
will ask for:
- The number of bootstrap samples to generate. The default is
1000 samples.
- The experimental uncertainty, unless it is given in the input
data file.
And the program will write out the following additional
statistics.
- 95% confidence interval of biased estimate, based upon a
normal approximation
- Bias
- Unbiased estimate
- 95% confidence interval of unbiased estimate, based upon the
error distribution
The qualstat_series.py utility
qualstat_series.py is a Python script that is useful to calculate
the quality measurements for a series of predictions.
The program is run by typing:
qualstat_series.py MEASUREMENTS EXPECTED
PREDICTIONS [ERRORS]
- MEASUREMENTS is a string specifying the quality measurements
to calculate, e.g. "MAD MADtr r2 PI"
- EXPECTED is a file with the expected quantities, with optional
uncertainties. One quantity at each lines.
- PREDICTIONS is a file with the predicted quantities. It should
have as many line as the expected quantities.
The different predictions should be on separate column.
The uncertainties of the predictions can be specfied in this
file by alternating predictions with uncertainties.
Alternatively, they could be specified in an additional file.
- ERRORS is an optional file that contain the uncertanties of
the predictions if they are not specified in PREDICTIONS.
The script creates appropriate input to QualStat for each prediction
and it assumes that the standard deviation of the mean was the
specified uncertainity. It will print out the biased estimate and
standard deviation of each quality measurement.
Note: the program contains some more advanced options as
well, but these are of experimental character and not well tested.
Type qualstat_series.py
without any arguments to get
a short help text.
Technicalities
The program is written in Fortran90 and its compilation has been
tested only with gfortran. The code does not depend on any external
functions that is not part of standard Fortran90.
It is possible, and quite easy, to extend the program, so that it
can calculate other quality measurements. Look in the quality.f90
file for further instructions.
The program can be compiled by
make clean
make all