QualStat

QualStat is a simple program to calculate errors of quality measurements, such as MAD, MADtr, r² and PI. It does so by a parametric bootstrap-procedure, assuming that all the input data are normal distributed. For data that is not normal distributed, alternative exists, but cannot be handled by this program.

The code is located in /away/bio/Qualstat and the executable is located in /away/bio/Bin/Qualstat

The program is compiled with gfortran using command make all

Samuel Genheden, 2011-2012

Input

The program requires, and will ask for, three inputs:

A data file that has the following format:
The first line specifies the number of records, n
The next n lines all have the same format and consists of three or four free-format columns:
1. The calculated quantity
2. The standard deviation (error) of the calculated quantity
3. The expected quantity
4. The standard deviation (error) of the expected quantity, this column is optional
A factor, fac, by which to scale the errors. The errors are scaled with 1/sqrt(fac)
This makes it possible to give either the standard deviation or the standard deviation of the mean of the calculated quantities. The standard deviation of the mean is the assumed standard deviation of the normal distribution that generated the data. So if one, in the data file, specify the standard deviation one must be sure to set fac to the number of data points on which the average was calculated. Otherwise, one can just press enter (and use the default of 1).
The quality measurements to calculate
The quality measurements should be written inside double quotes (") . The default is "MAD MADtr r2 tau"
A common choice is "MAD MADtr r2 tau taux"
or for FEP "MAD r2 r22 PI RMSD MSD slope slope2 inter taur taurx"

Output

The program has the possibility to calculate the following quality measurements:

Mean absolute deviation (MAD)
Mean absolute deviation, when systematic error is removed (MADtr)
Squared Pearson's correlation coefficient r² (r2)
Predictive index (PI)
Root mean square deviation (RMSD)
Mean signed deviation (MSD)
Median of deviation (Median)
Mean quota (MQ)
Q-value (Q)
Slope of linear regression line with expected quantity as x-variable (slope)
Intercept of linear regression line with expected quantitiy as x-variable (inter)
Slope times mean expected (multi)
Median absolute deviation (AbsMed)
Kendall's tau (tau)
MAD from regression line (regMAD)
Area under ROC curve (ROCar)
Kendall's tau without insignificant pairs (taux_90)
Kendall's tau for relative energies (taur)
Kendall's tau for signficant relative energies (taurx_90)
Squared Pearson's correlation coefficient r² with also negated data (r22)
Slope of linear regression line with also negated data (slope2)

Each of these measurements can be selected by the program by giving the abbreviation in parenthesis.
For each of these quality measurements, the program will calculate the following output:

Biased estimate
Standard deviation of biased estimate

Verbose input and output

If the program is started with the flag -v (QualStat -v) the program will ask for:

The number of bootstrap samples to generate. The default is 1000 samples.
The experimental uncertainty, unless it is given in the input data file.

And the program will write out the following additional statistics.

95% confidence interval of biased estimate, based upon a normal approximation
Bias
Unbiased estimate
95% confidence interval of unbiased estimate, based upon the error distribution

The qualstat_series.py utility

qualstat_series.py is a Python script that is useful to calculate the quality measurements for a series of predictions.

The program is run by typing:
qualstat_series.py MEASUREMENTS EXPECTED PREDICTIONS [ERRORS]

MEASUREMENTS is a string specifying the quality measurements to calculate, e.g. "MAD MADtr r2 PI"
EXPECTED is a file with the expected quantities, with optional uncertainties. One quantity at each lines.
PREDICTIONS is a file with the predicted quantities. It should have as many line as the expected quantities.
The different predictions should be on separate column.
The uncertainties of the predictions can be specfied in this file by alternating predictions with uncertainties. Alternatively, they could be specified in an additional file.
ERRORS is an optional file that contain the uncertanties of the predictions if they are not specified in PREDICTIONS.

The script creates appropriate input to QualStat for each prediction and it assumes that the standard deviation of the mean was the specified uncertainity. It will print out the biased estimate and standard deviation of each quality measurement.

Note: the program contains some more advanced options as well, but these are of experimental character and not well tested. Type qualstat_series.py without any arguments to get a short help text.

Technicalities

The program is written in Fortran90 and its compilation has been tested only with gfortran. The code does not depend on any external functions that is not part of standard Fortran90.

It is possible, and quite easy, to extend the program, so that it can calculate other quality measurements. Look in the quality.f90 file for further instructions.

The program can be compiled by
make clean
make all