Monday, July 22, 2013

Performing Statistical Hypothesis Test in C++ using AlgLilb: Student's t-test and Wilcoxon

Sometimes when comparing the performance of two algorithms such as in control, optimization, machine learning, etc. the comparison is done by running a number of simulations run on a set of benchmark problems for these algorithms, the statistical performance metrics are then derived from these simulation runs to compare their performances. However, it is usually not sufficient to claim one algorithm/method is better simply based on the average values of the performance metrics. In other words, performance comparison should also consider statistical hypothesis tests such as Student's t-test and Wilcoxon. Details of these methods can be found here:

https://en.wikipedia.org/wiki/Student's_t-test
http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

This post is about how to do statistical hypothesis test in C++ using alglib

Step 1: Download AlgLib

Download the AlgLib from the following link:

http://www.4shared.com/zip/ZWXFztx-/alglib-250cpp.html

Step 2: Add AlgLib to C++ project

In this case, I am using VS2008 C++ IDE, unzip the downloaded AlgLib to the project solution folder, and add it to the C++ project by the following properties configuration:

1) Properties->Configuration Properties->C++->General->Additional Include Directories->$(ProjectDir)alglib-2.5.0.cpp\out
2) Properties->Configuration Properties->Linker->General->Additional Library Directories->$(ProjectDir)alglib-2.5.0.cpp\out
3) Properties->Configuration Properties->Linker->Input->libalglib.lib

Step 3: Student's t-test in C++
Suppose you implement your code in a source file main.cpp, define the Student's t-test as shown below in

#include "studentttests.h"
//data1: vector containing simulation results of a performance metric (say MetricA) for algorithm 1
//data2: vector containing simulation results of a performance metric (say MetricA) for algorithm 2
//if left-tail is less than the confidence threshold, left-tail rejected, and we have MetricA (algorithm 1) > MetricA (algorithm 2)
//if right-tail is less than the confidence threshold, right-tail rejected, and we have MetricA (algorithm 1) < MetricA (algorithm 2)
void ComputeStudentT(const std::vector<double>& data1, const std::vector<double>& data2, double& bothtails, double& lefttail, double& righttail)
{
 if(data1.empty() || data2.empty())
 {
  return;
 }

 ap::real_1d_array x;
 ap::real_1d_array y;
 
 int n=static_cast<int>(data1.size());
 x.setlength(n);
 for(int i = 0; i != n; i++)
 {
  x(i) = data1[i];
 }

 int m=static_cast<int>(data2.size());
 y.setlength(m);
 for(int i=0; i != m; ++i)
 {
  y(i)=data2[i];
 }

 /*************************************************************************
 Two-sample unpooled test

 This test checks three hypotheses about the mean of the given samples. The
 following tests are performed:
  * two-tailed test (null hypothesis - the means are equal)
  * left-tailed test (null hypothesis - the mean of the first sample  is
    greater than or equal to the mean of the second sample)
  * right-tailed test (null hypothesis - the mean of the first sample is
    less than or equal to the mean of the second sample).

 Test is based on the following assumptions:
  * given samples have normal distributions
  * samples are independent.
 Dispersion equality is not required

 Input parameters:
  X - sample 1. Array whose index goes from 0 to N-1.
  N - size of the sample.
  Y - sample 2. Array whose index goes from 0 to M-1.
  M - size of the sample.

 Output parameters:
  BothTails   -   p-value for two-tailed test.
      If BothTails is less than the given significance level
      the null hypothesis is rejected.
  LeftTail    -   p-value for left-tailed test.
      If LeftTail is less than the given significance level,
      the null hypothesis is rejected.
  RightTail   -   p-value for right-tailed test.
      If RightTail is less than the given significance level
      the null hypothesis is rejected.

   -- ALGLIB --
   Copyright 18.09.2006 by Bochkanov Sergey
 *************************************************************************/
 
 unequalvariancettest(x, n, y, m, bothtails, lefttail, righttail);
}

For ComputeStudentT() method, the parameter data1 is a vector containing simulation results of a performance metric (say MetricA) for algorithm 1, which is obtained from simulation runs on a benchmark problem (suppose there are 30 simulation runs, then data1 is a vector of length 30), while data2 is a vector containing results of MetricA for algorithm 2, which is obtained from simulation runs on the same benchmark problem.

Below shows how one can use the CompareStudentT() in the coding

RunSimulationsToObtainMetricAForAlgorithm1();
RunSimulationsToObtainMetricAForAlgorithm2();

std::vector<double> data1;
LoadMetricAForAlgorithm1IntoVector(data1);

std::vector<double> data2;
LoadmetricAForAlgorithm2IntoVector(data2);

double bothtails=0, lefttail=0, righttail=0;

double p_threshold=0.05; //set p threshold to 0.05 for 95% confidence level

ComputeStudentT(data1, data2, bothtails, lefttail, righttail);

/*
* two-tailed test (null hypothesis - the means are equal)
* left-tailed test (null hypothesis - the mean of the first sample  is
  greater than or equal to the mean of the second sample)
* right-tailed test (null hypothesis - the mean of the first sample is
  less than or equal to the mean of the second sample).
*/
if(bothtails < p_threshold)
{
 //null hypothesis rejected, the mean of data1 is either greater or less than tat of data2
 if(lefttail < p_threshold && righttail > p_threshold)
 {
  std::cout << "The true mean of MetricA(algorithm1) is smaller than tat of MetricA(algorithm2)" << std::endl;
 }
 else if(lefttail > p_threshold && righttail < p_threshold)
 {
  std::cout << "The true mean of MetricA(algorithm1) is greater than tat of MetricA(algorithm2)" << std::endl;
 }
 else
 {
  std::cerr << "error: t stat failed" << std::endl;
  exit(0);
 }
}


Step 4: Wilcoxon test in C++ 

Below shows the Wilcoxon test method in C++, the interface and usage of ComputeWilcoxon() method is same as ComputeStudentT() method

#include "wsr.h"

//if left-tail is less than the confidence threshold, left-tail rejected, and we have data1 > data2
//if right-tail is less than the confidence threshold, right-tail rejected, and we have data2 < data1
void ComputeWilcoxon(const std::vector<double>& data1, const std::vector<double<& data2, double& bothtails, double& lefttail, double& righttail)
{
 if(data1.empty() || data2.empty())
 {
  return;
 }

 ap::real_1d_array x;
 ap::real_1d_array y;
 
 int n=static_cast<int>(data1.size());
 int m=static_cast<int>(data2.size());

 if(n > m)
 {
  n=m;
 }

 x.setlength(n);
 for(int i = 0; i != n; i++)
 {
  x(i) = (data1[i] - data2[i]);
 }

 double assumed_median=0; //the given value

 /*************************************************************************
 Wilcoxon signed-rank test

 This test checks three hypotheses about the median  of  the  given sample.
 The following tests are performed:
  * two-tailed test (null hypothesis - the median is equal to the  given
    value)
  * left-tailed test (null hypothesis - the median is  greater  than  or
    equal to the given value)
  * right-tailed test (null hypothesis  -  the  median  is  less than or
    equal to the given value)

 Requirements:
  * the scale of measurement should be ordinal, interval or  ratio (i.e.
    the test could not be applied to nominal variables).
  * the distribution should be continuous and symmetric relative to  its
    median.
  * number of distinct values in the X array should be greater than 4

 The test is non-parametric and doesn't require distribution X to be normal

 Input parameters:
  X       -   sample. Array whose index goes from 0 to N-1.
  N       -   size of the sample.
  Median  -   assumed median value.

 Output parameters:
  BothTails   -   p-value for two-tailed test.
      If BothTails is less than the given significance level
      the null hypothesis is rejected.
  LeftTail    -   p-value for left-tailed test.
      If LeftTail is less than the given significance level,
      the null hypothesis is rejected.
  RightTail   -   p-value for right-tailed test.
      If RightTail is less than the given significance level
      the null hypothesis is rejected.

 To calculate p-values, special approximation is used. This method lets  us
 calculate p-values with two decimal places in interval [0.0001, 1].

 "Two decimal places" does not sound very impressive, but in  practice  the
 relative error of less than 1% is enough to make a decision.

 There is no approximation outside the [0.0001, 1] interval. Therefore,  if
 the significance level outlies this interval, the test returns 0.0001.

   -- ALGLIB --
   Copyright 08.09.2006 by Bochkanov Sergey
 *************************************************************************/
 wilcoxonsignedranktest(x, n, assumed_median, bothtails, lefttail, righttail);
}


No comments:

Post a Comment