Independent Validation as a Validation Method for Classification

Authors

  • Tina Braun
  • Hannes Eckert
  • Timo von Oertzen

Abstract

The use of classifiers provides an alternative to conventional statistical methods. This involves using the accuracy with which data is correctly assigned to a given group by the classifier to apply tests to compare the performance of classifiers. The conventional validation methods for determining the accuracy of classifiers have the disadvantage that the distribution of correct classifications does not follow any known distribution, and therefore, the application of statistical tests is problematic. Independent validation circumvents this problem and allows the use of binomial tests to assess the performance of classifiers. However, independent validation accuracy is subject to bias for small training datasets. The present study shows that a hyperbolic function can be used to estimate the loss in classifier accuracy for independent validation. This function is used to develop three new methods to estimate the classifier accuracy for small training sets more precisely. These methods are compared to two existing methods in a simulation study. The results indicate overall small errors in the estimation of classifier accuracy and indicate that independent validation can be used with small samples. A least square estimation approach seems best suited to estimate the classifier accuracy.