Performance Evaluation in Machine Learning

Performance evaluation is an important aspect of the machine learning process. However, it is a complex task. It, therefore, needs to be conducted carefully in order for the application of machine learning to radiation oncology or other domains to be reliable. This chapter introduces the issue and discusses some of the most commonly used techniques that have been applied to it. The focus is on the three main subtasks of evaluation: measuring performance, resampling the data, and assessing the statistical significance of the results. In the context of the first subtask, the chapter discusses some of the confusion matrix-based measures (accuracy, precision, recall or sensitivity, and false alarm rate) as well as receiver operating characteristic (ROC) analysis; several error estimation or resampling techniques belonging to the cross-validation family as well as bootstrapping are involved in the context of the second subtask. Finally, a number of nonparametric statistical tests including McNemar’s test, Wilcoxon’s signed-rank test, and Friedman’s test are covered in the context of the third subtask. The chapter concludes with a discussion of the limitations of the evaluation process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

eBook EUR 71.68 Price includes VAT (France)

Softcover Book EUR 89.66 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Performance Evaluation

Chapter © 2022

Using p-values for the comparison of classifiers: pitfalls and alternatives

Article 11 April 2022

Evaluation of the Classifiers in Multiparameter and Imbalanced Data Sets

Chapter © 2020

Notes

Please note that there is an error in the textbook. We present, herein, the corrected solution.

Bibliography

  1. Japkowicz N, Shah M. Evaluating learning algorithms: a classification perspective. Cambridge/New York: Cambridge University Press; 2011. BookGoogle Scholar
  2. Lichman M. UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine: University of California, School of Information and Computer Science; 2013.
  3. Japkowicz N. Assessment metrics for imbalanced learning. In: Haibo He, Yunqian Ma, editors. Imbalanced learning: foundations, algorithms, and applications. 1st ed. Hoboken: Wiley; 2013. Google Scholar
  4. Bouckaert R. Choosing between two learning algorithms based on calibrated tests. In: Proceedings of the 20th international conference on machine learning (ICML-03). Washington, DC; 2003. p. 51–58. Google Scholar
  5. Thomas D. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10(7):1895–923. ArticleGoogle Scholar
  6. Drummond C. Machine learning as an experimental science (revisited). In: Proceedings of the twenty-first national conference on artificial intelligence: workshop on evaluation methods for machine learning. AAAI Press technical report WS-06-06. 2006. p. 1–5. Google Scholar
  7. Demšar J. On the appropriateness of statistical tests in machine learning. In: Proceedings of the 25th international conference on machine learning: workshop on evaluation methods for machine learning. Helsinki, Finland; 2008. Google Scholar

Author information

Authors and Affiliations

  1. School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada Nathalie Japkowicz PhD
  2. Research and Technology Center - North America, Robert Bosch LLC, Palo Alto, CA, USA Mohak Shah PhD
  1. Nathalie Japkowicz PhD