1However, the methods discussed here can be adapted to multi-class problems ... For a multi-class problem with Ncl classes, the confusion matrix will have Ncl2.
Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of S˜ ao Paulo – USP Institute of Mathematics and Computer Science – ICMC Department of Computer Science and Statistics – SCE Laboratory of Computational Intelligence – LABIC P. O. Box 668, 13560-970, S˜ao Carlos, SP, Brazil {gbatista, mcmonard}@icmc.usp.br
Abstract. Evaluating the performance of classifiers is not as trivial as it would seem at a first glance. Even the most widely used methods such as measuring accuracy or error rate on a test set has severe limitations. Two of the most prominent limitations of these measures are that they do not consider misclassification costs and can be misleading when the classes have very different prior probabilities. On the last years, several researches have pointed out alternative methods to evaluate the performance of learning systems. Some of those methods are based on graphical evaluation of classifiers. Usually, a graphical evaluation lets the user analyze the performance of a classifier under different scenarios, for instance, with different misclassification costs, and to select the classifier parameters setting that provides the best result. The objective of this paper is to survey some of the most used graphical methods for performance evaluation, which do not rely on precise class and cost distribution information.
1
Introduction
In supervised learning, a set of n training examples is given to an inducer. Each example Ei is a tuple (~ xi , yi ), where x~i is a vector of m features values and yi is the class value. The main objective in supervised learning is to induce a general mapping of the vectors ~x to the class values y. Thus, the inducer should build a model, y = f (~x), of an unknown function, f , also known as concept function, which predicts y values for previously unseen examples. However, in most cases, the number of examples used to induce a model is not sufficient to completely characterize the function f . In fact, the inducers are usually able to induce a function h that approximates f , i.e., h(~x) ≈ f (~x), where h is known as the hypotheses of the concept function f . For classification problems, the y values are drawn from a discrete set of classes C = {C1 , C2 , . . . CN cl }, where N cl is the number of classes. Given a set of training examples, the learning algorithm outputs a classifier such that, given a new unlabelled example, it accurately predicts the label y. Assuming the vectors ~x correspond to points
2
Maria Carolina Monard and Gustavo E. A. P. A. Batista
in a m-dimensional space,