Sep 14, 2006 - Waleed A. Yousef, Member, IEEE, Robert F. Wagner, Fellow, IEEE, and Murray H. ... where F above is the distribution of the log-likelihood ratio.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 11,
NOVEMBER 2006
1809
Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach Waleed A. Yousef, Member, IEEE, Robert F. Wagner, Fellow, IEEE, and Murray H. Loew, Fellow, IEEE Abstract—This paper considers binary classification. We assess a classifier in terms of the Area Under the ROC Curve (AUC). We estimate three important parameters, the conditional AUC (conditional on a particular training set) and the mean and variance of this AUC. We derive, as well, a closed form expression of the variance of the estimator of the AUC. This expression exhibits several components of variance that facilitate an understanding for the sources of uncertainty of that estimate. In addition, we estimate this variance, i.e., the variance of the conditional AUC estimator. Our approach is nonparametric and based on general methods from U-statistics; it addresses the case where the data distribution is neither known nor modeled and where there are only two available data sets, the training and testing sets. Finally, we illustrate some simulation results for these estimators. Index Terms—Classification, nonparametric statistics, ROC analysis.
Ç 1
INTRODUCTION
T
HIS paper
addresses the problem of assessment of binary classifiers in both the mean and the variance. We have previously reported on our investigations of this problem that were based entirely on resampling strategies; see [1] and [2]. The present work provides a more formal treatment of the problem beginning with a concise review of the elements of classical decision theory. Consider the binary classification problem where an observation ti ¼ ðxi ; yi Þ has the p-dimensional feature vector xi (the predictor) and belongs to the class yi (the response). In binary classification, the response yi is either the class !1 or the class !2 . Assume the availability of a training data set tr ¼ fti : ti ¼ ðxi ; yi Þ; i ¼ 1; . . . ; ntr g; this data set is used to learn the probabilistic structure of the problem and to design a classifier (classification rule) tr . For any future observation having a feature vector x0 of unknown class, the classification rule tr is designed to predict the class of the observation, i.e., tr ðx0 Þ equals !1 or !2 . The subscript tr indicates conditioning on the given training data. One type of classification rules produces an estimate of the log-likelihood ratio hðXÞ, the ratio between the probability density functions fX ðxj!1 Þ and fX ðxj!2 Þ and compares it to a threshold: !1
> hbtr ðXÞ!