Assessing Classifiers from Two Independent Data Sets ... - IEEE Xplore

Recommend Documents

Associative Classifiers for Predictive Analytics - IEEE Xplore

Associative Classifiers for Predictive analytics: Comparative Performance. Study. Ranjana Vyas1, Lokesh Kumar Sharma2, Om Prakash Vyas1, Simon ...

A Single-Phase Rectifier Having Two Independent ... - IEEE Xplore

Feb 13, 2015 - AbstractâHalf-bridge rectifiers are able to provide two voltage outputs, which offers three voltage levels, but the two voltage outputs depend ...

Independent Field-Oriented Control of Two Split-Phase ... - IEEE Xplore

Split-Phase Induction Motors From a Single Six-Phase Inverter. Krishna Keshaba Mohapatra, Student Member, IEEE, R. S. Kanchan, Student Member, IEEE,.

Missing Value Estimation for Mixed-Attribute Data Sets - IEEE Xplore

great successes on dealing with missing values in data sets with homogeneous ... estimators for discrete and continuous missing target values, respectively.

Incremental Sorting for Large Dynamic Data Sets - IEEE Xplore

for incrementally sorting large, multi-dimensional, dynamic data sets. Our particular use case involves sorting large Twitter data sets but our technique can be ...

Learning from Imbalanced Data - IEEE Xplore

AbstractâWith the continuous expansion of data availability in many large-scale ... Due to the inherent complex characteristics of imbalanced data sets, learning ...

Two-Stage SVM Classification for Large Data Sets via ... - IEEE Xplore

for large data sets by randomly selecting training data. The first stage SVM classification gets a sketch of support vector distribution. Then the neighbors of these ...

the data - IEEE Xplore

Netherlands than in any other developed country, and it's highest of all in the united States. However, merrill lynch & co. recently issued a ranking order that's.

Speaker-Independent Silent Speech Recognition From ... - IEEE Xplore

Nov 27, 2017 - ers has been a barrier for developing effective speaker-independent .... for speaker-independent acoustic speech recognition in a trans-.

Combining Load Forecasts from Independent Experts - IEEE Xplore

The BigDEAL team from the Big. Data Energy Analytics Laboratory landed a top 3 place in the ... forecast competition; NPower Forecasting Challenge 2015.

two for review - IEEE Xplore

Apr 19, 2012 - tional thyristor-based converter or line commutated converter (LCC) ... tion of the full-bridge converter mod- ... text are the two-level, three-phase ...

Polarization-Independent Metamaterial Analog of ... - IEEE Xplore

AbstractâA polarization-independent metamaterial analog of electromagnetically induced transparency (EIT) at microwave frequencies for normal incidence ...

Temperature-Independent Ultrasensitive FabryâPerot ... - IEEE Xplore

Aug 20, 2014 - Jian Wang. Liang Lu ..... [10] T. Wei, Y. K. Han, Y. J. Li, H. L. Tsai, and H. Xiao, âTemperature-insensitive miniaturized fiber inline FabryâPerot in-.

Content-independent task-focused recommendation - IEEE Xplore

University of Minnesota. Content-Independent. Task-Focused. Recommendation. A technique that correlates database items to a task adds content-independent ...

Polarization Demultiplexing by Independent Component ... - IEEE Xplore

Jun 1, 2010 - facilitate digital phase estimation but also make it possible to im- plement impairment compensation in the digital domain [1]â[3]. Moreover ...

Context Independent Continuous Authentication using ... - IEEE Xplore

tor as fast as possible to control the amount of damage that ... where the analysis was performed based on a fixed num- ber of actions or fixed time period. Also ...

Kurtosis-Based Constrained Independent Component ... - IEEE Xplore

Jan 2, 2014 - Kurtosis-Based Constrained Independent Component. Analysis and Its Application on Source. Contribution Quantitative Estimation. Jie Zhang ...

Learning Relational Bayesian Classifiers from RDF Data

Hence, it is natural to consider the use of machine learning ... limit their applicability in practice. Specifically ... learning algorithm having direct access to RDF data. ..... A of cardinality at most Z so as to optimize P. This problem is a vari

Learning Classifiers from Synthetic Data Using a

Mar 11, 2015 - and jointly learn from synthetic and real data, this paper proposes a Multichannel Autoencoder(MCAE). We show that by suing .... a Multichannel Autoencoder (MCAE) model to bridge ..... and differences with the augmented data used in de

Assessing Student Perception of Extreme ... - IEEE Xplore

Index TermsâComputer Science Education, eXtreme Appren- ... questionnaires such as the Short Courses Feedback Survey ... free-format comments. III.

Assessing Communications Technology Options for ... - IEEE Xplore

Section III reviews communication options for Smart Grid applications and .... communication vehicles acting as base stations during emergencies. This is.

Spatial-Contextual Supervised Classifiers Explored: A ... - IEEE Xplore

Spatial-Contextual Supervised Classifiers Explored: A Challenging Example of Lithostratigraphy. Classification. Matthew J. Cracknell and Anya M. Reading.

IEEE 802.21: Media Independent Handover: Features ... - IEEE Xplore

liferation of various Wi-Fi [1] access technolo- ... peting business interests make it unlikely that the industry will ... phone models now support both Wi-Fi and third.

Text Line Extraction Using DMLP Classifiers for ... - IEEE Xplore

Email: {firstname.lastname}@unifr.ch ... step, an algorithm takes the layout recognition results as an input, ... First, a layout analysis is performed to recognize.

Assessing Classifiers from Two Independent Data Sets ... - IEEE Xplore

Download PDF

2 downloads 0 Views 657KB Size Report

Comment

Sep 14, 2006 - Waleed A. Yousef, Member, IEEE, Robert F. Wagner, Fellow, IEEE, and Murray H. ... where F above is the distribution of the log-likelihood ratio.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 28, NO. 11,

NOVEMBER 2006

1809

Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach Waleed A. Yousef, Member, IEEE, Robert F. Wagner, Fellow, IEEE, and Murray H. Loew, Fellow, IEEE Abstract—This paper considers binary classification. We assess a classifier in terms of the Area Under the ROC Curve (AUC). We estimate three important parameters, the conditional AUC (conditional on a particular training set) and the mean and variance of this AUC. We derive, as well, a closed form expression of the variance of the estimator of the AUC. This expression exhibits several components of variance that facilitate an understanding for the sources of uncertainty of that estimate. In addition, we estimate this variance, i.e., the variance of the conditional AUC estimator. Our approach is nonparametric and based on general methods from U-statistics; it addresses the case where the data distribution is neither known nor modeled and where there are only two available data sets, the training and testing sets. Finally, we illustrate some simulation results for these estimators. Index Terms—Classification, nonparametric statistics, ROC analysis.

Ç 1

INTRODUCTION

T

HIS paper

addresses the problem of assessment of binary classifiers in both the mean and the variance. We have previously reported on our investigations of this problem that were based entirely on resampling strategies; see [1] and [2]. The present work provides a more formal treatment of the problem beginning with a concise review of the elements of classical decision theory. Consider the binary classification problem where an observation ti ¼ ðxi ; yi Þ has the p-dimensional feature vector xi (the predictor) and belongs to the class yi (the response). In binary classification, the response yi is either the class !1 or the class !2 . Assume the availability of a training data set tr ¼ fti : ti ¼ ðxi ; yi Þ; i ¼ 1; . . . ; ntr g; this data set is used to learn the probabilistic structure of the problem and to design a classifier (classification rule) tr . For any future observation having a feature vector x0 of unknown class, the classification rule tr is designed to predict the class of the observation, i.e., tr ðx0 Þ equals !1 or !2 . The subscript tr indicates conditioning on the given training data. One type of classification rules produces an estimate of the log-likelihood ratio hðXÞ, the ratio between the probability density functions fX ðxj!1 Þ and fX ðxj!2 Þ and compares it to a threshold: !1

> hbtr ðXÞ!