Proactive Learning with Multiple Class-Sensitive Labelers

2 downloads 4 Views 2MB Size Report
Oct 30, 2014 - Proactive Learning with. Multiple Class-Sensitive Labelers. Seungwhan (Shane) Moon, Jaime Carbonell. Scho
Proactive Learning with Multiple Class-Sensitive Labelers

Seungwhan (Shane) Moon, Jaime Carbonell School of Computer Science, Carnegie Mellon University DSAA 2014 Conference 10/30/2014

Proactive Learning with Multiple Class-Sensitive Labelers

Seungwhan (Shane) Moon, Jaime Carbonell School of Computer Science, Carnegie Mellon University DSAA 2014 Conference 10/30/2014

Unlabeled Data is Abundant

3

Unlabeled Data is Abundant •

Imagine building a Vehicle classifier

Scarcity of labeled data 4

Active Learning

5

Active Learning

6

Query Strategies •

Uncertainty Sampling



Query by Committee



Entropy Based Sampling



Density Weighted Methods



and more …

7

Uncertainty Sampling

Label 1 Label 2 Unlabeled

Current Decision Boundary 8

Uncertainty Sampling

= most uncertain

Label 1 Label 2 Unlabeled

Current Decision Boundary 9

Assumptions in Traditional Active Learning



Annotator(s) always give perfect answers (oracle)



There is no difference in cost for querying different annotators

10

Proactive Learning [Carbonell et. al] •

Relaxes the following assumptions: •

Only a single annotator gives labels



Annotators always give perfect answers



Annotators are insensitive to costs —> utility optimization under budget constraint

11

Proactive Learning [Carbonell et. al] Multiple annotators They have different labeling accuracy (expertise) incur different cost

12

Proactive Learning [Carbonell et. al] Key Component: Estimating Labeler Accuracy Probability of getting a right answer for an unlabeled instance x, and an expert k

Limitation in previous literature on proactive learning

Labeler accuracy is independent of label in multi-class problems 13

Proactive Learning with Multiple Domain Experts: Anology Motivation Diagnosis of a patient with unknown disease (uncertainty in data)

14

Proactive Learning with Multiple Domain Experts: Anology Motivation Diagnosis of a patient with unknown disease (uncertainty in data) Given multiple physicians with different specialization (multiple class-sensitive experts) If we know the patient has seemingly cancer symptoms (posterior class probability) And that oncologist treats cancer issues (estimated labeler accuracy given a specific class) Better delegate a task to its respective expert 15

Proactive Learning with Multiple Domain Experts Problem Formulation (Objective)

: : : Greedy Approximation

16

Proactive Learning with Multiple Domain Experts Utility Criteria for Greedy Approximation

Jointly optimize for an instance and expert pair which - has high information value V(X) (instance) - has high probability of getting the right answer (both) - has low cost of annotation (expert) 17

Expert Estimation Estimating Expertise of Labeling Sources

class posterior probability of label for sample x being c

over set of categories

18

the estimated probability of expert k answering for label c

Expert Estimation Estimating Expertise of Labeling Sources

Per-class Reduced Estimation

19

Density Based Sampling for Multiclassification Tasks

Label 1 Label 2 Label 3 Unlabeled

Current Decision Boundary 20

Density Based Sampling for Multiclassification Tasks

Label 1 Label 2 Label 3 Unlabeled

Current Decision Boundary 21

Density Based Sampling for Multiclassification Tasks Def: Multi-class Information Density (MCID) (1) Density

(2) Unknownness

(3) Conflictivity Final Value Function 22

Density Based Sampling for Multiclassification Tasks Induce Density using a Gaussian Mixture Model

Each Mixture Sharing the Same Variance

Estimation via an EM Procedure

23

So far: New Proactive Learning Algorithm for Multiple Domain Experts

Multi-class Information Density (MCID) as a query strategy 24

Experiments Dataset

Simulated Noisy Labelers (except for Diabetes dataset) Narrow Experts: Classifier trained over partially noised dataset (expertise in only a subset of classes) Meta Expert: Classifier trained over the entire dataset 25

Baselines

Best Avg: learner always asks one of the narrow experts that has the highest average P(ans|x, k) Meta : learner always asks meta-oracle (expensive) BestAvg+Meta: joint optimization under uniform reliability assumption (Donmez et al., 2012) *Narrow: joint optimization using our algorithm *Narrow+Meta: with the presence of an meta oracle as well

Classification Performance Over Iterations

Cost Ratio of Narrow vs Meta: 1:6 27

Classification Performance for Different Cost Ratios

28

On other datasets

29

Classification Performance vs. Budget Allocated for Expertise Estimation

- Works for both when there are ground truth samples available & via majority votes - Is able to estimate expertise well enough with ~10% budget 30

Conclusions •

A new proactive learning algorithm with multiple class sensitive labellers accounts better than baselines



Efficient estimation of expert’s expertise via reduced per-class method



Multi-class Information Density (MCID) as a new active learning criteria for noised multi-class active learning 31

Future Work



Theoretical min-max bounds of the proposed algorithm, under different reliabilities and costs of the experts



Extend the framework to a crowdsourcing scenario with a larger pool of experts

32

Proactive Learning with Multiple Class-Sensitive Labelers

Seungwhan Moon, Jaime Carbonell Language Technology Institute School of Computer Science, Carnegie Mellon University DSAA 2014 Conference 10/30/2014 33

MCID Performance

34

Performance when expertise was estimated via Majority Vote

Proactive Learning Algorithm

36

Expertise Estimation

37

Suggest Documents