18. Metric. Formula. Average classifica on accuracy ... Precision: Frac on of retrieved instances that are relevant ...
CSE342/542 -‐ Lecture 2 January 10, 2013
Slides from Bebis and other sources
Probability Review • Prior probability – P(Face)=0.1 means that “in the absence of any other informa0on, there is a 10% chance that the any image contains a face”.
• Posterior probability – P(Face/Eyes) = 0.95 means that “there is 95% chance that the image contains a face given that it has eyes”
2
Probability Review • CondiOonal probabiliOes can be defined in terms of uncondiOonal probabiliOes:
P(A,B) P(B, A) P(A /B) = , P(B / A) = P(B) P(A)
€
• CondiOonal probabiliOes lead to the chain rule: P(A,B) = P(A /B)P(B) = P(B / A)P(A) 3
Probability Review • Law of Total Probability – If A1, A2, …, An is a parOOon of mutually exclusive events and B is any event, then
4
Bayes’ Theorem • CondiOonal probabiliOes lead to the Bayes’ rule:
• Example: consider the probability of Disease given Symptom: P(Symptom / Disease)P(Disease) P(Disease / Symptom) = P(Symptom) P(Symptom) = P(Symptom / Disease)P(Disease) + P(Symptom / NoDisease)P(NoDisease) 5
Bayes’ Theorem • MeningiOs causes a sOff neck 50% of the Ome. • A paOent comes in with a sOff neck – what is the probability that he has meningiOs? • Need to know two things: – The prior probability of a paOent having meningiOs (1/50,000) – The prior probability of a paOent having a sOff neck (1/20)
P(M /S) = 0.0002 6
Bayes’ Theorem • If A1, A2, …, An is a parOOon of mutually exclusive events and B is any event, then the Bayes’ rule is given by:
7
Random Variables • A discrete random variable X is one that can assume only a finite or countably infinite number of disOnct values. • A conOnuous random variable is one that can assume any non-‐countably infinite number of values. • The collecOon of probabiliOes associated with the different values of a random variable is called the probability distribuOon of the random variable. 8
Random Variables • For a discrete random variable, the probability distribuOon is called the probability mass funcOon (pmf). • For a conOnuous random variable, it is called the probability density funcOon (pdf).
9
Random Variables • For n random variables, the joint pmf assigns a probability for each possible combinaOon of values:
p(x1,x2,…,xn)=P(X1=x1, X2=x2, …, Xn=xn) • Specifying the joint pmf requires an enormous number of values – kn assuming n random variables where each one can assume one of k discrete values.
10
Random Variables • From a joint probability, we can compute the probability of any subset of the variables by marginalizaOon: – Example -‐ case of joint pmf :
– Examples -‐ case of joint pdf :
11
ExpectaOon • Expected value of a funcOon g(x)
12
Variance • Variance of a random variable X is defined as
• Sample variance for a random variable X is defined as
13
Covariance and CorrelaOon • Covariance of two random variables
• CorrelaOon coefficient between two random variables
• Sample covariance matrix is given by
14
Covariance • Covariance matrix of two random variables
• Covariance matrix of n random variables
15
CorrelaOon • X and Y are uncorrelated if
• If n random variables are independent, then
• Note that if X and Y are independent, then their correlaOon coefficient is zero but not all uncorrelated variables are independent 16
EvaluaOon Metrics Let the problem statement be: classifying between dogs and cats. Dogs are labeled as posiOve class and cats are labeled as negaOve class Predicted Class
Actual Class
Nega/ve
Posi/ve
Nega/ve
A (true negaOve)
C (false posiOve)
Posi/ve
D (false negaOve)
B (true posiOve)
Term
Meaning
Example
True posiOve
Correctly idenOfied
Dog idenOfied as dog
False posiOve
Incorrectly idenOfied
Cat idenOfied as dog
True negaOve
Correctly rejected
Cat idenOfied as cat
False negaOve
Incorrectly rejected
Dog idenOfied as cat
hjp://www.cs.rpi.edu/~leen/misc-‐publicaOons/SomeStatDefs.html
17
EvaluaOon Metrics Predicted Class
Actual Class
Nega/ve
Posi/ve
Nega/ve
A (true negaOve)
C (false posiOve)
Posi/ve
D (false negaOve)
B (true posiOve)
Metric
Formula
Average classificaOon accuracy
(TN + TP) / (TN+TP+FN+FP)
Type I error (false posiOve rate)
FP / (TN + FP)
Type II error (false negaOve rate)
FN / (FN + TP)
True posiOve rate
TP / (TP + FN)
True negaOve rate
TN / (TN + FP)
hjp://www.cs.rpi.edu/~leen/misc-‐publicaOons/SomeStatDefs.html
18
EvaluaOon Metrics Predicted Class
Actual Class
Nega/ve
Posi/ve
Nega/ve
A (true negaOve)
C (false posiOve)
Posi/ve
D (false negaOve)
B (true posiOve)
Metric
Formula
Average classificaOon accuracy
(TN + TP) / (TN+TP+FN+FP)
Type I error (false posiOve rate)
FP / (TN + FP)
Type II error (false negaOve rate)
FN / (FN + TP) TP / (TP Prevalent + FN) in computer vision TN /related (TN + FcP) lassificaOon problems
True posiOve rate True negaOve rate
hjp://www.cs.rpi.edu/~leen/misc-‐publicaOons/SomeStatDefs.html
19
EvaluaOon Metrics Metric
Formula
Precision
TP / (TP + FP)
Recall
TP / (TP + FN)
SensiOvity
TP / (TP + FN)
Specificity
TN / (TN + FP)
Precision: FracOon of retrieved instances that are relevant Recall: FracOon of relevant instances that are retrieved
20
EvaluaOon Metrics Metric
Formula
Precision
TP / (TP + FP)
Recall
TP / (TP + FN)
SensiOvity
TP / (TP + FN)
Specificity
TN / (TN + FP)
Precision: FracOon of retrieved instances that are relevant Recall: FracOon of relevant instances that are retrieved
What does the precision score of 1.0 mean? What does the recall score of 1.0 mean? 21
EvaluaOon Metrics Metric
Formula
Precision
TP / (TP + FP)
Recall
TP / (TP + FN)
Precision: FracOon of retrieved instances that are relevant More prevalent Recall: FracOon of relevant instances that are retrieved in informaOon retrieval domain What does the precision score of 1.0 mean? What does the recall score of 1.0 mean? 22
EvaluaOon Metrics Metric
Formula
Precision
TP / (TP + FP)
Recall
TP / (TP + FN)
SensiOvity
TP / (TP + FN)
Specificity
TN / (TN + FP)
SensiOvity: ProporOon of actual posiOves which are correctly idenOfied Specificity: ProporOon of actual negaOves which are correctly idenOfied
23
EvaluaOon Metrics Metric
Formula
SensiOvity
TP / (TP + FN)
Specificity
TN / (TN + FP)
PredicOve value for a posiOve result (PV+)
TP / (TP + FP)
PredicOve value for a negaOve result (PV-‐)
TN / (TN + FN)
SensiOvity: ProporOon of actual posiOves which are correctly idenOfied Specificity: ProporOon of actual negaOves which are correctly idenOfied What does the sensiOvity score of 1.0 mean? What does the specificity score of 1.0 mean? 24
EvaluaOon Metrics Metric
Formula
SensiOvity
TP / (TP + FN)
Specificity
TN / (TN + FP)
PredicOve value for a posiOve result (PV+)
TP / (TP + FP)
PredicOve value for a negaOve result (PV-‐)
TN / (TN + FN)
SensiOvity: ProporOon of actual posiOves which are correctly idenOfied More prevalent Specificity: ProporOon of actual negaOves which are correctly idenOfied in research related to medical sciences What does the sensiOvity score of 1.0 mean? What does the specificity score of 1.0 mean? 25
EvaluaOon Metrics • Type I error or false posiOve rate
– The chance of incorrectly classifying a (randomly selected) sample as posiOve
• Type II error or false negaOve rate
– The chance of incorrectly classificaOon a (randomly selected) sample as negaOve
• Precision
– Probability that a (randomly selected) retrieved document in relevant
• Recall
– Probability that a (randomly selected) relevant document is retrieved in a search 26
EvaluaOon Metrics • SensiOvity – The chance of correctly idenOfying posiOve samples – A sensiOve test helps rule out disease (when the result is negaOve) • Specificity – The chance of correctly classifying negaOve samples – A very specific test rules in disease with a higher degree of confidence • PredicOve value of a posiOve result – If the test is posiOve, what is the probability that the paOent actually has the disease • PredicOve value of a negaOve result – If the test is negaOve, what is the probability that the paOent does not have the disease 27
Performance EvaluaOon • ClassificaOon is of two types: – AuthenOcaOon / verificaOon (1:1 Matching) • Is she Richa? • Is this an image of a cat?
– IdenOficaOon (1:n matching) • Who’s photo is this? • This image belongs to which class?
28
Performance EvaluaOon • Receiver operaOng characterisOcs (ROC) curve – For authenOcaOon/verificaOon – False posiOve rate vs true posiOve rate
• DetecOon error-‐tradeoff (DET) curve – False posiOve rate vs false negaOve rate
• CumulaOve match curve (CMC) – Rank vs idenOficaOon accuracy
29
Quiz on Monday, January 14, 2013
30