CSCI 226: Advanced Database System Performance Measurements ...

14 downloads 107 Views 42KB Size Report
CSCI 226: Advanced Database System ... medical diagnostic test for a ... also need to know the. Sensitivity of the test. A test with a high specificity has a low Type ...
CSCI 226: Advanced Database System Performance Measurements Basics By

Dr. Yu Cao Department of Computer Science California State University, Fresno Fresno, CA 93740, USA

1

True Negative, False Negative, False Positive, True Positive Ground Truth

Negative, No Healthy

Disease, Positive, Disease, Sick

Detected Results Negative, No Healthy

Disease, True Negative

Positive, Disease, Sick

False Positive (Type I error)

False Negative (Type II error, Miss)

True Positive

2

1

Ground Truth

Specificity

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Detected Results

• In binary testing, e.g. a medical diagnostic test for a certain disease, Specificity is the proportion of true negatives of all the negative samples tested, that is

Specificity =

NumberOfTrueNegatives NumberOfTrueNegatives + NumberOfFalsePositives

3

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Specificity Detected Results

• For a test to determine who has a certain disease, a specificity of 100% means that all healthy people are labeled as healthy. Specificity alone does not tell us all about the test, because a 100% specificity can be trivially achieved by labeling all test cases negative. Therefore, we also need to know the Sensitivity of the test. A test with a high specificity has a low Type I error. Specificity =

NumberOfTrueNegatives NumberOfTrueNegatives + NumberOfFalsePositives 4

2

Sensitivity

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Detected Results

• The Sensitivity of a binary classification test or algorithm, such as a blood test to determine if a person has a certain disease. The sensitivity of such a test is the proportion of those cases having a positive test result of all positive cases (e.g., people with the disease, faulty products) tested.

Sensitivity =

NumberOfTruePositives NumberOfTruePositives + NumberOfFalseNegatives

5

Sensitivity

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Detected Results

• A sensitivity of 100% means that all sick people or faulty products are recognized as such. Sensitivity alone does not tell us all about the test, because a 100% sensitivity can be trivially achieved by labeling all test cases positive. Therefore, we also need to know the specificity of the test.

Sensitivity =

NumberOfTruePositives NumberOfTruePositives + NumberOfFalseNegatives 6

3

False Negative Rate

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Detected Results

• The False Negative Rate is the proportion of negative instances that were erroneously reported as positive. It is equal to 1 minus the specificity of the test. FalseNegativeRate =

NumberOfFalseNegative = 1 − Sensitivity NumberOfPositives

7

False Positive Rate

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Detected Results

• The False Positive Rate is the proportion of negative instances that were erroneously reported as positive. It is equal to 1 minus the specificity of the test

FalsePositiveRate =

NumberOfFalsePositive = 1 − Specificity NumberOfNagatives

8

4

Positive Predictive Value (Precision)

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss)

Positive, Disease, Sick

False Positive (Type I error)

True Positive

Detected Results

• Positive Predictive Value (Precision) defined as

Positive Pr edictiveValue =

NumberOfTruePositive NumberOfTruePositives + NumberOfFalsePositives

9

Recall

Ground Truth

Negative, No Disease, Healthy

Positive, Disease, Sick

Negative, No Disease, Healthy

True Negative

False Negative (Type II error, Miss) Missed

Positive, Disease, Sick

False Positive (Type I error) NonRelevant

True Positive Relevant

Detected Results

Sensitivity = Re call =

Re levant Re levant + Missed

Positive Pr edictiveVa lue = Pr ecision =

Re levant Re levant + Non Re levant 10

5