Ensemble of detectors

0 downloads 0 Views 2MB Size Report
Event Labeling Combining Ensemble. Detectors and ... Label each time instant to Event and Non-Event ... Automatic Labeling : Individual detectors (Sequential).
Event Labeling Combining Ensemble Detectors and Background Knowledge

Hadi Fanaee-T & Joao Gama Laboratory of Artificial Intelligence and Decision Support (LIAAD) University of Porto

• •

Event Labeling Solutions

Outline

– Manual – Automatic • Indidivudal Detectors • Ensemble Detectors • Ensemble Detectors + Background knolwedge

• • • • • •

Research metholodgy Data Set Experiments Resutls Conclusion Future works 2

Event Labeling Input

Univariate time series

Environment information

Semiauto

Output • Assign abnormality score to each instant • Label each time instant to Event and Non-Event

3

Manual labeling

Semi-auto Individual Expert

Multiple Experts

Pro : Relatively accurate Con: 1) Time-consuming and expensive (2) infeasible for large volume of data (3) Impossible when there is no access to human experts 4) varying quality depending on the annotator Daniel Kahneman(Nobel ,Economic Sciences ,2002) : Humans are inconsistent in making summary judgment of complex information 4

Automatic Labeling : Individual detectors (Sequential) Analysis of time series based on the time series history Control Chart

Moving Average

CuSUM Wavelets

Exponential Smoothing ARIMA

SSA

Kalman Filter

Pro : Do not require the environment parameters (good for unknown environment) Con: 1) Uncertainty 2) Low accuracy when change occurs in environment (e.g. sudden rain)

5

Automatic Labeling : Individual detectors (Model-based) Generate a predictive model based on environmental parameters Regression

Pro : Robust again environmental changes Con: 1) Uncertainty 2) Low accuracy when there is no sufficient environmental information 6

Automatic Labeling : Ensemble of detectors An ensemble of detectors (model-based or sequential)

Pro : Less uncertainty comparing individual detectors Con: Higher false alarm rate

Question : How we can reduce the false alarms? 7

Automatic Labeling : Our proposed solution Combining ensemble detectors and background knwoledge

Pro : Lower false alarms Con: Background knowledge is not available for all applications 8

Research methodology • Research Question: is the proposed solution more accurate than ensemble detectors and individual detectors? • We try to evaluate the above hypothesis using sufficient experiments on an appropriate data set

9

Dataset Selection •

Question : There are dozens of time series data that can be used for evaluation of anomaly or event detection algorithms. We can not use one of them? Answer : Yes, if they come with the following additional sources

• – – –



Corresponding environmental information for each time instant External background knowledge source for verification. Human specialists for result confirmation

Bike Sharing Data Set meets all above requirements 10

11

Why bike sharing dataset?

Environment Information is avialable (e.g. weather in daily and hourly scales), and holidays, etc.



External Knowledge Source World Wide Web (WWW) (text, videos, Images)



There exists an access to human specialist to confirm the result

 12

Washington D.C. Bike Sharing Dataset Raw Data set Log of 3,807,587 travels during year 2011 and 2012 Aggregate

Time series of counts • Hourly: 17,379 hours • Daily:731 days 13

Experiment Setup

14

Ensemble Configuration Ensemble architecture is designed such a way that satisfy: Fleiss' kappa ~ 0

15

Detectors Setting

16

Selection of Predictive Model

17

Results

18

Conclusion



>

Ensemble detectors + background knowledge

Ensemble detectors

>

Individual detectors

knowledge sources like WWW can be potential  Computer-based sources for aiding event and anomaly detection systems

 Bike Sharing data is interesting for research! 19

Future Works

Using text processing and information extraction techniques for automatic extraction of information from knowledge sources

Testing different ensemble architectures

Using other knowledge sources and other knowledge interfaces.

20

An example of other knowledge sources I found an abnormal instant. Before sending alarm I need you please check this query: A=128 and B=234 and C=183

The z-score of the query is 3.56

Knowledge interface

Sensor with limited storage and memory

Oh! 3.56! I signal an alarm

Data Warehouse 21

[email protected]

Google : UCI + Bike Sharing Dataset

Data set

Questions?

Google : Event Labeling combining ensemble detectors and background knowledge

Article

22