Event Labeling Combining Ensemble Detectors and Background Knowledge
Hadi Fanaee-T & Joao Gama Laboratory of Artificial Intelligence and Decision Support (LIAAD) University of Porto
• •
Event Labeling Solutions
Outline
– Manual – Automatic • Indidivudal Detectors • Ensemble Detectors • Ensemble Detectors + Background knolwedge
• • • • • •
Research metholodgy Data Set Experiments Resutls Conclusion Future works 2
Event Labeling Input
Univariate time series
Environment information
Semiauto
Output • Assign abnormality score to each instant • Label each time instant to Event and Non-Event
3
Manual labeling
Semi-auto Individual Expert
Multiple Experts
Pro : Relatively accurate Con: 1) Time-consuming and expensive (2) infeasible for large volume of data (3) Impossible when there is no access to human experts 4) varying quality depending on the annotator Daniel Kahneman(Nobel ,Economic Sciences ,2002) : Humans are inconsistent in making summary judgment of complex information 4
Automatic Labeling : Individual detectors (Sequential) Analysis of time series based on the time series history Control Chart
Moving Average
CuSUM Wavelets
Exponential Smoothing ARIMA
SSA
Kalman Filter
Pro : Do not require the environment parameters (good for unknown environment) Con: 1) Uncertainty 2) Low accuracy when change occurs in environment (e.g. sudden rain)
5
Automatic Labeling : Individual detectors (Model-based) Generate a predictive model based on environmental parameters Regression
Pro : Robust again environmental changes Con: 1) Uncertainty 2) Low accuracy when there is no sufficient environmental information 6
Automatic Labeling : Ensemble of detectors An ensemble of detectors (model-based or sequential)
Pro : Less uncertainty comparing individual detectors Con: Higher false alarm rate
Question : How we can reduce the false alarms? 7
Automatic Labeling : Our proposed solution Combining ensemble detectors and background knwoledge
Pro : Lower false alarms Con: Background knowledge is not available for all applications 8
Research methodology • Research Question: is the proposed solution more accurate than ensemble detectors and individual detectors? • We try to evaluate the above hypothesis using sufficient experiments on an appropriate data set
9
Dataset Selection •
Question : There are dozens of time series data that can be used for evaluation of anomaly or event detection algorithms. We can not use one of them? Answer : Yes, if they come with the following additional sources
• – – –
•
Corresponding environmental information for each time instant External background knowledge source for verification. Human specialists for result confirmation
Bike Sharing Data Set meets all above requirements 10
11
Why bike sharing dataset?
Environment Information is avialable (e.g. weather in daily and hourly scales), and holidays, etc.
External Knowledge Source World Wide Web (WWW) (text, videos, Images)
There exists an access to human specialist to confirm the result
12
Washington D.C. Bike Sharing Dataset Raw Data set Log of 3,807,587 travels during year 2011 and 2012 Aggregate
Time series of counts • Hourly: 17,379 hours • Daily:731 days 13
Experiment Setup
14
Ensemble Configuration Ensemble architecture is designed such a way that satisfy: Fleiss' kappa ~ 0
15
Detectors Setting
16
Selection of Predictive Model
17
Results
18
Conclusion
>
Ensemble detectors + background knowledge
Ensemble detectors
>
Individual detectors
knowledge sources like WWW can be potential Computer-based sources for aiding event and anomaly detection systems
Bike Sharing data is interesting for research! 19
Future Works
Using text processing and information extraction techniques for automatic extraction of information from knowledge sources
Testing different ensemble architectures
Using other knowledge sources and other knowledge interfaces.
20
An example of other knowledge sources I found an abnormal instant. Before sending alarm I need you please check this query: A=128 and B=234 and C=183
The z-score of the query is 3.56
Knowledge interface
Sensor with limited storage and memory
Oh! 3.56! I signal an alarm
Data Warehouse 21
[email protected]
Google : UCI + Bike Sharing Dataset
Data set
Questions?
Google : Event Labeling combining ensemble detectors and background knowledge
Article
22