Distributed and parallel time series feature extraction for industrial big data applications Maximilian Christ, Andreas W. Kempa-Liehr , Michael Feindt Blue Yonder GmbH, Karlsruhe, Germany
Maximilian Christ
[email protected] @MaxBenChrist maximilianchrist.com ACML, 16.11.2016 1
iPRODICT Intelligent Process Prediction based on Big Data Analytics
2
Time Series Classification / Regression
Good? Bad? C=0.05%
ERP
Brand=SuperSteel 3
Industrial applications + Inhomogeneous sources
Decentral processing
several time series and metainformation simultaneously
latency issues, memory captivities exceeded, process close to source
Big Data scale with the number of time series, number of devices and length of time series
Robustness
Explainability
labeled samples are expensive, overfitting is bad
clients ask to justify results, traceability of results is mandatory
4
Two approaches: Time Series Classification Directly
Feature-based
9000 features with linear k-NN with Dynamic Time discriminant analyzer** Warping Distance (DTW)* filtering Inhomogeneous sources Decentral processing Big Data Robustness Explainability
✕ ? ? ? ✕
√ √ √ √ √
* Ratanamahatana, Chotirat Ann, and Eamonn Keogh. "Making time-series classification more accurate using learned constraints." SDM, 2004. ** Fulcher, Ben D., and Nick S. Jones. "Highly comparative feature-based time-series classification." IEEE Transactions on Knowledge and Data Engineering 26.12 (2014): 3026-3037.
Max Number Peaks
Median Mean
Min
f (t1 , . . . , tl ) = f 2 R
Good? Bad? C=0.05%
ERP
Brand=SuperSteel 8
Feature Extraction
Good? Bad? C=0.05%
ERP
Brand=SuperSteel
0
x11 B x21 B X=B . @ ..
xm1
x12 x22 .. .
··· ···
xm2
···
1
x1n x2n C C .. C . A
xmn
9
Feature Extraction
FeatuRe Extraction based on Scalable Hypothesis tests (FRESH)
Good? Bad? C=0.05%
ERP
Brand=SuperSteel
0
x11 B x21 B X=B . @ ..
xm1
x12 x22 .. .
··· ···
xm2
···
1
x1n x2n C C .. C . A
xmn
10
Feature Extraction
Many time series 53x250 > 13k Features Not all relevant
Many time series 53x250 > 13k Features Not all relevant
Robustness labeled samples are expensive, overfitting is bad
0
Feature Extraction
x11 B x21 B X=B . @ ..
xm1
x x
x
FRESH is a three step procedure 1. features extraction 2. feature significance 3. multiple testing
Address feature significance individual Hypothesis test for each feature
number of wrongly extracted features F ER = number of extracted features E(F ER) = q
FRESH controls false extraction rate asymptotically for 1. every distributions 2. every dependency structure by Benjamini Yekutieli procedure
number of wrongly extracted features F ER = number of extracted features E(F ER) = q
only parameter of fresh FRESH controls false extraction rate asymptotically for 1. every distributions 2. every dependency structure by Benjamini Yekutieli procedure
number of wrongly extracted features F ER = number of extracted features E(F ER) = q q = 7 %, 100 extracted features 7 irrelevant features 93 relevant fatures
FRESH controls false extraction rate asymptotically for 1. every distributions 2. every dependency structure by Benjamini Yekutieli procedure
Industrial applications + Inhomogeneous sources feature based
Decentral processing highly parallel
Robustness individual feature testing
Big Data highly parallel, linear runtime
Explainability feature based 22
http://github.com/blue-yonder/tsfresh