Distributed and parallel time series feature extraction for industrial big ...

100 downloads 40322 Views 4MB Size Report
Intelligent Process Prediction based on Big Data. Analytics. 2 ... Big Data scale with the number of time series, number of devices and ... discriminant analyzer**.
Distributed and parallel time series feature extraction for industrial big data applications Maximilian Christ, Andreas W. Kempa-Liehr , Michael Feindt Blue Yonder GmbH, Karlsruhe, Germany 


Maximilian Christ [email protected] @MaxBenChrist maximilianchrist.com ACML, 16.11.2016 1

iPRODICT Intelligent Process Prediction based on Big Data Analytics

2

Time Series Classification / Regression

Good? Bad? C=0.05%

ERP

Brand=SuperSteel 3

Industrial applications + Inhomogeneous sources

Decentral processing

several time series and metainformation simultaneously

latency issues, memory captivities exceeded, process close to source

Big Data scale with the number of time series, number of devices and length of time series

Robustness

Explainability

labeled samples are expensive, overfitting is bad

clients ask to justify results, traceability of results is mandatory

4

Two approaches: Time Series Classification Directly

Feature-based

9000 features with linear k-NN with Dynamic Time discriminant analyzer** Warping Distance (DTW)* filtering Inhomogeneous sources Decentral processing Big Data Robustness Explainability

✕ ? ? ? ✕

√ √ √ √ √

* Ratanamahatana, Chotirat Ann, and Eamonn Keogh. "Making time-series classification more accurate using learned constraints." SDM, 2004. ** Fulcher, Ben D., and Nick S. Jones. "Highly comparative feature-based time-series classification." IEEE Transactions on Knowledge and Data Engineering 26.12 (2014): 3026-3037.

Max Number Peaks

Median Mean

Min

f (t1 , . . . , tl ) = f 2 R

Good? Bad? C=0.05%

ERP

Brand=SuperSteel 8

Feature Extraction

Good? Bad? C=0.05%

ERP

Brand=SuperSteel

0

x11 B x21 B X=B . @ ..

xm1

x12 x22 .. .

··· ···

xm2

···

1

x1n x2n C C .. C . A

xmn

9

Feature Extraction

FeatuRe Extraction based on Scalable Hypothesis tests (FRESH)

Good? Bad? C=0.05%

ERP

Brand=SuperSteel

0

x11 B x21 B X=B . @ ..

xm1

x12 x22 .. .

··· ···

xm2

···

1

x1n x2n C C .. C . A

xmn

10

Feature Extraction

Many time series 53x250 > 13k Features Not all relevant

Many time series 53x250 > 13k Features Not all relevant

Robustness labeled samples are expensive, overfitting is bad

0

Feature Extraction

x11 B x21 B X=B . @ ..

xm1

x x

x

FRESH is a three step procedure 1. features extraction 2. feature significance 3. multiple testing

Address feature significance individual Hypothesis test for each feature

number of wrongly extracted features F ER = number of extracted features E(F ER) = q

FRESH controls false extraction rate asymptotically for 1. every distributions 2. every dependency structure by Benjamini Yekutieli procedure

number of wrongly extracted features F ER = number of extracted features E(F ER) = q

only parameter of fresh FRESH controls false extraction rate asymptotically for 1. every distributions 2. every dependency structure by Benjamini Yekutieli procedure

number of wrongly extracted features F ER = number of extracted features E(F ER) = q q = 7 %, 100 extracted features 7 irrelevant features 93 relevant fatures

FRESH controls false extraction rate asymptotically for 1. every distributions 2. every dependency structure by Benjamini Yekutieli procedure

Industrial applications + Inhomogeneous sources feature based

Decentral processing highly parallel

Robustness individual feature testing

Big Data highly parallel, linear runtime

Explainability feature based 22

http://github.com/blue-yonder/tsfresh

Suggest Documents