Document not found! Please try again

Multi-Layer Neural Network - UGent Biblio

0 downloads 0 Views 2MB Size Report
Jul 21, 2018 - Learning Multivariate Shapelets with Multi-Layer Neural Networks. ACDL Summer School ... With time-series data, performance isn't all. Obtaining insights on ... Classification based on shapelets relies on white-box features.
Learning Multivariate Shapelets with Multi-Layer Neural Networks Roberto Medico, Joeri Ruyssinck, Dirk Deschrijver, Tom Dhaene SUMOLab - IDLab Ghent University - imec, Belgium

ACDL Summer School 21 July 2018

Interpretable features for time-series classification are useful in many applications.

Interpretable features for time-series classification are useful in many applications. With time-series data, performance isn’t all. Obtaining insights on the data is as important.

Interpretable features for time-series classification are useful in many applications. With time-series data, performance isn’t all. Obtaining insights on the data is as important. In ML most powerful classifiers are black-box. It is hard to explain a classifier’s decision.

Interpretable features for time-series classification are useful in many applications. With time-series data, performance isn’t all. Obtaining insights on the data is as important. In ML most powerful classifiers are black-box. It is hard to explain a classifier’s decision. Classification based on shapelets relies on white-box features.

We propose a generic framework to learn multivariate shapelets using Neural Networks.

We propose a generic framework to learn multivariate shapelets using Neural Networks. Shapelets are learnt jointly with a NN weights.

We propose a generic framework to learn multivariate shapelets using Neural Networks. Shapelets are learnt jointly with a NN weights. Deeper networks allow to learn shapelets in more complex scenarios.

We propose a generic framework to learn multivariate shapelets using Neural Networks. Shapelets are learnt jointly with a NN weights. Deeper networks allow to learn shapelets in more complex scenarios.

This allows to leverage from all recent advances in deep learning.

Outline An overview on shapelets Learning the shapelets Embedding the learning in NN architectures

Performance and Interpretability

An overview on shapelets

Shapelets are discriminative subsequences extracted from time-series data.

Shapelets are discriminative subsequences extracted from time-series data.

Example of univariate shapelet for arrowheads

Yeh & Keogh, 2009

Shapelets can be found by enumeration + evaluation of every potential candidate.

Shapelets can be found by enumeration + evaluation of every potential candidate. The search is performed in a decision-tree like structure.

Yeh & Keogh, 2009

Shapelets can be found by enumeration + evaluation of every potential candidate. The search is performed in a decision-tree like structure.

Each node selects shapelet and splitting point. According to a criteria, e.g. information gain.

Yeh & Keogh, 2009

Shapelets can be found by enumeration + evaluation of every potential candidate. The search is performed in a decision-tree like structure.

Each node selects shapelet and splitting point. According to a criteria, e.g. information gain. The leaves of the tree represent the classes. Yeh & Keogh, 2009

Shapelets can be found by enumeration + evaluation of every potential candidate. The search is performed in a decision-tree like structure.

Each node selects shapelet and splitting point. According to a criteria, e.g. information gain. The leaves of the tree represent the classes. High complexity

Yeh & Keogh, 2009

𝑂 𝐿3 𝑁 2

Shapelets discovery has been optimized with more efficient strategies.

Shapelets discovery has been optimized with more efficient strategies. Euclidean Distance computation can be pruned. Via online clustering (Grabocka, 2015) or early abandon (Yeh, 2009)

Shapelets discovery has been optimized with more efficient strategies. Euclidean Distance computation can be pruned. Via online clustering (Grabocka, 2015) or early abandon (Yeh, 2009) Other speed-ups can derive from reducing the candidate space. By computing a similarity distance threshold over the percentile p on the distribution of distances

Shapelets discovery has been optimized with more efficient strategies. Euclidean Distance computation can be pruned. Via online clustering (Grabocka, 2015) or early abandon (Yeh, 2009) Other speed-ups can derive from reducing the candidate space. By computing a similarity distance threshold over the percentile p on the distribution of distances FastShapelets (Rakthanmanon, 2013) employs SAX to project each timeseries in a lower-dimensional space, where shapelets are found.

Shapelets can be used to project the input data into a new feature space (Shapelet Transform).

Shapelets can be used to project the input data into a new feature space (Shapelet Transform).

Discovery and classification are separated (Hills, 2014). First shapelets are found, then used to transform the data.

L

𝐷1



𝐷𝑄−𝐿+1

The distances are computed by sliding each shapelet on the time-series and computing the distance at each location.

L

𝐷1

… min

𝑖=1,…,𝑄−𝐿+1

𝐷𝑄−𝐿+1 𝐷𝑖

The distances are computed by sliding each shapelet on the time-series and computing the distance at each location. The minimum distance is kept.

L

MDM = 𝐌 𝑁 ∗ 𝐾 𝑀0,0



… 𝑀𝑁,0

𝐷1

… min

𝑖=1,…,𝑄−𝐿+1

𝐷𝑄−𝐿+1 𝐷𝑖

The distances are computed by sliding each shapelet on the time-series and computing the distance at each location. The minimum distance is kept.

𝑀0,𝐾 …



𝑀𝑁,𝐾

The Minimum Distance Matrix (MDM) forms the new feature space. M can then be used as input data for standard ML models for classification. 𝑀𝑖,𝑗 represent how well the 𝑗𝑡ℎ shapelet fits in the 𝑖𝑡ℎ time-series.

Learning the shapelets

Instead of exhaustive search among candidates, shapelets can be learnt from the data.

Instead of exhaustive search among candidates, shapelets can be learnt from the data. Shapelets are trainable parameter of an optimization problem. They are not bound to be subseries of training data.

Instead of exhaustive search among candidates, shapelets can be learnt from the data. Shapelets are trainable parameter of an optimization problem. They are not bound to be subseries of training data. Shapelet and the parameters of a linear classifier (logistic regression) are jointly learnt (Grabocka, 2014) using the values of the MDM as predictors.

Instead of exhaustive search among candidates, shapelets can be learnt from the data.

𝐾

𝑌 ′ = 𝑊0 + ෍ 𝑀𝑖,𝑘 𝑊𝑘 𝑘=1

bias

weights min distances

Instead of exhaustive search among candidates, shapelets can be learnt from the data.

𝐾



𝑌 = 𝑊0 + ෍ 𝑀𝑖,𝑘 𝑊𝑘 𝑘=1

bias

weights min distances

𝑁

argmin ෍ 𝐿(𝑌𝑖 , 𝑌𝑖′ ) 𝑆,𝑊 𝑖=1

+ λ𝑤 𝑊

2

Weights regularization

Jointly learns S and W Classification loss

Instead of exhaustive search among candidates, shapelets can be learnt from the data. Two shapelets learnt on the UCR Coffee dataset (Grabocka et al, 2014).

The current approach has some limitations.

The current approach has some limitations. It only considers univariate shapelets. In practice, most datasets are multivariate (sensor data).

The current approach has some limitations. It only considers univariate shapelets. In practice, most datasets are multivariate (sensor data). The classifier only allows for linear decision boundary. For non-linearly separable problems, performance will be low.

Embedding the learning in NN architectures

Our proposal: A flexible framework for (multi)variate shapelets learning, to solve the existing limitations.

Our proposal: A flexible framework for (multi)variate shapelets learning, to solve the existing limitations. It can be applied to both uni- or multi-variate time-series. In multi-variate scenarios, the shapelets are learnt across channels.

Our proposal: A flexible framework for (multi)variate shapelets learning, to solve the existing limitations. It can be applied to both uni- or multi-variate time-series. In multi-variate scenarios, the shapelets are learnt across channels. The classifier allows for any decision boundary. Many hidden layers can be used to increase complexity.

Our proposal: A flexible framework for (multi)variate shapelets learning, to solve the existing limitations. It can be applied to both uni- or multi-variate time-series. In multi-variate scenarios, the shapelets are learnt across channels. The classifier allows for any decision boundary. Many hidden layers can be used to increase complexity. The focus is on learning meaningful interpretable shapelets. The framework can be used both as feature extractor or classifier.

The main idea is to embed the learning within a Neural Network architecture.

The main idea is to embed the learning within a Neural Network architecture. The introduction of a custom layer that computes the distance matrix allows a direct embedding.

The main idea is to embed the learning within a Neural Network architecture. The introduction of a custom layer that computes the distance matrix allows a direct embedding.

The learning can then benefit from all recent DL advances. This provides a more generic classifier, also suitable for more complex problems.

2D input time-series

Input Layer

Distance Layer

Dense Layer

Softmax Layer

how?

A dataset of multidimensional time-series is fed 1 as input to a Multi-Layer Neural Network.

2D input time-series

Input Layer

Distance Layer

Dense Layer

Softmax Layer

how?

A dataset of multidimensional time-series is fed 1 as input to a Multi-Layer Neural Network.

2D input time-series

Shapelets are initialized either by sampling random subsequences from the input data, or using multivariate motifs appearing in each class.

Input Layer

Distance Layer

Dense Layer

Softmax Layer

how?

2D input time-series

Input Layer

Distance Layer

Dense Layer

Softmax Layer

how?

2

The Distance Layer computes the minimum distances and projects the input data onto the new feature space.

2D input time-series

Input Layer

Distance Layer

Dense Layer

Softmax Layer 3

how?

The Softmax Layer classifies each input time-series and backpropagates the error, thus updating the shapelets and weights.

2D input time-series

Input Layer

𝑻𝑄 ∗ 𝐶

Distance Layer

𝑺𝐾 ∗𝐿 ∗ 𝐶

Dense Layer

Softmax Layer

𝑾

𝑾𝑜𝑢𝑡

how?

The initialization of the shapelets has a big impact on the model performance and the final learnt representation.

The initialization of the shapelets has a big impact on the model performance and the final learnt representation. Random sampling from N(0,1) often leads to meaningless shapelets. The optimization focuses on updating “easier” parameters.

The initialization of the shapelets has a big impact on the model performance and the final learnt representation. Random sampling from N(0,1) often leads to meaningless shapelets. The optimization focuses on updating “easier” parameters. Shapelets can be initialized by sampling random multivariate subsequences. Starting from a reasonable guess helps learning the shapelets.

The initialization of the shapelets has a big impact on the model performance and the final learnt representation. Random sampling from N(0,1) often leads to meaningless shapelets. The optimization focuses on updating “easier” parameters. Shapelets can be initialized by sampling random multivariate subsequences. Starting from a reasonable guess helps learning the shapelets. Meaningful multivariate motifs to be used as initial candidates. This guarantees that the shapelets are recurrent motifs, though not yet discriminative.

The Matrix Profile (MP, Yeh 2016 ) is a useful tool for many time-series analysis tasks.

The Matrix Profile (MP, Yeh 2016 ) is a useful tool for many time-series analysis tasks. m

The MP of a time-series is another time-series obtained by retrieving the 1-Nearest Neighbor distance of every subsequence to each other.

m

Mueen & Keogh, 2017

The minima of the MP represent the top motifs in the time-series.

0

500

1000

1500

Mueen & Keogh, 2017

2000

2500

3000

The minima of the MP represent the top motifs in the time-series.

0

500

1000

1500

2000

2500

3000

2500

3000

The maxima of the MP are the discords.

0

500

1000

1500

Mueen & Keogh, 2017

2000

The Matrix Profile (MP, Yeh 2017 ) can be extended to multivariate time-series.

Yeh, 2017

The Matrix Profile (MP, Yeh 2017 ) can be extended to multivariate time-series. Not every channel is always informative.

Yeh, 2017

The Matrix Profile (MP, Yeh 2017 ) can be extended to multivariate time-series. Not every channel is always informative.

One or more channels can be noise.

Yeh, 2017

The Matrix Profile (MP, Yeh 2017 ) can be extended to multivariate time-series. Not every channel is always informative.

One or more channels can be noise. Multivariate motifs will span 1 or more channels. Yeh, 2017

Shapelets are initialized by concatenating (a sample of) time-series for each class, and running MP (with random dimensionality).

Shapelets are initialized by concatenating (a sample of) time-series for each class, and running MP (with random dimensionality).

Shapelets are initialized by concatenating (a sample of) time-series for each class, and running MP (with random dimensionality). This motif is discarded since it overlaps two subsequent timeseries.

Shapelets are initialized by concatenating (a sample of) time-series for each class, and running MP (with random dimensionality). This motif is accepted and will be used to initialize a 3D shapelet. The MP is 2-D so only 2 out of 3 channel are informative here.

Such strategy allows for a sensitive initialization, but learning is needed to make the shapelets discriminative.

Such strategy allows for a sensitive initialization, but learning is needed to make the shapelets discriminative. The strategy allows to extract motifs (i.e. most common patterns) from each class, rather than discriminative motifs. Very similar motifs could be chosen for different classes.

Such strategy allows for a sensitive initialization, but learning is needed to make the shapelets discriminative. The strategy allows to extract motifs (i.e. most common patterns) from each class, rather than discriminative motifs. Very similar motifs could be chosen for different classes. The motifs extracted with the MP are meaningful: each represents a different level and depth of interactions between the channels. Each motif potentially captures a relevant shapelet for classification.

Performance and Interpretability

The learning process can be visualized to give insights on the evolution of each multivariate shapelet.

The learning process can be visualized to give insights on the evolution of each multivariate shapelet.

Learning process of a 3-D shapelet for the uWaveGesture dataset.

The final multi-dimensional shapelets can be visualized to gain insights on the interaction between channels.

The final multi-dimensional shapelets can be visualized to gain insights on the interaction between channels. 3D visualizations of a multi-dimensional shapelet learnt on the uWaveGesture dataset.

Preliminary results on benchmark datasets show that the approach achieves competitive results.

Preliminary results on benchmark datasets show that the approach achieves competitive results.

In many cases, the motif-based initialization leads to better performance.

In many cases, the motif-based initialization leads to better performance.

Conclusions

Take-away messages

Take-away messages Shapelets extraction is a useful data mining tool for (interpretable) time-series classification. When more data is available, learning the shapelets is an effective alternative to brute-force approaches. The learning can be embedded into Neural Network-based architectures with a generic framework for both uni- and multivariate shapelets learning. This allows to build non-linear deeper models that are better suited for learning on large amount of multivariate data (e.g. sensor data).

Take-away messages Shapelets extraction is a useful data mining tool for (interpretable) time-series classification. When more data is available, learning the shapelets is an effective alternative to brute-force search approaches. The learning can be embedded into Neural Network-based architectures with a generic framework for both uni- and multivariate shapelets learning. This allows to build non-linear deeper models that are better suited for learning on large amount of multivariate data (e.g. sensor data).

Take-away messages Shapelets extraction is a useful data mining tool for (interpretable) time-series classification. When more data is available, learning the shapelets is an effective alternative to brute-force search approaches. The learning can be embedded into Neural Network-based architectures with a generic framework for both uni- and multivariate classification. This allows to build non-linear deeper models that are better suited for learning on large amount of multivariate data (e.g. sensor data).

Take-away messages Shapelets extraction is a useful data mining tool for (interpretable) time-series classification. When more data is available, learning the shapelets is an effective alternative to brute-force search approaches. The learning can be embedded into Neural Network-based architectures with a generic framework for both uni- and multivariate classification. This allows to build non-linear deeper models that are better suited for learning on large amount of multivariate data (e.g. sensor data).