Tensor Ensemble Learning

Tensor Ensemble Learning Authors: Ilia Kisil, Ahmad Moniri, Danilo P. Mandic Electrical and Electronic Engineering Department Imperial College London, SW7 2AZ, UK E-mails: {i.kisil15, ahmad.moniri13, d.mandic}@imperial.ac.uk November 28, 2018

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

1 / 15

Outline

Wisdom of the crowd Ensemble learning and existing algorithms Multidimensional representaiton of data Basics of tensor decompositions Tensor Ensemble Learning (TEL) Simulations and results

Ilia Kisil,



November 28, 2018

2 / 15

Ensemble learning: The wisdom of the crowd

Ilia Kisil,



November 28, 2018

3 / 15

Ensemble learning: Motivation and limitations

Every model has its own weaknesses # Combining different models can find a better hypothesis Every model explores its own hypothesis space # Robust to outliers Strong assumption that our individual errors are uncorrelated Ilia Kisil,



November 28, 2018

4 / 15

Ensemble learning: Existing approaches Bagging

Ilia Kisil,



November 28, 2018

5 / 15


Ilia Kisil,


Stacking


November 28, 2018

5 / 15


Stacking

Boosting

Ilia Kisil,



November 28, 2018

5 / 15

Tensors and basic sub-structures

A(:,:,k)

A(:,j,:) I2

A

I3

A (1)

∈

R

I1

I1

mode-1 unfolding I3

I2 I1

A

I3

A (2)

∈

R I 2 × I1 I3

mode-2 unfolding

I2

I1 I2

I3 I1

A

I3

A (2)

∈

R

I 3 × I1 I2

mode-3 unfolding

I3

I1

I2

I2 Ilia Kisil,

A(i,:,:)

I1 × I 2 I 3



November 28, 2018

6 / 15

Multidimensional data and tensor construction

Ilia Kisil,



November 28, 2018

7 / 15

Tucker decomposition # HOSVD

Eeach vector of A is associated with every vector of B and C through the R1 X R2 X R3 X Gr1 r2 r3 · ar1 ◦ br2 ◦ cr3 core tensor G # X ≈ r1 =1 r2 =1 r3 =1

In general, the Tucker decomposition is not unique But the subspaces spanned by vectors of A, B, C are unique By imposing orthogonality constrains on each factor matrix, we arrive at the natural generalisation of the matrix SVD, the higher-order SVD (HOSVD) Ilia Kisil,



November 28, 2018

8 / 15

Computation of the HOSVD

1

Compute the factor matrices first: X(1) = AΣ(1) (V(1) )T X(2) = BΣ(2) (V(2) )T X(3) = CΣ(3) (V

2

(1)

(3) T

)

Compute the core tensor G = X ×1 AT ×2 BT ×3 CT

(2)

Where G = X ×1 AT ⇔ G(1) = AT X(1) Ilia Kisil,



November 28, 2018

9 / 15

Tensor ensemble learning (TEL): General concept 1

Apply tensor decomposition to each multidimensional sample to extract hidden information

2

Perform reorganisation of the obtained latent components

3

Use them to train an ensemble of base learners

4

For new sample, aggregate the knowledge about extracted latent components based on trained models in stage 3

Ilia Kisil,



November 28, 2018

10 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,



November 28, 2018

11 / 15


(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+


(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1


Training set 1

Training set 1

(1) Rc

Training set N


b


c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label




November 28, 2018

11 / 15


(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+


(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1


Training set 1

Training set 1

(1) Rc

Training set N


b


c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label




November 28, 2018

11 / 15


(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+


(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1


Training set 1

Training set 1

(1) Rc

Training set N


b


c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label




November 28, 2018

11 / 15


(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+


(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1


Training set 1

Training set 1

(1) Rc

Training set N


b


c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label




November 28, 2018

11 / 15


(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+


(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1


Training set 1

Training set 1

(1) Rc

Training set N


b


c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label




November 28, 2018

11 / 15

TEL: Training stage 1

1

1

2

2

2

3

3

3

ML algorithm: SVM, NN, KNN



Trained model11 Base learner

Trained modeln1 Base learner

Trained modelN1 Base learner

Ilia Kisil,



November 28, 2018

12 / 15


1

1

2

2

2

3

3

3







Ilia Kisil,



November 28, 2018

12 / 15


1

1

2

2

2

3

3

3







Ilia Kisil,



November 28, 2018

12 / 15

TEL: Testing stage (new) c Rc

(new) c1

=

+

(new) b1

+

(new) a1

Predicted label (new) a1

(new) br b

(new) c Rc

Ilia Kisil,


Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra

Total vote (probability based) Score

Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2


 0.1   0.8   0.1

November 28, 2018

13 / 15


(new) c1

=

+

(new) b1

+

(new) a1


(new) br b

(new) c Rc

Ilia Kisil,


Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra


Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2


 0.1   0.8   0.1

November 28, 2018

13 / 15


(new) c1

=

+

(new) b1

+

(new) a1


(new) br b

(new) c Rc

Ilia Kisil,


Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra


Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2


 0.1   0.8   0.1

November 28, 2018

13 / 15


(new) c1

=

+

(new) b1

+

(new) a1


(new) br b

(new) c Rc

Ilia Kisil,


Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra


Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2


 0.1   0.8   0.1

November 28, 2018

13 / 15

ETH-80 dataset

8 different categories 10 objects per category Each object is captured under 41 different viewpoints Total: 3280 samples Ilia Kisil,



November 28, 2018

14 / 15

Performance of base estimators Performance of Base Learners of the TELVI classifier Majority Vote

A

B

C

Accuracy score

1.0 0.8 0.6 0.4 0.2 0.0

a1

a2

a3

a4

a5

b1

b2

b3

b4

b5

c1

c2

Factor vectors used by the base estimator

We employed HOSVD with multilinear rank (5, 5, 2) Utilised 12 base estimators (train/test split 50%) None of the base classifiers exhibited strong performance on the training set Combinational behaviour is similar to classic ensemble learning Ilia Kisil,



November 28, 2018

15 / 15

Comparison of the overall test performance Base estimator: TREE

Base estimator: KNN

1.0

1.0

Base estimator: SVM-RBF

1.0

0.9

0.9

0.8

0.8

0.8

0.7 0.6 Classifier Bagging Base TelVI

0.5 0.4 0.3 0.1

0.2

0.3

0.4

0.5

0.6

0.7

Training size portion


0.5 0.4 0.3 0.1

0.2

0.3

0.4

0.5

0.6

0.7


0.5 0.4 0.3 0.1

0.2


0.3

0.4

0.5

0.6


0.7

Accuracy score

0.9

0.8

Accuracy score

0.9

Accuracy score

Accuracy score

1.0

Base estimator: SVM-POLY


0.5 0.4 0.3 0.1

0.2

0.3

0.4

0.5

0.6

0.7


Random split into training and test data varied in range from 10% 70% Hyperparameter tuning: grid search with the 5-fold CV of the training data For fair comparison, the Bagging classifier also utilised 12 base learners

Ilia Kisil,



November 28, 2018

16 / 15

Conclusions: Key points to take home 1

TEL is a novel framework for generating ensembles of base estimators for multidimensional data

2

TEL highly parallelisable and suitable for large-scale problems

3

Enhanced performance is due to ability to obtain uncorrelated surrogate datasets that are generated by HOSVD

Ilia Kisil,



November 28, 2018

17 / 15

Conclusions: Key points to take home 1

TEL is a novel framework for generating ensembles of base estimators for multidimensional data

2

TEL highly parallelisable and suitable for large-scale problems

3

Enhanced performance is due to ability to obtain uncorrelated surrogate datasets that are generated by HOSVD

New Software: Higher Order Tensors ToolBOX (HOTTBOX)

m  Ilia Kisil,

Our python package for multilinear algebra: github.com/hottbox/hottbox Documentation: hottbox.github.io Tutorials: github.com/hottbox/hottbox-tutorials Imperial College London


November 28, 2018

17 / 15

The End

Thank you for your attention

U

Questions?

Ilia Kisil,



November 28, 2018

18 / 15

Tensor Ensemble Learning

Tensor Ensemble Learning

Suggest Documents

Online Ensemble Learning - CiteSeerX

TensorLy: Tensor Learning in Python

TensorLy: Tensor Learning in Python

Ensemble Methods in Machine Learning

Ensemble Algorithms in Reinforcement Learning

Tensor Extreme Learning Design Via Generalized

Tensor Decompositions for Learning Latent Variable Models

Tensor Decompositions for Learning Latent Variable Models

Online Robust Low-Rank Tensor Learning

Tensor Decompositions for Learning Latent Variable Models

Bayesian Hyperparameter Optimization for Ensemble Learning

Ensemble Based Extreme Learning Machine

From Kernel Machines to Ensemble Learning

Ensemble Pruning using Reinforcement Learning - Semantic Scholar

Ensemble Learning with Supervised Kernels - Semantic Scholar

Efficient Ensemble Learning with Support Vector

An Ensemble Approach for Incremental Learning

Ensemble Learning of Colorectal Cancer Survival Rates

Towards Ensemble Learning for Hybrid Music Recommendation

An Ensemble Learning Approach to Nonlinear Independent

Convolutional Dictionary Learning through Tensor Factorization

Learning with Ensemble of Linear Perceptrons

Ensemble Learning of Economic Taxonomy ... - Semantic Scholar

Sentiment analysis: Bayesian Ensemble Learning