Tensor Ensemble Learning

0 downloads 0 Views 5MB Size Report
Nov 28, 2018 - Ensemble learning: Existing approaches ... Tensor ensemble learning (TEL): General concept ..... Our python package for multilinear algebra:.
Tensor Ensemble Learning Authors: Ilia Kisil, Ahmad Moniri, Danilo P. Mandic Electrical and Electronic Engineering Department Imperial College London, SW7 2AZ, UK E-mails: {i.kisil15, ahmad.moniri13, d.mandic}@imperial.ac.uk November 28, 2018

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

1 / 15

Outline

Wisdom of the crowd Ensemble learning and existing algorithms Multidimensional representaiton of data Basics of tensor decompositions Tensor Ensemble Learning (TEL) Simulations and results

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

2 / 15

Ensemble learning: The wisdom of the crowd

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

3 / 15

Ensemble learning: Motivation and limitations

Every model has its own weaknesses # Combining different models can find a better hypothesis Every model explores its own hypothesis space # Robust to outliers Strong assumption that our individual errors are uncorrelated Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

4 / 15

Ensemble learning: Existing approaches Bagging

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

5 / 15

Ensemble learning: Existing approaches Bagging

Ilia Kisil,

Imperial College London

Stacking

Tensor Ensemble Learning

November 28, 2018

5 / 15

Ensemble learning: Existing approaches Bagging

Stacking

Boosting

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

5 / 15

Tensors and basic sub-structures

A(:,:,k)

A(:,j,:) I2

A

I3

A (1)



R

I1

I1

mode-1 unfolding I3

I2 I1

A

I3

A (2)



R I 2 × I1 I3

mode-2 unfolding

I2

I1 I2

I3 I1

A

I3

A (2)



R

I 3 × I1 I2

mode-3 unfolding

I3

I1

I2

I2 Ilia Kisil,

A(i,:,:)

I1 × I 2 I 3

Imperial College London

Tensor Ensemble Learning

November 28, 2018

6 / 15

Multidimensional data and tensor construction

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

7 / 15

Tucker decomposition # HOSVD

Eeach vector of A is associated with every vector of B and C through the R1 X R2 X R3 X Gr1 r2 r3 · ar1 ◦ br2 ◦ cr3 core tensor G # X ≈ r1 =1 r2 =1 r3 =1

In general, the Tucker decomposition is not unique But the subspaces spanned by vectors of A, B, C are unique By imposing orthogonality constrains on each factor matrix, we arrive at the natural generalisation of the matrix SVD, the higher-order SVD (HOSVD) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

8 / 15

Computation of the HOSVD

1

Compute the factor matrices first: X(1) = AΣ(1) (V(1) )T X(2) = BΣ(2) (V(2) )T X(3) = CΣ(3) (V

2

(1)

(3) T

)

Compute the core tensor G = X ×1 AT ×2 BT ×3 CT

(2)

Where G = X ×1 AT ⇔ G(1) = AT X(1) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

9 / 15

Tensor ensemble learning (TEL): General concept 1

Apply tensor decomposition to each multidimensional sample to extract hidden information

2

Perform reorganisation of the obtained latent components

3

Use them to train an ensemble of base learners

4

For new sample, aggregate the knowledge about extracted latent components based on trained models in stage 3

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

10 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

11 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

11 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

11 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

11 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

11 / 15

TEL: Formation of training set (1) ) (m) (M c Rc

(1) ) (m) (M c1

=

+

(M (m) (1) ) b bb 1 11

+

(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra

(M ) (1) (m) a1

(1) br b

(1) a1

Training set 1

Training set n (m) a1

Training set 1

Training set 1

(1) Rc

Training set N

N = (Ra + Rb + R(m) c) br

b

Training set n (M ) a1

c

c

(m) Rc

Training set N

N = (Ra + Rb + R(M c )) br

b

Training set n

c

1

Label

2

Label (M ) Rc

Training set N

3

Label

N = (Ra + Rb + Rc ) Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

11 / 15

TEL: Training stage 1

1

1

2

2

2

3

3

3

ML algorithm: SVM, NN, KNN

ML algorithm: SVM, NN, KNN

ML algorithm: SVM, NN, KNN

Trained model11 Base learner

Trained modeln1 Base learner

Trained modelN1 Base learner

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

12 / 15

TEL: Training stage 1

1

1

2

2

2

3

3

3

ML algorithm: SVM, NN, KNN

ML algorithm: SVM, NN, KNN

ML algorithm: SVM, NN, KNN

Trained model11 Base learner

Trained modeln1 Base learner

Trained modelN1 Base learner

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

12 / 15

TEL: Training stage 1

1

1

2

2

2

3

3

3

ML algorithm: SVM, NN, KNN

ML algorithm: SVM, NN, KNN

ML algorithm: SVM, NN, KNN

Trained model11 Base learner

Trained modeln1 Base learner

Trained modelN1 Base learner

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

12 / 15

TEL: Testing stage (new) c Rc

(new) c1

=

+

(new) b1

+

(new) a1

Predicted label (new) a1

(new) br b

(new) c Rc

Ilia Kisil,

Imperial College London

Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra

Total vote (probability based) Score

Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2

Tensor Ensemble Learning

 0.1   0.8   0.1

November 28, 2018

13 / 15

TEL: Testing stage (new) c Rc

(new) c1

=

+

(new) b1

+

(new) a1

Predicted label (new) a1

(new) br b

(new) c Rc

Ilia Kisil,

Imperial College London

Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra

Total vote (probability based) Score

Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2

Tensor Ensemble Learning

 0.1   0.8   0.1

November 28, 2018

13 / 15

TEL: Testing stage (new) c Rc

(new) c1

=

+

(new) b1

+

(new) a1

Predicted label (new) a1

(new) br b

(new) c Rc

Ilia Kisil,

Imperial College London

Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra

Total vote (probability based) Score

Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2

Tensor Ensemble Learning

 0.1   0.8   0.1

November 28, 2018

13 / 15

TEL: Testing stage (new) c Rc

(new) c1

=

+

(new) b1

+

(new) a1

Predicted label (new) a1

(new) br b

(new) c Rc

Ilia Kisil,

Imperial College London

Base Learner 1

Base Learner n

Base Learner N

 1



0.7   0.2   0.1

 3

 0.1   0.4   0.5

b

(new) Rb

(new) a Ra

Total vote (probability based) Score

Win

Apple (1)

0.70 5.60 4.90

7

Car (2)

9.80 0.20 4.90

3

Tomato (3)

2.20 3.60 0.10

7

 2

Tensor Ensemble Learning

 0.1   0.8   0.1

November 28, 2018

13 / 15

ETH-80 dataset

8 different categories 10 objects per category Each object is captured under 41 different viewpoints Total: 3280 samples Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

14 / 15

Performance of base estimators Performance of Base Learners of the TELVI classifier Majority Vote

A

B

C

Accuracy score

1.0 0.8 0.6 0.4 0.2 0.0

a1

a2

a3

a4

a5

b1

b2

b3

b4

b5

c1

c2

Factor vectors used by the base estimator

We employed HOSVD with multilinear rank (5, 5, 2) Utilised 12 base estimators (train/test split 50%) None of the base classifiers exhibited strong performance on the training set Combinational behaviour is similar to classic ensemble learning Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

15 / 15

Comparison of the overall test performance Base estimator: TREE

Base estimator: KNN

1.0

1.0

Base estimator: SVM-RBF

1.0

0.9

0.9

0.8

0.8

0.8

0.7 0.6 Classifier Bagging Base TelVI

0.5 0.4 0.3 0.1

0.2

0.3

0.4

0.5

0.6

0.7

Training size portion

0.7 0.6 Classifier Bagging Base TelVI

0.5 0.4 0.3 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.7 0.6 Classifier Bagging Base TelVI

0.5 0.4 0.3 0.1

0.2

Training size portion

0.3

0.4

0.5

0.6

Training size portion

0.7

Accuracy score

0.9

0.8

Accuracy score

0.9

Accuracy score

Accuracy score

1.0

Base estimator: SVM-POLY

0.7 0.6 Classifier Bagging Base TelVI

0.5 0.4 0.3 0.1

0.2

0.3

0.4

0.5

0.6

0.7

Training size portion

Random split into training and test data varied in range from 10% 70% Hyperparameter tuning: grid search with the 5-fold CV of the training data For fair comparison, the Bagging classifier also utilised 12 base learners

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

16 / 15

Conclusions: Key points to take home 1

TEL is a novel framework for generating ensembles of base estimators for multidimensional data

2

TEL highly parallelisable and suitable for large-scale problems

3

Enhanced performance is due to ability to obtain uncorrelated surrogate datasets that are generated by HOSVD

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

17 / 15

Conclusions: Key points to take home 1

TEL is a novel framework for generating ensembles of base estimators for multidimensional data

2

TEL highly parallelisable and suitable for large-scale problems

3

Enhanced performance is due to ability to obtain uncorrelated surrogate datasets that are generated by HOSVD

New Software: Higher Order Tensors ToolBOX (HOTTBOX)

m   Ilia Kisil,

Our python package for multilinear algebra: github.com/hottbox/hottbox Documentation: hottbox.github.io Tutorials: github.com/hottbox/hottbox-tutorials Imperial College London

Tensor Ensemble Learning

November 28, 2018

17 / 15

The End

Thank you for your attention

U

Questions?

Ilia Kisil,

Imperial College London

Tensor Ensemble Learning

November 28, 2018

18 / 15