Nov 28, 2018 - Ensemble learning: Existing approaches ... Tensor ensemble learning (TEL): General concept ..... Our python package for multilinear algebra:.
Tensor Ensemble Learning Authors: Ilia Kisil, Ahmad Moniri, Danilo P. Mandic Electrical and Electronic Engineering Department Imperial College London, SW7 2AZ, UK E-mails: {i.kisil15, ahmad.moniri13, d.mandic}@imperial.ac.uk November 28, 2018
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
1 / 15
Outline
Wisdom of the crowd Ensemble learning and existing algorithms Multidimensional representaiton of data Basics of tensor decompositions Tensor Ensemble Learning (TEL) Simulations and results
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
2 / 15
Ensemble learning: The wisdom of the crowd
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
3 / 15
Ensemble learning: Motivation and limitations
Every model has its own weaknesses # Combining different models can find a better hypothesis Every model explores its own hypothesis space # Robust to outliers Strong assumption that our individual errors are uncorrelated Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
4 / 15
Ensemble learning: Existing approaches Bagging
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
5 / 15
Ensemble learning: Existing approaches Bagging
Ilia Kisil,
Imperial College London
Stacking
Tensor Ensemble Learning
November 28, 2018
5 / 15
Ensemble learning: Existing approaches Bagging
Stacking
Boosting
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
5 / 15
Tensors and basic sub-structures
A(:,:,k)
A(:,j,:) I2
A
I3
A (1)
∈
R
I1
I1
mode-1 unfolding I3
I2 I1
A
I3
A (2)
∈
R I 2 × I1 I3
mode-2 unfolding
I2
I1 I2
I3 I1
A
I3
A (2)
∈
R
I 3 × I1 I2
mode-3 unfolding
I3
I1
I2
I2 Ilia Kisil,
A(i,:,:)
I1 × I 2 I 3
Imperial College London
Tensor Ensemble Learning
November 28, 2018
6 / 15
Multidimensional data and tensor construction
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
7 / 15
Tucker decomposition # HOSVD
Eeach vector of A is associated with every vector of B and C through the R1 X R2 X R3 X Gr1 r2 r3 · ar1 ◦ br2 ◦ cr3 core tensor G # X ≈ r1 =1 r2 =1 r3 =1
In general, the Tucker decomposition is not unique But the subspaces spanned by vectors of A, B, C are unique By imposing orthogonality constrains on each factor matrix, we arrive at the natural generalisation of the matrix SVD, the higher-order SVD (HOSVD) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
8 / 15
Computation of the HOSVD
1
Compute the factor matrices first: X(1) = AΣ(1) (V(1) )T X(2) = BΣ(2) (V(2) )T X(3) = CΣ(3) (V
2
(1)
(3) T
)
Compute the core tensor G = X ×1 AT ×2 BT ×3 CT
(2)
Where G = X ×1 AT ⇔ G(1) = AT X(1) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
9 / 15
Tensor ensemble learning (TEL): General concept 1
Apply tensor decomposition to each multidimensional sample to extract hidden information
2
Perform reorganisation of the obtained latent components
3
Use them to train an ensemble of base learners
4
For new sample, aggregate the knowledge about extracted latent components based on trained models in stage 3
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
10 / 15
TEL: Formation of training set (1) ) (m) (M c Rc
(1) ) (m) (M c1
=
+
(M (m) (1) ) b bb 1 11
+
(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra
(M ) (1) (m) a1
(1) br b
(1) a1
Training set 1
Training set n (m) a1
Training set 1
Training set 1
(1) Rc
Training set N
N = (Ra + Rb + R(m) c) br
b
Training set n (M ) a1
c
c
(m) Rc
Training set N
N = (Ra + Rb + R(M c )) br
b
Training set n
c
1
Label
2
Label (M ) Rc
Training set N
3
Label
N = (Ra + Rb + Rc ) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
11 / 15
TEL: Formation of training set (1) ) (m) (M c Rc
(1) ) (m) (M c1
=
+
(M (m) (1) ) b bb 1 11
+
(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra
(M ) (1) (m) a1
(1) br b
(1) a1
Training set 1
Training set n (m) a1
Training set 1
Training set 1
(1) Rc
Training set N
N = (Ra + Rb + R(m) c) br
b
Training set n (M ) a1
c
c
(m) Rc
Training set N
N = (Ra + Rb + R(M c )) br
b
Training set n
c
1
Label
2
Label (M ) Rc
Training set N
3
Label
N = (Ra + Rb + Rc ) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
11 / 15
TEL: Formation of training set (1) ) (m) (M c Rc
(1) ) (m) (M c1
=
+
(M (m) (1) ) b bb 1 11
+
(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra
(M ) (1) (m) a1
(1) br b
(1) a1
Training set 1
Training set n (m) a1
Training set 1
Training set 1
(1) Rc
Training set N
N = (Ra + Rb + R(m) c) br
b
Training set n (M ) a1
c
c
(m) Rc
Training set N
N = (Ra + Rb + R(M c )) br
b
Training set n
c
1
Label
2
Label (M ) Rc
Training set N
3
Label
N = (Ra + Rb + Rc ) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
11 / 15
TEL: Formation of training set (1) ) (m) (M c Rc
(1) ) (m) (M c1
=
+
(M (m) (1) ) b bb 1 11
+
(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra
(M ) (1) (m) a1
(1) br b
(1) a1
Training set 1
Training set n (m) a1
Training set 1
Training set 1
(1) Rc
Training set N
N = (Ra + Rb + R(m) c) br
b
Training set n (M ) a1
c
c
(m) Rc
Training set N
N = (Ra + Rb + R(M c )) br
b
Training set n
c
1
Label
2
Label (M ) Rc
Training set N
3
Label
N = (Ra + Rb + Rc ) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
11 / 15
TEL: Formation of training set (1) ) (m) (M c Rc
(1) ) (m) (M c1
=
+
(M (m) (1) ) b bb 1 11
+
(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra
(M ) (1) (m) a1
(1) br b
(1) a1
Training set 1
Training set n (m) a1
Training set 1
Training set 1
(1) Rc
Training set N
N = (Ra + Rb + R(m) c) br
b
Training set n (M ) a1
c
c
(m) Rc
Training set N
N = (Ra + Rb + R(M c )) br
b
Training set n
c
1
Label
2
Label (M ) Rc
Training set N
3
Label
N = (Ra + Rb + Rc ) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
11 / 15
TEL: Formation of training set (1) ) (m) (M c Rc
(1) ) (m) (M c1
=
+
(M (m) (1) ) b bb 1 11
+
(M (m) (1) ) b bb R RR bbb (M ) (1) (m) a Ra
(M ) (1) (m) a1
(1) br b
(1) a1
Training set 1
Training set n (m) a1
Training set 1
Training set 1
(1) Rc
Training set N
N = (Ra + Rb + R(m) c) br
b
Training set n (M ) a1
c
c
(m) Rc
Training set N
N = (Ra + Rb + R(M c )) br
b
Training set n
c
1
Label
2
Label (M ) Rc
Training set N
3
Label
N = (Ra + Rb + Rc ) Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
11 / 15
TEL: Training stage 1
1
1
2
2
2
3
3
3
ML algorithm: SVM, NN, KNN
ML algorithm: SVM, NN, KNN
ML algorithm: SVM, NN, KNN
Trained model11 Base learner
Trained modeln1 Base learner
Trained modelN1 Base learner
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
12 / 15
TEL: Training stage 1
1
1
2
2
2
3
3
3
ML algorithm: SVM, NN, KNN
ML algorithm: SVM, NN, KNN
ML algorithm: SVM, NN, KNN
Trained model11 Base learner
Trained modeln1 Base learner
Trained modelN1 Base learner
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
12 / 15
TEL: Training stage 1
1
1
2
2
2
3
3
3
ML algorithm: SVM, NN, KNN
ML algorithm: SVM, NN, KNN
ML algorithm: SVM, NN, KNN
Trained model11 Base learner
Trained modeln1 Base learner
Trained modelN1 Base learner
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
12 / 15
TEL: Testing stage (new) c Rc
(new) c1
=
+
(new) b1
+
(new) a1
Predicted label (new) a1
(new) br b
(new) c Rc
Ilia Kisil,
Imperial College London
Base Learner 1
Base Learner n
Base Learner N
1
0.7 0.2 0.1
3
0.1 0.4 0.5
b
(new) Rb
(new) a Ra
Total vote (probability based) Score
Win
Apple (1)
0.70 5.60 4.90
7
Car (2)
9.80 0.20 4.90
3
Tomato (3)
2.20 3.60 0.10
7
2
Tensor Ensemble Learning
0.1 0.8 0.1
November 28, 2018
13 / 15
TEL: Testing stage (new) c Rc
(new) c1
=
+
(new) b1
+
(new) a1
Predicted label (new) a1
(new) br b
(new) c Rc
Ilia Kisil,
Imperial College London
Base Learner 1
Base Learner n
Base Learner N
1
0.7 0.2 0.1
3
0.1 0.4 0.5
b
(new) Rb
(new) a Ra
Total vote (probability based) Score
Win
Apple (1)
0.70 5.60 4.90
7
Car (2)
9.80 0.20 4.90
3
Tomato (3)
2.20 3.60 0.10
7
2
Tensor Ensemble Learning
0.1 0.8 0.1
November 28, 2018
13 / 15
TEL: Testing stage (new) c Rc
(new) c1
=
+
(new) b1
+
(new) a1
Predicted label (new) a1
(new) br b
(new) c Rc
Ilia Kisil,
Imperial College London
Base Learner 1
Base Learner n
Base Learner N
1
0.7 0.2 0.1
3
0.1 0.4 0.5
b
(new) Rb
(new) a Ra
Total vote (probability based) Score
Win
Apple (1)
0.70 5.60 4.90
7
Car (2)
9.80 0.20 4.90
3
Tomato (3)
2.20 3.60 0.10
7
2
Tensor Ensemble Learning
0.1 0.8 0.1
November 28, 2018
13 / 15
TEL: Testing stage (new) c Rc
(new) c1
=
+
(new) b1
+
(new) a1
Predicted label (new) a1
(new) br b
(new) c Rc
Ilia Kisil,
Imperial College London
Base Learner 1
Base Learner n
Base Learner N
1
0.7 0.2 0.1
3
0.1 0.4 0.5
b
(new) Rb
(new) a Ra
Total vote (probability based) Score
Win
Apple (1)
0.70 5.60 4.90
7
Car (2)
9.80 0.20 4.90
3
Tomato (3)
2.20 3.60 0.10
7
2
Tensor Ensemble Learning
0.1 0.8 0.1
November 28, 2018
13 / 15
ETH-80 dataset
8 different categories 10 objects per category Each object is captured under 41 different viewpoints Total: 3280 samples Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
14 / 15
Performance of base estimators Performance of Base Learners of the TELVI classifier Majority Vote
A
B
C
Accuracy score
1.0 0.8 0.6 0.4 0.2 0.0
a1
a2
a3
a4
a5
b1
b2
b3
b4
b5
c1
c2
Factor vectors used by the base estimator
We employed HOSVD with multilinear rank (5, 5, 2) Utilised 12 base estimators (train/test split 50%) None of the base classifiers exhibited strong performance on the training set Combinational behaviour is similar to classic ensemble learning Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
15 / 15
Comparison of the overall test performance Base estimator: TREE
Base estimator: KNN
1.0
1.0
Base estimator: SVM-RBF
1.0
0.9
0.9
0.8
0.8
0.8
0.7 0.6 Classifier Bagging Base TelVI
0.5 0.4 0.3 0.1
0.2
0.3
0.4
0.5
0.6
0.7
Training size portion
0.7 0.6 Classifier Bagging Base TelVI
0.5 0.4 0.3 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.7 0.6 Classifier Bagging Base TelVI
0.5 0.4 0.3 0.1
0.2
Training size portion
0.3
0.4
0.5
0.6
Training size portion
0.7
Accuracy score
0.9
0.8
Accuracy score
0.9
Accuracy score
Accuracy score
1.0
Base estimator: SVM-POLY
0.7 0.6 Classifier Bagging Base TelVI
0.5 0.4 0.3 0.1
0.2
0.3
0.4
0.5
0.6
0.7
Training size portion
Random split into training and test data varied in range from 10% 70% Hyperparameter tuning: grid search with the 5-fold CV of the training data For fair comparison, the Bagging classifier also utilised 12 base learners
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
16 / 15
Conclusions: Key points to take home 1
TEL is a novel framework for generating ensembles of base estimators for multidimensional data
2
TEL highly parallelisable and suitable for large-scale problems
3
Enhanced performance is due to ability to obtain uncorrelated surrogate datasets that are generated by HOSVD
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
17 / 15
Conclusions: Key points to take home 1
TEL is a novel framework for generating ensembles of base estimators for multidimensional data
2
TEL highly parallelisable and suitable for large-scale problems
3
Enhanced performance is due to ability to obtain uncorrelated surrogate datasets that are generated by HOSVD
New Software: Higher Order Tensors ToolBOX (HOTTBOX)
m Ilia Kisil,
Our python package for multilinear algebra: github.com/hottbox/hottbox Documentation: hottbox.github.io Tutorials: github.com/hottbox/hottbox-tutorials Imperial College London
Tensor Ensemble Learning
November 28, 2018
17 / 15
The End
Thank you for your attention
U
Questions?
Ilia Kisil,
Imperial College London
Tensor Ensemble Learning
November 28, 2018
18 / 15