Sep 17, 2013 ... Machine Learning. Need to Scale up. High-throughput. Machine Learning
implementations. Large datasets high- dimensional inputs. Inference.
Machine Learning for Adaptive Multi-Core Machines Noel Lopes Supervisor: Prof. Dr. Bernardete Ribeiro
University of Coimbra, Portugal
September 17, 2013
Outline
I
Introduction
I
Objectives
I
Contributions
I
High-performance Deep Learning
I
Conclusions
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Machine Learning Need to Scale up Large datasets
highdimensional inputs
Cascade predictors
High-throughput Machine Learning implementations
Adequate model selection
Inference time constraints
Algorithms complextity
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Machine Learning Big Data Data sources
Real Data
Computer Simulation Models
Artificial Data
Data streams Large volumes of data
Extract useful and relevant information
challenge
vastly exceeds our capacity to analyze it
Persistent repositories of (accumulated) Data
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Machine Learning Big Data Data sources
Real Data
Computer Simulation Models
Artificial Data
Data streams Large volumes of data
ML Algorithms
Persistent repositories of (accumulated) Data
Extracted information
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions
I
Machine Learning I I I
I
Supervised Learning Semi-supervised Learning Unsupervised Learning
GPUMLib – GPU ML Library
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions Supervised Learning I
Machine Learning I
Supervised Learning I I I
I I
I
Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)
Semi-supervised Learning Unsupervised Learning
GPUMLib – GPU ML Library
Main Network with selective actuation neurons ×
x1 x2
× x3 bias
×
× ×
Space Network
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
y1 y2
Scientific Contributions Supervised Learning I
Machine Learning I
Supervised Learning I I I
I I
Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)
Semi-supervised Learning Unsupervised Learning Physical model
I
GPUMLib – GPU ML Library
Conceptual models Model 1 when x3 is missing: κ3 = 0
x1 y1 x2
x1
y1
x2
y2
y2 x3 κ3 Model 2 when the value of x3 is known: κ3 = 1
bj multiplier xi
wij
×
x ˜i
x1 y1 x2
κi
y2 selective input neuron
Noel Lopes
x3
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions Supervised Learning I
Machine Learning I
Supervised Learning I I I
I I
I
Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)
Semi-supervised Learning Unsupervised Learning
GPUMLib – GPU ML Library
x1
x2
xk
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions Semi-Supervised Learning I
Machine Learning I
Supervised Learning Semi-supervised
I
Unsupervised Learning
I
I
I
Semi-supervised Non-Negative Matrix Factorization
GPUMLib – GPU ML Library V
Htrain
W r1
rC
r2
H1
r1 r2
H2
r D
V1
···
V2
VC
≈
···
· · · WC
W1 W2
rC
HC
N r
N
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions Unsupervised Learning I
Machine Learning I I I
Supervised Learning Semi-supervised Unsupervised Learning I
I
Deep Belief Networks (Adaptive Step Size technique)
GPUMLib – GPU ML Library
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions Case studies and Benchmarks I
Yale face database
ORL face database
MNIST hand-written digits
HHreco multi-stroke images
Case studies I I I
biomedical finance and business bio-informatics
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
GPUMLib – GPU ML Library Graphics Processing Unit (GPU)
Historical Single−/Double−Precision Peak Compute Rates ●
4000
●
●
Precision ●
GFLOPS
SP DP
3000 ●
●
2000 ● ● ●
1000 ● ●
0
● ● ●●● ●● ● ● ● ●●●
2002
2004
● ●●● ●● ●
2006
● ● ●
●
2008
● ●
2010
●
●
Vendor ●
AMD (GPU)
●
NVIDIA (GPU)
●
Intel (CPU)
●
Intel Xeon Phi
2012
Date
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Scientific Contributions GPUMLib – GPU ML Library Host (CPU) and device (GPU) memory access framework HostArray
HostMatrix
CudaArray
DeviceArray
DeviceMatrix
···
Common Host (CPU) Classes
Common CUDA Kernels
C++ classes (algorithms) BackPropagation
Radial Basis Functions
Deep Belief Networks
Restricted Boltzmann Machines
Multiple BackPropagation
Support Vector Machines
Non-Negative Matrix Factorization
···
Multiple BackPropagation
Support Vector Machines
Non-Negative Matrix Factorization
Nonlinear Dimension Reduction
Radial Basis Functions
Restricted Boltzmann Machines
Self Organizing Maps
···
CUDA (GPU) Kernels Common Device (GPU) Functions
http://gpumlib.sourceforge.net/ Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Deep Belief Networks Deep architecture
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Restricted Boltzmann Machines (RBMs) For the binary units hj ∈ {0, 1} and vi ∈ {0, 1} the energy function of the whole network is: X X X E(v, h) = − Wij vi hj − ci vi − bj hj i,j
i
(1)
j
where W is the matrix of weights, and b and c are the bias units w.r.t. hidden and visible layers, respectively. hidden units h2
···
h3
hj
···
hJ
1
encoder
bias
v1
v2
···
vi
···
visible units Noel Lopes
vI
decoder
h1
1 bias
Machine Learning for Adaptive Multi-Core Machines
Restricted Boltzmann Machines (RBMs) Given a random training vector v, the state of a given hidden unit j is set to 1 with probability: X p(hj = 1|v) = σ(bj + vi Wij ) (2) i
Similarly: p(vi = 1|h) = σ(ci +
X
hj Wij )
(3)
j
where σ (x) is the sigmoid squashing function
1 . (1+e−x )
hidden units h2
···
h3
hj
···
hJ
1
encoder
bias
v1
v2
···
vi
···
visible units
Noel Lopes
vI
decoder
h1
1 bias
Machine Learning for Adaptive Multi-Core Machines
Training an RBM Alternating Gibbs Sampling
h(0) ···
j
hvi hj i0 i v(0)
··· =x
p(hj = 1|v) = σ(bj +
Noel Lopes
PI
i=1 vi Wji )
Machine Learning for Adaptive Multi-Core Machines
Training an RBM Alternating Gibbs Sampling
h(0) ···
j
hvi hj i0 ···
i v(0)
i
=x
p(vi = 1|h) = σ(ci +
Noel Lopes
··· v(1)
PJ
j=1 hj Wji )
Machine Learning for Adaptive Multi-Core Machines
Training an RBM Alternating Gibbs Sampling
h(0) ···
h(1) ···
j
j
hvi hj i0 i v(0)
··· =x
···
i
v(1)
p(vi = 1|h) = σ(ci +
Noel Lopes
PJ
j=1 hj Wji )
Machine Learning for Adaptive Multi-Core Machines
Training an RBM Alternating Gibbs Sampling
h(0) ···
h(1) ···
j
j
hvi hj i0 i v(0)
···
i
···
i
v(1)
=x
p(vi = 1|h) = σ(ci +
Noel Lopes
··· v(1)
PJ
j=1 hj Wji )
Machine Learning for Adaptive Multi-Core Machines
Training an RBM Alternating Gibbs Sampling
h(0) ···
h(1) ···
j
h(2) ···
j
h(∞) ···
j
hvi hj i0 i v(0)
j
hvi hj i∞ ···
=x
i
···
i
v(1)
··· v(2)
Noel Lopes
i
··· v(∞)
Machine Learning for Adaptive Multi-Core Machines
Training an RBM Contrastive Divergence (CD–k)
I
To solve this problem, Hinton proposed the Contrastive Divergence algorithm.
I
CD–k replaces h.i∞ by h·ik for small values of k. ∆Wji = γ(hvi hj i0 − hvi hj ik )
Noel Lopes
(4)
Machine Learning for Adaptive Multi-Core Machines
Deep Belief Networks (DBNs)
··· p(h3 |h2 ) ··· p(h2 |h1 ) ··· p(h1 |x)
h1 p(h1 |x)
p(x|h1 ) ···
Noel Lopes
p(h1 |x) x
h2
p(h1 |h2 ) ···
h1
p(x|h1 ) ···
x
p(h2 |h1 )
p(h1 |h2 ) ···
p(h2 |h3 ) ···
h2
h3
h1
p(x|h1 ) ···
x
Machine Learning for Adaptive Multi-Core Machines
Deep Belief Networks (DBNs) GPU Implementation Results N = 60, 000 10000
3h46m40s
Time (s)
1000
16m40s
100
46.07×
41.83×
38.64× 43.46×
10
1m40s
10s
42.73×
GTX 460 (GPU) dual-core i5 (CPU) 1
0
100
200
300
400 500 Hidden units
600
700
800
900
MNIST average training time per epoch. Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Deep Belief Networks (DBNs) Adaptive Step Size α = 0.1
α = 0.4
0.45
0.45 adaptive γ = 0.1 γ = 0.4 γ = 0.7
0.30 0.25 0.20 0.15 0.10
adaptive γ = 0.1 γ = 0.4 γ = 0.7
0.40 RMSE (reconstruction)
0.35
0.35 0.30 0.25 0.20 0.15
0
100
200
300
400
500 600 Epoch
700
800
0.10
900 1000
0
100
200
300
400
500 600 Epoch
700
800
900 1000
α = 0.7 0.45 adaptive γ = 0.1 γ = 0.4 γ = 0.7
0.40 RMSE (reconstruction)
RMSE (reconstruction)
0.40
0.35 0.30 0.25 0.20 0.15 0.10
0
100
200
300
400
500 600 Epoch
700
800
900 1000
Average reconstruction error (RMSE). Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Restricted Boltzmann Machines Receptive Fields
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Restricted Boltzmann Machines Receptive Fields
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Deep Belief Networks (DBNs)
Demonstration
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Conclusions Future Work
I
Big Data Problem: I I
Novel ML algorithms Scale-up existing algorithms I
I
High-performance (GPU) ML implementations
Size matters: I
Enhancing GPUMLib algorithms with Big Data in mind
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Publications First author
I I I
5 Journal Articles 15 Conference Articles 30+ Citations
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Publications Journal Articles Noel Lopes and Bernardete Ribeiro. Towards adaptive learning with improved convergence of deep belief networks on graphics processing units. Pattern Recognition, 2013. Noel Lopes and Bernardete Ribeiro. Towards a hybrid NMF-based neural approach for face recognition on GPUs. International Journal of Data Mining, Modelling and Management (IJDMMM), 4(2):138–155, 2012. Noel Lopes and Bernardete Ribeiro. Handling missing values via a neural selective input model. Neural Network World, 22(4):357–370, 2012. Noel Lopes and Bernardete Ribeiro. GPUMLib: An efficient open-source GPU machine learning library. International Journal of Computer Information Systems and Industrial Management Applications, 3:355–362, 2011. Noel Lopes and Bernardete Ribeiro. An evaluation of multiple feed-forward networks on GPUs. International Journal of Neural Systems (IJNS), 21(1):31–47, 2011. Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Publications Proceeding Articles (page 1 of 4) Noel Lopes, Bernardete Ribeiro, and Jo˜ ao Gon¸calves. Restricted Boltzmann machines and deep belief networks on multi-core processors. In The 2012 International Joint Conference on Neural Networks (IJCNN), 2012. Noel Lopes and Bernardete Ribeiro. Improving convergence of restricted Boltzmann machines via a learning adaptive step size. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, LNCS 7441, pages 511–518. Springer Berlin / Heidelberg, 2012. Noel Lopes, Daniel Correia, Carlos Pereira, Bernardete Ribeiro, and Ant´ onio Dourado. An incremental hypersphere learning framework for protein membership prediction. In 7th International Conference on Hybrid Artificial Intelligent Systems, LNCS 7208, pages 429–439. Springer Berlin / Heidelberg, 2012. Noel Lopes and Bernardete Ribeiro. A robust learning model for dealing with missing values in many-core architectures. In 10th International Conference on Adaptive and Natural Computing Algorithms (ICANNGA 2011), Part II, LNCS 6594, pages 108–117. Springer Berlin, 2011. Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Publications Proceeding Articles (page 2 of 4) Noel Lopes and Bernardete Ribeiro. Incremental learning for non-stationary patterns. In 17th edition of the Portuguese Conference on Pattern Recognition (RECPAD 2011), 2011. Noel Lopes and Bernardete Ribeiro. An incremental class boundary preserving hypersphere classifier. In International Conference on Neural Information Processing (ICONIP 2011), Part II, LNCS 7063, pages 690–699. Springer Berlin Heidelberg, 2011. Noel Lopes and Bernardete Ribeiro. A fast optimized semi-supervised non-negative matrix factorization algorithm. In IEEE International Joint Conference on Neural Networks (IJCNN 2011), pages 2495–2500, 2011. Noel Lopes, Bernardete Ribeiro, and Ricardo Quintas. GPUMLib: A new library to combine machine learning algorithms with graphics processing units. In IEEE 10th International Conference on Hybrid Intelligent Systems (HIS 2010), pages 229–232, August 2010.
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Publications Proceeding Articles (page 3 of 4) Noel Lopes and Bernardete Ribeiro. A strategy for dealing with missing values by using selective activation neurons in a multi-topology framework. In IEEE World Congress on Computational Intelligence (WCCI 2010), 2010. Noel Lopes and Bernardete Ribeiro. Stochastic GPU-based multithread implementation of multiple back-propagation. In Second International Conference on Agents and Artificial Intelligence (ICAART 2010), pages 271–276, 2010. Noel Lopes and Bernardete Ribeiro. Non-negative matrix factorization implementation using graphic processing units. In 11th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2010), LNCS 6283, pages 275–283. Springer, 2010. Noel Lopes and Bernardete Ribeiro. A hybrid face recognition approach using GPUMLib. In 15th Iberoamerican Congress on Pattern Recognition (CIARP 2010), LNCS 6419, pages 96–103. Springer, 2010.
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Publications Proceeding Articles (page 4 of 4)
Noel Lopes and Bernardete Ribeiro. Fast pattern classification of ventricular arrhythmias using graphics processing units. In 14th Iberoamerican Congress on Pattern Recognition (CIARP 2009), LNCS 5856, pages 603–610. Springer, 2009. Noel Lopes and Bernardete Ribeiro. GPU implementation of the multiple back-propagation algorithm. In 10th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009), LNCS 5788, pages 449–456. Springer, 2009. Noel Lopes and Bernardete Ribeiro. MBPGPU: A supervised pattern classifier for graphical processing units. In 15th edition of the Portuguese Conference on Pattern Recognition (RECPAD 2009), 2009.
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines
Machine Learning for Adaptive Multi-Core Machines Noel Lopes Supervisor: Prof. Dr. Bernardete Ribeiro
University of Coimbra, Portugal
September 17, 2013
GPUMLib Over 2000 downloads
5.6%
19.6%
11.7%
Noel Lopes
Machine Learning for Adaptive Multi-Core Machines