Machine Learning for Adaptive Multi-Core Machines - CISUC

Machine Learning for Adaptive Multi-Core Machines Noel Lopes Supervisor: Prof. Dr. Bernardete Ribeiro

University of Coimbra, Portugal

September 17, 2013

Outline

I

Introduction

I

Objectives

I

Contributions

I

High-performance Deep Learning

I

Conclusions

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Machine Learning Need to Scale up Large datasets

highdimensional inputs

Cascade predictors

High-throughput Machine Learning implementations

Adequate model selection

Inference time constraints

Algorithms complextity

Noel Lopes


Machine Learning Big Data Data sources

Real Data

Computer Simulation Models

Artificial Data

Data streams Large volumes of data

Extract useful and relevant information

challenge

vastly exceeds our capacity to analyze it

Persistent repositories of (accumulated) Data

Noel Lopes


Machine Learning Big Data Data sources

Real Data

Computer Simulation Models

Artificial Data

Data streams Large volumes of data

ML Algorithms

Persistent repositories of (accumulated) Data

Extracted information

Noel Lopes


Scientific Contributions

I

Machine Learning I I I

I

Supervised Learning Semi-supervised Learning Unsupervised Learning

GPUMLib – GPU ML Library

Noel Lopes


Scientific Contributions Supervised Learning I

Machine Learning I

Supervised Learning I I I

I I

I

Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)

Semi-supervised Learning Unsupervised Learning


Main Network with selective actuation neurons ×

x1 x2

× x3 bias

×

× ×

Space Network

Noel Lopes


y1 y2


Machine Learning I


I I


Semi-supervised Learning Unsupervised Learning Physical model

I


Conceptual models Model 1 when x3 is missing: κ3 = 0

x1 y1 x2

x1

y1

x2

y2

y2 x3 κ3 Model 2 when the value of x3 is known: κ3 = 1

bj multiplier xi

wij

×

x ˜i

x1 y1 x2

κi

y2 selective input neuron

Noel Lopes

x3



Machine Learning I


I I

I


Semi-supervised Learning Unsupervised Learning


x1

x2

xk

Noel Lopes


Scientific Contributions Semi-Supervised Learning I

Machine Learning I

Supervised Learning Semi-supervised

I

Unsupervised Learning

I

I

I

Semi-supervised Non-Negative Matrix Factorization

GPUMLib – GPU ML Library V

Htrain

W r1

rC

r2

H1

r1 r2

H2

r D

V1

···

V2

VC

≈

···

· · · WC

W1 W2

rC

HC

N r

N

Noel Lopes


Scientific Contributions Unsupervised Learning I

Machine Learning I I I

Supervised Learning Semi-supervised Unsupervised Learning I

I

Deep Belief Networks (Adaptive Step Size technique)


Noel Lopes


Scientific Contributions Case studies and Benchmarks I

Yale face database

ORL face database

MNIST hand-written digits

HHreco multi-stroke images

Case studies I I I

biomedical finance and business bio-informatics

Noel Lopes


GPUMLib – GPU ML Library Graphics Processing Unit (GPU)

Historical Single−/Double−Precision Peak Compute Rates ●

4000

●

●

Precision ●

GFLOPS

SP DP

3000 ●

●

2000 ● ● ●

1000 ● ●

0

● ● ●●● ●● ● ● ● ●●●

2002

2004

● ●●● ●● ●

2006

● ● ●

●

2008

● ●

2010

●

●

Vendor ●

AMD (GPU)

●

NVIDIA (GPU)

●

Intel (CPU)

●

Intel Xeon Phi

2012

Date

Noel Lopes


Scientific Contributions GPUMLib – GPU ML Library Host (CPU) and device (GPU) memory access framework HostArray

HostMatrix

CudaArray

DeviceArray

DeviceMatrix

···

Common Host (CPU) Classes

Common CUDA Kernels

C++ classes (algorithms) BackPropagation

Radial Basis Functions

Deep Belief Networks

Restricted Boltzmann Machines

Multiple BackPropagation

Support Vector Machines

Non-Negative Matrix Factorization

···

Multiple BackPropagation

Support Vector Machines

Non-Negative Matrix Factorization

Nonlinear Dimension Reduction

Radial Basis Functions

Restricted Boltzmann Machines

Self Organizing Maps

···

CUDA (GPU) Kernels Common Device (GPU) Functions

http://gpumlib.sourceforge.net/ Noel Lopes


Deep Belief Networks Deep architecture

Noel Lopes


Restricted Boltzmann Machines (RBMs) For the binary units hj ∈ {0, 1} and vi ∈ {0, 1} the energy function of the whole network is: X X X E(v, h) = − Wij vi hj − ci vi − bj hj i,j

i

(1)

j

where W is the matrix of weights, and b and c are the bias units w.r.t. hidden and visible layers, respectively. hidden units h2

···

h3

hj

···

hJ

1

encoder

bias

v1

v2

···

vi

···

visible units Noel Lopes

vI

decoder

h1

1 bias


Restricted Boltzmann Machines (RBMs) Given a random training vector v, the state of a given hidden unit j is set to 1 with probability: X p(hj = 1|v) = σ(bj + vi Wij ) (2) i

Similarly: p(vi = 1|h) = σ(ci +

X

hj Wij )

(3)

j

where σ (x) is the sigmoid squashing function

1 . (1+e−x )

hidden units h2

···

h3

hj

···

hJ

1

encoder

bias

v1

v2

···

vi

···

visible units

Noel Lopes

vI

decoder

h1

1 bias


Training an RBM Alternating Gibbs Sampling

h(0) ···

j

hvi hj i0 i v(0)

··· =x

p(hj = 1|v) = σ(bj +

Noel Lopes

PI

i=1 vi Wji )



h(0) ···

j

hvi hj i0 ···

i v(0)

i

=x

p(vi = 1|h) = σ(ci +

Noel Lopes

··· v(1)

PJ

j=1 hj Wji )



h(0) ···

h(1) ···

j

j

hvi hj i0 i v(0)

··· =x

···

i

v(1)

p(vi = 1|h) = σ(ci +

Noel Lopes

PJ

j=1 hj Wji )



h(0) ···

h(1) ···

j

j

hvi hj i0 i v(0)

···

i

···

i

v(1)

=x

p(vi = 1|h) = σ(ci +

Noel Lopes

··· v(1)

PJ

j=1 hj Wji )



h(0) ···

h(1) ···

j

h(2) ···

j

h(∞) ···

j

hvi hj i0 i v(0)

j

hvi hj i∞ ···

=x

i

···

i

v(1)

··· v(2)

Noel Lopes

i

··· v(∞)


Training an RBM Contrastive Divergence (CD–k)

I

To solve this problem, Hinton proposed the Contrastive Divergence algorithm.

I

CD–k replaces h.i∞ by h·ik for small values of k. ∆Wji = γ(hvi hj i0 − hvi hj ik )

Noel Lopes

(4)


Deep Belief Networks (DBNs)

··· p(h3 |h2 ) ··· p(h2 |h1 ) ··· p(h1 |x)

h1 p(h1 |x)

p(x|h1 ) ···

Noel Lopes

p(h1 |x) x

h2

p(h1 |h2 ) ···

h1

p(x|h1 ) ···

x

p(h2 |h1 )

p(h1 |h2 ) ···

p(h2 |h3 ) ···

h2

h3

h1

p(x|h1 ) ···

x


Deep Belief Networks (DBNs) GPU Implementation Results N = 60, 000 10000

3h46m40s

Time (s)

1000

16m40s

100

46.07×

41.83×

38.64× 43.46×

10

1m40s

10s

42.73×

GTX 460 (GPU) dual-core i5 (CPU) 1

0

100

200

300

400 500 Hidden units

600

700

800

900

MNIST average training time per epoch. Noel Lopes


Deep Belief Networks (DBNs) Adaptive Step Size α = 0.1

α = 0.4

0.45

0.45 adaptive γ = 0.1 γ = 0.4 γ = 0.7

0.30 0.25 0.20 0.15 0.10

adaptive γ = 0.1 γ = 0.4 γ = 0.7

0.40 RMSE (reconstruction)

0.35

0.35 0.30 0.25 0.20 0.15

0

100

200

300

400

500 600 Epoch

700

800

0.10

900 1000

0

100

200

300

400

500 600 Epoch

700

800

900 1000

α = 0.7 0.45 adaptive γ = 0.1 γ = 0.4 γ = 0.7

0.40 RMSE (reconstruction)

RMSE (reconstruction)

0.40

0.35 0.30 0.25 0.20 0.15 0.10

0

100

200

300

400

500 600 Epoch

700

800

900 1000

Average reconstruction error (RMSE). Noel Lopes


Restricted Boltzmann Machines Receptive Fields

Noel Lopes


Restricted Boltzmann Machines Receptive Fields

Noel Lopes


Deep Belief Networks (DBNs)

Demonstration

Noel Lopes


Conclusions Future Work

I

Big Data Problem: I I

Novel ML algorithms Scale-up existing algorithms I

I

High-performance (GPU) ML implementations

Size matters: I

Enhancing GPUMLib algorithms with Big Data in mind

Noel Lopes


Publications First author

I I I

5 Journal Articles 15 Conference Articles 30+ Citations

Noel Lopes


Publications Journal Articles Noel Lopes and Bernardete Ribeiro. Towards adaptive learning with improved convergence of deep belief networks on graphics processing units. Pattern Recognition, 2013. Noel Lopes and Bernardete Ribeiro. Towards a hybrid NMF-based neural approach for face recognition on GPUs. International Journal of Data Mining, Modelling and Management (IJDMMM), 4(2):138–155, 2012. Noel Lopes and Bernardete Ribeiro. Handling missing values via a neural selective input model. Neural Network World, 22(4):357–370, 2012. Noel Lopes and Bernardete Ribeiro. GPUMLib: An efficient open-source GPU machine learning library. International Journal of Computer Information Systems and Industrial Management Applications, 3:355–362, 2011. Noel Lopes and Bernardete Ribeiro. An evaluation of multiple feed-forward networks on GPUs. International Journal of Neural Systems (IJNS), 21(1):31–47, 2011. Noel Lopes


Publications Proceeding Articles (page 1 of 4) Noel Lopes, Bernardete Ribeiro, and Jo˜ ao Gon¸calves. Restricted Boltzmann machines and deep belief networks on multi-core processors. In The 2012 International Joint Conference on Neural Networks (IJCNN), 2012. Noel Lopes and Bernardete Ribeiro. Improving convergence of restricted Boltzmann machines via a learning adaptive step size. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, LNCS 7441, pages 511–518. Springer Berlin / Heidelberg, 2012. Noel Lopes, Daniel Correia, Carlos Pereira, Bernardete Ribeiro, and Ant´ onio Dourado. An incremental hypersphere learning framework for protein membership prediction. In 7th International Conference on Hybrid Artificial Intelligent Systems, LNCS 7208, pages 429–439. Springer Berlin / Heidelberg, 2012. Noel Lopes and Bernardete Ribeiro. A robust learning model for dealing with missing values in many-core architectures. In 10th International Conference on Adaptive and Natural Computing Algorithms (ICANNGA 2011), Part II, LNCS 6594, pages 108–117. Springer Berlin, 2011. Noel Lopes


Publications Proceeding Articles (page 2 of 4) Noel Lopes and Bernardete Ribeiro. Incremental learning for non-stationary patterns. In 17th edition of the Portuguese Conference on Pattern Recognition (RECPAD 2011), 2011. Noel Lopes and Bernardete Ribeiro. An incremental class boundary preserving hypersphere classifier. In International Conference on Neural Information Processing (ICONIP 2011), Part II, LNCS 7063, pages 690–699. Springer Berlin Heidelberg, 2011. Noel Lopes and Bernardete Ribeiro. A fast optimized semi-supervised non-negative matrix factorization algorithm. In IEEE International Joint Conference on Neural Networks (IJCNN 2011), pages 2495–2500, 2011. Noel Lopes, Bernardete Ribeiro, and Ricardo Quintas. GPUMLib: A new library to combine machine learning algorithms with graphics processing units. In IEEE 10th International Conference on Hybrid Intelligent Systems (HIS 2010), pages 229–232, August 2010.

Noel Lopes


Publications Proceeding Articles (page 3 of 4) Noel Lopes and Bernardete Ribeiro. A strategy for dealing with missing values by using selective activation neurons in a multi-topology framework. In IEEE World Congress on Computational Intelligence (WCCI 2010), 2010. Noel Lopes and Bernardete Ribeiro. Stochastic GPU-based multithread implementation of multiple back-propagation. In Second International Conference on Agents and Artificial Intelligence (ICAART 2010), pages 271–276, 2010. Noel Lopes and Bernardete Ribeiro. Non-negative matrix factorization implementation using graphic processing units. In 11th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2010), LNCS 6283, pages 275–283. Springer, 2010. Noel Lopes and Bernardete Ribeiro. A hybrid face recognition approach using GPUMLib. In 15th Iberoamerican Congress on Pattern Recognition (CIARP 2010), LNCS 6419, pages 96–103. Springer, 2010.

Noel Lopes


Publications Proceeding Articles (page 4 of 4)

Noel Lopes and Bernardete Ribeiro. Fast pattern classification of ventricular arrhythmias using graphics processing units. In 14th Iberoamerican Congress on Pattern Recognition (CIARP 2009), LNCS 5856, pages 603–610. Springer, 2009. Noel Lopes and Bernardete Ribeiro. GPU implementation of the multiple back-propagation algorithm. In 10th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009), LNCS 5788, pages 449–456. Springer, 2009. Noel Lopes and Bernardete Ribeiro. MBPGPU: A supervised pattern classifier for graphical processing units. In 15th edition of the Portuguese Conference on Pattern Recognition (RECPAD 2009), 2009.

Noel Lopes


Machine Learning for Adaptive Multi-Core Machines Noel Lopes Supervisor: Prof. Dr. Bernardete Ribeiro

University of Coimbra, Portugal

September 17, 2013

GPUMLib Over 2000 downloads

5.6%

19.6%

11.7%

Noel Lopes


Machine Learning for Adaptive Multi-Core Machines - CISUC

Machine Learning for Adaptive Multi-Core Machines - CISUC

Suggest Documents

Map-Reduce for Machine Learning on Multicore

A Self Distributing Virtual Machine for Adaptive Multicore ... - CiteSeerX

Garbage Collection for Multicore NUMA Machines

MATE: Machine Learning for Adaptive Calibration

Machine Learning for Adaptive Planning - CiteSeerX

A MACHINE LEARNING FRAMEWORK FOR ADAPTIVE ...

On domain-adaptive machine learning

Gaussian Margin Machines - Proceedings of Machine Learning ...

Bidirectional Helmholtz Machines - Proceedings of Machine Learning ...

Kernel methods in machine learning - Kernel Machines

Gaussian Margin Machines - Proceedings of Machine Learning ...

Machine learning for vision Binary Restricted Boltzmann Machines ...

benchmarking effort of virtual machines on multicore machines

Probabilistic models for learning from crowdsourced data - CISUC

Adaptive Features of Machine Learning Methods

Systems-Analytics-Adaptive-Machine-Learning-Workbook.pdf ...

Systems-Analytics-Adaptive-Machine-Learning-Workbook.pdf ...

Adaptive Caching by Refetching - machine learning @ wustl

Adaptive Layered Approach using Machine Learning ... - arXiv

Machine Learning-Based Adaptive Wireless Interval ...

Adaptive Belief Propagation - Proceedings of Machine Learning ...

Systems-Analytics-Adaptive-Machine-Learning-Workbook.pdf ...

Adaptive Computation and Machine Learning - Google Sites

Adaptive Machine Learning in Delayed Feedback