Machine Learning for Adaptive Multi-Core Machines - CISUC

3 downloads 63 Views 3MB Size Report
Sep 17, 2013 ... Machine Learning. Need to Scale up. High-throughput. Machine Learning implementations. Large datasets high- dimensional inputs. Inference.
Machine Learning for Adaptive Multi-Core Machines Noel Lopes Supervisor: Prof. Dr. Bernardete Ribeiro

University of Coimbra, Portugal

September 17, 2013

Outline

I

Introduction

I

Objectives

I

Contributions

I

High-performance Deep Learning

I

Conclusions

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Machine Learning Need to Scale up Large datasets

highdimensional inputs

Cascade predictors

High-throughput Machine Learning implementations

Adequate model selection

Inference time constraints

Algorithms complextity

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Machine Learning Big Data Data sources

Real Data

Computer Simulation Models

Artificial Data

Data streams Large volumes of data

Extract useful and relevant information

challenge

vastly exceeds our capacity to analyze it

Persistent repositories of (accumulated) Data

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Machine Learning Big Data Data sources

Real Data

Computer Simulation Models

Artificial Data

Data streams Large volumes of data

ML Algorithms

Persistent repositories of (accumulated) Data

Extracted information

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions

I

Machine Learning I I I

I

Supervised Learning Semi-supervised Learning Unsupervised Learning

GPUMLib – GPU ML Library

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions Supervised Learning I

Machine Learning I

Supervised Learning I I I

I I

I

Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)

Semi-supervised Learning Unsupervised Learning

GPUMLib – GPU ML Library

Main Network with selective actuation neurons ×

x1 x2

× x3 bias

×

× ×

Space Network

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

y1 y2

Scientific Contributions Supervised Learning I

Machine Learning I

Supervised Learning I I I

I I

Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)

Semi-supervised Learning Unsupervised Learning Physical model

I

GPUMLib – GPU ML Library

Conceptual models Model 1 when x3 is missing: κ3 = 0

x1 y1 x2

x1

y1

x2

y2

y2 x3 κ3 Model 2 when the value of x3 is known: κ3 = 1

bj multiplier xi

wij

×

x ˜i

x1 y1 x2

κi

y2 selective input neuron

Noel Lopes

x3

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions Supervised Learning I

Machine Learning I

Supervised Learning I I I

I I

I

Autonomous Training System (ATS) Neural Selective Input Model (NSIM) Incremental Hypersphere Classifier (IHC)

Semi-supervised Learning Unsupervised Learning

GPUMLib – GPU ML Library

x1

x2

xk

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions Semi-Supervised Learning I

Machine Learning I

Supervised Learning Semi-supervised

I

Unsupervised Learning

I

I

I

Semi-supervised Non-Negative Matrix Factorization

GPUMLib – GPU ML Library V

Htrain

W r1

rC

r2

H1

r1 r2

H2

r D

V1

···

V2

VC



···

· · · WC

W1 W2

rC

HC

N r

N

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions Unsupervised Learning I

Machine Learning I I I

Supervised Learning Semi-supervised Unsupervised Learning I

I

Deep Belief Networks (Adaptive Step Size technique)

GPUMLib – GPU ML Library

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions Case studies and Benchmarks I

Yale face database

ORL face database

MNIST hand-written digits

HHreco multi-stroke images

Case studies I I I

biomedical finance and business bio-informatics

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

GPUMLib – GPU ML Library Graphics Processing Unit (GPU)

Historical Single−/Double−Precision Peak Compute Rates ●

4000





Precision ●

GFLOPS

SP DP

3000 ●



2000 ● ● ●

1000 ● ●

0

● ● ●●● ●● ● ● ● ●●●

2002

2004

● ●●● ●● ●

2006

● ● ●



2008

● ●

2010





Vendor ●

AMD (GPU)



NVIDIA (GPU)



Intel (CPU)



Intel Xeon Phi

2012

Date

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Scientific Contributions GPUMLib – GPU ML Library Host (CPU) and device (GPU) memory access framework HostArray

HostMatrix

CudaArray

DeviceArray

DeviceMatrix

···

Common Host (CPU) Classes

Common CUDA Kernels

C++ classes (algorithms) BackPropagation

Radial Basis Functions

Deep Belief Networks

Restricted Boltzmann Machines

Multiple BackPropagation

Support Vector Machines

Non-Negative Matrix Factorization

···

Multiple BackPropagation

Support Vector Machines

Non-Negative Matrix Factorization

Nonlinear Dimension Reduction

Radial Basis Functions

Restricted Boltzmann Machines

Self Organizing Maps

···

CUDA (GPU) Kernels Common Device (GPU) Functions

http://gpumlib.sourceforge.net/ Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Deep Belief Networks Deep architecture

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Restricted Boltzmann Machines (RBMs) For the binary units hj ∈ {0, 1} and vi ∈ {0, 1} the energy function of the whole network is: X X X E(v, h) = − Wij vi hj − ci vi − bj hj i,j

i

(1)

j

where W is the matrix of weights, and b and c are the bias units w.r.t. hidden and visible layers, respectively. hidden units h2

···

h3

hj

···

hJ

1

encoder

bias

v1

v2

···

vi

···

visible units Noel Lopes

vI

decoder

h1

1 bias

Machine Learning for Adaptive Multi-Core Machines

Restricted Boltzmann Machines (RBMs) Given a random training vector v, the state of a given hidden unit j is set to 1 with probability: X p(hj = 1|v) = σ(bj + vi Wij ) (2) i

Similarly: p(vi = 1|h) = σ(ci +

X

hj Wij )

(3)

j

where σ (x) is the sigmoid squashing function

1 . (1+e−x )

hidden units h2

···

h3

hj

···

hJ

1

encoder

bias

v1

v2

···

vi

···

visible units

Noel Lopes

vI

decoder

h1

1 bias

Machine Learning for Adaptive Multi-Core Machines

Training an RBM Alternating Gibbs Sampling

h(0) ···

j

hvi hj i0 i v(0)

··· =x

p(hj = 1|v) = σ(bj +

Noel Lopes

PI

i=1 vi Wji )

Machine Learning for Adaptive Multi-Core Machines

Training an RBM Alternating Gibbs Sampling

h(0) ···

j

hvi hj i0 ···

i v(0)

i

=x

p(vi = 1|h) = σ(ci +

Noel Lopes

··· v(1)

PJ

j=1 hj Wji )

Machine Learning for Adaptive Multi-Core Machines

Training an RBM Alternating Gibbs Sampling

h(0) ···

h(1) ···

j

j

hvi hj i0 i v(0)

··· =x

···

i

v(1)

p(vi = 1|h) = σ(ci +

Noel Lopes

PJ

j=1 hj Wji )

Machine Learning for Adaptive Multi-Core Machines

Training an RBM Alternating Gibbs Sampling

h(0) ···

h(1) ···

j

j

hvi hj i0 i v(0)

···

i

···

i

v(1)

=x

p(vi = 1|h) = σ(ci +

Noel Lopes

··· v(1)

PJ

j=1 hj Wji )

Machine Learning for Adaptive Multi-Core Machines

Training an RBM Alternating Gibbs Sampling

h(0) ···

h(1) ···

j

h(2) ···

j

h(∞) ···

j

hvi hj i0 i v(0)

j

hvi hj i∞ ···

=x

i

···

i

v(1)

··· v(2)

Noel Lopes

i

··· v(∞)

Machine Learning for Adaptive Multi-Core Machines

Training an RBM Contrastive Divergence (CD–k)

I

To solve this problem, Hinton proposed the Contrastive Divergence algorithm.

I

CD–k replaces h.i∞ by h·ik for small values of k. ∆Wji = γ(hvi hj i0 − hvi hj ik )

Noel Lopes

(4)

Machine Learning for Adaptive Multi-Core Machines

Deep Belief Networks (DBNs)

··· p(h3 |h2 ) ··· p(h2 |h1 ) ··· p(h1 |x)

h1 p(h1 |x)

p(x|h1 ) ···

Noel Lopes

p(h1 |x) x

h2

p(h1 |h2 ) ···

h1

p(x|h1 ) ···

x

p(h2 |h1 )

p(h1 |h2 ) ···

p(h2 |h3 ) ···

h2

h3

h1

p(x|h1 ) ···

x

Machine Learning for Adaptive Multi-Core Machines

Deep Belief Networks (DBNs) GPU Implementation Results N = 60, 000 10000

3h46m40s

Time (s)

1000

16m40s

100

46.07×

41.83×

38.64× 43.46×

10

1m40s

10s

42.73×

GTX 460 (GPU) dual-core i5 (CPU) 1

0

100

200

300

400 500 Hidden units

600

700

800

900

MNIST average training time per epoch. Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Deep Belief Networks (DBNs) Adaptive Step Size α = 0.1

α = 0.4

0.45

0.45 adaptive γ = 0.1 γ = 0.4 γ = 0.7

0.30 0.25 0.20 0.15 0.10

adaptive γ = 0.1 γ = 0.4 γ = 0.7

0.40 RMSE (reconstruction)

0.35

0.35 0.30 0.25 0.20 0.15

0

100

200

300

400

500 600 Epoch

700

800

0.10

900 1000

0

100

200

300

400

500 600 Epoch

700

800

900 1000

α = 0.7 0.45 adaptive γ = 0.1 γ = 0.4 γ = 0.7

0.40 RMSE (reconstruction)

RMSE (reconstruction)

0.40

0.35 0.30 0.25 0.20 0.15 0.10

0

100

200

300

400

500 600 Epoch

700

800

900 1000

Average reconstruction error (RMSE). Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Restricted Boltzmann Machines Receptive Fields

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Restricted Boltzmann Machines Receptive Fields

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Deep Belief Networks (DBNs)

Demonstration

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Conclusions Future Work

I

Big Data Problem: I I

Novel ML algorithms Scale-up existing algorithms I

I

High-performance (GPU) ML implementations

Size matters: I

Enhancing GPUMLib algorithms with Big Data in mind

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Publications First author

I I I

5 Journal Articles 15 Conference Articles 30+ Citations

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Publications Journal Articles Noel Lopes and Bernardete Ribeiro. Towards adaptive learning with improved convergence of deep belief networks on graphics processing units. Pattern Recognition, 2013. Noel Lopes and Bernardete Ribeiro. Towards a hybrid NMF-based neural approach for face recognition on GPUs. International Journal of Data Mining, Modelling and Management (IJDMMM), 4(2):138–155, 2012. Noel Lopes and Bernardete Ribeiro. Handling missing values via a neural selective input model. Neural Network World, 22(4):357–370, 2012. Noel Lopes and Bernardete Ribeiro. GPUMLib: An efficient open-source GPU machine learning library. International Journal of Computer Information Systems and Industrial Management Applications, 3:355–362, 2011. Noel Lopes and Bernardete Ribeiro. An evaluation of multiple feed-forward networks on GPUs. International Journal of Neural Systems (IJNS), 21(1):31–47, 2011. Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Publications Proceeding Articles (page 1 of 4) Noel Lopes, Bernardete Ribeiro, and Jo˜ ao Gon¸calves. Restricted Boltzmann machines and deep belief networks on multi-core processors. In The 2012 International Joint Conference on Neural Networks (IJCNN), 2012. Noel Lopes and Bernardete Ribeiro. Improving convergence of restricted Boltzmann machines via a learning adaptive step size. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, LNCS 7441, pages 511–518. Springer Berlin / Heidelberg, 2012. Noel Lopes, Daniel Correia, Carlos Pereira, Bernardete Ribeiro, and Ant´ onio Dourado. An incremental hypersphere learning framework for protein membership prediction. In 7th International Conference on Hybrid Artificial Intelligent Systems, LNCS 7208, pages 429–439. Springer Berlin / Heidelberg, 2012. Noel Lopes and Bernardete Ribeiro. A robust learning model for dealing with missing values in many-core architectures. In 10th International Conference on Adaptive and Natural Computing Algorithms (ICANNGA 2011), Part II, LNCS 6594, pages 108–117. Springer Berlin, 2011. Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Publications Proceeding Articles (page 2 of 4) Noel Lopes and Bernardete Ribeiro. Incremental learning for non-stationary patterns. In 17th edition of the Portuguese Conference on Pattern Recognition (RECPAD 2011), 2011. Noel Lopes and Bernardete Ribeiro. An incremental class boundary preserving hypersphere classifier. In International Conference on Neural Information Processing (ICONIP 2011), Part II, LNCS 7063, pages 690–699. Springer Berlin Heidelberg, 2011. Noel Lopes and Bernardete Ribeiro. A fast optimized semi-supervised non-negative matrix factorization algorithm. In IEEE International Joint Conference on Neural Networks (IJCNN 2011), pages 2495–2500, 2011. Noel Lopes, Bernardete Ribeiro, and Ricardo Quintas. GPUMLib: A new library to combine machine learning algorithms with graphics processing units. In IEEE 10th International Conference on Hybrid Intelligent Systems (HIS 2010), pages 229–232, August 2010.

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Publications Proceeding Articles (page 3 of 4) Noel Lopes and Bernardete Ribeiro. A strategy for dealing with missing values by using selective activation neurons in a multi-topology framework. In IEEE World Congress on Computational Intelligence (WCCI 2010), 2010. Noel Lopes and Bernardete Ribeiro. Stochastic GPU-based multithread implementation of multiple back-propagation. In Second International Conference on Agents and Artificial Intelligence (ICAART 2010), pages 271–276, 2010. Noel Lopes and Bernardete Ribeiro. Non-negative matrix factorization implementation using graphic processing units. In 11th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2010), LNCS 6283, pages 275–283. Springer, 2010. Noel Lopes and Bernardete Ribeiro. A hybrid face recognition approach using GPUMLib. In 15th Iberoamerican Congress on Pattern Recognition (CIARP 2010), LNCS 6419, pages 96–103. Springer, 2010.

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Publications Proceeding Articles (page 4 of 4)

Noel Lopes and Bernardete Ribeiro. Fast pattern classification of ventricular arrhythmias using graphics processing units. In 14th Iberoamerican Congress on Pattern Recognition (CIARP 2009), LNCS 5856, pages 603–610. Springer, 2009. Noel Lopes and Bernardete Ribeiro. GPU implementation of the multiple back-propagation algorithm. In 10th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009), LNCS 5788, pages 449–456. Springer, 2009. Noel Lopes and Bernardete Ribeiro. MBPGPU: A supervised pattern classifier for graphical processing units. In 15th edition of the Portuguese Conference on Pattern Recognition (RECPAD 2009), 2009.

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Machine Learning for Adaptive Multi-Core Machines Noel Lopes Supervisor: Prof. Dr. Bernardete Ribeiro

University of Coimbra, Portugal

September 17, 2013

GPUMLib Over 2000 downloads

5.6%

19.6%

11.7%

Noel Lopes

Machine Learning for Adaptive Multi-Core Machines

Suggest Documents