Parallelization and optimization of the neuromorphic simulation code

17 downloads 0 Views 2MB Size Report
Nov 3, 2015 - Numerical integration using a Runge-Kutta C routine. • Computation of ... Eigen C++ library for linear algebra operations. • InterProcess ...
Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem Raphaël Couturier, Michel Salomon FEMTO-ST - DISC Department - AND Team November 2 & 3, 2015 / Besançon Dynamical Systems and Brain-inspired Information Processing Workshop

Introduction Background • Emergence of hardware RC implementation

Analogue electronic ; optoelectronic ; fully optical Larger et al. - Photonic information processing beyond Turing : an optoelectronic implementation of reservoir computing, Opt. Express 20, 3241-3249 (2012)

• Matlab simulation code • Study processing conditions • Tuning parameters • Pre and post-processing by computer

Motivation • Study the concept of Reservoir Computing • Design a faster simulation code • Apply it to new problems FEMTO-ST Institute

2 / 16

Outline

1. Neuromorphic processing 2. Parallelization and optimization 3. Performances on the MNIST problem 4. Conclusion and perspectives

FEMTO-ST Institute

3 / 16

Delay Dynamics as a Reservoir Spatio-temporal viewpoint of a DDE (Larger et al. - Opt. Express 20:3 2012)

• δτ → temporal spacing; τD → time delay • f (x) → nonlinear transformation; h(t) → impulse response

Computer simulation with an Ikeda type NLDDE τ

dx (t) = −x(t) + β sin2 [α x(t − τD ) + ρ uin (t − τD ) + Φ0 ] dt

α → feedback scaling; β → gain; ρ → amplification; Φ0 → offset FEMTO-ST Institute

4 / 16

Spoken Digits Recognition Input (pre-processing) • Lyon ear model transformation of each speech sample

→ 60 samples × 86 frequency channels • Channels connection to reservoir (400 neurons)

→ sparse and random

Reservoir transient response Temporal series recorded for Read-Out processing

FEMTO-ST Institute

5 / 16

Spoken Digits Recognition Output (post-processing) • Training of the Read-Out

→ optimize W R matrix for the digits of the training set • Regression problem for A × W R ≈ B

 −1 R AT B Wopt = AT A − λI • A = concatenates reservoir transient response for each digit • B = concatenates target matrices

Testing • Dataset of 500 speech samples → 5 female speakers • 20-fold cross-validation → 20 × 25 test samples • Performance evaluation → Word Error Rate FEMTO-ST Institute

6 / 16

Matlab Simulation Code Main steps 1. Pre-processing • Input data formatting (1D vector ; sampling period → δτ ) • W I initialization (randomly ; normalization)

2. Concatenation of 1D vectors → batch processing 3. Nonlinear transient computation • Numerical integration using a Runge-Kutta C routine • Computation of matrices A and B

4. Read-out training → Moore-Penrose matrix inversion 5. Testing of the solution (cross-validation) Computation time 12 min for 306 “neurons” on a quad-core i7 1,8 GHz (2013)

FEMTO-ST Institute

7 / 16

Parallelization Scheme Guidelines • Reservoir response is independent, whatever the data

→ computation of matrices A and B can be parallelized • Different regression tests are also independent

In practice • Simulation code rewritten in C++ • Eigen C++ library for linear algebra operations • InterProcess Communication → Message Passing Interface

Performance on speech recognition problem • Similar classification accuracy → same WER • Reduced computation time

We can study problems with huge Matlab computation time FEMTO-ST Institute

8 / 16

Finding Optimal Parameters What parameters can be optimized ? • Currently • Pitch of the Read-Out • Amplitude parameters → δ; β; φ0 • Regression parameter → λ

• Next • Number of nodes significantly improving the solution (threshold) • Input data filter (convolution filter for images)

Potentially any parameter can be optimized Optimization heuristics • Currently → simulated annealing (probabilistic global search controlled by a cooling schedule) • Next → other metaheuristics like evolutionary algorithms FEMTO-ST Institute

9 / 16

Application on the MNIST problem Task of handwritten digits recognition National Institute of Standards and Technology database • Training dataset → american census bureau employees • Test dataset → american high school students

Mixed-NIST database is widely used in machine learning Mixing of both datasets and improved images • Datasets • Training → 60K samples • Test → 10K samples • Grayscale Images • Normalized to fit into a 20 × 20 pixel bounding box • Centered and anti-aliased FEMTO-ST Institute

10 / 16

Performances of the parallel code Classification error for 10K images • 1 reservoir of 2000 neurons → Digit Error Rate: 7.14% • 1000 reservoirs of 2 neurons → DER: 3.85%

Speedup 35 1000 reservoirs 2 neurons ideal 1 reservoir 2000 neurons

30

speed up

25

20

15

10

5

0 0

5

10

15 20 nb. of cores

25

30

35

FEMTO-ST Institute

11 / 16

Exploring ways to improve the results Using the parallel NTC code • Many small reservoirs and one read out • Features extraction using a simple 3 × 3 convolution filter • Best error without convolution : around 3%

Using the Oger toolbox • Increasing the dataset with transformed images

→ 15 × 15 pixel bounding box and rotated images • Subsampling of the reservoir response • Committee of reservoirs • Lower errors with the complete reservoir response • 1 reservoir of 1200 neurons → 1.42% • Committee of 31 reservoirs of 1000 neurons → 1.25% FEMTO-ST Institute

12 / 16

Comparison with other approaches Convolutional Neural Networks • Feedforward multilayer network for visual information • Different type of layers • Convolutional layer → features extraction • Pooling layer → reduce variance • Many parameters to train

Multilayer Reservoir Computing (Jalalvand et al. - CICSyN 2015) • Stacking of reservoirs → the next “corrects” the previous one • Same outputs • Trained one after the other

• 3-layer system • 16K neurons per reservoir • 528K trainable parameters → 16K nodes × 11 readouts × 3 layers FEMTO-ST Institute

13 / 16

Comparison with other approaches Classification errors Approach LeNet-1 (CNN) A reservoir of 1200 neurons SVM with gaussian kernel Committee of 31 reservoirs 3-layer reservoir CNN of 551 neurons Committee of 7 CNNs (221 neurons in each CNN)

Error rate 1.7 1.42 1.4 1.25 0.92 0.35 0.23

Reference LeCun et al. - 1998 Schaetti et al. - 2015 Schaetti et al. - 2015 Jalalvand et al. - 2015 Ciresan al. - 2011 Ciresan et al. - 2012

Remarks • CNNs give the best results, but have a long training time • A reservoir of 1000 neurons is trained in 15 minutes • Automatic features extraction improves the results FEMTO-ST Institute

14 / 16

Conclusion and perspectives Results • A parallel code allowing fast simulations • A first evaluation on the MNIST problem

Future works • Further code improvement → parallel regression • Use of several reservoirs • Committees • Correct errors of a reservoir by another one • Other applications • Simulation of lung motion • Airflow prediction • etc.

FEMTO-ST Institute

15 / 16

Thank you for your attention

Questions ?

FEMTO-ST Institute

16 / 16

Suggest Documents