Parallelization and optimization of the neuromorphic simulation code

Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem Raphaël Couturier, Michel Salomon FEMTO-ST - DISC Department - AND Team November 2 & 3, 2015 / Besançon Dynamical Systems and Brain-inspired Information Processing Workshop

Introduction Background • Emergence of hardware RC implementation

Analogue electronic ; optoelectronic ; fully optical Larger et al. - Photonic information processing beyond Turing : an optoelectronic implementation of reservoir computing, Opt. Express 20, 3241-3249 (2012)

• Matlab simulation code • Study processing conditions • Tuning parameters • Pre and post-processing by computer

Motivation • Study the concept of Reservoir Computing • Design a faster simulation code • Apply it to new problems FEMTO-ST Institute

2 / 16

Outline

1. Neuromorphic processing 2. Parallelization and optimization 3. Performances on the MNIST problem 4. Conclusion and perspectives

FEMTO-ST Institute

3 / 16

Delay Dynamics as a Reservoir Spatio-temporal viewpoint of a DDE (Larger et al. - Opt. Express 20:3 2012)

• δτ → temporal spacing; τD → time delay • f (x) → nonlinear transformation; h(t) → impulse response

Computer simulation with an Ikeda type NLDDE τ

dx (t) = −x(t) + β sin2 [α x(t − τD ) + ρ uin (t − τD ) + Φ0 ] dt

α → feedback scaling; β → gain; ρ → amplification; Φ0 → offset FEMTO-ST Institute

4 / 16

Spoken Digits Recognition Input (pre-processing) • Lyon ear model transformation of each speech sample

→ 60 samples × 86 frequency channels • Channels connection to reservoir (400 neurons)

→ sparse and random

Reservoir transient response Temporal series recorded for Read-Out processing

FEMTO-ST Institute

5 / 16

Spoken Digits Recognition Output (post-processing) • Training of the Read-Out

→ optimize W R matrix for the digits of the training set • Regression problem for A × W R ≈ B

−1 R AT B Wopt = AT A − λI • A = concatenates reservoir transient response for each digit • B = concatenates target matrices

Testing • Dataset of 500 speech samples → 5 female speakers • 20-fold cross-validation → 20 × 25 test samples • Performance evaluation → Word Error Rate FEMTO-ST Institute

6 / 16

Matlab Simulation Code Main steps 1. Pre-processing • Input data formatting (1D vector ; sampling period → δτ ) • W I initialization (randomly ; normalization)

2. Concatenation of 1D vectors → batch processing 3. Nonlinear transient computation • Numerical integration using a Runge-Kutta C routine • Computation of matrices A and B

4. Read-out training → Moore-Penrose matrix inversion 5. Testing of the solution (cross-validation) Computation time 12 min for 306 “neurons” on a quad-core i7 1,8 GHz (2013)

FEMTO-ST Institute

7 / 16

Parallelization Scheme Guidelines • Reservoir response is independent, whatever the data

→ computation of matrices A and B can be parallelized • Different regression tests are also independent

In practice • Simulation code rewritten in C++ • Eigen C++ library for linear algebra operations • InterProcess Communication → Message Passing Interface

Performance on speech recognition problem • Similar classification accuracy → same WER • Reduced computation time

We can study problems with huge Matlab computation time FEMTO-ST Institute

8 / 16

Finding Optimal Parameters What parameters can be optimized ? • Currently • Pitch of the Read-Out • Amplitude parameters → δ; β; φ0 • Regression parameter → λ

• Next • Number of nodes significantly improving the solution (threshold) • Input data filter (convolution filter for images)

Potentially any parameter can be optimized Optimization heuristics • Currently → simulated annealing (probabilistic global search controlled by a cooling schedule) • Next → other metaheuristics like evolutionary algorithms FEMTO-ST Institute

9 / 16

Application on the MNIST problem Task of handwritten digits recognition National Institute of Standards and Technology database • Training dataset → american census bureau employees • Test dataset → american high school students

Mixed-NIST database is widely used in machine learning Mixing of both datasets and improved images • Datasets • Training → 60K samples • Test → 10K samples • Grayscale Images • Normalized to fit into a 20 × 20 pixel bounding box • Centered and anti-aliased FEMTO-ST Institute

10 / 16

Performances of the parallel code Classification error for 10K images • 1 reservoir of 2000 neurons → Digit Error Rate: 7.14% • 1000 reservoirs of 2 neurons → DER: 3.85%

Speedup 35 1000 reservoirs 2 neurons ideal 1 reservoir 2000 neurons

30

speed up

25

20

15

10

5

0 0

5

10

15 20 nb. of cores

25

30

35

FEMTO-ST Institute

11 / 16

Exploring ways to improve the results Using the parallel NTC code • Many small reservoirs and one read out • Features extraction using a simple 3 × 3 convolution filter • Best error without convolution : around 3%

Using the Oger toolbox • Increasing the dataset with transformed images

→ 15 × 15 pixel bounding box and rotated images • Subsampling of the reservoir response • Committee of reservoirs • Lower errors with the complete reservoir response • 1 reservoir of 1200 neurons → 1.42% • Committee of 31 reservoirs of 1000 neurons → 1.25% FEMTO-ST Institute

12 / 16

Comparison with other approaches Convolutional Neural Networks • Feedforward multilayer network for visual information • Different type of layers • Convolutional layer → features extraction • Pooling layer → reduce variance • Many parameters to train

Multilayer Reservoir Computing (Jalalvand et al. - CICSyN 2015) • Stacking of reservoirs → the next “corrects” the previous one • Same outputs • Trained one after the other

• 3-layer system • 16K neurons per reservoir • 528K trainable parameters → 16K nodes × 11 readouts × 3 layers FEMTO-ST Institute

13 / 16

Comparison with other approaches Classification errors Approach LeNet-1 (CNN) A reservoir of 1200 neurons SVM with gaussian kernel Committee of 31 reservoirs 3-layer reservoir CNN of 551 neurons Committee of 7 CNNs (221 neurons in each CNN)

Error rate 1.7 1.42 1.4 1.25 0.92 0.35 0.23

Reference LeCun et al. - 1998 Schaetti et al. - 2015 Schaetti et al. - 2015 Jalalvand et al. - 2015 Ciresan al. - 2011 Ciresan et al. - 2012

Remarks • CNNs give the best results, but have a long training time • A reservoir of 1000 neurons is trained in 15 minutes • Automatic features extraction improves the results FEMTO-ST Institute

14 / 16

Conclusion and perspectives Results • A parallel code allowing fast simulations • A first evaluation on the MNIST problem

Future works • Further code improvement → parallel regression • Use of several reservoirs • Committees • Correct errors of a reservoir by another one • Other applications • Simulation of lung motion • Airflow prediction • etc.

FEMTO-ST Institute

15 / 16

Thank you for your attention

Questions ?

FEMTO-ST Institute

16 / 16

Parallelization and optimization of the neuromorphic simulation code

Parallelization and optimization of the neuromorphic simulation code

Suggest Documents

GPU parallelization, optimization and

Optimization and Parallelization of Monaural

Transparent Parallelization of Binary Code

Adaptive Parallelization and Optimization for The

Performance Optimization and Parallelization of ... - Semantic Scholar

Optimization and Parallelization of Monaural ... - Semantic Scholar

Implementation of Parallelization and Nano Simulation ... - CiteSeerX

Hardware for Speculative Reduction Parallelization and Optimization

Parallelization of a DEM/CFD code for the numerical ... - CiteSeerX

A tool supporting C code parallelization

Towards automated code parallelization through ... - Semantic Scholar

Simulation and Optimization Characteristic of

SIMULATION AND OPTIMIZATION OF ...

DESIGN OPTIMIZATION AND SIMULATION OF THE PHOTOVOLTAIC ...

Hydrodynamic simulation and optimization of the

Modeling, Simulation and Optimization

Simulation and Optimization of the Energy ...

Is Neuromorphic MNIST neuromorphic? Analyzing the discriminative

Optimization and parallelization of B-spline based orbital ... - arXiv

Parallelization and optimization of spatial analysis for large scale ...

Simulation Optimization for the Stochastic ... - Optimization Online

The ORBIT Simulation Code: Benchmarking and Applications

Parallelization and Performance Optimization on Face ... - IEEE Xplore

Frontiers in neuromorphic engineering - Neuromorphic Cognitive ...