Parallelization and optimization of the neuromorphic simulation code
Recommend Documents
OpenCL has not been commonly used in CFD codes and it is ...... There are many models, such as decision tree, artificial neural network, genetic algorithms, etc.
Then, the actual separation process performs NMF and uses the previously trained ... tion, and only the second NMF factor is computed through multiplicative updates. ... Other operations such as element-wise matrix oper- ations do require ...
program. Intermediate form. Parallel form. Parallel executable ra isin g transform/ optimize lo w e rin g. Figure 1: The system's architecture and does not suffer ...
2.5 The Token Ring Mechanism in the JAMAICA CMP Architecture. (refer to Figure 2.1). .... 6.4 SMT Related Data Structure for 4-processors CMP. . . . . . . . . 135 ..... Passing more variables requires spilling into the stack frame. int foo() { int x,
the software-defined radio (SDR) paradigm, where a significant portion of a ...... Although this custom turbo decoding architecture offers good performance, it offers ...... C by adapting the decoder functionality originally written in MATLAB.
illustrating a variety of use-cases from audio effects in music processing ... music source separation and audio feature extraction, which ...... Vancouver, Canada.
Machine) on MATLAB, Linux and WCCS (Windows Compute. Cluster Server) ... tools and numerical algorithms for the structural design and modeling of ...
Consider, as an example, the loop shown in Figure 1, where arrays f, g and k depend on the ... Distributed Shared-Memory (DSM) multiprocessors, some important codes would .... Consequently, we add a Read and a Write bit per word (4 bytes). ... cohere
A complete direct numerical simulation (DNS) is carried out to solve the ... egy of the DNS part is based on a domain decomposition method and uses a hybrid.
dedicated to C programming language. Usually programmers have difficulties writing code which can be highly parallelizable so code visualization may be very.
each with a private local memory, connected by a message-passing network. ...... P. Brinch Hansen, \Model programs for computational science: a programming.
Sep 28, 2017 - material for solar cell if we deal well the MoS2/c-Si interface. 3.3. Optimizing the Performance of TCO/n-Type MoS2/i-Layer nc-Si:H/p-Type ...
1Chemical Engineering and Technology/Energy Processes,. Royal Institute of .... All systems have been simulated in Aspen Plus 2006 (Aspen 2006). For the ...
Official Full-Text Paper (PDF): DESIGN OPTIMIZATION AND SIMULATION OF THE PHOTOVOLTAIC ... Photo taken from Google Map .... Solar path diagram is a very useful tool in the first phase of the design of photovoltaic systems for.
Aug 15, 2017 - bubbling fluidized-bed gasifier in a triple-bed circulating fluidized bed with high ... a Department of Chemical Engineering, Taiyuan University of ...
connecting a plug to an electrical power source. Like HEVs ... three phase electricity network that allows for a ... maximum traction power (Pmax), brake power.
Jul 3, 2018 - The advantage of spiking neural networks (SNNs) over their ... neurons while temporal code makes use of the precise spike timing of the ...
Nov 8, 2016 - Intel Corporation. â . Argonne National Laboratory. â¡ ... We implement efficient nested threading in B-spline orbital evaluation kernels, paving the .... increase on-node performance and science productivity, while minimizing the ...
to workers (processes). A job server was created and the number of workers specified. The master process distributed the zones ar- ray and climate data path to ...
We study simulation optimization methods for the stochastic economic lot ... study, we compare approximate dynamic programming with a global search for ...
benchmarking and the current usage of the ORBIT code are presented. INTRODUCTION ... Alamos Neutron Science Center (LANSCE), USA. PSR has accumulated a .... This is the reason we call the model 2.5D. There is also an alternative ...
Abstract: Face detect application has a real time need in nature. Although ... program, trying to help other applications do better on GPU. .... tial sub-windows can enter the last stage classifiers and ..... puters on ordinary people's desk. To some
Oct 10, 2011 - Douglas, Andreas Andreou, Paul Mueller, Jan van der Spiegel, and Eric Vittoz, training a generation of cross-disciplinary students. It has been ...
Parallelization and optimization of the neuromorphic simulation code
Nov 3, 2015 - Numerical integration using a Runge-Kutta C routine. ⢠Computation of ... Eigen C++ library for linear algebra operations. ⢠InterProcess ...
Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem Raphaël Couturier, Michel Salomon FEMTO-ST - DISC Department - AND Team November 2 & 3, 2015 / Besançon Dynamical Systems and Brain-inspired Information Processing Workshop
Introduction Background • Emergence of hardware RC implementation
Analogue electronic ; optoelectronic ; fully optical Larger et al. - Photonic information processing beyond Turing : an optoelectronic implementation of reservoir computing, Opt. Express 20, 3241-3249 (2012)
• Matlab simulation code • Study processing conditions • Tuning parameters • Pre and post-processing by computer
Motivation • Study the concept of Reservoir Computing • Design a faster simulation code • Apply it to new problems FEMTO-ST Institute
2 / 16
Outline
1. Neuromorphic processing 2. Parallelization and optimization 3. Performances on the MNIST problem 4. Conclusion and perspectives
FEMTO-ST Institute
3 / 16
Delay Dynamics as a Reservoir Spatio-temporal viewpoint of a DDE (Larger et al. - Opt. Express 20:3 2012)
• δτ → temporal spacing; τD → time delay • f (x) → nonlinear transformation; h(t) → impulse response
Spoken Digits Recognition Input (pre-processing) • Lyon ear model transformation of each speech sample
→ 60 samples × 86 frequency channels • Channels connection to reservoir (400 neurons)
→ sparse and random
Reservoir transient response Temporal series recorded for Read-Out processing
FEMTO-ST Institute
5 / 16
Spoken Digits Recognition Output (post-processing) • Training of the Read-Out
→ optimize W R matrix for the digits of the training set • Regression problem for A × W R ≈ B
−1 R AT B Wopt = AT A − λI • A = concatenates reservoir transient response for each digit • B = concatenates target matrices
Testing • Dataset of 500 speech samples → 5 female speakers • 20-fold cross-validation → 20 × 25 test samples • Performance evaluation → Word Error Rate FEMTO-ST Institute
6 / 16
Matlab Simulation Code Main steps 1. Pre-processing • Input data formatting (1D vector ; sampling period → δτ ) • W I initialization (randomly ; normalization)
2. Concatenation of 1D vectors → batch processing 3. Nonlinear transient computation • Numerical integration using a Runge-Kutta C routine • Computation of matrices A and B
4. Read-out training → Moore-Penrose matrix inversion 5. Testing of the solution (cross-validation) Computation time 12 min for 306 “neurons” on a quad-core i7 1,8 GHz (2013)
FEMTO-ST Institute
7 / 16
Parallelization Scheme Guidelines • Reservoir response is independent, whatever the data
→ computation of matrices A and B can be parallelized • Different regression tests are also independent
In practice • Simulation code rewritten in C++ • Eigen C++ library for linear algebra operations • InterProcess Communication → Message Passing Interface
Performance on speech recognition problem • Similar classification accuracy → same WER • Reduced computation time
We can study problems with huge Matlab computation time FEMTO-ST Institute
8 / 16
Finding Optimal Parameters What parameters can be optimized ? • Currently • Pitch of the Read-Out • Amplitude parameters → δ; β; φ0 • Regression parameter → λ
• Next • Number of nodes significantly improving the solution (threshold) • Input data filter (convolution filter for images)
Potentially any parameter can be optimized Optimization heuristics • Currently → simulated annealing (probabilistic global search controlled by a cooling schedule) • Next → other metaheuristics like evolutionary algorithms FEMTO-ST Institute
9 / 16
Application on the MNIST problem Task of handwritten digits recognition National Institute of Standards and Technology database • Training dataset → american census bureau employees • Test dataset → american high school students
Mixed-NIST database is widely used in machine learning Mixing of both datasets and improved images • Datasets • Training → 60K samples • Test → 10K samples • Grayscale Images • Normalized to fit into a 20 × 20 pixel bounding box • Centered and anti-aliased FEMTO-ST Institute
10 / 16
Performances of the parallel code Classification error for 10K images • 1 reservoir of 2000 neurons → Digit Error Rate: 7.14% • 1000 reservoirs of 2 neurons → DER: 3.85%
Exploring ways to improve the results Using the parallel NTC code • Many small reservoirs and one read out • Features extraction using a simple 3 × 3 convolution filter • Best error without convolution : around 3%
Using the Oger toolbox • Increasing the dataset with transformed images
→ 15 × 15 pixel bounding box and rotated images • Subsampling of the reservoir response • Committee of reservoirs • Lower errors with the complete reservoir response • 1 reservoir of 1200 neurons → 1.42% • Committee of 31 reservoirs of 1000 neurons → 1.25% FEMTO-ST Institute
12 / 16
Comparison with other approaches Convolutional Neural Networks • Feedforward multilayer network for visual information • Different type of layers • Convolutional layer → features extraction • Pooling layer → reduce variance • Many parameters to train
Multilayer Reservoir Computing (Jalalvand et al. - CICSyN 2015) • Stacking of reservoirs → the next “corrects” the previous one • Same outputs • Trained one after the other
• 3-layer system • 16K neurons per reservoir • 528K trainable parameters → 16K nodes × 11 readouts × 3 layers FEMTO-ST Institute
13 / 16
Comparison with other approaches Classification errors Approach LeNet-1 (CNN) A reservoir of 1200 neurons SVM with gaussian kernel Committee of 31 reservoirs 3-layer reservoir CNN of 551 neurons Committee of 7 CNNs (221 neurons in each CNN)
Error rate 1.7 1.42 1.4 1.25 0.92 0.35 0.23
Reference LeCun et al. - 1998 Schaetti et al. - 2015 Schaetti et al. - 2015 Jalalvand et al. - 2015 Ciresan al. - 2011 Ciresan et al. - 2012
Remarks • CNNs give the best results, but have a long training time • A reservoir of 1000 neurons is trained in 15 minutes • Automatic features extraction improves the results FEMTO-ST Institute
14 / 16
Conclusion and perspectives Results • A parallel code allowing fast simulations • A first evaluation on the MNIST problem
Future works • Further code improvement → parallel regression • Use of several reservoirs • Committees • Correct errors of a reservoir by another one • Other applications • Simulation of lung motion • Airflow prediction • etc.