Efficient Implementation of Self-Organizing Map for ...
Recommend Documents
Efficient Implementation of Self-Organizing Map for Sparse ... model that maps input data from a high-dimensional vector space ... computer or such GPU as the Nvidia Tesla, Titan or ..... the sparse nature of many data models can be effecti-.
In this paper we analyze a quantized turbo-decoder based on a Max-Log-MAP algorithm with Extrinsic Scaling Fac- tor (ESF). Its communication performance ...
percentage of the landmarks from the map without making the map building process statistically .... Eq. (6) and which is being observed according to the.
complexity of a MAP decoding algorithm is signi cant for realization of e cient ... executes decoding in the divide-and-conquer manner, which results in many ...
M.Tech -VLSI Students, Department of ECE, K L University, Guntur, AP, India. 3. Assistant ..... Member of Indian Society for Technical Education (MISTE).
Assistant Professor, Department of ECE, K L University, Guntur, AP, India. 2. M.Tech -VLSI Students, Department of ECE, K L University, Guntur, AP, India. 3.
way using a micro-GA (mGA) search engine. The inverse approach is ... of using micro-GA (mGA) to optimize the highly nonlinear objective function in the (m,t,k).
standard UNIX file system interface to encrypted files. User can associate a cryptographic key with the directories they wish to protect. Files in these directories.
system is beautifully interrelated and unified: the general ... Section 3 gives a general introduction ..... [18] Albert A. Liddicoat, Michael J. Flynn, âParallel square.
ABSTRACT. Micro-algae constitute powerful unicellular factories with enormous potential for mitigation of miscellaneous pollutants from effluent gases.
multiplication operations, so the high-speed multiplier with a compromised ..... high speed applications using Karatsuba algorithm and Urdhva-. Tiryagbhyam ...
In conventional full duplex wireline systems digital echo .... In long DSL loops, for example, echo ..... discrete multitone asymmetric digital subscriber line.
This paper implements the inverse approach for forecasting hydrological time series in an efficient way using a micro-GA (mGA) search engine. The inverse ...
Jul 12, 2010 - Manuscript received February 13, 2009; revised June 13, 2009. This work was supported in ... Digital Object Identifier 10.1109/TVLSI.2009.2032553. TABLE I ...... trical engineering from National Central University,. Taoyuan ...
Karatsuba algorithm (KA)-based strategy is rarely employed to reduce the ... Implementation of Karatsuba Algorithm based Three-Operand Multiplication over ...
47. 6. REFERENCES. [1] Prof R.P. Arora, Garima Verma, âImplementation of. Authentication and Transaction Security based on. Kerberosâ, IJITCE, Feb 2011.
Jul 21, 2017 - Cornell University [email protected]. Abstract. The DenseNet architecture [9] is highly computationally efficient as a result of feature reuse.
Oct 27, 2009 - David Valencia and Antonio Plaza. Department of ..... Gaudi. AMD Opteron Quad Core (numa). 32 with 3 × 3 pixels in size. As expected, the ...
Oct 27, 2009 - regular properties across the scene, an adaptative scheme is needed to ensure that .... definition of grayscale erosion of size n ⥠0, the grayscale dilation of size n .... lution in order to better substantiate our findings. Table 2
With the Knowledge of VLSI, we can implement a Project in the field of Voice and Data Communication networks, Digital Signal Processing, Automobiles, Commercial Electronics, and Medical Electronics.
AbstractâWe propose a novel end-to-end framework to customize execution ... based realizations of neural networks is often bounded by the memory access ... a novel end-to-end framework to customize execution of deep neural networks ...... âCompre
Efficient Implementation Techniques for. Topological Predicates on Complex
Spatial Objects. Reasey Praing & Markus Schneider∗. University of Florida.
Efficient Implementation for Particle Tracing in Computational Hemodynamics. Eduardo Camargo. Pablo Javier Blanco. Rául Antonino Feijóo [email protected].
May 7, 2017 - AbstractâIn recent years deep learning algorithms have shown extremely high ... convolutional neural networks (CNNs) have become the state-of-the-art for ... capable of generating data by âupsamplingâ the input using deconvolution
Efficient Implementation of Self-Organizing Map for ...
SOM Description. Standard vs. Batch Versions. Motivation for Sparse SOM. Example: MNIST Handwritten Dataset. Random sampling. SOM map (12x16 units) ...
Efficient Implementation of Self-Organizing Map for Sparse Input Data Josué Melka and Jean-Jacques Mariage
Introduction Contributions Experiments
1
Introduction SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
2
Contributions
3
Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Presentation: The SOM Network
Self-Organizing Map (Kohonen 1982): an artificial neural network trained by unsupervised competitive learning produces a low-dimensional map of the input space Many applications Commonly used for data projection, clustering, etc.
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
The SOM Map The map: units (= nodes/neurons) on a lattice. Associated with each node: a weight vector a position in the map
http://www.lohninger.com
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Example: MNIST Handwritten Dataset
SOM map (12x16 units)
Random sampling
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
The SOM Training Algorithm
Within the main loop: 1
compute the distance between input and weight vectors dk (t) = kx(t) − wk (t)k2
2
find the node weight closest to the input (BMU) dc (t) = min d(t) k
3
(1)
(2)
update the BMU and its neighbors to be closer to the input
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Standard Algorithm: The Learning Rule Update weight vectors at each time step, for a random sample x: wk (t + 1) = wk (t) + α(t)hck (t) [x(t) − wk (t)]
(3)
α(t) is the decreasing learning rate hck (t) is the neighborhood function For example, the Gaussian: krk − rc k2 hck (t) = exp − 2σ(t)2
!
Gaussian neighborhood
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Introduction Contributions Experiments
The Batch Algorithm
The BMUs and their neighbors are updated once at the end of each epoch, with the average of all the samples that trigger them: Ptf
wk (tf ) =
t0
hck (t 0 )x(t 0 ) 0 t0 hck (t )
Ptf
(4)
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Motivation: Reduce the Computing Time
Computing time depends on: 1
the T training iterations
2
the M nodes in the network
3
the D dimensions of the vectors
The issue: large sparse datasets Many real-world datasets are sparse and high-dimensional. But existing SOMs can’t exploit sparseness to save time.
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Datasets Examples: Dense vs. Sparse
MNIST dataset
News-20 dataset
(dim: 780, density: 19.22%)
(dim: 62061, density: 0.13%)
Introduction Contributions Experiments
SOM Description Standard vs. Batch Versions Motivation for Sparse SOM
Overcoming the Dimensionality Problem
A popular option: reduce the model space Use less dimension-sensitive space reduction techniques (such as SVD, Random-Mapping, etc.). But is this the only way ? Let’s define f is the fraction of non-zero values, and d = D × f . Can we reduce the SOM complexity from O(TMD) to O(TMd) ?
Introduction Contributions Experiments
1
Introduction
2
Contributions Sparse Algorithms Parallelism
3
Experiments
Sparse Algorithms Parallelism
Introduction Contributions Experiments
Sparse Algorithms Parallelism
Compressed Sparse Rows Format
https://op2.github.io/PyOP2/linear_algebra.html
Sufficient to reduce the computing time ? CSR supports efficient linear algebra operations. But: nodes weights must be dense for training not all operations produce sparse output
Introduction Contributions Experiments
Sparse Algorithms Parallelism
Speedup the BMU Search Euclidean distance (1) is equivalent to: dk (t) = kx(t)k2 − 2(wk (t) · x(t)) + kwk (t)k2
(5)
Batch version This change suffices to make sparse Batch SOM efficient, since kwk k2 can be computed once for each epoch. Standard version By storing kw(t)k2 , we can compute kw(t + 1)k2 efficiently. kwk (t + 1)k2 = (1 − β(t))2 kwk (t)k2 + β(t)2 kx(t)k2 + 2β(t)(1 − β(t))(wk (t) · x(t))
(6)
Introduction Contributions Experiments
Sparse Algorithms Parallelism
Sparse-Som: Speedup the Update Phase We express the learning rule (3) as (Natarajan 1997):
wk (t + 1) = (1 − β(t)) wk (t) +
β(t) x(t) 1 − β(t)
Don’t update entire weight vectors We keep the scalar coefficient separately, so we update only the values affected by x(t). Numerical stability To avoid numerical stability issues, we use double-precision floating-point, and rescale the weights when needed.
(7)
Introduction Contributions Experiments
Sparse Algorithms Parallelism
Parallel Approaches for SOM Specialized hardware (historical) Massively parallel computing Shared-memory multiprocessing
Another specific issue due to sparseness Memory-access latency, because the non-linear access pattern to weight vectors. Mitigation: improve the processor cache locality Access to the weight-vectors in the inner-loop.
Introduction Contributions Experiments
1
Introduction
2
Contributions
3
Experiments Evaluation Speed benchmark Quality test
Evaluation Speed benchmark Quality test
Introduction Contributions Experiments
Evaluation Speed benchmark Quality test
The Evaluation
We’ve trained SOM networks on various datasets with same parameters 5 times each test then measured their performance (speed and quality). Our speed baseline Somoclu (Wittek et al. 2017) is a massively parallel batch SOM implementation, which uses the classical algorithm.
Each network : 30 × 40 units grid Rectangular lattice tmax = 10 × Nsamples / Kepochs = 10 α(t) = 1 − (t/tmax ) Gaussian neighborhood, with a radius decreasing linearly from 15 to 0.5
Introduction Contributions Experiments
Evaluation Speed benchmark Quality test
Speed Benchmark 4 datasets used (2 sparse and 2 dense), to test both: Serial mode (Sparse-Som vs. Sparse-BSom) Parallel mode (Sparse-BSom vs. Somoclu) Hardware and system specifications: Intel Xeon E5-4610 4 sockets of 6 cores each cadenced at 2.4 GHz 2 threads / core
Linux Ubuntu 16.04 (64 bits) GCC 5.4
Evaluation Speed benchmark Quality test
Introduction Contributions Experiments
Results: Serial Performance 302
elapsed time (seconds)
300
Sparse-Som Sparse-BSom
250 203
200
144
150 100 50 0
123
94 30
50
usps
34 mnist
sector
44 news20
36 rcv1
Evaluation Speed benchmark Quality test
Introduction Contributions Experiments
Results: Parallel Performance Somoclu
Sparse-BSom sector news20 mnist usps
elapsed time (seconds)
104
103
102 sector news20 mnist usps
101 1
2
4
8
number of CPU cores
16
32
1
2
4
8
number of CPU cores
16
32
Evaluation Speed benchmark Quality test
Introduction Contributions Experiments
Quality Evaluation: Methodology
Metrics used: 1
Average Quantization Error PN
Q=
i=1 kxi
− wc k
N
2
Topological Error
3
Precision and Recall in classification
(8)
Introduction Contributions Experiments
Evaluation Speed benchmark Quality test
Results: Average Quantization Error
rcv1 news20 sector mnist usps protein dna satimage letter
Sparse-Som and Sparse-BSom run much faster than their classical “dense” counterparts with sparse data. Advantages of each version: Sparse-Som maps seem to have a better organization
Sparse-BSom highly parallelizable more memory efficient (single-precision)