Efficient Implementation of Self-Organizing Map for ...

Efficient Implementation of Self-Organizing Map for Sparse Input Data Josué Melka and Jean-Jacques Mariage

Introduction Contributions Experiments

1

Introduction SOM Description Standard vs. Batch Versions Motivation for Sparse SOM

2

Contributions

3

Experiments

SOM Description Standard vs. Batch Versions Motivation for Sparse SOM



Presentation: The SOM Network

Self-Organizing Map (Kohonen 1982): an artificial neural network trained by unsupervised competitive learning produces a low-dimensional map of the input space Many applications Commonly used for data projection, clustering, etc.



The SOM Map The map: units (= nodes/neurons) on a lattice. Associated with each node: a weight vector a position in the map

http://www.lohninger.com



Example: MNIST Handwritten Dataset

SOM map (12x16 units)

Random sampling



The SOM Training Algorithm

Within the main loop: 1

compute the distance between input and weight vectors dk (t) = kx(t) − wk (t)k2

2

find the node weight closest to the input (BMU) dc (t) = min d(t) k

3

(1)

(2)

update the BMU and its neighbors to be closer to the input



Standard Algorithm: The Learning Rule Update weight vectors at each time step, for a random sample x: wk (t + 1) = wk (t) + α(t)hck (t) [x(t) − wk (t)]

(3)

α(t) is the decreasing learning rate hck (t) is the neighborhood function For example, the Gaussian: krk − rc k2 hck (t) = exp − 2σ(t)2

!

Gaussian neighborhood



The Batch Algorithm

The BMUs and their neighbors are updated once at the end of each epoch, with the average of all the samples that trigger them: Ptf

wk (tf ) =

t0

hck (t 0 )x(t 0 ) 0 t0 hck (t )

Ptf

(4)



Motivation: Reduce the Computing Time

Computing time depends on: 1

the T training iterations

2

the M nodes in the network

3

the D dimensions of the vectors

The issue: large sparse datasets Many real-world datasets are sparse and high-dimensional. But existing SOMs can’t exploit sparseness to save time.



Datasets Examples: Dense vs. Sparse

MNIST dataset

News-20 dataset

(dim: 780, density: 19.22%)

(dim: 62061, density: 0.13%)



Overcoming the Dimensionality Problem

A popular option: reduce the model space Use less dimension-sensitive space reduction techniques (such as SVD, Random-Mapping, etc.). But is this the only way ? Let’s define f is the fraction of non-zero values, and d = D × f . Can we reduce the SOM complexity from O(TMD) to O(TMd) ?


1

Introduction

2

Contributions Sparse Algorithms Parallelism

3

Experiments

Sparse Algorithms Parallelism



Compressed Sparse Rows Format

https://op2.github.io/PyOP2/linear_algebra.html

Sufficient to reduce the computing time ? CSR supports efficient linear algebra operations. But: nodes weights must be dense for training not all operations produce sparse output



Speedup the BMU Search Euclidean distance (1) is equivalent to: dk (t) = kx(t)k2 − 2(wk (t) · x(t)) + kwk (t)k2

(5)

Batch version This change suffices to make sparse Batch SOM efficient, since kwk k2 can be computed once for each epoch. Standard version By storing kw(t)k2 , we can compute kw(t + 1)k2 efficiently. kwk (t + 1)k2 = (1 − β(t))2 kwk (t)k2 + β(t)2 kx(t)k2 + 2β(t)(1 − β(t))(wk (t) · x(t))

(6)



Sparse-Som: Speedup the Update Phase We express the learning rule (3) as (Natarajan 1997):

wk (t + 1) = (1 − β(t)) wk (t) +

β(t) x(t) 1 − β(t)

Don’t update entire weight vectors We keep the scalar coefficient separately, so we update only the values affected by x(t). Numerical stability To avoid numerical stability issues, we use double-precision floating-point, and rescale the weights when needed.

(7)



Parallel Approaches for SOM Specialized hardware (historical) Massively parallel computing Shared-memory multiprocessing

http://www.nersc.gov/users/computational-systems/cori



How to Split the Computation

Network partitioning divides the neurons

Data partitioning dispatches the input data



What We Did Sparse-Som: hard to parallelize

Sparse-BSom: much simpler

not adapted to data partitioning

adapted both to data and network partitioning

too much latency with network partitioning

less synchronization needed

Another specific issue due to sparseness Memory-access latency, because the non-linear access pattern to weight vectors. Mitigation: improve the processor cache locality Access to the weight-vectors in the inner-loop.


1

Introduction

2

Contributions

3

Experiments Evaluation Speed benchmark Quality test

Evaluation Speed benchmark Quality test



The Evaluation

We’ve trained SOM networks on various datasets with same parameters 5 times each test then measured their performance (speed and quality). Our speed baseline Somoclu (Wittek et al. 2017) is a massively parallel batch SOM implementation, which uses the classical algorithm.



Datasets Characteristics classes

features

rcv1

53

47236

news20

20

62061

sector

105

55197

mnist

10

780

usps

10

256

letter

26

16

protein

3

357

dna

3

180

satimage

6

36

samples 15564 518571 15933 3993 6412 3207 60000 10000 7291 2007 15000 5000 17766 6621 2000 1186 4435 2000

density 0.14 0.14 0.13 0.13 0.29 0.30 19.22 19.37 100.00 100.00 100.00 100.00 29.00 26.06 25.34 25.14 98.99 98.96



Training Parameters

Each network : 30 × 40 units grid Rectangular lattice tmax = 10 × Nsamples / Kepochs = 10 α(t) = 1 − (t/tmax ) Gaussian neighborhood, with a radius decreasing linearly from 15 to 0.5



Speed Benchmark 4 datasets used (2 sparse and 2 dense), to test both: Serial mode (Sparse-Som vs. Sparse-BSom) Parallel mode (Sparse-BSom vs. Somoclu) Hardware and system specifications: Intel Xeon E5-4610 4 sockets of 6 cores each cadenced at 2.4 GHz 2 threads / core

Linux Ubuntu 16.04 (64 bits) GCC 5.4



Results: Serial Performance 302

elapsed time (seconds)

300

Sparse-Som Sparse-BSom

250 203

200

144

150 100 50 0

123

94 30

50

usps

34 mnist

sector

44 news20

36 rcv1



Results: Parallel Performance Somoclu

Sparse-BSom sector news20 mnist usps

elapsed time (seconds)

104

103

102 sector news20 mnist usps

101 1

2

4

8

number of CPU cores

16

32

1

2

4

8

number of CPU cores

16

32



Quality Evaluation: Methodology

Metrics used: 1

Average Quantization Error PN

Q=

i=1 kxi

− wc k

N

2

Topological Error

3

Precision and Recall in classification

(8)



Results: Average Quantization Error

rcv1 news20 sector mnist usps protein dna satimage letter

Sparse-Som 0.825 ± 0.001 0.905 ± 0.000 0.814 ± 0.001 4.400 ± 0.001 3.333 ± 0.002 2.451 ± 0.000 4.452 ± 0.006 0.439 ± 0.001 0.357 ± 0.001

Sparse-BSom 0.816 ± 0.001 0.901 ± 0.001 0.772 ± 0.003 4.500 ± 0.008 3.086 ± 0.006 2.450 ± 0.001 3.267 ± 0.042 0.377 ± 0.001 0.345 ± 0.002



Results: Topological Error


Sparse-Som 0.248 ± 0.007 0.456 ± 0.020 0.212 ± 0.012 0.369 ± 0.005 0.150 ± 0.006 0.505 ± 0.006 0.099 ± 0.004 0.103 ± 0.005 0.160 ± 0.003

Sparse-BSom 0.353 ± 0.010 0.604 ± 0.014 0.514 ± 0.017 0.268 ± 0.003 0.281 ± 0.011 0.448 ± 0.007 0.278 ± 0.023 0.239 ± 0.015 0.269 ± 0.008



Results: Prediction Evaluation


Sparse-Som precision recall 79.2 ± 0.5 79.3 ± 0.6 73.7 ± 0.4 70.6 ± 0.5 64.2 ± 0.5 62.8 ± 0.5 60.0 ± 1.7 55.4 ± 1.3 77.2 ± 0.9 73.2 ± 0.9 73.3 ± 0.8 61.3 ± 1.8 93.5 ± 0.2 93.5 ± 0.2 93.4 ± 0.2 93.4 ± 0.2 95.9 ± 0.2 95.9 ± 0.2 91.4 ± 0.3 90.7 ± 0.3 56.7 ± 0.2 57.5 ± 0.2 49.8 ± 0.7 51.2 ± 0.6 90.9 ± 0.6 90.8 ± 0.5 77.7 ± 1.5 69.6 ± 2.1 92.3 ± 0.4 92.4 ± 0.3 87.6 ± 0.3 85.4 ± 0.4 83.8 ± 0.3 83.7 ± 0.3 81.5 ± 0.5 81.1 ± 0.5

Sparse-BSom precision recall 81.3 ± 0.4 82.1 ± 0.3 76.6 ± 0.4 72.6 ± 0.5 50.3 ± 0.9 49.6 ± 0.8 47.8 ± 1.2 43.6 ± 1.2 58.4 ± 0.5 56.0 ± 1.0 60.9 ± 1.3 44.8 ± 1.0 91.5 ± 0.2 91.5 ± 0.2 91.7 ± 0.2 91.7 ± 0.2 95.6 ± 0.2 95.6 ± 0.2 92.4 ± 0.5 91.5 ± 0.4 56.7 ± 0.4 57.6 ± 0.3 50.7 ± 0.7 52.1 ± 0.6 88.5 ± 0.6 88.5 ± 0.5 81.9 ± 2.9 30.3 ± 1.7 92.5 ± 0.4 92.6 ± 0.4 88.7 ± 0.5 86.3 ± 0.5 81.9 ± 0.3 81.7 ± 0.4 80.2 ± 0.3 79.8 ± 0.5


Summary: Main Benefits

Sparse-Som and Sparse-BSom run much faster than their classical “dense” counterparts with sparse data. Advantages of each version: Sparse-Som maps seem to have a better organization

Sparse-BSom highly parallelizable more memory efficient (single-precision)


Thank you!

Efficient Implementation of Self-Organizing Map for ...

Efficient Implementation of Self-Organizing Map for ...

Suggest Documents

Efficient Implementation of Self-Organizing Map for Sparse Input Data

Efficient MAP-algorithm implementation on programmable architectures

Map Management for Efficient Simultaneous

recursive algorithm for efficient map decoding of

IMPLEMENTATION OF NEW SLANT FOR EFFICIENT POWER ...

IMPLEMENTATION OF NEW SLANT FOR EFFICIENT POWER ...

Efficient implementation of inverse approach for ... - CiteSeerX

Efficient methodology for implementation of ... - Semantic Scholar

Efficient Implementation of Scalar Multiplication for ...

Efficient implementation of micro-algae for ...

Implementation of Efficient Multiplier for High Speed

EFFICIENT IMPLEMENTATION OF ECHO CANCELLER FOR ...

Efficient implementation of inverse approach for ... - CiteSeerX

Area-Efficient Scalable MAP Processor Design for

Efficient Implementation of Karatsuba Algorithm

Implementation of highly efficient Authentication

Memory-Efficient Implementation of DenseNets

EFFICIENT IMPLEMENTATION OF ... - Semantic Scholar

EFFICIENT IMPLEMENTATION OF MORPHOLOGICAL ... - UMBC

FPGA Implementation Technology for Memory Efficient VLSI

Customizing Neural Networks for Efficient FPGA Implementation

Efficient Implementation Techniques for Topological Predicates on ...

Efficient Implementation for Particle Tracing in ...

A Design Methodology for Efficient Implementation