Measuring Saturation in Neural Networks

Measuring Saturation in Neural Networks Anna Rakitianskaia, Andries Engelbrecht Computational Intelligence Research Group (CIRG) Department of Computer Science University of Pretoria South Africa http://cirg.cs.up.ac.za

SSCI 2015

Outline

1 FFNNs 2 Problem of saturation 3 Measuring saturation 4 Proposed saturation measure 5 Empirical Study 6 Conclusions 7 Future Work

Feed Forward Neural Network a.k.a. FFNN

input hidden output v11 w11 - z1 - y1 - o1 A@ A@ A@ A@ A A @ @ R @ R @ y2 o2 - z2 A A @ @ .. . . A A . @ A .. @ A .. U y @ A R @ RAU oK @ - zI @ J bias

bias

Figure : A real NN and a FFNN with a single hidden layer

Neuron Saturation Why is saturation bad?

2 1.5 1

f(n)

0.5 0 -0.5 -1

Sigmoid TanH LeCun tanH Elliott

-1.5 -2 -4

-2

0 n

Figure : Artificial neuron and activation functions

2

4

Measuring saturation Net input signal

2 1.5 1 0.5 f(n)

The easiest way to measure saturation is to calculate the average of the absolute values of the incoming signals:

0 -0.5 -1

PP PH ςh =

i=1

j=1

PH

|nij |


-1.5 -2 -4

-2

0 n

Disadvantages • Net input signal ςh is unscaled: only NNs employing the same activation function can be compared • Net input signal ςh is unbounded

2

4

Measuring saturation

Saturation in the output signal

• Complete saturation: frequency of zero in all bins except the leftmost and the rightmost one • No saturation: frequencies of similar magnitude across all the bins

4000 2000 0

• Higher frequency in the leftmost and the rightmost bin ⇒ higher saturation

Frequency

• Saturation can be approximated by observing the neuron output frequencies

−1.0

−0.5

0.0 g(net)

0.5

1.0

Proposed Saturation Measure Derivation • The average output signal value for each bin b is: ( P fb if fb > 0 k =1 g(net)k /fb ¯b = g 0 otherwise

(1)

¯b can be scaled to the [−1, 1] range: • For any g ∈ [gL , gU ], g ¯b0 = g

¯b −gL ) 2(g gU −gL

−1

(2)

• A weighted mean magnitude is then calculated as: ϕB =

PB ¯0 b=1 |gb |fb P B b=1 fb

(3)

Proposed Saturation Measure

PB

b=1

¯b0 |fb |g

PB

b=1 fb

¯ 0 is saturated, then ϕB > 0.5 • If g • ϕB tends to 1 as the degree of saturation increases, and tends to zero otherwise.

4000 2000

¯ 0 is normally distributed, then • If g ϕB < 0.5

ϕ σ > 0.5 ϕ σ = 0.5 ϕ σ < 0.5

0

¯ 0 is uniformly distributed, then • If g ϕB ≈ 0.5

Frequency

6000

ϕB =

−1.0

−0.6

−0.2 g(net)

0.2

0.6

1.0

Experimental Setup • 4 benchmarks with known optimal NN architectures • Iris, Glass Identification, Heart, Diabetes • Feed-forward NNs with a single hidden layer employing

four different activation functions: • sigmoid, hyperbolic tangent (tanH), modified hyperbolic

tangent (LeCun tanH), Elliott 2 1.5 1

f(n)

0.5 0 -0.5 -1


-1.5 -2 -4

-2

0 n

2

4

Experimental Setup Training algorithm • Particle Swarm Optimisation (PSO) algorithm was used • Two neighbourhood topologies were used: global best (GBest) and Von Neumann (VN)

Iris Data Set Least saturation: Most saturation:

According to ςh : LeCun TanH, VN Elliott

According to ϕ: LeCun TanH, gBest Sigmoid

Table : Average EC , ςh , and ϕ values for the Iris data set, with corresponding standard deviation in parenthesis g(net) Alg.: EC ςh ϕ5 ϕ10 ϕ20 ϕ30

gBest

Sigmoid VN

0.0422 (0.0315) 9.8375 (4.1496) 0.88 (0.0887) 0.8825 (0.0864) 0.8825 (0.0864) 0.8825 (0.0864)

0.0322 (0.0321) 8.2117 (2.3794) 0.9065 (0.0472) 0.9078 (0.0459) 0.9078 (0.0459) 0.9078 (0.0459)

gBest

TanH VN

0.0411 (0.0358) 5.5728 (1.4583) 0.8945 (0.0595) 0.8964 (0.0585) 0.8964 (0.0585) 0.8964 (0.0585)

0.0289 (0.0324) 5.1136 (1.5274) 0.9016 (0.049) 0.9039 (0.0464) 0.9039 (0.0464) 0.9039 (0.0464)

LeCun TanH gBest VN

gBest

0.0467 (0.0416) 4.0721 (1.797) 0.7367 (0.1004) 0.7414 (0.0968) 0.7414 (0.0968) 0.7421 (0.0971)

0.0444 (0.0343) 11.4278 (4.3423) 0.8158 (0.0501) 0.8174 (0.0491) 0.8174 (0.0491) 0.8174 (0.0491)

0.0367 (0.0354) 3.6347 (1.2032) 0.7631 (0.0951) 0.768 (0.0915) 0.768 (0.0915) 0.7681 (0.0916)

Elliott VN 0.04 (0.0355) 9.4702 (1.9462) 0.8138 (0.0337) 0.8151 (0.0332) 0.8151 (0.0332) 0.8151 (0.0332)

4000

Frequency

0

2000

6000 2000 0

Frequency

6000

Iris Data Set

0.0

0.2

0.4

0.6

0.8

1.0

−1.0

g(net)

−0.5

0.0

0.5

1.0

g(net)

(a) Sigmoid, VN PSO

(b) Elliott, GBest PSO 10

4000

Count

GBest 2000

Value

algorithm

3000

1

VN Sigmoid, φ TanH, φ LeCun TanH, φ Elliott, φ Sigmoid, σ TanH, σ LeCun TanH, σ Elliott, σ

1000

0 −2

−1

0

1

2

Hidden layer output

(c) LeCun TanH

0.1 10

100 Iterations

(d) ϕ10 and ςh profiles

1000

6000

Frequency

0

2000

6000 2000 0

Frequency

Glass Data Set

−2

−1

0

1

2

−2

−1

g(net)

0

1

2

g(net)

Table : Average EC , ςh , and ϕ values for the Glass data set g(net) Alg.: EC ςh ϕ5 ϕ10 ϕ20 ϕ30

gBest

Sigmoid VN

0.438 (0.0822) 3.4534 (0.7454) 0.7205 (0.059) 0.729 (0.0568) 0.729 (0.0568) 0.729 (0.0568)

0.4884 (0.0728) 2.8163 (0.4972) 0.6841 (0.0644) 0.693 (0.0613) 0.693 (0.0613) 0.693 (0.0613)

gBest

TanH VN

0.4372 (0.0764) 2.2161 (0.4478) 0.741 (0.0543) 0.7483 (0.052) 0.7483 (0.052) 0.7483 (0.052)

0.455 (0.0648) 1.9336 (0.4274) 0.7009 (0.0728) 0.71 (0.0694) 0.71 (0.0694) 0.71 (0.0694)

LeCun TanH gBest VN 0.4744 (0.0657) 1.3895 (0.2983) 0.4901 (0.0548) 0.508 (0.0507) 0.508 (0.0507) 0.5082 (0.0507)

0.4891 (0.0653) 1.3452 (0.351) 0.4769 (0.0581) 0.4968 (0.0538) 0.4968 (0.0538) 0.497 (0.0539)

gBest

Elliott VN

0.4357 (0.078) 3.8907 (0.8526) 0.656 (0.0454) 0.6614 (0.0436) 0.6614 (0.0436) 0.6614 (0.0436)

0.4535 (0.0709) 3.3354 (0.4817) 0.6357 (0.04) 0.6417 (0.0381) 0.6417 (0.0381) 0.6417 (0.0381)

Overall Ranks

Table : Average Algorithm Ranks: ϕ10 Algorithm

g(net)

GBest PSO Sigmoid

Average Rank

6.5 5.5

6.5

6.5

6.25

6.5 5.5

6.5

6.5

6.25

LeCun TanH 1.5 1.5

3.5

3.5

2.5

Elliott

3.5 5.5

2

3.5

3.625

Sigmoid

6.5 5.5

6.5

6.5

6.25

TanH

6.5 5.5

6.5

6.5

6.25

LeCun TanH 1.5 1.5

3.5

1.5

2

Elliott

1

1.5

2.875

TanH

VN PSO

Iris Glass Heart Diabetes

3.5 5.5

• LeCun TanH: least saturation • VN PSO: saturated less than GBest PSO

) 4.65625

) 4.34375

Conclusions • Simple bounded single-valued saturation measure for NNs based on activation function outputs was proposed • Applicable to all bounded activation functions • Independent of the activation function output range • Allows direct statistical comparisons between NNs employing different activation functions • LeCun TanH saturated less than other activation functions considered • VN PSO neighbourhood saturated less than GBest PSO neighbourhood

Future Work

• Explore the relationship between saturation and training algorithm performance • Use saturation measure as a NN learning guide • Explore means of controlling saturation

Thank You

Questions / Comments?

Measuring Saturation in Neural Networks

Measuring Saturation in Neural Networks

Suggest Documents

measuring software complexity using neural networks - DergiPark

Neural networks in astronomy

Neural Networks in Chemistry

Measuring of Oxygen Saturation Using Pulse

Neural networks

Measuring of Oxygen Saturation Using Pulse

Measuring efficiency with neural networks. An application ... - CiteSeerX

Neural Networks

Neural Networks Based Colour Measuring for Process Monitoring and ...

Neural Networks

Neural Networks in Games - CiteSeerX

Neural Networks in Business Forecasting

Quantum Effects in Neural Networks

Universality in neural networks - Infoscience

Neural networks in intestinal immunoregulation

Desynchronization in diluted neural networks

Convolutional Neural Networks In Convolution

Immune networks: multi-tasking capabilities near saturation

CS536: Machine Learning Artificial Neural Networks Neural Networks

Quantized Neural Networks: Training Neural Networks with Low ...

Neural Networks : Basics

Fundamentals of Neural Networks

Binarized Neural Networks - arXiv

Deep Neural Networks