Overall Ranks. Table : Average Algorithm Ranks: Ï10. Algorithm g(net). Iris Glass Heart Diabetes. Average Rank. GBest PSO Sigmoid. 6.5 5.5. 6.5. 6.5. 6.25.
Measuring Saturation in Neural Networks Anna Rakitianskaia, Andries Engelbrecht Computational Intelligence Research Group (CIRG) Department of Computer Science University of Pretoria South Africa http://cirg.cs.up.ac.za
SSCI 2015
Outline
1 FFNNs 2 Problem of saturation 3 Measuring saturation 4 Proposed saturation measure 5 Empirical Study 6 Conclusions 7 Future Work
Feed Forward Neural Network a.k.a. FFNN
input hidden output v11 w11 - z1 - y1 - o1 A@ A@ A@ A@ A A @ @ R @ R @ y2 o2 - z2 A A @ @ .. . . A A . @ A .. @ A .. U y @ A R @ RAU oK @ - zI @ J bias
bias
Figure : A real NN and a FFNN with a single hidden layer
Neuron Saturation Why is saturation bad?
2 1.5 1
f(n)
0.5 0 -0.5 -1
Sigmoid TanH LeCun tanH Elliott
-1.5 -2 -4
-2
0 n
Figure : Artificial neuron and activation functions
2
4
Measuring saturation Net input signal
2 1.5 1 0.5 f(n)
The easiest way to measure saturation is to calculate the average of the absolute values of the incoming signals:
0 -0.5 -1
PP PH ςh =
i=1
j=1
PH
|nij |
Sigmoid TanH LeCun tanH Elliott
-1.5 -2 -4
-2
0 n
Disadvantages • Net input signal ςh is unscaled: only NNs employing the same activation function can be compared • Net input signal ςh is unbounded
2
4
Measuring saturation
Saturation in the output signal
• Complete saturation: frequency of zero in all bins except the leftmost and the rightmost one • No saturation: frequencies of similar magnitude across all the bins
4000 2000 0
• Higher frequency in the leftmost and the rightmost bin ⇒ higher saturation
Frequency
• Saturation can be approximated by observing the neuron output frequencies
−1.0
−0.5
0.0 g(net)
0.5
1.0
Proposed Saturation Measure Derivation • The average output signal value for each bin b is: ( P fb if fb > 0 k =1 g(net)k /fb ¯b = g 0 otherwise
(1)
¯b can be scaled to the [−1, 1] range: • For any g ∈ [gL , gU ], g ¯b0 = g
¯b −gL ) 2(g gU −gL
−1
(2)
• A weighted mean magnitude is then calculated as: ϕB =
PB ¯0 b=1 |gb |fb P B b=1 fb
(3)
Proposed Saturation Measure
PB
b=1
¯b0 |fb |g
PB
b=1 fb
¯ 0 is saturated, then ϕB > 0.5 • If g • ϕB tends to 1 as the degree of saturation increases, and tends to zero otherwise.
4000 2000
¯ 0 is normally distributed, then • If g ϕB < 0.5
ϕ σ > 0.5 ϕ σ = 0.5 ϕ σ < 0.5
0
¯ 0 is uniformly distributed, then • If g ϕB ≈ 0.5
Frequency
6000
ϕB =
−1.0
−0.6
−0.2 g(net)
0.2
0.6
1.0
Experimental Setup • 4 benchmarks with known optimal NN architectures • Iris, Glass Identification, Heart, Diabetes • Feed-forward NNs with a single hidden layer employing
four different activation functions: • sigmoid, hyperbolic tangent (tanH), modified hyperbolic
tangent (LeCun tanH), Elliott 2 1.5 1
f(n)
0.5 0 -0.5 -1
Sigmoid TanH LeCun tanH Elliott
-1.5 -2 -4
-2
0 n
2
4
Experimental Setup Training algorithm • Particle Swarm Optimisation (PSO) algorithm was used • Two neighbourhood topologies were used: global best (GBest) and Von Neumann (VN)
Iris Data Set Least saturation: Most saturation:
According to ςh : LeCun TanH, VN Elliott
According to ϕ: LeCun TanH, gBest Sigmoid
Table : Average EC , ςh , and ϕ values for the Iris data set, with corresponding standard deviation in parenthesis g(net) Alg.: EC ςh ϕ5 ϕ10 ϕ20 ϕ30
gBest
Sigmoid VN
0.0422 (0.0315) 9.8375 (4.1496) 0.88 (0.0887) 0.8825 (0.0864) 0.8825 (0.0864) 0.8825 (0.0864)
0.0322 (0.0321) 8.2117 (2.3794) 0.9065 (0.0472) 0.9078 (0.0459) 0.9078 (0.0459) 0.9078 (0.0459)
gBest
TanH VN
0.0411 (0.0358) 5.5728 (1.4583) 0.8945 (0.0595) 0.8964 (0.0585) 0.8964 (0.0585) 0.8964 (0.0585)
0.0289 (0.0324) 5.1136 (1.5274) 0.9016 (0.049) 0.9039 (0.0464) 0.9039 (0.0464) 0.9039 (0.0464)
LeCun TanH gBest VN
gBest
0.0467 (0.0416) 4.0721 (1.797) 0.7367 (0.1004) 0.7414 (0.0968) 0.7414 (0.0968) 0.7421 (0.0971)
0.0444 (0.0343) 11.4278 (4.3423) 0.8158 (0.0501) 0.8174 (0.0491) 0.8174 (0.0491) 0.8174 (0.0491)
0.0367 (0.0354) 3.6347 (1.2032) 0.7631 (0.0951) 0.768 (0.0915) 0.768 (0.0915) 0.7681 (0.0916)
Elliott VN 0.04 (0.0355) 9.4702 (1.9462) 0.8138 (0.0337) 0.8151 (0.0332) 0.8151 (0.0332) 0.8151 (0.0332)
4000
Frequency
0
2000
6000 2000 0
Frequency
6000
Iris Data Set
0.0
0.2
0.4
0.6
0.8
1.0
−1.0
g(net)
−0.5
0.0
0.5
1.0
g(net)
(a) Sigmoid, VN PSO
(b) Elliott, GBest PSO 10
4000
Count
GBest 2000
Value
algorithm
3000
1
VN Sigmoid, φ TanH, φ LeCun TanH, φ Elliott, φ Sigmoid, σ TanH, σ LeCun TanH, σ Elliott, σ
1000
0 −2
−1
0
1
2
Hidden layer output
(c) LeCun TanH
0.1 10
100 Iterations
(d) ϕ10 and ςh profiles
1000
6000
Frequency
0
2000
6000 2000 0
Frequency
Glass Data Set
−2
−1
0
1
2
−2
−1
g(net)
0
1
2
g(net)
Table : Average EC , ςh , and ϕ values for the Glass data set g(net) Alg.: EC ςh ϕ5 ϕ10 ϕ20 ϕ30
gBest
Sigmoid VN
0.438 (0.0822) 3.4534 (0.7454) 0.7205 (0.059) 0.729 (0.0568) 0.729 (0.0568) 0.729 (0.0568)
0.4884 (0.0728) 2.8163 (0.4972) 0.6841 (0.0644) 0.693 (0.0613) 0.693 (0.0613) 0.693 (0.0613)
gBest
TanH VN
0.4372 (0.0764) 2.2161 (0.4478) 0.741 (0.0543) 0.7483 (0.052) 0.7483 (0.052) 0.7483 (0.052)
0.455 (0.0648) 1.9336 (0.4274) 0.7009 (0.0728) 0.71 (0.0694) 0.71 (0.0694) 0.71 (0.0694)
LeCun TanH gBest VN 0.4744 (0.0657) 1.3895 (0.2983) 0.4901 (0.0548) 0.508 (0.0507) 0.508 (0.0507) 0.5082 (0.0507)
0.4891 (0.0653) 1.3452 (0.351) 0.4769 (0.0581) 0.4968 (0.0538) 0.4968 (0.0538) 0.497 (0.0539)
gBest
Elliott VN
0.4357 (0.078) 3.8907 (0.8526) 0.656 (0.0454) 0.6614 (0.0436) 0.6614 (0.0436) 0.6614 (0.0436)
0.4535 (0.0709) 3.3354 (0.4817) 0.6357 (0.04) 0.6417 (0.0381) 0.6417 (0.0381) 0.6417 (0.0381)
Overall Ranks
Table : Average Algorithm Ranks: ϕ10 Algorithm
g(net)
GBest PSO Sigmoid
Average Rank
6.5 5.5
6.5
6.5
6.25
6.5 5.5
6.5
6.5
6.25
LeCun TanH 1.5 1.5
3.5
3.5
2.5
Elliott
3.5 5.5
2
3.5
3.625
Sigmoid
6.5 5.5
6.5
6.5
6.25
TanH
6.5 5.5
6.5
6.5
6.25
LeCun TanH 1.5 1.5
3.5
1.5
2
Elliott
1
1.5
2.875
TanH
VN PSO
Iris Glass Heart Diabetes
3.5 5.5
• LeCun TanH: least saturation • VN PSO: saturated less than GBest PSO
) 4.65625
) 4.34375
Conclusions • Simple bounded single-valued saturation measure for NNs based on activation function outputs was proposed • Applicable to all bounded activation functions • Independent of the activation function output range • Allows direct statistical comparisons between NNs employing different activation functions • LeCun TanH saturated less than other activation functions considered • VN PSO neighbourhood saturated less than GBest PSO neighbourhood
Future Work
• Explore the relationship between saturation and training algorithm performance • Use saturation measure as a NN learning guide • Explore means of controlling saturation
Thank You
Questions / Comments?