Measuring Saturation in Neural Networks

29 downloads 0 Views 2MB Size Report
Overall Ranks. Table : Average Algorithm Ranks: ϕ10. Algorithm g(net). Iris Glass Heart Diabetes. Average Rank. GBest PSO Sigmoid. 6.5 5.5. 6.5. 6.5. 6.25.
Measuring Saturation in Neural Networks Anna Rakitianskaia, Andries Engelbrecht Computational Intelligence Research Group (CIRG) Department of Computer Science University of Pretoria South Africa http://cirg.cs.up.ac.za

SSCI 2015

Outline

1 FFNNs 2 Problem of saturation 3 Measuring saturation 4 Proposed saturation measure 5 Empirical Study 6 Conclusions 7 Future Work

Feed Forward Neural Network a.k.a. FFNN

input hidden output  v11  w11  - z1 - y1 - o1  A@  A@    A@  A@        A  A @ @ R @ R @  y2  o2 - z2 A A       @ @       .. . .  A  A . @ A .. @ A ..    U y  @ A R @ RAU oK @ - zI   @ J            bias

bias

Figure : A real NN and a FFNN with a single hidden layer

Neuron Saturation Why is saturation bad?

2 1.5 1

f(n)

0.5 0 -0.5 -1

Sigmoid TanH LeCun tanH Elliott

-1.5 -2 -4

-2

0 n

Figure : Artificial neuron and activation functions

2

4

Measuring saturation Net input signal

2 1.5 1 0.5 f(n)

The easiest way to measure saturation is to calculate the average of the absolute values of the incoming signals:

0 -0.5 -1

PP PH ςh =

i=1

j=1

PH

|nij |

Sigmoid TanH LeCun tanH Elliott

-1.5 -2 -4

-2

0 n

Disadvantages • Net input signal ςh is unscaled: only NNs employing the same activation function can be compared • Net input signal ςh is unbounded

2

4

Measuring saturation

Saturation in the output signal

• Complete saturation: frequency of zero in all bins except the leftmost and the rightmost one • No saturation: frequencies of similar magnitude across all the bins

4000 2000 0

• Higher frequency in the leftmost and the rightmost bin ⇒ higher saturation

Frequency

• Saturation can be approximated by observing the neuron output frequencies

−1.0

−0.5

0.0 g(net)

0.5

1.0

Proposed Saturation Measure Derivation • The average output signal value for each bin b is: ( P  fb if fb > 0 k =1 g(net)k /fb ¯b = g 0 otherwise

(1)

¯b can be scaled to the [−1, 1] range: • For any g ∈ [gL , gU ], g ¯b0 = g

¯b −gL ) 2(g gU −gL

−1

(2)

• A weighted mean magnitude is then calculated as: ϕB =

PB ¯0 b=1 |gb |fb P B b=1 fb

(3)

Proposed Saturation Measure

PB

b=1

¯b0 |fb |g

PB

b=1 fb

¯ 0 is saturated, then ϕB > 0.5 • If g • ϕB tends to 1 as the degree of saturation increases, and tends to zero otherwise.

4000 2000

¯ 0 is normally distributed, then • If g ϕB < 0.5

ϕ σ > 0.5 ϕ σ = 0.5 ϕ σ < 0.5

0

¯ 0 is uniformly distributed, then • If g ϕB ≈ 0.5

Frequency

6000

ϕB =

−1.0

−0.6

−0.2 g(net)

0.2

0.6

1.0

Experimental Setup • 4 benchmarks with known optimal NN architectures • Iris, Glass Identification, Heart, Diabetes • Feed-forward NNs with a single hidden layer employing

four different activation functions: • sigmoid, hyperbolic tangent (tanH), modified hyperbolic

tangent (LeCun tanH), Elliott 2 1.5 1

f(n)

0.5 0 -0.5 -1

Sigmoid TanH LeCun tanH Elliott

-1.5 -2 -4

-2

0 n

2

4

Experimental Setup Training algorithm • Particle Swarm Optimisation (PSO) algorithm was used • Two neighbourhood topologies were used: global best (GBest) and Von Neumann (VN)

Iris Data Set Least saturation: Most saturation:

According to ςh : LeCun TanH, VN Elliott

According to ϕ: LeCun TanH, gBest Sigmoid

Table : Average EC , ςh , and ϕ values for the Iris data set, with corresponding standard deviation in parenthesis g(net) Alg.: EC ςh ϕ5 ϕ10 ϕ20 ϕ30

gBest

Sigmoid VN

0.0422 (0.0315) 9.8375 (4.1496) 0.88 (0.0887) 0.8825 (0.0864) 0.8825 (0.0864) 0.8825 (0.0864)

0.0322 (0.0321) 8.2117 (2.3794) 0.9065 (0.0472) 0.9078 (0.0459) 0.9078 (0.0459) 0.9078 (0.0459)

gBest

TanH VN

0.0411 (0.0358) 5.5728 (1.4583) 0.8945 (0.0595) 0.8964 (0.0585) 0.8964 (0.0585) 0.8964 (0.0585)

0.0289 (0.0324) 5.1136 (1.5274) 0.9016 (0.049) 0.9039 (0.0464) 0.9039 (0.0464) 0.9039 (0.0464)

LeCun TanH gBest VN

gBest

0.0467 (0.0416) 4.0721 (1.797) 0.7367 (0.1004) 0.7414 (0.0968) 0.7414 (0.0968) 0.7421 (0.0971)

0.0444 (0.0343) 11.4278 (4.3423) 0.8158 (0.0501) 0.8174 (0.0491) 0.8174 (0.0491) 0.8174 (0.0491)

0.0367 (0.0354) 3.6347 (1.2032) 0.7631 (0.0951) 0.768 (0.0915) 0.768 (0.0915) 0.7681 (0.0916)

Elliott VN 0.04 (0.0355) 9.4702 (1.9462) 0.8138 (0.0337) 0.8151 (0.0332) 0.8151 (0.0332) 0.8151 (0.0332)

4000

Frequency

0

2000

6000 2000 0

Frequency

6000

Iris Data Set

0.0

0.2

0.4

0.6

0.8

1.0

−1.0

g(net)

−0.5

0.0

0.5

1.0

g(net)

(a) Sigmoid, VN PSO

(b) Elliott, GBest PSO 10

4000

Count

GBest 2000

Value

algorithm

3000

1

VN Sigmoid, φ TanH, φ LeCun TanH, φ Elliott, φ Sigmoid, σ TanH, σ LeCun TanH, σ Elliott, σ

1000

0 −2

−1

0

1

2

Hidden layer output

(c) LeCun TanH

0.1 10

100 Iterations

(d) ϕ10 and ςh profiles

1000

6000

Frequency

0

2000

6000 2000 0

Frequency

Glass Data Set

−2

−1

0

1

2

−2

−1

g(net)

0

1

2

g(net)

Table : Average EC , ςh , and ϕ values for the Glass data set g(net) Alg.: EC ςh ϕ5 ϕ10 ϕ20 ϕ30

gBest

Sigmoid VN

0.438 (0.0822) 3.4534 (0.7454) 0.7205 (0.059) 0.729 (0.0568) 0.729 (0.0568) 0.729 (0.0568)

0.4884 (0.0728) 2.8163 (0.4972) 0.6841 (0.0644) 0.693 (0.0613) 0.693 (0.0613) 0.693 (0.0613)

gBest

TanH VN

0.4372 (0.0764) 2.2161 (0.4478) 0.741 (0.0543) 0.7483 (0.052) 0.7483 (0.052) 0.7483 (0.052)

0.455 (0.0648) 1.9336 (0.4274) 0.7009 (0.0728) 0.71 (0.0694) 0.71 (0.0694) 0.71 (0.0694)

LeCun TanH gBest VN 0.4744 (0.0657) 1.3895 (0.2983) 0.4901 (0.0548) 0.508 (0.0507) 0.508 (0.0507) 0.5082 (0.0507)

0.4891 (0.0653) 1.3452 (0.351) 0.4769 (0.0581) 0.4968 (0.0538) 0.4968 (0.0538) 0.497 (0.0539)

gBest

Elliott VN

0.4357 (0.078) 3.8907 (0.8526) 0.656 (0.0454) 0.6614 (0.0436) 0.6614 (0.0436) 0.6614 (0.0436)

0.4535 (0.0709) 3.3354 (0.4817) 0.6357 (0.04) 0.6417 (0.0381) 0.6417 (0.0381) 0.6417 (0.0381)

Overall Ranks

Table : Average Algorithm Ranks: ϕ10 Algorithm

g(net)

GBest PSO Sigmoid

Average Rank

6.5 5.5

6.5

6.5

6.25

6.5 5.5

6.5

6.5

6.25

LeCun TanH 1.5 1.5

3.5

3.5

2.5

Elliott

3.5 5.5

2

3.5

3.625

Sigmoid

6.5 5.5

6.5

6.5

6.25

TanH

6.5 5.5

6.5

6.5

6.25

LeCun TanH 1.5 1.5

3.5

1.5

2

Elliott

1

1.5

2.875

TanH

VN PSO

Iris Glass Heart Diabetes

3.5 5.5

• LeCun TanH: least saturation • VN PSO: saturated less than GBest PSO

) 4.65625

) 4.34375

Conclusions • Simple bounded single-valued saturation measure for NNs based on activation function outputs was proposed • Applicable to all bounded activation functions • Independent of the activation function output range • Allows direct statistical comparisons between NNs employing different activation functions • LeCun TanH saturated less than other activation functions considered • VN PSO neighbourhood saturated less than GBest PSO neighbourhood

Future Work

• Explore the relationship between saturation and training algorithm performance • Use saturation measure as a NN learning guide • Explore means of controlling saturation

Thank You

Questions / Comments?