IJSRD - International Journal for Scientific Research & Development

0 downloads 0 Views 437KB Size Report
IJSRD - International Journal for Scientific Research & Development| Vol. 6, Issue 02, 2018 | ISSN (online): ..... functions, LAP LAMBERT Academic Publishing;.
IJSRD - International Journal for Scientific Research & Development| Vol. 6, Issue 02, 2018 | ISSN (online): 2321-0613

A New Class of Activation Functions Based on the Correcting Amendments Nikolay Kyurkchiev Faculty Department of Mathematics & Informatics University of Plovdiv Paisii Hilendarski, 24, Tzar Asen Str., 4000 Plovdiv, Bulgaria Abstract— We will explore the interesting methodological task for constructing new activation functions using "correcting amendments" for example from a combination of amendment of ”Gompertz - type” and ”Hyperbolic Tangent - type”. We prove upper and lower estimates for the Hausdorff approximation of the sign function by means of this new class of parametric activation functions - (PGHAF). Numerical examples, illustrating our results are given. Key words: Parametric activation function based on ”amendments” of ”Gompertz - type” and ”Hyperbolic Tangent - type” - (PGHAF), Sign function, Hausdorff distance, Upper and lower bounds

distance between the points A = ( t A , x A ) , B = ( t B , x B ) in is || A  B ||= max (| t A  t B |, | x A  x B |) . It is natural to define the following special class of activation functions.

R

2

C. Definition 3. The new parametric activation function based on ”amendments” of ”Gompertz - type” and ”Hyperbolic Tangent - type” - (PGHAF) is defined as follows

e

 (t ) =

e

I. INTRODUCTION Sigmoidal functions (also known as ”activation functions”) find multiple applications to neural networks [1]–[11]. In a series of papers, we have explored the interesting task of approximating the functions – Heaviside function h ( t ) and sign ( t ) with all-knowing functions such as Hyperbolic tangent, Logistic, Gompertz and others (see, for instance [12]–[13]). The task is important in the treatment of questions related to the study of the ”super saturation” - the object of the research in various fields - neural networks, nucleation theory, machine learning and others. A survey of neural transfer activation functions can be found in [14].

II. PRELIMINARIES A. Definition 1. The sign function of a real number t is defined as follows:

  1,  sgn ( t ) =  0, 1, 

if

t = 0,

if

t > 0.

 ( f , g ) = m ax{ sup inf || A  B ||, A F ( f ) B  F ( g )

sup

 (d ) =

inf || A  B ||},

(2)

B  F ( g ) A F ( f )

2

wherein || . || is any norm in R , e. g. the maximum norm || ( t , x ) ||= max{| t |, | x |} ; hence the

e

e

e

e

e

e

e

 at

at

at

.

(3)

e

e

e

e

e

 ad

e  ad

e

e

e

e

ad

ad

= 1 d.

(4)

e e The following Theorem gives upper and lower bounds for d : A. Theorem 3.1. For the Hausdorff distance d between the sgn function and the function  the following inequalities hold for

a>

dl =

  R , is the distance between their completed graphs F ( f ) and F ( g ) considered as closed subsets of   R . More precisely,

e

 at

In this Section we prove upper and lower estimates for the Hausdorff approximation of the sign function by means of  ( t ) . The H -distance d ( sgn ( t ),  ( t )) between the sgn function and the function  satisfies the relation:

(1)

B. Definition 2. [15], [16] The Hausdorff distance (the H– distance)  ( f , g ) between two interval functions f , g on

e

III. MAIN RESULTS

t < 0,

if

e

11 2   e  1   0.991261 e2 

1 2 1  a e 

< d
0 . We look for two reals d l and d r such that G ( d l ) < 0 and G ( d r ) > 0 (leading to

G ( d l ) < G ( d ) < G ( d r ) and thus d l < d < d r ). Trying

All rights reserved by www.ijsrd.com

565

A New Class of Activation Functions Based on the Correcting Amendments (IJSRD/Vol. 6/Issue 02/2018/163)

dl = a>

1 2 1  ae 

and d r =

ln  2  1  a e   2 1  a e 

we obtain for

11 2   e  1 e2 

G ( d l ) < 0; G ( d r ) > 0. This completes the proof of the inequalities (5).

Fig. 4: Approximation of the sgn ( t ) by (PGHAF) for

a = 10 ; Hausdorff distance: d = 0.0524862 . From the graphics it can be seen that the ”saturation” is faster. Some computational examples using relations (5) are presented in Table 1. The last column of Table 1 contains the values of d computed by solving the nonlinear equation (4). a dl dr d from (4) Fig. 1: The functions F ( d ) and G ( d ) for a = 4 . Approximations of the sgn ( t ) by (PGHAF)– functions for various a are visualized on Fig. 2–Fig. 4.

2 0.0776812 0.198487 0.186197 3 0.0546159 0.158792 0.137975 4 0.0421119 0.133386 0.110541 5 0.0342667 0.115602 0.0926649 6 0.0288856 0.102382 0.0800208 10 0.0177413 0.0715305 0.0524862 Table 1: Bounds for d computed by (5) for various a . From the above table, it can be seen that the right estimates for the value of the best Hausdorff distance (see (5)) are quite precise. IV. GENERAL CASE

Fig. 2: Approximation of the sgn ( t ) by (PGHAF) for

a = 3 ; Hausdorff distance: d = 0.137975 .

The research conducted in Section 3 gives us reason to explore the following generalization of activation function  (t ) :

 (t ; k ) =

e

e

e

e

e

  

e

 at

e   

e

e

e

 at e

e

  

e

  

e

at

,

at

(8)

e e where k means the number of recursive insertion of exp (given with sign '   in (8)).

As example when k = 1 we receive the function  ( t ) (3); when k = 2 : Fig. 3: Approximation of the sgn ( t ) by (PGHAF) for

 ( t ; 2) =

a = 6 ; Hausdorff distance: d = 0.0800208 .

e e

e

e

e

e

e

 at

e e

e

e

e

e

 at

e

e

e

at

at

(9)

,

etc. Let

Ak = 1  e  e  e e

e

e



e

e

  

e

i.e. A1 = 1; A2 = 1  e ; A3 = 1  e  e ; e

e

A4 = 1  e  e  e , e

e

.

All rights reserved by www.ijsrd.com

566

A New Class of Activation Functions Based on the Correcting Amendments (IJSRD/Vol. 6/Issue 02/2018/163)

The following Theorem gives upper and lower bounds for d : A. Theorem 4.1. For given k , the H-distance d k between the sgn function and the function  ( t ; k ) the following inequalities hold for:

1 1 2  e  1 A   e k 2 1 = < dk A 2 1  ae k a>

dl

k





 

ln 2 1  ae
0. 3

k

k

This completes the proof of the inequalities (10).

Fig. 6: An example of the usage of dynamical and graphical representation for the activation functions  ( t ; k ) for given

a and k . For example k = 2 , a = 0.9 . Hausdorff distance d 2 = 0.049175 . The plots are prepared using CAS Mathematica. Remark. We define the following ”Combined” 3–parametric activation functions:  7 ( t ; k ;  ;  ;  ) of the form:

 ( t ; k ;  ;  ;  ) = (1   )

e

e

 Fig. 5: The activation functions  ( t ; 2) for a = 5 ; Hausdorff distance d 2 = 0.012156 . V. CONCLUSION A family of parametric activation functions (PGHAF) based on ”correcting amendments” of ”Gompertz - type” and ”Hyperbolic Tangent - type” is introduced finding application in neural network theory and practice. Theoretical and numerical results on the approximation in Hausdorff sense of the sgn function by means of functions belonging to the family are reported in the paper.

e

e

e

e

e

  

e

 t

e   

e

e

e

e

e

 t

e e

e

  

e

e

e

e

  

e

t

e   

e

  

e

e

t e

e

t

t

  

e

  

e



(11)

t

t

,

e e 0    1 where . Based on the methodology proposed in the present note, the reader may formulate the corresponding approximation problems on his/her own. We propose a software module within the programming environment CAS Mathematica for the analysis of the considered family of (PGHAF) functions. The module offers the following possibilities:  generation of the activation functions under user defined values of the parameters k and a ;  calculation of the H-distance d k , p = 1, 2, , between the sgn function and the activation functions  (t ; k ) ;

All rights reserved by www.ijsrd.com

567

A New Class of Activation Functions Based on the Correcting Amendments (IJSRD/Vol. 6/Issue 02/2018/163)



software tools for animation and visualization. In conclusion, we will note that the newly constructed recurrently generable families of sigmoidal and activation functions can be used with success in creating a new higher order recurrent neural networks.

[14] Duch W., N. Jankowski (1999) Survey of neural transfer functions, Neural Computing Surveys, 2, 163– 212. [15] Hausdorff F. (1962) Set Theory (2 ed.), New York, Chelsea Publ. [16] Sendov B. Hausdorff Approximations, Boston, Kluwer

ACKNOWLEDGMENTS Special thanks go to Prof. Kamen Ivanov, DSc. for the valuable recommendations for the proof of Theorems 3.1– 4.1. This work has been supported by the project FP17FMI-008 of Department for Scientific Research, Paisii Hilendarski University of Plovdiv. REFERENCES [1] Guliyev N., V. Ismailov (2016) A single hidden layer feedforward network with only one neuron in the hidden layer san approximate any univariate function, Neural Computation, 28, 1289–1304. [2] Costarelli D., R. Spigler (2013) Approximation results for neural network operators activated by sigmoidal functions, Neural Networks 44, 101–106. [3] Costarelli D., G. Vinti (2016) Pointwise and uniform approximation by multivariate neural network operators of the max-product type, Neural Networks, 81, 81–90. [4] Costarelli D., R. Spigler (2016) Solving numerically nonlinear systems of balance laws by multivariate sigmoidal functions approximation, Computational and Applied Mathematics, 1–31. [5] Costarelli D., G. Vinti (2017) Convergence for a family of neural network operators in Orlicz spaces, Mathematische Nachrichten, 290 (2-3), 226–235. [6] Dombi J., Z. Gera (2005) The Approximation of Piecewise Linear Membership Functions and Lukasiewicz Operators, Fuzzy Sets and Systems, 154 (2), 275–286. [7] Basheer I. A., M. Hajmeer (2000) Artificial Neural Networks: Fundamentals, Computing, Design, and Application, Journal of Microbiological Methods, 43, 3–31. [8] Chen Z., F. Cao (2009) The Approximation Operators with Sigmoidal Functions, Computers & Mathematics with Applications, 58, 758–765. [9] Chen Z., F. Cao (2012) The Construction and Approximation of a Class of Neural Networks Operators with Ramp Functions, Journal of Computational Analysis and Applications, 14, 101–112. [10] Chen Z., F. Cao, J. Hu (2015) Approximation by Network Operators with Logistic Activation Functions, Applied Mathematics and Computation, 256, 565–571. [11] Costarelli D., R. Spigler (2013) Constructive Approximation by Superposition of Sigmoidal Functions, Anal. Theory Appl., 29, 169–196. [12] Kyurkchiev N., A. Iliev, S. Markov (2017) Some techniques for recurrence generating of activation functions, LAP LAMBERT Academic Publishing; ISBN 978-3-330-33143-3. [13] Iliev A., N. Kyurkchiev, S. Markov (2017) A Note on the New Activation Function of Gompertz Type, Biomath Communications, 4 (2).

All rights reserved by www.ijsrd.com

568