Haraux Type Activation Functions in Neural ... - Journal Repository

British Journal of Mathematics & Computer Science 4(22): 3163-3170, 2014 ISSN: 2231-0851

SCIENCEDOMAIN international www.sciencedomain.org

Haraux Type Activation Functions in Neural Network Theory Nasser-eddine Tatar∗1 1 Department of Mathematics and Statistics, King Fahd University of Petroleum and Minerals,

Dhahran, 31261, Saudi Arabia. Article Information DOI: 10.9734/BJMCS/2014/12361 Editor(s): (1) Qiankun Song, Department of Mathematics, Chongqing Jiaotong University, China. Reviewers: (1) Eleonora Catsigeras, Institute of Mathematics, Faculty of Engineering, University of the Republic, Uruguay. (2) Mohammed H. AL-Smadi, Applied Science Department, Ajloun College, Al-Balqa Applied University, Ajloun 26816, Jordan. (3) Anonymous, Sichuan University of Science and Engineering, Sichuan, China. (4) Anonymous, Southeast University, China. Peer review History: http://www.sciencedomain.org/review-history.php?iid=636id=6aid=5934

Original Research Article Received: 28 June 2014 Accepted: 02 August 2014 Published: 04 September 2014

Abstract A general system of ordinary differential equations that appear in neural network theory is studied. We consider nonlinear activation functions which have not been treated in the literature so far. Namely, we assume that the nonlinear activation functions are continuous functions but not necessarily Lipschitz continuous. Keywords: Neural network; activation function; Lipschitz continuous; exponential convergence; global existence.

1

Introduction

We consider the following system x0i (t) = −ai (t)xi (t) +

m X

fij (t, xj (t)) + ci (t), t > 0, i = 1, ..., m

j=1

*Corresponding author: E-mail: [email protected]

(1.1)

British Journal of Mathematics and Computer Science 4(22), 3163-3170, 2014

where the data xj (0) = x0j . The coefficients ai (t) ≥ 0 and ci (t), i = 1, ..., m (called the input functions) are continuous functions. The functions fij are nonlinear continuous functions (called activation functions). We are concerned here with the long time behavior of solutions. One of the main applications of our theoretical result is in Neural Network Theory. In fact, this is a more general nonlinear version of a system that appears in Neural Network Theory [1]. Artificial intelligence is motivated by the functioning of the human brain and (Artificial) Neural Networks are one of the products. A Neural Network is composed of many electronic models of neurons called units or processing elements. These units exchange a large number of signals between them. They generate a signal after having transformed the input received from the previous layer and pass it to the next layer along the connection. We may cite the following few applications: quality control, pattern recognition, speech recognition, price forecast, time series analysis, optimization and scheduling, data segmentation, marketing strategies, consumer choice prediction, detection of medical phenomena, hydraulic conductivity, soil compaction, stress management, database retrieval, ... [2-14]. Neural Networks are capable of ”learning” (by examples) so that they do not need a program like the traditional computers. They can also identify imprecise and noisy data as well as making decisions. A similar problem has been studied extensively by many authors [2,3,5,10-12,15,16]. Most of these authors were concerned about the exponential stability of solutions. Historically, specific activation functions were treated, then they were assumed to be bounded, monotone and differentiable. Then, these assumptions were weakened to the only condition of Lipschitz continuity. Since then this condition has not been improved considerably although we are witnessing the appearance of activation functions which are not Lipschitz continuous in applications [4,6]. We may cite, however, the contributions in [13,14]. They use a partial Lipschitz condition or a small modification of the ¨ Lipschitz condition. For Holder continuous activation functions we refer the reader to the works in [17-19]. The case of discontinuous activation functions has been discussed in [2,7-9,15,17,20]. The main assumptions to prove global (or local) exponential stability of the equilibrium are: some growth conditions on the coefficients, M-Matrix, H-Matrix, LMI condition and the Lyapunov Diagonally Stable condition. In the present work we would like to contribute in this regard by considering activation functions which are continuous but not necessarily Lipschitz continuous. General forms of nonlinear activations of Haraux-type are considered. We have named nonlinearities of the form ψ (log(b + |x(t)|) after Haraux because of his work on the inequality Z t u(t) ≤ u0 + a u(s) log [a + u(s)] ds 0

that appeared in 1981 [21, P. 139]. One of the key ingredients in our proof is the use of his idea in finding a bound for solutions of this inequality, although our functions here are much more general. We will prove that solutions converge to zero in an exponential manner when we start close enough to zero. The local existence of solutions may be derived easily by a standard argument. The global existence is a direct consequence of our theorem (see Corollary 1). The plan of the paper is as follows: in the next section we present some material needed to prove our result and in Section 3 we state and prove our convergence theorem.

2

Preliminaries

The nonlinear activation functions fij (t, xj (t)) are assumed to satisfy the condition |fij (t, xj (t))| ≤ bij (t)ϕij (|xj (t)|) ψij (log(b + |xj (t)|) , i, j = 1, ..., m, t > 0

(2.1)

3164


with ϕij (|xj (t)|) ≤ L |xj (t)| , t > 0

(2.2)

for some positive constants L, b ≥ 1, where bij , ϕij are nonnegative continuous functions and ψij are nonnegative non-decreasing continuous functions. Although the logarithm function is Lipschitz continuous (away from zero) our functions here are not necessarily Lipschitz continuous. Simple examples are the square root function and the cubic root function. Let I ⊂ R, and let g1 , g2 : I → R\{0}. We write g1 ∝ g2 if g2 /g1 is nondecreasing in I. This condition is used in the next lemma as well as as in the previous papers. We will use it in our proof and then show how to remove it (see Remark 2). Lemma 1 [22]: Let a(t) be a positive continuous function in J := [α, β), kj (t), j = 1, ..., n are nonnegative continuous functions, gj (u), j = 1, ..., n are nondecreasing continuous functions in R+ , with gj (u) > 0 for u > 0 and u(t) is a nonnegative continuous functions in J. If g1 ∝ g2 ∝ ... ∝ gn in (0, ∞), then the inequality Xn Z t kj (s)gj (u(s))ds, t ∈ J, u(t) ≤ a(t) + j=1

α

implies that u(t) ≤ ωn (t), α ≤ t < β0 where ω0 (t) := sup0≤s≤t a(s), Z ωj (t) := G−1 Gj (ωj−1 (t)) + j

t

kj (s)ds , j = 1, ..., n,

α

Z

u

Gj (u) := uj

dx , u > 0 (uj > 0, j = 1, ..., n), gj (x)

and β0 is chosen so that the functions ωj (t), j = 1, ..., n are defined for α ≤ t < β0 . The lemma is proved in [22]. In our case we will assume that (H) ψij (u) > 0 for u > 0 and ψij may be ordered as g1 ∝ g2 ∝ ... ∝ gn and their corresponding coefficients are relabelled ˜bk (t), k = 1, ..., n. The following notation will be usefull a(t) := min {ai (t)} , x(t) := 1≤i≤m

t

Z c(t) :=

|xi (t)| ,

i=1 s

Z exp

0

m X

a(σ)dσ 0

X m

|ci (s)| ds, t > 0,

(2.3)

i=1

Z t ˜bj (s)ds , j = 1, ..., n, ωj (t) := G−1 Gj (ωj−1 (t)) + L j

(2.4)

0

Z

u

Gj (u) := uj

dx , u > 0 (uj > 0, j = 1, ..., n), gj (x)

(2.5)

m P where ω0 (t) = log b + c + |x0i | = const and c = supt>0 c(t). i=1

3

Exponential Convergence

This section contains the statement and proof of our result which states that solutions converge to zero in an exponential manner provided that the initial data are small enough.

3165


Theorem 1: Assume that the hypotheses (2.1), (2.2), (H) hold and c = supt>0 c(t) < ∞. Then, there exists β0 > 0 such that Z t a(s)ds , 0 ≤ t < β0 x(t) ≤ exp ωn (t) − 0

where ωn (t) is as defined in (2.4). Proof: It is not difficult to see from the equations in (1.1) that for t > 0 and i = 1, ..., m we have m X

D+ |xi (t)| ≤ −ai (t) |xi (t)| +

|fij (t, xj (t))| + |ci (t)| ,

j=1

and the assumptions (2.1) and (2.2) imply that D+ x(t) ≤ − min1≤i≤m {ai (t)} x(t) m m P P +L bij (t) |xj (t)| ψij (log(b + |xj (t)|) + |ci (t)| , t > 0 i,j=1

i=1

where D+ denotes the right Dini derivative. Hence, D+ x(t) ≤ −a(t)x(t) m m P P +L bij (t) |xj (t)| ψij (log(b + |xj (t)|) + |ci (t)| , t > 0 i,j=1

(3.1)

i=1

and consequently io hR i P n hR m t t D+ x(t) exp 0 a(s)ds ≤ L exp 0 a(s)ds bij (t)x(t) i,j=1 hR iP m t ×ψij (log(b + x(t)) + exp 0 a(s)ds |ci (t)| , t > 0.

(3.2)

i=1

Defining Z

t

x ˜(t) := x(t) exp

a(s)ds , t > 0

(3.3)

0

and using a comparison theorem (see [23]) we deduce from (3.2), that for t > 0 Z s m Z tX m X x ˜(t) ≤ x(0) + c + L bij (s) exp a(σ)dσ x(s)ψij (log(b + x(s)) ds. j=1

0

(3.4)

0

i=1

If y(t) denotes the right hand side of (3.4) then x ˜(t) ≤ y(t), t > 0 and m P

Dy(t) = L

bij (t)˜ x(t)ψij (log(b + x(t))

i,j=1

≤ Ly(t)

m P

(3.5)

bij (t)ψij (log(b + y(t)) , t > 0.

i,j=1

This relation (3.5) implies that m X Dy(t) Dy(t) ≤ ≤L bij (t)ψij (log(b + y(t)) , t > 0 b + y(t) y(t) i,j=1

(3.6)

and an integration of (3.6) yields log [b + y(t)] − log [b + y(0)] ≤ L

m Z X i,j=1

t

bij (s)ψij (log(b + y(s)) ds, t > 0.

(3.7)

0

3166


It is clear that (3.7) may be rewritten as z(t) ≤ z(0) + L

m Z X

t

bij (s)ψij (z(s)) ds, t > 0 0

i,j=1

where z(t) designates the expression log [b + y(t)] . Now we can apply Pinto’s theorem (Lemma 1) to get z(t) := log [b + y(t)] ≤ ωn (t), 0 ≤ t < β0 where ω0 (t) := z(0) and ωn (t) as in (H). Therefore, x ˜(t) ≤ y(t) ≤ eωn (t) , 0 ≤ t < β0 and in view of (3.3) t

Z x(t) ≤ exp ωn (t) −

a(s)ds , 0 ≤ t < β0 .

0

Example: Case ψij (z) = z αij , αij ≥ 1, i, j = 1, ..., m. To have the order g1 ∝ g2 ∝ ... ∝ gn we must order the powers in a non-decreasing manner β1 ≤ β2 ≤ ... ≤ βn . We obtain Gj (x) = h i− 1 1−βj 1−βj 1−βj βj −1 x0 −1 x − 1−β , G (z) = x − (β and j − 1)z 0 j 1−βj j ωj (t) =

1−β

ωj−1 j (t) − (βj − 1)L

t

Z

˜bj (s)ds

− β 1−1 j

, j = 1, ..., n, 0 ≤ t < β0 .

0

The value β0 will be the largest value of t for which Z t βj −1 ˜bj (s)ds < 1/(βj − 1) ωj−1 (t)L 0

for all j = 1, ..., n. Corollary 1: If, in addition to the hypotheses of the theorem, we assume that Z ∞ Z ∞ dσ ˜bk (s)ds ≤ , k = 1, ..., n L 0 ωk−1 gk (σ) then we have global existence of solutions. Indeed, under the conditions in this corollary, it can be seen from Lemma 1 that ωn (t) are defined everywhere. That is β0 = +∞. By the continuation principle, solutions that are bounded by continuous functions exist globally in time. Rt Corollary 2: If, in addition to the hypotheses of the theorem, we assume that ωn (t)− 0 a(s)ds → −∞ then solutions decay to zero at an exponential rate. Remark 1: The condition c = supt>0 c(t) < ∞ is not necessary. It may be dropped. Indeed, we will have m c0 (t) Ly(t) X D+ y(t) ≤ + bij (t)ψij (log(b + y(t)) , t > 0 b + y(t) b + y(t) b + y(t) i,j=1 or

m X D+ y(t) c0 (t) ≤ +L bij (t)ψij (log(b + y(t)) , t > 0 b + y(t) c(t) i,j=1

and therefore z(t) ≤ z ∗ (t) + L

m Z X i,j=1

t

bij (s)ψij (log(b + y(s)) ds

0

where z ∗ (t) = z(0) + log c(t) − log c(0)

3167


(we may assume without loss of generality that c(t) ≥ 1). Then, we apply Theorem 3 in [22]. Remark 2: The monotonicity condition and the order assumed in our theorem may be dropped by passing to the new functions (after relabelling ψij into a single subscript) ψk (s) χ1 (t) := max ψ1 (s), χk (t) := max χk−1 (t) 0≤s≤t 0≤s≤t χk−1 (s) and ϕ(t) := χk (t)/χk−1 (t).

4

Conclusions

We proved an exponential convergence to zero of solution of a system of ordinary differential equations. This system is similar to some systems that appear in Neural Network Theory. In this theory researchers are concerned primarly by stabilizing exponentially the system to the equilibrium state. In most of the existing works in the literature the activation functions (nonlinearities) are assumed to be Lipschitz continuous. Discontinuous activation functions also have been treated in many papers. In the present work our nonlinearities are continuous but not necessarily Lipschitz continuous.

Acknowledgment The author is grateful for the financial support and the facilities provided by King Fahd University of Petroleum and Minerals through grant No. IN111052.

Competing interests The author declares that no competing interests exist.

References [1] Hopfield JJ, Tank DW. Computing with neural circuits: a model. Science. 1986;233:625-633. [2] Bao G, Zeng Z. Analysis and design of associative memories based on recurrent neural network with discontinuous activation functions. Neurocomput. 2012;77:101-107. [3] Chua LO, Roska T. Stability of a class of nonreciprocal cellular neural networks. IEEE Trans Circuits Syst I. 1990;37:1520-1527. [4] Gavaldi R, Siegelmann HT. Discontinuous in recurrent neural networks. Neural Comput. 1999;11:715-745. [5] Kennedy MP, Chua LO. Neural networks for non-linear programming. IEEE Trans Circ Syst I, Fundam Theory Appl. 1998;35:554-562.

3168


[6] Kosko B. Neural Network and Fuzzy System - A Dynamical System Approach to Machine Intelligence. New Delhi: Prentice-Hall of India; 1991. [7] Li L, Huang L. Dynamical behaviors of a class of recurrent neural networks with discontinuous neuron activations. Appl Math Model. 2009;33:4326-4336. [8] Li W, Wu H. Global stability analysis for periodic solution in discontinuous neural networks with nonlinear growth activations. Adv Diff Eqs. 2009, ID 798685. [9] Liu X, Cao J. Robust state estimation for neural networks with discontinuous activations. IEEE Trans Syst Man Cyb-Part B: Cybernetics. 2010;40(6):1425-1437. [10] Mohamad S. Exponential stability in Hopfield-type neural networks with impulses. Chaos Solitons & Fractals. 2007;32:456-467. [11] Qiao H, Peng JG, Xu Z. Nonlinear measures: A new approach to exponential stability analysis for Hopfield-type neural networks. IEEE Trans Neural Netw 2001;12:360–370. [12] Sudharsanan SI, Sundareshan MK. Exponential stability and a systematic synthesis of a neural network for quadratic minimization. Neural Networks 1991;4:599-613. [13] Wu H. Global exponential stability of Hopfield neural networks with delays and inverse Lipschitz neuron activations. Nonlinear Anal, Real World Appl 2009;10:2297-2306. [14] Wu H, Xue X. Stability analysis for neural networks with inverse Lipschizan neuron activations and impulses. Appl. Math. Model. 2008;32:2347-2359. [15] Forti M, Nistri P. Global convergence of neural networks with discontinuous neuron activations. IEEE Trans Circuits Syst.-I: Fund. Theory Appl. 2003;50(11):1421-1435. [16] Forti M, Tesei A. New conditions for global stability of neural networks with applications to linear, quadratic programming problems. IEEE Tran Circ. Syst I, Fundam Theory Appl. 1995;42(7):354366. [17] Forti M, Grazzini M, Nistri P, Pancioni L. Generalized Lyapunov approach for convergence of neural networks with discontinuous or non-Lipschitz activations. Physica D. 2006;214:88-99. [18] Tatar N.-e. Hopfield neural networks with unbounded monotone activation functions. Adv Artificial Neural Netw Syst. 2012;2012, ID 571358, 1-5. ¨ [19] Tatar N.-e. Control of systems with Holder continuous functions in the distributed delays. Carpathian J Math 2014;30(1):123-128. [20] Wang J, Huang L, Guo Z. Global asymptotic stability of neural networks with discontinuous activations. Neural Networks 2009;22:931-937. [21] Haraux A. Nonlinear Evolution Equations-Global Behavior of Solutions. Lecture Notes in Mathematics, Vol. 841, Springer-Verlag, Berlin; 1981. [22] Pinto M. Integral inequalities of Bihari-type and applications. Funkcialaj Ekvacioj. 1990;33:387403.

3169


[23] Lakshmikhantam V, Leela S. Differential and Integral Inequalities: Theory and Applications, Vo. 55-I, Mathematics in Sciences and Engineering, Edited by Richard Bellman, Acad. Press, New York-London; 1969. —————————————————————————————————————————————c

2014 Tatar; This is an Open Access article distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Peer-review history: The peer review history for this paper can be accessed here (Please copy paste the total link in your browser address bar) www.sciencedomain.org/review-history.php?iid=636&id=6&aid=5934

3170

Haraux Type Activation Functions in Neural ... - Journal Repository

Haraux Type Activation Functions in Neural ... - Journal Repository

Suggest Documents

Adaptive Neural Activation Functions in

Deep Neural Networks with Multistate Activation Functions

Stochastic Neural Networks with Monotonic Activation Functions

Comparison of new activation functions in neural ... - Semantic Scholar

Neural Networks with Smooth Adaptive Activation Functions for ...

Comparison of non-linear activation functions for deep neural

Kafnets: kernel-based non-parametric activation functions for neural ...

Noisy Activation Functions - arXiv

Mechanisms and Drivers of Type 2 Diabetes in ... - Journal Repository

Nonmonotonic Activation Functions in Multilayer ... - CiteSeerX

Nonmonotonic Activation Functions in Multilayer ... - CiteSeerX

Activation of Nrf2 Restores Klotho Expression ... - Journal Repository

Fun with type functions

Growth and Water-Yield Functions of Dry-Season ... - Journal Repository

Searching for Activation Functions - arXiv

Neural activation differences in amputees during ... - ScienceOpen

Neural Network Models for Predicting Wellhead ... - Journal Repository

NEURAL ACTIVATION, INFORMATION, AND ...

Symbolic functions from neural computation

Survey of Neural Transfer Functions

EVANS FUNCTIONS FOR INTEGRAL NEURAL

Cannabinoid Receptor Type 2 Activation in ...

Inflammasome Activation in Delayed-Type Hypersensitivity ... - Core

Partial Functions in Type Theory - Project Euclid