An LMS Algorithm for Training Single Layer Globally ... - CiteSeerX

2 downloads 0 Views 816KB Size Report
E-mail: stubber@ee.unlv.edu,. jwbruceQee.unlv.edu the nearest integer towards infinity. These products are summed such that the output of the rth neuron is.
An LMS Algorithm for Training Single Layer Globally Recursive Neural Networks Peter Stubberud

and J.W. Bruce

AbstractUnlike feedforward neural networks (FFNN) which can act as universal function approximaters, recursive neural networks have the potential to act as both universal function approximaters and universal system approximaters. In this paper, a globally recursive neural network least mean square (GRNNLMS) gradient descent or a real time recursive backpropagation (RTRBP) algorithm is developed for a single layer globally recursive neural network that has multiple delays in its feedback path.

the nearest summed

L &(n>

GLOBALLY

recursive

neural

network

(GRNN)

is a

neural network that has nonrecursive neurons which are interconnected recursively. Unlike feedforward neural

~,&kxk(n>

+ >:

N >)

i=l

b,[(i-l)N+k]Yk(n

-

i>

(l)

k=l

as

X(n) = [ q(n) the output

algorithm that is used to train the single layer GRNN in [2], the RTRBP algorithm developed in this paper can train a single layer

the output

.-.

XL(n) IT

as

=

vector

r( n >

path.

x&a)

vector Y(n)

Y(n)

GRNN shown in Fig. 1. Unlike the RTRBP

II.

are

and the function, f[*], is a nonrecursive activation function. An activation function can be defined as a continuous nondecreasing function that maps the input, (-00, oo), to [-a, b] where a and b are finite nonnegative constants. The

x(4

forward neural networks in the same way that infinite impulse response (IIR) filters can replace longer finite impulse response (FIR) filters. In this paper, a globally recursive neural network least mean square (GRNNLMS) gradient descent algorithm or a real time recursive backpropagation (RTRBP) algorithm is developed for the single layer

delays in its feedback

products

h(n), -**7dN(n), and the GRNN’s output signals. The states of the GRNN in Fig. 1 can also be written in matrix form by defining the nonrecursive input vector,

versal system approximators [I]. Also, recursive neural networks can yield smaller structures than nonrecursive feed-

that has multiple

These

of the rth neuron is

error signals, cl(n), . . . . e&z), that are fed to the training algorithm are the difference between the training signals,

networks (FFNN) which can act as universal function approximators, recursive neural networks have the potential to act as both universal function approximators and uni-

GRNN

=

lkl

k=O

I. INTRODUCTION

infinity.

where

KeywordsLeast mean square, real-time recursive backpropagation, recurrent neural network, neural network training, globally recursive neural network

A

integer towards

such that the output

=

7

l?(n),

that is fedback as

[ %b>

72b-4

l

-•

%M N(n)

3’

SINGLE LAYER GLOBALLY RECURSIVE NEURAL NETWORK

1, the single layer GRNN’s nonrecursive inz&z), can represent L + 1 simultaneous puts,a(+ - I inputs, or by defining xk(n) as x(n - /c), the nonrecursive In Fig.

l

xOn0

l

inputs can represent sequential samples from a single input. The network’s MN recursive inputs, yl(n - l), yz(n -

i

s k

11 “7 YN(n - M>, are the network’s N outputs that have been delayed and fedback to the input. The adjustable weights, alo, . . . , ~NL, are the weights that scale the nonre-

-

(

n

s. f

cursive inputs, and the adjustable weights, brl, . . . . IN, are the weights that scale the recursive inputs. In par-

)

s n d-L3 (

ticular, the weight, arc, multiplies the nonrecursive input, xc(n), and the weight, b,,, multiplies the recursive

input, Y[(c-1) mod(N)+I](n - rc/N]),

n “N 0

where TX] rounds x to

1

5

The authors are with the Department of Electrical and Computer Engineering at the University of Nevada, Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4026 USA E-mail: [email protected], jwbruceQee.unlv.edu

O-7803-4859-1/98 $10.0001998 IEEE

“Nci

Fig. 1.

2214

Single Layer Globally Recursive Neural Network

0n

Y(n----

III. RTRBP

FOR A SINGLE LAYER GRNN WITH MULTIPLE DELAYS IN THE FEEDBACK

1)

Y(n - 2) ---.. . ----

--

) y( n-

An adaptive algorithm trains a neural network by optimizing a cost function that defines the network’s performance. In this paper, the mean square error cost function, J(n), where

W

J( n >

By combining the network’s nonrecursive input vector, X(n), and the recursive input vector, I’(n), the network’s complete input vector, U(n), can be defined as U(n) =

w-4 [ 1 ---

=

E [ET(n)E(n)]

=

E

j=l N

.

=

r( n >

By defining a weight matrix, A(n), input vector, X(n), as

I

A(n) =

alo

for the nonrecursive

-0•

NIL@)

.. .

:.

1 aNo

--*

1 7

WL(4

[ 1 N

xey(n)

1

and a weight matrix, B(n , for the recursive input vector as B(n) = [ l%(n) I -- 1 BM(n) ] 7

E

x [dj(n) - 7Jjb$12 j=l [

1 (4) I

and E[e] is the ensemble average function, is used to determine an optimal set of weights for W(n). In general, an exact measurement of E [ET(n)E(n)] is not practical;

and thus, an estimate, y(n), of the cost function, J(n), is used to approximate the neural network’s performance. Least mean square (LMS) or backpropagation algorithms estimate mean square cost functions by a single sample of J(n) [3]. Thus, the cost function for the GRNNLMS developed in this paper is

l

z(n) = J(n)

where r

, h[(i-

1 )N+1 ]

Bi(n) =

(4

l

l

l

b[iN]

: i

b N[(i-1 )N+1 ]

(4

l

l

-

..

6N[iN]

=

(4

1 I

(4

=

[ w-4

=

[

7-02(7-J)

l

l

l

wl[L+l+M N]

(4

.

.

.

.

0

.

w,;(n)

WN& )

l

**

~iv[L+1CMN]o

By defining the vector, S(n), as SCn > =

[ sl(n)

=

s2(72)

l

-*

sN(n)

1 ’

w(n)u(n),

the output, Y(n), of the network can be written as

[ 1 f [Slb)l

Y(n) =

:*

= f [S(n)] = f [W(n)U(n

To generate the error vector, E(n), the output vector, Y(n), is subtracted from the training vector, D(n), which is defined as =

[

h(n)

b(n)

Thus, the error vector, E(n), is E(n)

L--

-

=

[ cl(n)

=

D(n) - Y(n).

e2(n)

dN(n) 1’

l

* l

.

eN(n)

f Ldj(n) -Yj(n)12 j=l

G(n)

I- (2)

f bdn)l

D(n)

Fe;(n)

(5)

A training algorithm that uses the gradient of the cost function or an estimate of the gradient of the cost function to determine a set of optimal weights is called a gradient descent algorithm. In this paper, a gradient descent algorithm that minimizes the cost function in (5) is developed for the single layer GRNN shown in Fig. 1. To determine an optimal set of weights, gradient descent algorithms iteratively descend a cost function’s surface. These algorithms use the gradient of the cost function with respect to the adjustable weights to indicate the direction of the surface’s minimum. For this paper, the gradient descent algorithm used to train the GRNN in Fig. 1 is

I B(n) ]

WI (n> -

ET(n)E(n)

j=l

an adjustable weight matrix, W(n), can be written as W(n)

=

1'

(3)

vv(n + 1) = w(n)-Pm7

n

(6)

where p is a positive constant. The gradient descent algorithm in (6) iteratively converges to an optimal set of weights when a^J(n)/aW(n) = 0. In practice, a neural network is considered to be trained when the cost function drops below a predetermined tolerance. To start = the algorithm, the elements in the matrix, W(0) with small random [ No) I Jw 17 are initialized . numbers. The matrix, B(O), should be initialized with values very near zero to ensure that the neural network is stable.

2215

of ^J(n> with respect to W(n)

In (6)‘) the gradient be written as

-C(n)

-

8W(n)

6G(n) 1 G(n) aB(n) dA(n)

-

[

--

6G(n) [ a wrc(n)

- r EN 1 forr=

1,2 ,...,

N; c=

-ali

-

aA

,...,

dsl (n)

-

EN

a^J(n) Z=O

N;c=O,l,...,

as,(n)

%3(n)

1

&l(n)

Substituting (10) into (8), the LMS gradient descent algorithm for training the single layer GRNN in Fig. 1 can be written as

1

bsl(n) da,,(n)

(n >I

n

d fT [s(n)1 n. EC )

-2

where

1

a>(n)

-zdfT [%$-J(

1

d wr,(n)

L+l+AJF,

b&)

-

forr=1,2

I=0

1,2 ,...,

-

1

C(n)

and the grad .ient of ^J(n) in (9) with respect to S(n) is

can

W(n+I)

Land

= W(n)+2p

[ a(n)

1 p(n) ] afaTsI~~~)lE(n)

] p(n) 1, in (ll), conthe matrix, [ a(n) a rc&) = ~s,(n)lwT, and l&z(n) = b,,, where

To calculate

sider the terms

d s,(n)/?

- C

a>(n)

EN

-

Z=O

1

t3sl (n)

as,(n)

a b,,(n)

M

L h>

=

F;azkxk(n)

b[(i-l)N+k] x k=l

+ >: i=l

k=O

for r = 1,2,. . . , N; c = 1,2,. . . , MN. In matrix form, the gradient of 2(n) with respect to W(n) can be written as

N

C(n)

8 S(n)

-

aW(n)

1

[ d w-4

-

t%(n)

aB(n)

-

(12)

i)

or MF

L -

Yk(n

S&)

] a^Jt4

=

%3(n)

cazkxk(n)

+

>1

bzk”Yk(n)

(13)

k=l

k=O

=[44I P(n) ]a (7)

Using (12))

W-cx

where

M

(n) = x,(n)S(x

r)

+ >:

x

i=l =

[&u(n)]

--

-

k=l

MN Prcz

N; c=O,l,...,

1,2, . . .,N

L; x=

(In>

=

rcwG

-

r)

+

>),

=

and

rcW(~

- r, + y

Cn)l -d s(n) =w

=

[Prcz

-

In matrix

s&-q - [d db 1

+>

Substituting

c =

k!N;

(7) into (6), the RTRBP

ing the single layer GRNN

W(n+

1,2 ,...,

1) = W(n)-+

in Fig.

[ a(n)

x =

I,2 ,...,

algorithm

N

x

bz[(+l)N+k]

dyk(

n - i)

db

k=l

rc

form, (14) and (15) can be written as

=

[x,(n)@

- r)] + sBT(n)

=

[xc(n)@

- r)] + >,

M

rc

N;

(15) rc

M

i=l

PC n >

aYk(n) 7

bk

k=l

\

da rc

I,2 ,..“,

i>

d s(n)

- [as,(n) 1

for T =

-

da rc

and using (13),

Wn)

1,2 ,...,

b[(i-l)N+k]

(14

44

for T=

N dyk(n

-

N.

for train-

forr=l,2

1 can be written as

1 p(n) ] a-

i, B’(n)

“yfAe

i=l

(8)

,...,

N;c=O,l,...,

&x=1,2

--

Mn)G - 41+

--

[y,(n)S(x

,...,

Nand

%BT(n) M

Substituting

(2) into (3), 2(n)

can be written as

- r)] + 7:

“yEBp

0X 4

12>

=

DT(n)D(n)

- 2DT(n)f [S(n)]

+fT [WI f [WI

i)BT(n)

i=l

(9)

for r = 1,2 ,..., respectively.

2216

N;

c =

1,2 ,...,

MN;

x =

1,2 ,...,

N,

Using the chain rule, the terms &&x - i)/&,, &(n - i)/ab,, in (14) and (15) can be written as aYk(n> da rc

af h(n)1 h&-4

-

ask(n)

aarc

- a%b&)1 3Sk(n)

(16)

%ck(n)

and a Yk(4

dbrc

V.

and

SUMMARY

The GRNNLMS or RTRBP algorithm that this paper developed for training the single layer GRNN shown in Fig. 1 is described by (11), (20) and (21). Before starting the algorithm, initialize W(0) = [ A(0) ] B(0) ] with small random numbers that ensure that the neural network is stable and set a(n) = p(n) = 0. In (11), the constant p can be replaced by the diagonal matrix

Tf [src(n>l he(n)

-

ask(n)

dbrc

d f hb-d

-

07)

prck(,)

3 Sk(n)

Substituting (16) and (17) into (14) and (15), respectively,

, PN are positive constants. Using the mawhere/+p2,-=. trix, M, instead of the scalar, p, allows each of the neurons to have a different convergence factor.

REFERENCES a rcx

( 9%>

=

x,(n)s(x M

- r) +

(18)

[l]

-

PI

N %ck(n

i>

and Prcz Cn)

PI =

y,(n>S(x - r) + M

N

$f

b[(ic k=l

c i=l

(19) PI

l)N+k]

[sk(n

-

i)]

dsk(n - i)

prck(,

-

i,

In matrix form, a(n)

=

[x,(n)@ f$x(n

- r)] +

(20)

- i) diag (‘~$~~~‘])

B?(n)

i=l

forr=l,2

,...,

N;c=O,l,...,

&x=1,2

,...,

Nand

P(n) = [r,(n)@ - 4 + Fp(n

(21)

- i) diag (aaf~~~~)])

B’(n).

i=l

forr=l,2

,...,

N; c=l,2 IV.

,...,

A&N; x=1,2

,...,

N.

EXAMPLE

A state machine oscillator was chosen to demonstrate the LMS training and capabilities of the GRNN. This task can be characterized as requiring the network to learn to use input and output information stored at an earlier time to assist in determination of the output at later times. Thus, the GRNN distributes information for its operation in both space and time [4]. In the state machine oscillator, the network has a single input signal and two output signals. The input signal is a clock of (-1, +l, -1, +l, . l } and output is interpreted as a two bit binary number. The network was trained to output the sequences, (0, 1, 0,2,0,1,0,2,0, . l } and {0,1,0,2,0,3,0,1,0,2,0,3,0,~ l +}, where 0=(-l, -l), 1=(-l, +l), 2=(+1, -1) and 3=(+1, +l). These sequences require the use of a single memory element. To illustrate the usefullness of more than one delay in the feedback path, the network was trained to step through the output sequence (0, 1,2,0,1,3,0,1,2,0,1,3,0, .l }. l

l

l

2211

“Approximation theory for deterministic P. Perryman, chastic nonlinear systems”, Ph.D. Dissertation, Univ.

and stoof Califor-

nia, Irvine, 1996. for continually R. Williams and D. Zipser, “A learning algorithm running fully recurrent neural networks,” Neural Computation, 2, pp. 270-280, 1989. Adaptive Signal Processing, EngleB. Widrow and S. Stearns, woods Cliffs, NJ: Prentice-Hall, 1985. “Efficient training of recurrent B. Cohen, D. Saad and E. Marom, neural network with time delays,” Neural Network;s, 10, pp. 5159, 1997.

Suggest Documents