An LMS Algorithm for Training Single Layer Globally Recursive Neural Networks Peter Stubberud
and J.W. Bruce
AbstractUnlike feedforward neural networks (FFNN) which can act as universal function approximaters, recursive neural networks have the potential to act as both universal function approximaters and universal system approximaters. In this paper, a globally recursive neural network least mean square (GRNNLMS) gradient descent or a real time recursive backpropagation (RTRBP) algorithm is developed for a single layer globally recursive neural network that has multiple delays in its feedback path.
the nearest summed
L &(n>
GLOBALLY
recursive
neural
network
(GRNN)
is a
neural network that has nonrecursive neurons which are interconnected recursively. Unlike feedforward neural
~,&kxk(n>
+ >:
N >)
i=l
b,[(i-l)N+k]Yk(n
-
i>
(l)
k=l
as
X(n) = [ q(n) the output
algorithm that is used to train the single layer GRNN in [2], the RTRBP algorithm developed in this paper can train a single layer
the output
.-.
XL(n) IT
as
=
vector
r( n >
path.
x&a)
vector Y(n)
Y(n)
GRNN shown in Fig. 1. Unlike the RTRBP
II.
are
and the function, f[*], is a nonrecursive activation function. An activation function can be defined as a continuous nondecreasing function that maps the input, (-00, oo), to [-a, b] where a and b are finite nonnegative constants. The
x(4
forward neural networks in the same way that infinite impulse response (IIR) filters can replace longer finite impulse response (FIR) filters. In this paper, a globally recursive neural network least mean square (GRNNLMS) gradient descent algorithm or a real time recursive backpropagation (RTRBP) algorithm is developed for the single layer
delays in its feedback
products
h(n), -**7dN(n), and the GRNN’s output signals. The states of the GRNN in Fig. 1 can also be written in matrix form by defining the nonrecursive input vector,
versal system approximators [I]. Also, recursive neural networks can yield smaller structures than nonrecursive feed-
that has multiple
These
of the rth neuron is
error signals, cl(n), . . . . e&z), that are fed to the training algorithm are the difference between the training signals,
networks (FFNN) which can act as universal function approximators, recursive neural networks have the potential to act as both universal function approximators and uni-
GRNN
=
lkl
k=O
I. INTRODUCTION
infinity.
where
KeywordsLeast mean square, real-time recursive backpropagation, recurrent neural network, neural network training, globally recursive neural network
A
integer towards
such that the output
=
7
l?(n),
that is fedback as
[ %b>
72b-4
l
-•
%M N(n)
3’
SINGLE LAYER GLOBALLY RECURSIVE NEURAL NETWORK
1, the single layer GRNN’s nonrecursive inz&z), can represent L + 1 simultaneous puts,a(+ - I inputs, or by defining xk(n) as x(n - /c), the nonrecursive In Fig.
l
xOn0
l
inputs can represent sequential samples from a single input. The network’s MN recursive inputs, yl(n - l), yz(n -
i
s k
11 “7 YN(n - M>, are the network’s N outputs that have been delayed and fedback to the input. The adjustable weights, alo, . . . , ~NL, are the weights that scale the nonre-
-
(
n
s. f
cursive inputs, and the adjustable weights, brl, . . . . IN, are the weights that scale the recursive inputs. In par-
)
s n d-L3 (
ticular, the weight, arc, multiplies the nonrecursive input, xc(n), and the weight, b,,, multiplies the recursive
input, Y[(c-1) mod(N)+I](n - rc/N]),
n “N 0
where TX] rounds x to
1
5
The authors are with the Department of Electrical and Computer Engineering at the University of Nevada, Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4026 USA E-mail:
[email protected], jwbruceQee.unlv.edu
O-7803-4859-1/98 $10.0001998 IEEE
“Nci
Fig. 1.
2214
Single Layer Globally Recursive Neural Network
0n
Y(n----
III. RTRBP
FOR A SINGLE LAYER GRNN WITH MULTIPLE DELAYS IN THE FEEDBACK
1)
Y(n - 2) ---.. . ----
--
) y( n-
An adaptive algorithm trains a neural network by optimizing a cost function that defines the network’s performance. In this paper, the mean square error cost function, J(n), where
W
J( n >
By combining the network’s nonrecursive input vector, X(n), and the recursive input vector, I’(n), the network’s complete input vector, U(n), can be defined as U(n) =
w-4 [ 1 ---
=
E [ET(n)E(n)]
=
E
j=l N
.
=
r( n >
By defining a weight matrix, A(n), input vector, X(n), as
I
A(n) =
alo
for the nonrecursive
-0•
NIL@)
.. .
:.
1 aNo
--*
1 7
WL(4
[ 1 N
xey(n)
1
and a weight matrix, B(n , for the recursive input vector as B(n) = [ l%(n) I -- 1 BM(n) ] 7
E
x [dj(n) - 7Jjb$12 j=l [
1 (4) I
and E[e] is the ensemble average function, is used to determine an optimal set of weights for W(n). In general, an exact measurement of E [ET(n)E(n)] is not practical;
and thus, an estimate, y(n), of the cost function, J(n), is used to approximate the neural network’s performance. Least mean square (LMS) or backpropagation algorithms estimate mean square cost functions by a single sample of J(n) [3]. Thus, the cost function for the GRNNLMS developed in this paper is
l
z(n) = J(n)
where r
, h[(i-
1 )N+1 ]
Bi(n) =
(4
l
l
l
b[iN]
: i
b N[(i-1 )N+1 ]
(4
l
l
-
..
6N[iN]
=
(4
1 I
(4
=
[ w-4
=
[
7-02(7-J)
l
l
l
wl[L+l+M N]
(4
.
.
.
.
0
.
w,;(n)
WN& )
l
**
~iv[L+1CMN]o
By defining the vector, S(n), as SCn > =
[ sl(n)
=
s2(72)
l
-*
sN(n)
1 ’
w(n)u(n),
the output, Y(n), of the network can be written as
[ 1 f [Slb)l
Y(n) =
:*
= f [S(n)] = f [W(n)U(n
To generate the error vector, E(n), the output vector, Y(n), is subtracted from the training vector, D(n), which is defined as =
[
h(n)
b(n)
Thus, the error vector, E(n), is E(n)
L--
-
=
[ cl(n)
=
D(n) - Y(n).
e2(n)
dN(n) 1’
l
* l
.
eN(n)
f Ldj(n) -Yj(n)12 j=l
G(n)
I- (2)
f bdn)l
D(n)
Fe;(n)
(5)
A training algorithm that uses the gradient of the cost function or an estimate of the gradient of the cost function to determine a set of optimal weights is called a gradient descent algorithm. In this paper, a gradient descent algorithm that minimizes the cost function in (5) is developed for the single layer GRNN shown in Fig. 1. To determine an optimal set of weights, gradient descent algorithms iteratively descend a cost function’s surface. These algorithms use the gradient of the cost function with respect to the adjustable weights to indicate the direction of the surface’s minimum. For this paper, the gradient descent algorithm used to train the GRNN in Fig. 1 is
I B(n) ]
WI (n> -
ET(n)E(n)
j=l
an adjustable weight matrix, W(n), can be written as W(n)
=
1'
(3)
vv(n + 1) = w(n)-Pm7
n
(6)
where p is a positive constant. The gradient descent algorithm in (6) iteratively converges to an optimal set of weights when a^J(n)/aW(n) = 0. In practice, a neural network is considered to be trained when the cost function drops below a predetermined tolerance. To start = the algorithm, the elements in the matrix, W(0) with small random [ No) I Jw 17 are initialized . numbers. The matrix, B(O), should be initialized with values very near zero to ensure that the neural network is stable.
2215
of ^J(n> with respect to W(n)
In (6)‘) the gradient be written as
-C(n)
-
8W(n)
6G(n) 1 G(n) aB(n) dA(n)
-
[
--
6G(n) [ a wrc(n)
- r EN 1 forr=
1,2 ,...,
N; c=
-ali
-
aA
,...,
dsl (n)
-
EN
a^J(n) Z=O
N;c=O,l,...,
as,(n)
%3(n)
1
&l(n)
Substituting (10) into (8), the LMS gradient descent algorithm for training the single layer GRNN in Fig. 1 can be written as
1
bsl(n) da,,(n)
(n >I
n
d fT [s(n)1 n. EC )
-2
where
1
a>(n)
-zdfT [%$-J(
1
d wr,(n)
L+l+AJF,
b&)
-
forr=1,2
I=0
1,2 ,...,
-
1
C(n)
and the grad .ient of ^J(n) in (9) with respect to S(n) is
can
W(n+I)
Land
= W(n)+2p
[ a(n)
1 p(n) ] afaTsI~~~)lE(n)
] p(n) 1, in (ll), conthe matrix, [ a(n) a rc&) = ~s,(n)lwT, and l&z(n) = b,,, where
To calculate
sider the terms
d s,(n)/?
- C
a>(n)
EN
-
Z=O
1
t3sl (n)
as,(n)
a b,,(n)
M
L h>
=
F;azkxk(n)
b[(i-l)N+k] x k=l
+ >: i=l
k=O
for r = 1,2,. . . , N; c = 1,2,. . . , MN. In matrix form, the gradient of 2(n) with respect to W(n) can be written as
N
C(n)
8 S(n)
-
aW(n)
1
[ d w-4
-
t%(n)
aB(n)
-
(12)
i)
or MF
L -
Yk(n
S&)
] a^Jt4
=
%3(n)
cazkxk(n)
+
>1
bzk”Yk(n)
(13)
k=l
k=O
=[44I P(n) ]a (7)
Using (12))
W-cx
where
M
(n) = x,(n)S(x
r)
+ >:
x
i=l =
[&u(n)]
--
-
k=l
MN Prcz
N; c=O,l,...,
1,2, . . .,N
L; x=
(In>
=
rcwG
-
r)
+
>),
=
and
rcW(~
- r, + y
Cn)l -d s(n) =w
=
[Prcz
-
In matrix
s&-q - [d db 1
+>
Substituting
c =
k!N;
(7) into (6), the RTRBP
ing the single layer GRNN
W(n+
1,2 ,...,
1) = W(n)-+
in Fig.
[ a(n)
x =
I,2 ,...,
algorithm
N
x
bz[(+l)N+k]
dyk(
n - i)
db
k=l
rc
form, (14) and (15) can be written as
=
[x,(n)@
- r)] + sBT(n)
=
[xc(n)@
- r)] + >,
M
rc
N;
(15) rc
M
i=l
PC n >
aYk(n) 7
bk
k=l
\
da rc
I,2 ,..“,
i>
d s(n)
- [as,(n) 1
for T =
-
da rc
and using (13),
Wn)
1,2 ,...,
b[(i-l)N+k]
(14
44
for T=
N dyk(n
-
N.
for train-
forr=l,2
1 can be written as
1 p(n) ] a-
i, B’(n)
“yfAe
i=l
(8)
,...,
N;c=O,l,...,
&x=1,2
--
Mn)G - 41+
--
[y,(n)S(x
,...,
Nand
%BT(n) M
Substituting
(2) into (3), 2(n)
can be written as
- r)] + 7:
“yEBp
0X 4
12>
=
DT(n)D(n)
- 2DT(n)f [S(n)]
+fT [WI f [WI
i)BT(n)
i=l
(9)
for r = 1,2 ,..., respectively.
2216
N;
c =
1,2 ,...,
MN;
x =
1,2 ,...,
N,
Using the chain rule, the terms &&x - i)/&,, &(n - i)/ab,, in (14) and (15) can be written as aYk(n> da rc
af h(n)1 h&-4
-
ask(n)
aarc
- a%b&)1 3Sk(n)
(16)
%ck(n)
and a Yk(4
dbrc
V.
and
SUMMARY
The GRNNLMS or RTRBP algorithm that this paper developed for training the single layer GRNN shown in Fig. 1 is described by (11), (20) and (21). Before starting the algorithm, initialize W(0) = [ A(0) ] B(0) ] with small random numbers that ensure that the neural network is stable and set a(n) = p(n) = 0. In (11), the constant p can be replaced by the diagonal matrix
Tf [src(n>l he(n)
-
ask(n)
dbrc
d f hb-d
-
07)
prck(,)
3 Sk(n)
Substituting (16) and (17) into (14) and (15), respectively,
, PN are positive constants. Using the mawhere/+p2,-=. trix, M, instead of the scalar, p, allows each of the neurons to have a different convergence factor.
REFERENCES a rcx
( 9%>
=
x,(n)s(x M
- r) +
(18)
[l]
-
PI
N %ck(n
i>
and Prcz Cn)
PI =
y,(n>S(x - r) + M
N
$f
b[(ic k=l
c i=l
(19) PI
l)N+k]
[sk(n
-
i)]
dsk(n - i)
prck(,
-
i,
In matrix form, a(n)
=
[x,(n)@ f$x(n
- r)] +
(20)
- i) diag (‘~$~~~‘])
B?(n)
i=l
forr=l,2
,...,
N;c=O,l,...,
&x=1,2
,...,
Nand
P(n) = [r,(n)@ - 4 + Fp(n
(21)
- i) diag (aaf~~~~)])
B’(n).
i=l
forr=l,2
,...,
N; c=l,2 IV.
,...,
A&N; x=1,2
,...,
N.
EXAMPLE
A state machine oscillator was chosen to demonstrate the LMS training and capabilities of the GRNN. This task can be characterized as requiring the network to learn to use input and output information stored at an earlier time to assist in determination of the output at later times. Thus, the GRNN distributes information for its operation in both space and time [4]. In the state machine oscillator, the network has a single input signal and two output signals. The input signal is a clock of (-1, +l, -1, +l, . l } and output is interpreted as a two bit binary number. The network was trained to output the sequences, (0, 1, 0,2,0,1,0,2,0, . l } and {0,1,0,2,0,3,0,1,0,2,0,3,0,~ l +}, where 0=(-l, -l), 1=(-l, +l), 2=(+1, -1) and 3=(+1, +l). These sequences require the use of a single memory element. To illustrate the usefullness of more than one delay in the feedback path, the network was trained to step through the output sequence (0, 1,2,0,1,3,0,1,2,0,1,3,0, .l }. l
l
l
2211
“Approximation theory for deterministic P. Perryman, chastic nonlinear systems”, Ph.D. Dissertation, Univ.
and stoof Califor-
nia, Irvine, 1996. for continually R. Williams and D. Zipser, “A learning algorithm running fully recurrent neural networks,” Neural Computation, 2, pp. 270-280, 1989. Adaptive Signal Processing, EngleB. Widrow and S. Stearns, woods Cliffs, NJ: Prentice-Hall, 1985. “Efficient training of recurrent B. Cohen, D. Saad and E. Marom, neural network with time delays,” Neural Network;s, 10, pp. 5159, 1997.