An Application of Flow Graph Interreciprocity. - CiteSeerX

Relating Real-Time Backpropagation and Backpropagation-Through-Time: An Application of Flow Graph Interreciprocity. Francoise Beaufays and Eric A. Wan

Abstract

We show that signal ow graph theory provides a simple way to relate two popular algorithms used for adapting dynamic neural networks, real-time backpropagation and backpropagation-through-time. Starting with the ow graph for real-time backpropagation, we use a simple transposition to produce a second graph. The new graph is shown to be interreciprocal with the original and to correspond to the backpropagation-through-time algorithm. Interreciprocity provides a theoretical argument to verify that both ow graphs implement the same overall weight update.

Introduction

Two adaptive algorithms, real-time backpropagation (RTBP) and backpropagation-throughtime (BPTT), are currently used to train multilayer neural networks with output feedback connections. RTBP was rst introduced for single layer fully recurrent networks by Williams and Zipser (1989). The algorithm has since been extended to include feedforward networks with output feedback (see, e.g., Narendra 1990). The algorithm is sometimes referred to as real-time recurrent learning, on-line backpropagation, or dynamic backpropagation (Williams and Zipser; Narendra et al., 1990; and, Hertz et al., 1991). The name recurrent backpropagation is also occasionally used, although this should not be confused with recurrent backpropagation as developed by Pinenda (1987) for learning xed points in feedback networks. RTBP is well suited for on-line adaptation of dynamic networks where a desired response is speci ed at each time step. BPTT, (Rumelhart et al. 1986; Nguyen and Widrow, 1990; and Werbos, 1990), on the other hand, involves unfolding the network in time and applying standard backpropagation through the unraveled system. It does not allow for on-line adaptation as in RTBP, but has been shown to be computationally less expensive. Both algorithms attempt to minimize the same performance criterion, and are equivalent in terms of what they compute (assuming all weight changes are made o-line). However, they are generally derived independently and take on very dierent mathematical formulations. In this paper, we use ow graph theory as a common support for relating the two algorithms. We begin by deriving a general ow graph diagram for the weight updates associated with The authors are with the department of Electrical Engineering, Stanford University, Stanford, CA 94305-

4055. This work was sponsored by EPRI under contract RP8010-13.

1

RTBP. A second ow graph is obtained by transposing the original one, i.e., by reversing the arrows that link the graph nodes, and by interchanging the source and sink nodes. Flow graph theory shows that transposed ow graphs are interreciprocal, and for single input single output (SISO) systems, have identical transfer functions. This basic property, which was rst presented in the context of electrical circuits analysis (Pen eld et al., 1970), nds applications in a wide variety of engineering disciplines, such as the reciprocity of emitting and receiving antennas in electromagnetism (Ramo et al., 1984), the relationship between controller and observer canonical forms in control theory (Kailath, 1980), and the duality between decimation in time and decimation in frequency formulations of the FFT algorithm in signal processing (Oppenheim and Schafer, 1989). The transposed ow graph is shown to correspond directly to the BPTT algorithm. The interreciprocity of the two ow graphs allows us to verify that RTBP and BPTT perform the same overall computations. These principles are then extended to a more elaborate control feedback structure.

Network Equations

A neural network with output recurrence is shown in Figure 1. Let r(k ? 1) denote the vector of external reference inputs to the network and x(k ? 1) the recurrent inputs. The output vector x(k) is a function of the recurrent and external inputs, and of the adaptive weights w of the network: x(k) = N (x(k ? 1); r(k ? 1); w): (1)

r(k ? 1)

x(k)

N

x(k ? 1)

q Figure 1: Recurrent neural network (q represents a unit delay operator). The neural network N is most generally a feedforward multilayer architecture (Rumelhart et al., 1986). If N has only a single layer of neurons, the structure of Figure 1 represents a completely recurrent network (Williams and Zipser, 1989; Pineda, 1987). Any connectionist architecture with feedback units can, in fact, be represented in this standard format (Piche, 1993). Adapting the neural network amounts to nding the set of weights w that minimizes the cost function " # X 1 1 X E h e(k) e(k) i ; J = E e ( k ) e(k ) = (2) 2 2 =1 =1 K

K

T

T

k

k

2

where the expectation E [] is taken over the external reference inputs r(k) and over the initial values of the recurrent inputs x(0). The error e(k) is de ned at each time step as the dierence between the desired state d(k) and the recurrent state x(k) whenever the desired vector d(k) is de ned, and is otherwise set to zero: ( if 9 d(k) e(k) = d0 (k) ? x(k) otherwise (3) For such problems as terminal control (Bryson and Ho, 1969; Nguyen and Widrow, 1990) a desired response may be given only at the nal time k = K , while for other problems such as system identi cation (Ljung, 1987; Narendra, 1990) it is more common to have a desired response vector for all k. In addition, only some of the recurrent states may represent actual outputs while others may be used solely for computational purposes. In both RTBP and BPTT, a gradient descent approach is used to adapt the weights of the network. At each time step, the contribution to the weight update is given by h

i

e(k) e(k) dx(k ) = e(k ) ; w(k) = ? 2 dw dw T

d

T

T

(4)

where is the learning rate. Here the derivative is used to represent the change in error due to a weight change over all time1. The accumulation of weight updates over k = 1:::K is P given by w = =1 w(k). Typically, RTBP uses on-line adaptation in which the weights are updated at each time k, whereas BPTT performs an update based on the aggregate w. The dierences due to on-line versus o-line adaptation will not be considered in this paper. For consistency, we assume that in both algorithms the weights are held constant during all gradient calculations. K k

Flow Graph Representation of the Adaptive Algorithms

RTBP was originally derived for fully recurrent single layer networks2. A more general algorithm is obtained by using equation 1 to directly evaluate the state gradient dx(k) = dw in the above weight update formula. Applying the chain rule, we get: dx(k ) @ x(k ) dx(k ? 1) @ x(k ) dr(k ? 1) @ x(k ) dw = + + dw ; (5) dw @ x(k ? 1) dw @ r(k ? 1) dw @w in which dr(k ? 1) = dw = 0 since the external inputs do not depend on the network weights, and dw = dw = I , where I is the identity matrix. With these simpli cations, equation 5 reduces to: dx(k ) @ x(k ) dx(k ? 1) @ x(k ) = + : (6) dw @ x(k ? 1) dw @w We de ne the derivative of a vector a 2

An Application of Flow Graph Interreciprocity. - CiteSeerX

An Application of Flow Graph Interreciprocity. - CiteSeerX

Suggest Documents

Analysing an SQL Application with a BSPlib Call-graph ... - CiteSeerX

Graph Grammars with Negative Application Conditions? - CiteSeerX

APPLICATION OF AN INTERLINE POWER FLOW CONTROLLER ...

Automatic Data-Flow Graph Generation of MPI Programs - CiteSeerX

An application of the exponential random graph model

An application of graph theory to linguistic complexity - Sciendo

Application of asymmetric flow field-flow fractionation

PageRank on an Evolving Graph - CiteSeerX

An Efficient Graph Indexing Method - CiteSeerX

Multicommodity Flow Approximation used for Exact Graph ... - CiteSeerX

The e-Flow Audit: An Evaluation of Knowledge Flow ... - CiteSeerX

An Experimental Investigation of Flow Boiling ... - CiteSeerX

Feasibility of development and application of an Up-flow anaerobic/

Graph polynomials and Tutte-Grothendieck invariants: an application

Relational Graph Analysis with Real-World Constraints: An Application ...

Application of Graph Transformation to Visual Languages

APPLICATION OF GRAPH TRANSFORMATION TO VISUAL ...

Spectrum of the Laplacian of an asymmetric fractal graph - CiteSeerX

application of graph theory in communication networks

Application of Graph Transformation for ... - Semantic Scholar

A Novel Graph Drawing based Algorithm for Application ... - CiteSeerX

An Early Evaluation of the Scalability of Graph Algorithms ... - CiteSeerX

An Analysis of Solution Properties of the Graph Coloring ... - CiteSeerX

The Design of an Axial Flow Fan for Application in