To discuss that lemma, let G be a subset of a real or complex inner .... of X, and let f be a classi er on C0 that takes the value ai on Ci; 1 i m. ...... 7] R. C. Buck.
ON APPLICATIONS OF APPROXIMATION THEORY TO IDENTIFICATION, CONTROL AND CLASSIFICATION by AJIT TRIMBAK DINGANKAR, B.Tech., M.S., M.S.
DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Ful llment of the Requirements for the Degree of
DOCTOR OF PHILOSOPHY
Acknowledgements It is a great pleasure to express gratitude to my adviser, Professor Irwin W. Sandberg, for his invaluable guidance during the research period which culminated in this dissertation. This work took place in the Information and Systems Engineering Area within the Department of Electrical and Computer Engineering. From June 1989 through August 1994, I was employed at the Advanced Workstations Division of IBM Corporation, Austin. I would like to thank my colleagues and supervisers for continually asking me the question \when are you going to nish that PhD?", thus reminding me constantly what I was in (school) for. My parents provided a rationalistic development environment which shaped my life probably to a much greater extent than even they could have anticipated. My wife, Manjusha, patiently waited through my changes of departments, topics, employers and moods all these years. During the course of the doctorate we had the pleasure to welcome a new addition to our family, our son Chaaru, who reintroduced me to learning in a fresh, fun, noncomputational way. A special note of thanks goes to my friends too numerous to enumerate here. Most of all, I appreciate the true spirit of brotherhood from Satish Dingankar.
ii
ON APPLICATIONS OF APPROXIMATION THEORY TO IDENTIFICATION, CONTROL AND CLASSIFICATION Publication No. Ajit Trimbak Dingankar, Ph. D. The University of Texas at Austin, 1995 Supervisor: Irwin W. Sandberg Applications of approximation theory to some problems in identi cation of dynamic systems, their control, and to problems in signal classi cation are studied. First, an algorithm is given for constructing approximations in a wide variety of settings, and a corresponding error bound is derived. Then weak sucient conditions for perfect classi cation of signals are studied. Next the problem of approximating linear functionals with certain sums of integrals is studied, alongwith its relation to the approximation of nonlinear functionals. Then an approximation theoretic characterization of continuity of nonlinear maps is given. As another application of function approximation, the problem of universally approximating controllers for discrete-time continuous plants is studied. Finally, error bounds for approximation of functions de ned on nite dimensional Hilbert spaces are given.
iii
Table of Contents Acknowledgements
ii
Abstract
iii
List of Tables
vii
List of Figures
viii
Chapter 1. Introduction
1
Chapter 2. A Constructive Algorithm
4
Chapter 3. Classi ers on Relatively Compact Sets
7
1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2 The algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
3.1 Introduction : : : : : : 3.2 Classi cation theorem 3.2.1 Comments : : : 3.2.2 Example of C0 : 3.2.3 Examples of Y :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
Chapter 4. Approximation of Bounded Linear Functionals 4.1 Introduction : : : : : : : : : : : 4.2 Approximation Theorem : : : : 4.2.1 Notation and De nitions 4.2.2 The Theorem : : : : : : 4.3 Comments : : : : : : : : : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
iv
: : : : :
: : : : :
: : : : :
: : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
1
4 5
: 7 : 7 : 8 : 9 : 10 : : : : :
12 12 13 13 14 15
Chapter 5. Approximation of Nonlinear Maps
5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2 Preliminaries : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2.1 The inputs and outputs : : : : : : : : : : : : : : : : : : : : : : 5.2.2 Localizable systems : : : : : : : : : : : : : : : : : : : : : : : : 5.2.3 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2.4 Approximating structures : : : : : : : : : : : : : : : : : : : : : 5.3 Characterization of Continuity : : : : : : : : : : : : : : : : : : : : : : 5.3.1 The approximation theorem : : : : : : : : : : : : : : : : : : : 5.4 Applications to Time-invariant Systems : : : : : : : : : : : : : : : : 5.4.1 Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.4.2 Approximation of discrete-time causal time-invariant systems : 5.4.3 A conjecture about nonlinear feedback systems : : : : : : : : : 5.4.4 Continuous-time systems on the half-line : : : : : : : : : : : : 5.4.5 Approximation of stable nonlinear feedback systems : : : : : : 5.5 An Algorithm and its Error Analysis : : : : : : : : : : : : : : : : : : 5.5.1 Remark : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.6 Applications : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.6.1 Dynamic signal classi cation : : : : : : : : : : : : : : : : : : : 5.6.2 Image processing : : : : : : : : : : : : : : : : : : : : : : : : : 5.7 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Chapter 6. Robust Universal Neural Controllers 6.1 6.2 6.3 6.4 6.5
Introduction : : : : : : : : : : : : : : : Simultaneous Approximation Theorem Robust Retrainable Neural Controllers Comments : : : : : : : : : : : : : : : : Conclusion : : : : : : : : : : : : : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
Chapter 7. Error Bounds for Function Approximation 7.1 7.2 7.3 7.4
Introduction : : : : : : : : : Notation : : : : : : : : : : : Relation to previous work : Error Bounds: Preliminaries
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
v
: : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : :
16 16 17 17 17 18 21 21 23 25 25 26 27 27 29 30 31 31 32 32 32
33 33 34 36 39 40
41 41 41 43 44
7.5 7.6 7.7 7.8 7.9
7.4.1 The classes and c : : : : : : : : : : 7.4.2 A consequence of c-integrability : : : : Approximation of complex-valued functionals Approximation of real-valued functionals : : : Applications : : : : : : : : : : : : : : : : : : : 7.7.1 Bandlimited Functions : : : : : : : : : 7.7.2 Schwartz class : : : : : : : : : : : : : : Approximation of functionals in L2 : : : : : : 7.8.1 Classi cation : : : : : : : : : : : : : : : Discussion of results : : : : : : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
44 45 45 49 49 49 50 51 51 52
Appendix A. Concerning Chapter 4
53
Appendix B. Concerning Chapter 5
54
A.1 Uniform Riemann-Stieltjes Integrability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53 B.1 Proof of the Completeness of Xw : : : : : : B.1.1 Weighted continuous functions : : : : B.1.2 Weighted Lp space : : : : : : : : : : B.2 Veri cation of the Hypotheses of Theorem 6 B.2.1 Hypothesis (i) : : : : : : : : : : : : : B.2.2 Hypothesis (ii) : : : : : : : : : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
54 54 54 54 54 55
Appendix C. Concerning Chapter 6
57
Bibliography
61
Vita
65
C.1 Lemma on Equicontinuous Extensions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 57 C.2 Uniform Riemann Integrability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59
vi
List of Tables 7.1 An analogy between analogies. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 43
vii
List of Figures 3.1 Architecture of the classi er. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.2 Neural network implementation of the map h. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
9 9
5.1 A tensor product neural network M. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 5.2 Nonlinear control system. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27 6.1 Unity feedback con guration for tracking. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38 7.1 Architecture of the complex Hilbert space network. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42 7.2 Architecture of the real Hilbert space network. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 43
viii
Chapter 1 Introduction The problems of identi cation and control of dynamic systems, and of signal classi cation are of importance in many engineering settings. Several techniques of their solution have been proposed in the literature. Here we treat these problems from the viewpoint of applications of approximation theory.
1.1 Overview The earliest proofs of approximation with neural networks of continuous functions on compact sets are nonconstructive and do not address the question of how one determines the number of nonlinearities needed for a given degree of approximation. In an interesting recent paper, Barron [3] shows that a lemma attributed to B. Maurey can be used in some cases to obtain a bound on the number of nonlinearities. In Chapter 2 we make an important simple observation concerning the proof of the lemma described above, and we give a corresponding simpler algorithm for achieving the error bound. The observation is that the proof of Lemma 1 given in [3] shows that each approximating function fn there can in fact be taken to an arithmetic average of n points of the approximating set G. In particular, we describe an algorithm to compute such n-point averages. Chapter 2 is based on the paper A. T. Dingankar and I. W. Sandberg, \A Note on Error Bounds for Approximation in Inner Product Spaces", To appear in Circuits, Systems and Signal Processing, 1996. A conference version apppears in \On Error Bounds for Neural Network Approximation", Proceedings of the International Symposium on Circuits and Systems (ISCAS-95), pp. 490{492, Seattle, Washington, April 30{May 3, 1995. The problem of classifying signals is of interest in several engineering contexts, e.g., automatic target recognition and symbol detection in digital communication using analog channels. Typically we are given a nite number m of pairwise disjoint sets C1; : : :; Cm of signals, and we would like to synthesize a system that maps the elements of each Cj into a real number aj , such that the numbers a1 ; : : :; am are distinct. In a recent paper [51] it is shown that this classi cation can be performed by certain simple structures involving linear functionals and memoryless nonlinear elements, assuming that the Cj are compact subsets of a real normed linear space. In Chapter 3 we give a similar solution to the problem under the considerably weaker assumption that the Cj are relatively compact (i.e., have compact closure) and are of positive distance 1
2 from each other. An example is given in which the Cj are subsets of Lp (a; b); 1 p < 1. This example shows that a very wide class of signals can be classi ed using the structures described. Chapter 3 is based on the paper I. W. Sandberg and A. T. Dingankar, \Classi ers on relatively compact sets", IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, 42(1):57, January 1995. One of the earliest results in the area of neural networks is the proposition that any continuous real function de ned on a compact subset of Rk (k an arbitrary positive integer) can be approximated arbitrarily well using a single-hiddenlayer network with sigmoidal nonlinearities (see, for example, [18]). Among other results in the literature concerning arbitrarily good approximation that concern more general types of \target" functionals, dierent network structures, other nonlinearities, and various measures of approximation errors is the proposition in [49], [51] that any continuous real nonlinear functional on a compact subset of a real normed linear space can be approximated arbitrarily well using a single-hiddenlayer neural network with a linear functional input layer and exponential (or polynomial or sigmoidal) nonlinearities. In an interesting paper [11] similar results concerning single-hidden-layer neural networks with sigmoidal nonlinearities are proved for continuous real functionals on a compact subset of an Lp space (1 < p < 1 ). One of the main results in [11] is the proposition that the linear functionals in the input layer may have the special form of a nite sum of integrals of a certain type. In Chapter 4 we connect these results by showing that every continuous linear functional can be approximated arbitrarily well on compact sets by nite sums of the type used in [11]. Besides relating the two works [11] and [51], the theorem given here is felt to be interesting in itself. Chapter 4 is based on the paper I. W. Sandberg and A. T. Dingankar, \On Approximation of Linear Functionals on Lp Spaces", IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, 42(7):402, July 1995. A modi ed version of Chapter 5 is under preparation as the paper Ajit T. Dingankar and Irwin W. Sandberg, \Tensor Product Neural Networks are Universal Approximators of Dynamical Systems". Related results, in a general sense, on approximation of dynamic systems are reported in the work of Chen and Chen [12], where the approximating structures are certain forms of radial basis function neural networks. Approximation results using Volterra series for nonlinear time-varying systems are obtained in [47], and in Boyd's dissertation [6] for the time-invariant case. One of the principal results of the work reported on in Chapter 5, as an application of Theorem 5 there, states that the property of arbitrarily good approximation by nite sums of the form k(i) m X X i=1 j =1
cij uij [yij ()]vi
is equivalent to continuity of a certain nonlinear dynamic input-output map. Here the cij are real constants, the uij are certain continuous real-valued functions of the reals, the yij are continuous real functionals de ned on the input space X, and the vi belong to the output space E. A modi ed version of Chapter 6 is under preparation as the paper
3 Ajit T. Dingankar and Irwin W. Sandberg, \Recon gurable Universal Neural Controllers". Here the problem is to approximate all members of a class of equicontinuous controllers arbitrarily well using a xed dilation (called the \width") of a basis function in conjunction with a xed nite set of translates called the \centers". This property is important in practical applications because of the need for robustness with respect to plant variations or modeling uncertainties [33]. In the case of discrete-time continuous controllers that are time-invariant and posess nite memory, our problem reduces to the problem of simultaneously approximating a class of continuous functions de ned on a nite-dimensional Euclidean space. This is the setting adopted in this work. We show that the problem can be solved by radial basis function neural controllers (RBFNCs). Approximations with radial basis functions have been studied in the literature by many authors. See [36] for an early theoretical result, and [53] for a control application. However, in most cases only a single function (or a single controller) is approximated. Simultaneous approximation of a class of functions using radial basis functions was obtained by Chen and Chen [12]. In contrast to [12], we are not restricted to using an activation function that is radially symmetric with respect to the Euclidean norm. This has important practical implications for using a form of elliptic basis functions [37], [9]. Also, in our work the same width is used on all nodes, and hence the number of free parameters of the network is reduced. Hence the parameter estimation problem is simpli ed. Perhaps the most important consideration is the ease of retraining a network on a new target function, i.e., if a suciently accurate RBFNC has been obtained with our method, then it can be used with the same width and centers for any controller in the given equicontinuous class with the same error bound. The only parameters that need to be adapted to a new controller are the weights from the hidden nodes to the output, but this problem is linear. Together, these results indicate that the RBFNCs are a powerful and ecient architecture | they are linearly recon gurable universal controllers. Chapter 7 deals with the problem of function approximation using neural networks, and the associated error bounds. This area of study received a great impetus with the work of Barron [3]. Here we obtain similar error bounds using a dierent neural network architecture and a dierent class of \target" functions (de ned via a dierent integrability condition than the condition in [3]). Most importantly, we give a much simpli ed construction of the approximating structures. We show that, if the target function f is integrable in the Lebesgue sense (which is the most commonly assumed condition to guarantee the existence of an integral representation of the type crucially used in [3]), then the integrability condition of [3] implies that of ours. We also give an example to show that there are functions f which satisfy our integrability condition but not the condition in [3]. An abridged version of Chapter 7 is under preparation for future publication.
Chapter 2 A Constructive Algorithm 2.1 Introduction One of the earliest interesting results in the area of neural networks is the proposition that any continuous real function de ned on a compact subset of Rk (k an arbitrary positive integer) can be approximated arbitrarily well using a single-hidden-layer network with sigmoidal nonlinearities [26]. There are several other results in the literature concerning arbitrarily good approximation that concern more general types of \target" functionals, dierent network structures, other nonlinearities, and various criteria by which the quality of approximation is gauged.1 The earliest proofs of these results are nonconstructive and do not address the question of how one determines the number of nonlinearities needed for a given degree of approximation. In an interesting recent paper, Barron [3] shows that a lemma attributed to B. Maurey can be used in some cases to obtain a bound on the number of nonlinearities. To discuss that lemma, let G be a subset of a real or complex inner product space H with norm jj jj, such that the elements of G are bounded in norm by some positive constant b, and let co(G) denote the closure of the convex hull of G. Maurey's lemma, which concerns the error in approximating an element of co(G) with a convex combination of n points in G, is the following:
Lemma 1 Let f be an element of co(G) and let c > (b2 ? jjf jj2). Then for each positive integer n there is an fn in the convex hull of n points of G such that
jjf ? fn jj2 nc :
(2.1)
As mentioned in [3], this lemma is attributed in [39] to B. Maurey. In the main application given in [3] assumptions are made on an f so that f(x) can be written as an integral with respect to a probability measure. This yields a setting in which f is an element of the closure of the convex hull of functions generated by the integrand. For a related later study concerning approximations using translates of a given function see [23]. An algorithm for achieving the bounds in Lemma 1 is also given in [3]. (The algorithm is an improved version of a result due to Jones [27].) In this algorithm, for each n > 1 one chooses an fn of the form n f(n?1) + (1 ? n)gn such that jjfn ? f jj2 0inf inf jjfn?1 + (1 ? )g ? f jj2 + n 1 g 2 G 1 An example of another result is the proposition [51] that any continuous real functional on a compact subset of a real normed linear space can be approximated arbitrarily well using a single-hidden-layer neural network with a linear functional input layer and exponential (or polynomial or sigmoidal) nonlinearities.
4
5 in which n is a positive number that satis es n O(n?2 ). In particular, for large n one chooses g as well as so that jjfn?1 + (1 ? )g ? f jj is nearly minimal. Here we direct attention to an observation concerning the proof of the lemma described above, and give a corresponding simpler algorithm for achieving bounds similar to (2.1). The observation is just that the proof of Lemma 1 given in [3] shows that each fn in Lemma 1 can in fact be taken to be an arithmetic average of n points in G.2 In the remainder of this chapter, we show how one can compute such n-point averages.
2.2 The algorithm Let f be an element of co(G), and de ne
= vinf sup fjjg ? vjj2 ? jjf ? vjj2g: 2H g2G
Choose > (it suces to take > b2 ? jjf jj2, because b2 ? jjf jj2 since 0 2 H). Set k = ( ? )=k2 for k = 1; 2; . Let f1 ; f2 ; be any sequence generated as follows: Choose f1 = g1 2 G such that jjf ? g1 jj2 (ginf jjf ? gjj2) + 1 . 2G For k = 2; 3; choose gk 2 G such that ?1 ?1 2 jjf ? (1 ? k?1)fk?1 ? k?1gk jj2 ginf 2 G jjf ? (1 ? k )fk?1 ? k gjj + k
(2.2)
and set fk = (1 ? k?1 )fk?1 + k?1gk : P Of course fk is the arithmetic average k?1 kj=1 gj of k elements of G. We will prove the following:
Theorem 1 Under the conditions indicated, for all n.
jjf ? fn jj2 n
(2.3)
Proof: We will use an induction argument based on the following lemma [14, Lemma 3 of Chapter 25] (which is related to a result in [3]).3
2 As an aside, the proof is probabilistic and is reminiscent of one used by Shannon in his 1947 random coding argument, in that one observes
that if an average is bounded from above by a certain number then at least one of the terms in the expression for the average must also be bounded in this way. 3 The proof of Lemma 2 given in [14] and the proof of the related result in [3] assume that the inner product space is real, but obvious modi cations show that both results hold also for complex spaces.
6
Lemma 2 Let G; f and be as above, and let h belong to the convex hull of G. Then inf jjf ? h ? (1 ? )gjj2 2 jjf ? hjj2 + (1 ? )2 :
g2G
for 2 [0; 1]. By Lemma 2 and the choice of f1 ,
jjf ? f1 jj2 ginf jjf ? gjj2 + 1 2G ginf jjf ? gjj2 + ( ? ) 2G + ( ? ) = ; showing that (2.3) holds for n = 1. Now suppose that jjf ? f(n?1)jj2 =(n ? 1) for some n 2. By (2.2) and Lemma 2,
jjf ? fn jj2 ginf jjf ? (1 ? n?1)fn?1 ? n?1g jj2 + n 2G n ? 1 2 jjf ? f jj2 + 1 + n
n?1
+ + (n ?n1) n 2 n ? ( n?2 ) + n = n
n2
n
which completes the proof. Besides showing that our algorithm may be used to generate simpler approximants with somewhat less computational cost, the theorem improves Lemma 1 in also the sense that (as simple examples show) can be less than b2 ? jjf jj2.
Chapter 3 Classi ers on Relatively Compact Sets 3.1 Introduction The problem of classifying signals is of interest in several application areas. (E.g., it arises in connection with both automatic target recognition and symbol detection in digital communication using analog channels.) Typically we are given a nite number m of pairwise disjoint sets C1; : : :; Cm of signals, and we would like to synthesize a system that maps the elements of each Cj into a real number aj , such that the numbers a1 ; : : :; am are distinct. In a recent paper [51] it is shown that this classi cation can be performed by certain simple structures involving linear functionals and memoryless nonlinear elements, assuming that the Cj are compact subsets of a real normed linear space1 . In Section 3.2 we give a similar solution to the problem under the considerably weaker assumption that the Cj are relatively compact (i.e., have compact closure) and are of positive distance from each other. An example is given in which the Cj are subsets of Lp (a; b); 1 p < 1. This example shows that a very wide class of signals can be classi ed using the structures described. The signals need not be continuous, in contrast with those of the example given in [51].
3.2 Classi cation theorem We need a few preliminaries: Let C1; : : :; Cm be subsets of a real normed linear space X, with m > 1. Let us say that the collection fC1; ; Cmg is strongly separated if each Ci is nonempty and the distance between each pair of sets is positive in the usual sense that 0 < inf f jja ? bjj : a 2 Ci ; b 2 Cj g; i 6= j: We de ne a classi er on [i Ci to be a functional that for each i assigns a real number ai to every x 2 Ci where a1; ; am are distinct. Let X be the set of bounded linear functionals on X (i.e., the set of bounded linear maps from X to the reals R). Given a compact subset C of X, let Y be any set of continuous maps from X to R that is dense in X on C, in the sense that for each 2 X and any > 0 there is a y 2 Y such that j(x) ? y(x)j < , x 2 C. Examples of such sets Y are given in Section 3.2.3. 1 The main result in [51] expands on an earlier observationin [48] regarding approximationof continuous nonlinearfunctionals using feedforward neural networks with linear (or ane) functional input layer and one hidden (e.g., sigmoidal) layer.
7
8 Finally, let U be any set of continuous maps u : R ! R such that given > 0 and any bounded interval ( 1 ; 2 ) R P there exists a nite number of elements u1; : : :; u` of U for which j exp( ) ? j uj ( )j < for 2 ( 1 ; 2 ).2 Our classi cation result is as follows.
Theorem 2 Let C0 = C1 [ : : : [ Cm where fC1; : : :; Cm g is a strongly-separated collection of relatively compact subsets of X, and let f be a classi er on C0 that takes the value ai on Ci ; 1 i m. Also, let > 0 be given. Then there exists
a positive integer k, real constants c1 ; : : :; ck, elements u1; : : :; uk of U, and elements y1 ; : : :; yk of Y with C the closure of C0 such that k X ai ? < cj uj [yj (x)] < ai + (x 2 Ci ; 1 i m): j =1
Proof: We will use the following.
Lemma [51]: Let g be a real-valued continuous map de ned on a compact subset K of X. Then given > 0 there are a
positive integer k, real numbers c1; : : :; ck , elements u1; : : :; uk of U, and elements y1; : : :; yk of Y with C = K such that
jg(x) ?
X j
cj uj [yj (x)]j <
for x 2 K. Continuing with the proof and using the strong separation of the Ci, we see that f satis es x1; x2 2 C0 ; jjx1 ? x2 jj < ) jf(x1 ) ? f(x2 )j = 0 for any suciently small positive , showing that f is uniformly continuous on C0. Note that K := C 1 [ C 2 : : : [ C m ; (with C i the closure of Ci ), being the union of nitely many compact sets, is compact. Since C0 is dense in K, there exists [42, pp. 99, Problem 13] a unique continuous extension of the functional f from C0 to K. Let g denote this extension. Using the lemma we obtain the desired result.
3.2.1 Comments Theorem 2 shows that the simple structure illustrated in Figure 3.1 consisting of elements yi of a class of functionals,3 a memoryless nonlinear map h, and a quantizer (here a map Q that for each j takes numbers in the interval (aj ? 0:25; aj + 0:25) into aj , where = mini6=j jai ? aj j) is capable of implementing any classi er on strongly separated relatively compact sets. The theorem shows also that neural networks of a certain architecture with a single hidden layer may be used to implement the map h. This architecture is illustrated in Figure 3.2 where every uj block represents the evaluation of a 2 Of course we can take U to be the set whose only element is exp(), or the set fu : u( ) = ( )n=n!; n 2 f0; 1;: : :gg. 3 The functionals can be taken to be linear. As mentioned earlier, examples are given in Section 3.2.3.
9
x
?? -? @@ @-
-
y1 .. . yk
-
h
-
Q
f(x)
Figure 3.1: Architecture of the classi er. real-valued function uj de ned on the reals. As described above, the uj are drawn from a set U of continuous functions whose nite sums can approximate the exponential function exp() on bounded intervals. In particular, uj can be taken to be exp() for all j.
y1 (x)
-
yk (x)
-
u1 .. . uk
@@c1L R ? ? ck
Figure 3.2: Neural network implementation of the map h.
3.2.2 Example of C0 Here we prove a theorem which provides a class of examples of relatively compact sets of particular interest concerning the classi cation of signals. Given ; ; > 0; let A(; ; ) denote any set of real-valued functions de ned on a xed nite interval [a; b] such that for every x 2 A(; ; ): 1) The number of discontinuities of x is at most . 2) For every interval of continuity I of x,
jx(t1) ? x(t2)j jt1 ? t2j; 8 t1 ; t2 2 I :
10 3) jx(t)j ; 8 t 2 [a; b]: Let Lp (a; b), with 1 p < 1, denote the usual normed space of real-valued Lebesgue-measurable functions de ned on [a; b] that are pth power integrable.
Theorem 3 A(; ; ) is a relatively compact subset of Lp (a; b). Proof: We use the bound (see the proof of Lemma 3 of [50])
Zb a
jx(t) ? x(t + )jp dt [(2 )p ( + 1) + (b ? a)]j j
for x 2 A(; ; ) and 2 R (by an abuse of notation x in the integrand is an extension of x that vanishes outside [a; b]). This bound shows that given an > 0 there exists a > 0 that is independent of x such that
Zb a
jx(t) ? x(t + )jpdt < p
for j j < . Also, A(; ; ) is uniformly bounded because
Zb a
jx(t)jp dt (b ? a) p :
Hence, by a theorem due to Riesz [31, p. 44], A(; ; ) is relatively compact. Theorem 3 shows that with X = Lp (a; b), the Ci can be taken to be any strongly separated family in which each member is some A(; ; ). This example illustrates that a very wide class of signals can be classi ed using the structures described in the preceding section. The signals need not be continuous, in contrast with those of the example given in [51].
3.2.3 Examples of Y Let [a; b] and Lp (a; b) be as described in the preceding section. For the case in which X = Lp (a; b), each yi in Figure 3.1 can be taken to be represented by integration over [a; b] after pointwise multiplication by a kernel in the conjugate space, since all linear functionals can be represented in that manner. The following example of an alternative simpler representation is taken from Chapter 4. For each integrable x: [a; b] ! R and each > 0, let the so-called Steklov function be de ned by Z t+ x (t) = 21 x() d; t 2 [a; b] t? where (abusing the notation) x in the integrand is an extension of x that vanishes outside [a; b]. Note that x is continuous on [a; b] and that, in particular, it belongs to Lp (a; b). For a given n 2 N and > 0, we use Sn; to denote
9 8 n = < X c x ( ); c 2 R ; 1 j n s: L (a; b) ! R j s(x) = j j j ; : p j =1
11 in which the j are the elements of the uniformly spaced grid of n points j = a+j(b ? a)=n; 1 j n. The set Sn; is a family of what might be called Steklov functionals of order n and width . The class S of all Steklov functionals is de ned by [ S = Sn; : n 2N >0
The following theorem is proved in Chapter 4. It justi es taking S as an example of a set of continuous maps from Lp (a; b) to R that is dense in Lp (a; b) over compacta. (That is, it justi es the statement that given any compact C Lp (a; b) we can take Y = S .)
Theorem 4 Let f 2 Lp (a; b) and let C be a compact subset of Lp(a; b). Then for each > 0 there exists an s 2 S such that
jf(x) ? s(x)j < ; x 2 C:
Note that the elements of S are simpler in form than the integrals involving a kernel mentioned at the beginning of this section, in that a member of S is speci ed by just a nite number of parameters.
Chapter 4 Approximation of Bounded Linear Functionals 4.1 Introduction One of the earliest results in the area of neural networks is the proposition that any continuous real function de ned on a compact subset of Rk (k an arbitrary positive integer) can be approximated arbitrarily well using a single-hiddenlayer network with sigmoidal nonlinearities (see, for example, [18]). Among other results in the literature concerning arbitrarily good approximation that concern more general types of \target" functionals, dierent network structures, other nonlinearities, and various measures of approximation errors is the proposition in [49], [51] that any continuous real nonlinear functional on a compact subset of a real normed linear space can be approximated arbitrarily well using a single-hidden-layer neural network with a linear functional input layer and exponential (or polynomial or sigmoidal) nonlinearities. In an interesting paper [11] by Chen and Chen similar results concerning single-hidden-layer neural networks with sigmoidal nonlinearities are proved for continuous real functionals on a compact subset of an Lp space (1 < p < 1 ). One of the main results in [11] is the proposition that the linear functionals in the input layer may have the special form of a nite sum of integrals of a certain type. Here we connect these results by showing that every continuous linear functional can be approximated arbitrarily well on compact sets by nite sums of the type used in [11]. More speci cally, the main result in [51] shows that any real continuous functional de ned over a compact subset of a real normed linear space can be approximated arbitrarily well with a neural network employing an input layer of functionals that can be taken to be linear, and a single-hidden-layer implementing nonlinearities that, for example, can be taken to be sigmoidal. To describe the result proved there, let X be a real normed linear space, and let X be the set of bounded linear functionals on X (i.e., the set of bounded linear maps from X to the reals R). Given a compact subset C of X, let Y be any set of continuous maps from X to R that is dense in X on C, in the sense that for each 2 X and any > 0 there is a y 2 Y such that j(x) ? y(x)j < , x 2 C. Also, let U be any set of continuous maps u : R ! R such that given > 0 and any bounded interval ( 1 ; 2 ) R there exists a nite number of elements u1 ; : : :; u` of U for which j exp( ) ? Pj uj ( )j < for 2 ( 1 ; 2 ).1 The result in [51] is this: Let g be a real-valued continuous map de ned on C. Then given > 0 there are a positive integer k, real numbers c1; : : :; ck , elements u1; : : :; uk of U, and elements y1 ; : : :; yk of Y such that X jg(x) ? cj uj [yj (x)]j < j
1 Of course we can take U to be the set whose only element is exp(), or the set fu : u( ) = ( )n=n!; n 2 f0; 1;: : :gg.
12
13 for x 2 C. The proof in [51] of the result described above shows that the result can be slightly sharpened in that \y1 ; : : :; yk of Y " can be replaced with \y1; : : :; yk of Y for some > 0," where each Y is any set of continuous maps from X to R with the property that given 2 X with jjjj 1 there is a y 2 Y such that j(x) ? y(x)j < , x 2 C, provided that we also replace \such that given > 0 and any bounded interval ( 1 ; 2 ) R there exists a nite number of elements u1; : : :; u` P of U for which j exp( ) ? j uj ( )j < for 2 ( 1 ; 2)" with \such that given 1; > 0 and any bounded interval P ( 1 ; 2 ) R there exists a nite number of elements u1; : : :; u` of U for which j exp( ) ? j uj ( )j < for 2 ( 1 ; 2 )". It can be veri ed that for the modi ed result the elements of U need not be continuous, thus yielding an even stronger result. Now suppose that X = Lp with 1 < p < 1, and that U = fu : u( ) = cs(w + ); c; w; 2 Rg, where s is a sigmoidal function. In this setting we obtain one of the main results in [11] as a special case of the above slightly-sharpened result, since our theorem below and a simple observation in Section 4.3 about its proof permit us to take the Y to be sets of nite linear combinations of integral functionals of the form considered in [11].2 That is, in the Lp setting, for each there is an h > 0 such that we can take the value at x of the elements y of Y to be given by certain nite linear combinations of integrals of the form 1 Z t+h x() d 2h t?h
with dierent values of t. Besides relating the two works [11] and [51], the theorem given in the next section is felt to be interesting in itself.
4.2 Approximation Theorem
4.2.1 Notation and De nitions Let Lp ; 1 p < 1 denote the space of real-valued Lebesgue-measurable functions de ned on a nite interval [a; b] that are pth power integrable, and let jj jj denote the usual norm. For each integrable function x: [a; b] ! R and each h > 0, let
the so-called Steklov function be de ned by
Z t+h 1 x() d; t 2 [a; b] xh (t) = 2h t?h where (abusing the notation) x in the integral is an extension of x that vanishes outside [a; b]. Note that xh is continuous on [a; b] and that, in particular, it belongs to Lp . For a given n 2 N and h > 0, we use Sn;h to denote 9 8 n = < X c x ( ); c 2 R; 0 j n s: L ! R j s(x) = j h j j p ; : j =1
in which the j are the elements of the uniformly spaced grid of n points j = a + j(b ? a)=n; 1 j n. The set Sn;h is a family of what might be called Steklov functionals of order n and width h. 2 We also extend the result in [11] to p = 1.
14 The class S of all Steklov functionals is de ned by
S=
[ n2N h>0
Sn;h :
4.2.2 The Theorem Our result is the following.
Theorem: Let f be a real continuous linear functional de ned on Lp and let C be a compact subset of Lp . Then for each > 0 there exists an s 2 S such that jf(x) ? s(x)j < for all x 2 C. Proof: Let > 0 be given. By the characterization of relatively compact sets3 in Lp due to Kolmogorov as extended to p = 1 by A. N. Tulaikov (see [35], pp. 212-216), for any > 0 there exists an h > 0 such that jjx ? xh jj < for all x 2 C. Using the uniform continuity of f, choose h so that
jf(x) ? f(xh )j < =3; x 2 C:
(4.1)
Since f is a continuous linear functional on Lp , we have f(xh ) =
Zb a
k()xh () d
(4.2)
for some k 2 Lq where q is the conjugate index of p. Since the map from x 2 C to xh in the normed space of continuous functions is continuous, we see that fxh j x 2 C g is compact, and so M = sup sup jxh (t)j is nite. Using the integrability x 2 C t 2 [a;b] of k, let g be a step function on [a; b] such that M
Zb a
jk() ? g()j d < =3:
(4.3)
By (4.1), (4.2), (4.3) and the triangle inequality in R,
f(x) ? Z b g()xh() d < 2=3; x 2 C: a
(4.4)
We will now show that there is an n and an s 2 Sn;h such that
Z b g()xh () d ? s(x) < =3; x 2 C; a
3 In [35] the term \compact" means relatively compact.
(4.5)
15 and this will complete the proof. (Recall that h has been chosen.) We use the following lemma.
Lemma ([51]): Let A and B be normed linear spaces, and let K be a compact subset of A. Suppose that G : K ! B is Lipschitz. Let F be any family of Lipschitz maps H : K ! B such that given any > 0 and any nite set fx1; : : :; xr g of points of K there is an H 2 F for which jjGxi ? Hxijj < ; 1 i r: Then given > 0 there is an H 2 F such that jjGx ? Hxjj < for all x 2 K. R It is easy to check that the map from x 2 C to ab g()xh () d as well as the elements of [n 2 NSn;h are Lipschitz. Referring to the lemma, take A = Lp ; K = C; B = R, F = [n 2 NSn;h and let G be de ned by Gx =
Zb a
g()xh () d:
(4.6)
Now choose any x 2 C and any > 0. Since gxh is bounded and sectionally continuous on [a; b], there is an n0(x) 2 N with the following property. For n 2 N with n n0(x) the Riemann partition j = a + j(b ? a)=n; 0 j n, of [a; b] is such that Z n b g()x () d ? X g(j )xh (j )(b ? a)=n < : h
a
Thus, given a set fx1 ; : : :; xr g of points of C we have
j =1
n Gx ? X i j=1 cj xih(j ) < ; 1 i r:
for n = maxi fn0(xi )g, in which cj = g(j )(b ? a)=n. By the lemma there is an s 2 [n 2 NSn;h such that (4.5) holds. This completes the proof of the theorem.
4.3 Comments A check of the proof shows that h (which depends on C and ) can be chosen to be independent of f for f of norm at most unity, and that the theorem holds also for the case in which f is complex valued, Lp is a complex space, and the coecients cj are complex numbers. The last part of the proof can be generalized to yield an interesting result concerning what might be called the \uniform Riemann integrability" of certain integrands. This result, which might be useful in other approximation studies, is discussed in Appendix A.
Chapter 5 Approximation of Nonlinear Maps Here we show that nonlinear dynamic systems de ned on various spaces can be approximated arbitrarily well with a variety of neural network structures. We give two schemes for such approximation, and also provide error analysis for one scheme. We demonstrate the applicability of our results to a large class of stable nonlinear feedback systems in continuous-time, and causal time-invariant systems in the discrete-time case.
5.1 Introduction One of the principal results of this work is that nite sums of the form k(i) m X X i=1 j =1
cij uij [yij ()]vi
can approximate (arbitarily well in an important sense) the elements of a large class of input-output maps of nonlinear dynamical systems taking a certain subset of the linear space of Rn-valued continuous functions de ned over R, into the linear space of R-valued functions de ned over R. Here the cij are real constants, the uij are certain continuous real-valued functions of the reals, the yij are continuous real functionals, and the vi are elements of the output space. A similar result follows also for the case of functions de ned over Rn or over discrete domains. Corresponding results for complex-valued inputs and/or outputs can also be derived, but that is not considered here. To the best of our knowledge, this is the rst approximation result (using structures like neural networks) concerning input-output maps of systems that need not be time (or shift) invariant. Another fact that makes this result interesting is that the inputs are not restricted to be de ned on nite intervals. Also, the set of inputs need not be compact in the usual topologies of the input spaces. Results related in a general sense to ours can be found in the work of Chen and Chen [12], where the approximating structures are certain forms of radial basis function neural networks. In contrast to [12], our results do not restrict the form of the basis function to be radial with respect to the Euclidean norm, thus permitting very exible choices of approximating structures. Approximation results using Volterra series for nonlinear systems were obtained in [47] (in this connection see also Boyd's dissertation [6]). In contrast to [6], we treat input and output functions with multidimensional domains and ranges, as well as provide approximations in the Lp norm. Unlike [47], we do not require Frechet dierentiability of the input-output map. 16
17
5.2 Preliminaries
5.2.1 The inputs and outputs We consider nonlinear dynamical systems as input-output maps between appropriate input and output spaces as de ned below. Throughout this work V denotes Rd, for an arbitrary positive integer d. We denote the norm in V by jj jjV . Let n be a positive integer, and let D denote the domain of the input functions, where D = Rn+ or Zn+, or Rn or Zn. With D we associate the \natural" measure (the Lebesgue measure for Rn+ or Rn, and the counting measure for Zn+ or Zn). The norm in D is denoted by j j, taken to be the Euclidean norm for Rn+ or Rn, and the inherited Euclidean norm for Zn+ or Zn. Let X denote the set of all V -valued -measurable functions de ned on D. By a slight abuse of notation, let both the zero element of X and that of V be denoted by . We assume that the inputs are drawn from X . We let X denote a subset of X satisfying one of the following assumptions:
A1. X = Cb(D), the set of all bounded continuous functions in X , with the norm given by jjf jj = sup jjf(t)jjV : t2D
A2. X = Lp(D; ); 1 p < 1, with the norm de ned by
8Z 91=p < = jjf jj = : jjf(t)jjV p (dt); : D
Note that in each case X is a Banach space. Let E be a real Banach space with norm jj jjE , e.g., E can be taken to be the linear space of R-valued bounded continuous functions de ned over D with jjyjjE = supt2D jy(t)j. We assume that our outputs belong to E.
5.2.2 Localizable systems We rst de ne the weighted norm spaces used in the development of our results.1
If X = Cb(D), let W be the set of continuous maps w: D ! (0; 1] with limjtj!1 w(t) = 0. For w 2 W, let Xw be the set of all continuous functions f in X for which jjf jjw = sup jjf(t)w(t)jjV < 1: t2D
1 Related ideas (in a general sense) involving weighted norms on function spaces can be found in, for example, proofs of the boundedness of solutions of integral equations [45], [46, pp. 880{1], and in a large number of other publications (see, for instance, [6], [17], [15], [16]).
18
If X = Lp (D; ), let W be the set of -measurable maps w: D ! (0; 1] with limjtj!1 w(t) = 0. For w 2 W, let Xw be the set of all functions f in X for which
91=p 8Z = < jjf jjw = : jjf(t)jjV p w(t)(dt); < 1: D
Later we will use the fact that in every case Xw is a Banach space (see Appendix B). For every a 2 R+, let ca denote [?a; a]n \ D. Let be a function from R+ to R+, and assuming A1 holds, let R 2 Xw be a function that satis es lim jjR(t)w(t)jjV = 0: (5.1) jtj!1 Let B0 be de ned as follows:
1. If D = Zn+ or Zn, let B0 be any nonempty subset of X satisfying the following condition: for every f 2 B0 and t 2 ca ,
jjf(t)jjV jjR(a)jjV : 2. If D = Rn+ or Rn, let B0 be any nonempty subset of X such that for every f 2 B0 and t 2 ca , jjf(t)jjV jjR(a)jjV . We assume additionally that:
Under Assumption A1, jjf(t) ? f()jjV (a)jt ? j for all t; 2 ca , and every f 2 B0 . Under Assumption A2, let be a function from R+ to R+, such that the discontinuities in ca of any f 2 B0 lie on at most (a) hyperplanes perpendicular to the coordinate axes, and for every n-interval of continuity I of f, jjf(t) ? f()jjV (a)jt ? j for all t; 2 ca , and every f 2 B0 . Let B = B0 denote the closure of B0 in the norm jj jjw . We say that a map M: B ! E is w-localizable on B if M is continuous in the norm jj jjw .
5.2.3 Motivation
A characterization for time-varying kernels
Here we give a characterization of w-localizability in the case of maps de ned by convolution with a time-varying kernel. Consider input-output maps M under Assumption A1, for the case D = R+, V = R. Here E is the linear space of R-valued bounded continuous functions de ned over D with jjyjjE = supt2D jy(t)j. Let the kernel k(; ) be a jointly measurable bounded real valued function on D D.
Proposition 1 Let M be a linear causal time-varying map M on B de ned by (Mx)(t) =
Zt 0
k(t; )x()d;
(5.2)
19 for all t 2 R+ and x 2 B. M is w-localizable if and only if Z t ) d < 1: sup k(t; t2R + 0 w()
(5.3)
Proof: Note that M may be reexpressed in terms of the new kernel kw (t; ) = k(t; )=w() since w is continuous and
does not vanish. Hence,
(Mx)(t) =
Zt 0
kw (t; )x()w()d:
We will use the following well-known result [4], [61] (see also [20] for a succinct proof). Let L1 be the linear space of all essentially bounded measurable real valued functions on D, and let g(; ) be a jointly measurable real valued function on D D. Then, G de ned by (Gx)(t) = is a bounded map from L1 into L1 if and only if sup
R
Zt
t2 + 0
Zt 0
g(t; )x()d
jg(t; )j d < 1:
Suciency: Assume that kw satis es the condition (5.3), then by the above mentioned result there exists a constant
A > 0 such that
y 2 L1 ) jjMyjj1 Ajjyjj1: Let > 0 be given and choose = =A. Consider x 2 Xw such that jjxjjw < . Since xw 2 L1 and jjxwjj1 < , jjMxjj1 A = . Hence M is w-localizable. Necessity: Assume that M is w-localizable, i.e., it is continuous in the norm jj jjw, hence bounded. Therefore, there exists a constant A0 > 0 such that x 2 Xw ) jjMxjj1 A0 jjxjjw: We will show that M as de ned in (5.2) over L1 is bounded hence (5.3) is satis ed. Let y 2 L1 be arbitrary. We will use the following expressions for the output My at any arbitrary but xed time t 0: (My)(t) = =
Zt 0
Zt 0
k(t; )y()d kw (t; )y()w()d:
Using Luzin's Theorem [29, p. 293], we can nd a continuous real valued function yc de ned on R+ such that (Lt ) < 1=Kt and yc () = yc (t) for all > t, where Lt = f 2 [0; t]: yc() 6= y()g and Kt = max 2[0;t] k(t; ). Hence
j(My)(t)j
Z
Lt
jk(t; )y()j d +
Z
[0;t]nLt
jkw (t; )yc()w()j d
20
Kt jjyjj1(Lt ) + A0 jjycjjw jjyjj1 + A0 jjycjj1 (1 + A0 )jjyjj1: The last inequality follows from the construction of yc given in [21, Theorem 7.5.2, p. 190], since yc may be chosen to satisfy yc () jjyjj1 for all 2 [0; t]. Relationship of w-localizability to usual continuity
Here we consider linear input-output maps M under Assumption A1, for the case D = R+, V = R, with E being the linear space of R-valued bounded continuous functions de ned over D with the usual sup norm. We show that w-localizability is stronger than continuity in the usual norm. We also give sucient conditions on the set of inputs B to guarantee w-localizability given continuity in the usual norm.
Proposition 2 Let M be a w-localizable linear map M: B ! E, where the inputs are bounded in the usual sense, i.e., jjxjj1 < 1 for any input x 2 B. Then M is continuous in the usual norm jj jj1. Proof: Since M is w-localizable, it is continuous in the norm jj jjw by de nition. Hence, given any positive there exists a positive such that jjxjjw < ) jjMxjj1 < . Since w: D ! (0; 1], we have jjxjj1 < ) jjxjjw < . Hence M is continuous in the norm jj jj1. Now we give sucient conditions on B so that w-localizability follows from continuity in the usual norm. Let B satisfy the following conditions:2 1. For any input x 2 B, jjxjj1 < 1. 2. There exists a positive T with the following property: for every x 2 B there exists a x 2 (0; 1) such that sup jx()j < >T x jjxjj1.
Proposition 3 Let M be a linear map M: B ! E, where B satis es the above conditions. Assume that M is continuous in the usual norm jj jj1. Then M is w-localizable for any nonincreasing w 2 W. Proof: Let T > 0 be chosen according to the above conditions. Let w 2 W be any nonincreasing weight. Note that jjxjjw = maxftmax jx(t)w(t)j; sup jx(t)w(t)jg: 2[0;T ] t2(T;1)
Since w is nonincreasing,
max jx(t)w(t)j w(T) tmax jx(t)j; 2[0;T ]
t2[0;T ]
2 These conditions are met by, for example, a set B consisting of functions with support contained in a xed nite interval.
21 and
sup jx(t)w(t)j w(T) sup jx(t)j:
t2(T;1)
t2(T;1)
Since supt2(T;1) jx(t)j < jjxjj1 < jjxjj1, it follows that maxt2[0;T ] jx(t)j = jjxjj1. Hence
jjxjjw = t2max jx(t)w(t)j [0;T ] w(T) t2max jx(t)j [0;T ] = w(T)jjxjj1: It follows that
jjxjj1 w(T)?1 jjxjjw; x 2 B: Since M is continuous in the usual norm jj jj1, given any positive there exists a positive such that jjxjj1 < ) jjMxjj1 < . Let = w(T). Hence, jjxjjw < ) jjxjj1 < . Combining the above implications, we have the desired result.
5.2.4 Approximating structures
We denote by Xw the set of bounded linear functionals on Xw (i.e., the set of bounded linear maps from Xw to the reals R). Given a compact subset C of Xw , let Y be any set of continuous maps from Xw to R that is dense in Xw on C, in the sense that for each 2 Xw and any > 0 there is a y 2 Y such that j(x) ? y(x)j < , x 2 C. Unless otherwise mentioned, w 2 W will henceforth be xed. Also, let U be any set of continuous maps u : R ! R such that given > 0 and any bounded interval ( 1 ; 2) R P there exists a nite number of elements u1; : : :; u` of U for which j exp( ) ? j uj ( )j < for 2 ( 1 ; 2 ).3 Let T denote the set of maps from B to E of the form k(i) m X X i=1 j =1
cij uij [yij ()]vi
(5.4)
where m 2 N, k(i) 2 N for 1 i m, and cij 2 R, uij 2 U, yij 2 Y , vi 2 E for 1 i m; 1 j k(i). A typical element M of T can be realized by what may be called a tensor product neural network, since, by de ning the continuous P (i) c u [y ()], we see immediately that M() = Pm ()v is an element of the so-called tensor functionals i() = jk=1 ij ij ij i=1 i i product [13], and that M can be implemented as a neural network as shown in Figure 5.1.
5.3 Characterization of Continuity Let H be a compact metric space. The linear space G is the set of all maps G: H ! E such that
jjGjjG = sup jjGxjjE < 1: x2H
3 Of course we can take U to be the set whose only element is exp(), or the set fu : u( ) = ( )n=n!; n 2 f0; 1;: : :gg.
22
v1
x2B
?? -? @@ @-
1 .. . m
-N? vm
-N?
@@L @R ?? ?
Mx
P (a) Network representation of the tensor product M(x) = mi=1 i (x)vi .
x
? - yi1 ? .. -? . @@ @ - yik(i)
-
ui1 .. . - uik(i)
@ ci1 @RL ? ? cik(i)
i(x)
P (i) c u [y (x)]. (b) Neural network implementation of one of the input functionals i (x) = jk=1 ij ij ij Figure 5.1: A tensor product neural network M.
23 Note that the set of continuous maps, denoted by C (H; E), is a subspace of G . To give an example of a discontinuous map G 2 G consider disjoint nonempty compact subsets C1 ; ; Cm of H with metric and the distance between A; B H given by d(A; B) = inf f(a; b): a 2 A; b 2 B g. With C0 = H n [mi=1 Ci, let G be de ned by Gx = i; if x 2 Ci; i 2 f0; 1; ; mg; where it is assumed that
d(Ci ; C0) = 0; i = 1; ; m:
5.3.1 The approximation theorem Theorem 5 Let G: B ! E be given, and let T be de ned as in (5.4) with C = B. Then the following are equivalent. 1. G is w-localizable on B. 2. For any > 0, there exists M 2 T such that jjG ? M jjG < .
Proof: (1) ) (2). We rst prove that B0 is relatively compact in the norm jj jjw. De ne a sequence of operators
T1 ; T2; : : : from Xw into itself for the continuous and discrete input domains separately: 1. (D = Zn+ or D = Zn)
s(t), if t 2 c k (Tk s)(t) = ,
otherwise.
(5.5)
2. (D = Rn+ or D = Rn) Under assumption A1, we de ne Tk by the following extension. Note that the norm j j in D is uniformly convex. Since ck is closed and convex, for any x 2 D there exists a unique x^ 2 ck such that jjx ? x^jj = inf y2ck jjx ? yjj. Let (Tk s)(x) = s(^x)
(5.6)
Under assumption A2, we let Tk be de ned as in (5.5). We use the following tool theorem.
Theorem 6 ([52]) Let S be a subset of a complete metric space A with metric , and let T1 ; T2; : : : be maps of A into itself such that
(i) Tk (S) is a relatively compact subset of A for each k, and (ii) (s; Tk s) ! 0 as k ! 1 uniformly for s 2 S. Then S is a relatively compact subset of A.
24 With A = Xw , S = B0 , and the metric induced by the norm jj jjw , the theorem implies (by the veri cation of its hypotheses in Appendix B) that B0 is relatively compact in the norm jj jjw . Since G is w-localizable, G is continuous in the norm jj jjw by de nition, i.e., G 2 C (B; E). The space of all continuous functionals on B, C (B; R), is denoted by C (B) as usual. Let A1 be a subspace of C (B). The so-called tensor product [13] of A1 and E (denoted by A1 E) is the linear manifold of C (B; E) consisting of nite P sums of the form () = kj=1 aj ()vj , where aj 2 A1 and vj 2 E. We need the following technical proposition.
Proposition 4 C(B) E is dense in C(B; E). Proof of the Proposition: Let M = C(B) E, and let M(x) = ff(x): f 2 Mg. Since C(B) contains the constant function taking the value 1, M(x) = E for every x 2 B. The proposition follows from [7, Corollary 1]. Continuing with the proof of the theorem, and using the above Proposition, there exist continuous functionals ai and elements vi of E such that P jjGu ? mi=1 ai (u)vi jjE < =2; u 2 B: By Theorem 1 of [51], there exist k(i) 2 N, cij 2 R, uij 2 U, yij 2 Y , 1 j k(i) such that
k(i) a (u) ? X cij uij [yij (u)] < =(2mv); u 2 B; 1 i m; i j =1
where v = max1im jjvijjE . Combining the above estimates using the triangle inequality, we get the desired result.
(2) ) (1). Given > 0, let G^ 2 T be chosen such that ^ jjE < =3; u 2 B: jjGu ? Gu
(5.7)
Note that G^ is continuous on B which is compact, hence G^ is uniformly continuous, i.e., there exists > 0 such that ^ ? Gu ^ jjE < =3: u; v 2 B; jju ? vjjw < ) jjGu
(5.8)
By (5.7), (5.8) and the triangle inequality, ^ jjE + jjGu ^ ? Gv ^ jjE + jjGv ^ ? GvjjE < : jjGu ? GvjjE jjGu ? Gu
(5.9)
From the above, G is continuous in the norm jj jjw , hence G is w-localizable. More generally, the following result is true. Again, with H a compact metric space, let P be the set of all maps of the P form x 7! mi=1 i (x)vi where i 2 C (H), and vi 2 E.
25
Scholium 1 A map G: H ! E is continuous if and only if for any > 0, there exists M 2 P such that jjGx ? MxjjE < ; x 2 H. Proof: Note that the proof of suciency can be given exactly analogous to that of Theorem 5, since any M 2 P is (uniformly) continuous on H. Similarly for the necessity, note that we can stop after using Proposition 4 in the proof of Theorem 5.
5.4 Applications to Time-invariant Systems Here we exhibit a large class of applications of our last result in the area of approximation of causal time-invariant systems.
5.4.1 Notation Let D = Z+, and let X be the linear space of real-valued sequences with the metric X ? y(k)j : (x; y) = 2?k 1 +jx(k) j x(k) ? y(k)j k2D
With an arbitrary positive number c, let S = fx 2 X: jx(t)j c; t 2 Z+g. From the criteria of compactness in X [35, Theorem 1, p. 202, vol. II], S is compact. For every 2 D, let the delay and advance operators on X be de ned by ? ), t (T x)(t) = x(t 0, t< (T x)(t) = x(t + ); t : Note that S is closed under T and T . Let U = S. We say that M: U ! X is time-invariant if MT = T M for every . Let sjca stand for the restriction of s 2 X to ca . As usual, we say that a map M: U ! X is causal if for all a 2 D and x; y 2 S, xjca = yjca implies (Mx)(a) = (My)(a). Let Wt;a : X ! X for t; a 2 D; a > 0 be a windowing operator de ned by x(), t ? a t (Wt;a x)() = 0, otherwise. We say that M: U ! X has approximately nite memory on S if given any > 0 there is an a 2 D; a > 0 such that
j(Gs)(t) ? (GWt;a s)(t)j < ; t 2 D; s 2 S: Let Sa = fsjca : s 2 S g be topologized by a metric a . For any a 2 D, the associated functional Ma for a causal operator M: U ! X is de ned on Sa by Ma x = (My)(a); x 2 Sa where y 2 S is chosen such that yjca = x.
26
5.4.2 Approximation of discrete-time causal time-invariant systems In this section, by \G meets hypothesis H" we mean that G: U ! X is a causal, time-invariant map with approximately nite memory on S, and Ga is uniformly continuous on Sa for every a. Here, a (x; y) =
a X
k=1
? y(k)j 2?k 1 +jx(k) jx(k) ? y(k)j :
Lemma 3 If G: U ! X meets hypothesis H, then for any > 0, there exist m 2 N, i 2 C(S), and vi 2 X; i = 1; : : :; m;
such that
m (Gs)(t) ? X i (s)vi (t) < ; t 2 D; s 2 S: i=1
Proof: We only need to show that G 2 C(S; X), since the result then follows from Scholium 1. Let be an arbitrary positive number. Using approximately nite memory of G, choose a 2 D such that j(Gs)(t) ? (GWt;a s)(t)j < =3; t 2 D; s 2 S: Hence, for s1 ; s2 2 S,
j(Gs1 )(t) ? (Gs2 )(t)j j(Gs1)(t) ? (GWt;as1 )(t)j + j(GWt;as1 )(t) ? (GWt;a s2 )(t)j + j(Gs2 )(t) ? (GWt;as2 )(t)j < 2=3 + j(GWt;as1 )(t) ? (GWt;a s2 )(t)j Now we consider two cases: 1. If t a, let v 2 Sa be de ned by v() = (T t?a Wt;a s)(); 2 ca : Using the causality and time-invariance of G, (GWt;as)(t) = (GTt?a T t?a Wt;as)(t) = (GT t?aWt;a s)(a) = Ga (v): 2. If t < a, let v 2 Sa be de ned by v() = (Ta?t s)(); 2 ca ; since S is closed under T . Now use the causality and time-invariance of G again, (GWt;as)(t) = (Gs)(t) = (GTa?t s)(t) = Ga (v):
(5.10)
27 Since Ga is uniformly continuous in the a metric on Sa , we can choose a > 0 such that jGa(w) ? Ga(w0 )j < =3 for w; w0 2 Sa satisfying a (w; w0) < . It is clear that (s1 ; s2 ) < ) a (s1 jca ; s2jca ) < hence
j(Gs1)(t) ? (Gs2 )(t)j < 2=3 + jGa(v1 ) ? Ga(v2 )j < : Thus, G: S ! X is continuous.
5.4.3 A conjecture about nonlinear feedback systems Here the writer would like to conjecture that our result above can be used to approximate members of a large class of discrete-time stable nonlinear feedback systems using networks implementing the so-called tensor products. Consider the closed-loop input-output maps of feedback control systems (see Figure 5.2) containing a time-invariant sector nonlinearity represented by the operator N and a linear time-invariant system L for which the circle condition [45] is met. The conjecture is that, under these conditions, the map G taking the input r to the output y meets hypothesis H (similar to [50, Theorem 4], but in discrete-time setting) of Lemma 3, hence it can be approximated uniformly with tensor product networks.
r
-+ L e 6?
N
w- L
y
Figure 5.2: Nonlinear control system. Results along similar lines can be proved for the case of continuous-time systems, as well as for more general domains of input functions, e.g., D = Rn+ or Zn+, or Rn or Zn, but only D = R+ is considered next.
5.4.4 Continuous-time systems on the half-line Here we treat the case D = R+. We use the same formal de nitions for the delay, advance and windowing operators and the concepts of time-invariance, causality, approximately nite memory and associated functionals as in Section 5.4.1, keeping in mind that now X = L2 (0; 1). We also abbreviate the latter to just L2 . Let U be the collection of equivalence
28 classes of Lebesgue measurable real-valued functions on D. Note that U is closed under the Tt and T t . We assume that S, a nonempty subset of U, satis es the following conditions: 1. S is closed under the T t . 2. S is a compact subset of L2 . (It follows that Sa is a compact subset of Ua in the metric a (x; y) =
Z a 0
jx(t) ? y(t)j2 dt
1=2
for every a > 0, since the map x 7! xjca is continuous.) To get a class of examples of S, we introduce a concept reminiscent of the Hilbert cube [29, Example 5, p. 98] in one respect as follows: Let be any nondecreasing nonnegative function de ned on R+. Assume that 2 L2 is a bounded function, and let > 0 be given. D(; ; ) U is called a -fundamental set if every x 2 D(; ; ) satis es the following: 1) For any positive T, the number of discontinuities of x in [0; T] is at most (T). 2) On every interval of continuity I of x,
jx(t1) ? x(t2)j jt1 ? t2j; 8 t1 ; t2 2 I : 3) jx(t)j j (t)j; t 2 R+: It is clear that a -fundamental set is a subset of L2 . The proof of the following proposition can be given using some of the ideas from Section 3.2.2 and from [29, Theorem 3, p. 101].
Proposition 5 D(; ; ) is relatively compact in L2(0; 1). It is clear that S can be taken to be the closure of any D(; ; ) that is closed under the T t (which holds trivially if is nonincreasing).
Lemma 4 If G: U ! X meets hypothesis H and S satis es the two conditions stated at the beginning of this subsection, then for any > 0, there exist m 2 N, i 2 C (S), vi 2 X, i = 1; : : :; m, and a > 0 such that
m (Gs)(t) ? X i(s)vi (t) < ; t a; s 2 S: i=1
The proof follows along the same lines as that of Lemma 3, except that the case t < a there is not needed, and the uniform continuity of Ga on Sa is now in the a metric as de ned above.
29
5.4.5 Approximation of stable nonlinear feedback systems Here we show that our result above is applicable to the problem of approximating members of a large class of stable nonlinear feedback systems using tensor product networks. Consider a closed-loop input-output map G of a feedback control system (see Figure 5.2) containing a time-invariant sector nonlinearity for which the circle condition [44] is met.4 In particular, assume that the signals r; e; w and y belong to L1e , the space of real-valued Lebesgue measurable functions on R+ that are bounded on bounded subintervals. Also suppose that the initial conditions are zero, and that w and y are related by y(t) = (Lw)(t) =
Zt 0
k(t ? )w()d; t 2 D:
Here the kernel k: R+ ! R is such that its Laplace transform K is a strictly proper rational function without any pole-zero cancellation. The nonlinear part of the system is assumed to be given by the operator (Nx)(t) = [x(t)]; t 2 D; where : R ! R is a time-invariant nonlinearity such that (0) = 0, and there are two real constants ; with the following property: (b) (a)a ? ? b ; a 6= b: We assume that one of the following \circle conditions" holds: 0 < < . The locus of K(i!) for ?1 < ! < 1 is bounded away from the circle in the complex plane C1 of radius (?1 ? ?1 )=2 centered at ((??1 ? ?1 )=2; 0), and encircles C1 in the counterclockwise direction np times, where np is the number of poles of K with positive real parts. 0 = < . K has no poles in the open right-half plane, and Re[K(i!)] > ? ?1 for all real ! for which Re[K(i!)] is nite. < 0 < . K has no poles in the closed right-half plane, and the locus of K(i!) for ?1 < ! < 1 is contained within the circle in the complex plane C2 of radius ( ?1 ? ?1 )=2 centered at ((??1 ? ?1 )=2; 0). Let V be the solution map (taking u into v) of the integral equation u(t) = v(t) +
Zt 0
k(t ? )[v()]d; t 2 D;
and let the input-output map G of the closed-loop feedback system in Figure 5.2 be given by y = Gr = LNV r.
Theorem 7 Assume that the inputs r to feedback system in Figure 5.2 belong to S that satis es the conditions stated at the beginning of Section 5.4.4. If a circle condition is met, then the input-output map G can be approximated uniformly with tensor product networks. 4 For more information about approximation of causal time-invariant systems using a dierent network architecture, and about the relationship of the circle condition with approximately nite memory, see [50], on which the subsequent discussion is based.
30
Proof: Note that under these conditions G meets hypothesis H [50, Theorem 4 and comments in that section], hence the result follows by Lemma 4.
5.5 An Algorithm and its Error Analysis In this section we give another scheme for approximation of input-output maps. The result given here has the advantage over the one proved in the previous section that an error analysis is provided. Let F be a Banach space with a basis denoted P1 by fyk g1 k=1, i.e., every v 2 F has a unique representation v = k=1 ak (v)yk , in the sense of [31]. The requirement of the existence of a basis is not very restrictive, since bases have been constructed for most of the useful separable Banach spaces; however, it rules out nonseparable spaces such as L1 . Without loss of generality, we assume that the set fyk g1 k=1 is linearly independent. We give error bounds for an algorithm to construct an approximation to a continuous operator G: B ! F. Let the remainder Rm (v) after m terms of the expansion of a v 2 F with respect to the basis be de ned by Rm (v) =
1 X
k=m+1
ak (v)yk :
Since B is compact, G(B) is a compact subset of F which has a basis, hence given > 0, there exists m 2 N such that (see [31, p. 136]) jjRm(Gu)jjF < ; u 2 B: Let Ym denote the span of y1 ; : : :; ym . De ne Sm (v) =
m X
k=1
ak (v)yk ; v 2 F:
Note that Sm is a projection of F onto Ym which is continuous in jj jjF , since the ak are continuous functionals [31]. Hence Sm (G(B)) is compact, thus bounded. P Note that the coordinate isomorphism given by (v) = (a1 (v); : : :; am(v)) for v = mk=1 ak (v)yk from Ym onto Rm is linear and one-to-one. Hence the function
hv; wiYm = hv; wi ; v; w 2 Ym (5.11) de nes an inner product on Ym . The inner product induces a norm jj jjYm which, due to nite-dimensionality of Ym , is equivalent to jj jjF in Ym , i.e., there exist positive constants ; such that jjvjjYm jjvjjF jjvjjYm ; v 2 Ym : Let r > 0 be chosen such that Br = ff 2 Ym j jjf jjYm < rg contains Sm (G(B)). Let @Br = ff 2 Ym j jjf jjYm = rg. For a subset A of a real topological vector space, let co(A) denote the convex hull of A, and let co(A) denote the closure of the convex hull of A. Note that Br = co(@Br ), hence by the algorithm given in Chapter 2 there exists c > 0 such that given u 2 B, Sm (Gu) can be approximated by fn (u) 2 co(@Br ) such that
jjSm (Gu) ? fn (u)jjYm c=pn:
31 Also note that for every u 2 B, fn (u) has the form fn (u) =
m X n X k=1 i=1
i (u)ak (zi (Gu))yk ; i (u) 0;
n X i=1
i (u) = 1;
where zi 2 @Br . Now it is easy to derive the error bound for such an approximation scheme. For any u 2 B,
jjGu ? fn (u)jjF =
jjSm Gu + Rm Gu ? fn(u)jjF jjSm Gu ? fn (u)jjF + jjRmGujjF jjSm Gu ? fn (u)jjYm + p cn + :
(5.12)
We have shown the existence for every > 0 of a number k such that jjGu ? fn (u)jjF + kn?1=2 uniformly for u 2 B. Note that the approximating structures fn have the form fn (u) = where
k (u) =
n X i=1
m X
k=1
k (u)yk ;
i (u)ak (zi (Gu))
and that fn may be implemented as a neural network. This approximation is similar to that considered in the previous section, but is dierent in the detailed form of the weighting functionals and the requirement of the existence of a basis for the output space.
5.5.1 Remark If F has an inner-product structure, then from boundedness of G(B) and the algorithm given in Chapter 2, we can nd a constant c > 0 such that jjGu ? fn (u)jjF cn?1=2 uniformly for u 2 B.
5.6 Applications Theorem 5 is a result to the eect that under mild conditions nite sums of the form k(i) m X X i=1 j =1
cij uij [yij ()]vi
can approximate the input-output maps of a certain kind of nonlinear dynamical systems taking a subset B of X, the linear space of Rn-valued functions de ned over R, into E, the linear space of R-valued functions de ned over R. Here the cij are real constants, the uij are certain continuous real-valued functions of the reals, the yij are continuous real functionals de ned on X, and the vi are elements of E. A similar result for the case of discrete domain systems can also be deduced.
32
5.6.1 Dynamic signal classi cation Many authors have considered dynamic neural network classi ers of signals in the literature, e.g. [22]. In a typical classi er of this kind, the inputs are bounded Rd-valued sequences of feature vectors and the outputs are real valued sequences which indicate how closely a segment of the past input matches a class, where classes are some predetermined nonoverlapping sets of input sequences with nite support. Usually such classi ers have exactly nite memory, and are time-invariant, but they have never been studied from the viewpoint of localizability or continuity of the associated functionals. Our result shows that a neural network of the type described above can approximate such a dynamic classi er arbitrarily well if it is w-localizable, or has continuous associated functionals.
5.6.2 Image processing Shift-invariant adaptive lters [2] are often used in image processing applications. In this setting, D is usually taken to be Z2 (however, it could be R2), and assumption A1 holds. The most commonly used such lters have exactly nite memory, but their localizability (or continuity of the associated functionals) has never been studied. It follows that they can be approximated arbitratily well by neural network structures if they are w-localizable or have continuous associated functionals.
5.7 Conclusion In the settings of both discrete and continuous domains for inputs and outputs, we have proved results concerning the approximation of certain nonlinear dynamic systems using neural network architectures introduced here. We have characterized the time-varying systems for which our results hold, and demonstrated a large class of time-invariant systems which are of importance in engineering applications. The inputs are not restricted to be de ned on nite intervals. Input and output functions with multidimensional domains and ranges, as well as Lp norms are treated. We have given two schemes for approximation of such maps, and have provided error analysis for one of the schemes using the algorithm described in Chapter 2.
Chapter 6 Robust Universal Neural Controllers 6.1 Introduction Here we give a solution to the problem of approximating all members of a class of equicontinuous controllers arbitrarily well using a xed basis function. Traditionally a common choice of the basis function has been the Gaussian, hence the translates of a basis function are usually called \centers", and the dilation is called the \width". We use a xed nite set of centers and a xed width to approximate any member of the given class. This property is important in practical applications because of the need for robustness with respect to plant variations or modeling uncertainties [33]. In the case of discrete-time continuous controllers that are time-invariant and posess nite memory, our problem reduces to the problem of simultaneously approximating a class of continuous functions de ned on a nite-dimensional Euclidean space. This is the setting adopted in this work. The continuous functions to be approximated are called \target functions" or \target controllers", with a slight abuse of notation. We show that the problem can be solved by the so-called radial basis function neural controllers (RBFNCs). Approximations with radial basis functions have been studied in the literature by many authors. See [36] for an early theoretical result, and [53] for a control application. However, in most cases only a single function (or a single controller) is approximated. Simultaneous approximation of a class of functions using radial basis functions was obtained by Chen and Chen [12]. In contrast to [12], we are not restricted to using an activation function that is radially symmetric with respect to the Euclidean norm. This feature allows the use of elliptic basis functions which are more suitable for certain applications in practice [37], [9]. Also, in our work the same width is used on all nodes, and hence the number of free parameters of the network is reduced. This results in a simpli cation of the parameter estimation problem. A minor distiction is that in our case the class of target functions need not be compact, relative compactness is sucient. A particularly important dierence is our emphasis on the application of our simultaneous approximation result to the engineering problem of discrete-time control. In [53] adaptive control applications of radial basis function networks are considered. The principal dierence between the present work and [53] is that we give a robust design of the width and centers (and a prescription for the weights) for a class of plants, as compared to an adaptive approach for a single plant. Also, the basis function in our case is not restricted to be Gaussian, nor is it required to be radial with respect to the Euclidean norm. An important dierence is that our choice of centers for the radial basis function network is not restricted to a uniform grid. Hence our designs are not subject to an exponential growth in the number of nodes used in the network to achieve increasingly accurate approximations, a 33
34 phenomenon usually described as \the curse of dimensionality". In fact, our work in conjunction with the error bounds for radial basis function network approximation in [23] constitutes an ecient approximation of controllers in terms of the network size. Our result also addresses the problem of retraining a network on a new target function (other than the original one used in initial training). In particular, if a suciently accurate RBFNC has been obtained with our method for one controller, then it can be used with the same width and centers to implement any other controller in the given equicontinuous class with the same error bound. The only parameters that need to be changed for the new controller are the weights from the hidden nodes to the output, but the problem of determining these weights is linear. Together, these results indicate that the RBFNCs are a powerful and ecient architecture | they are linearly retrainable universal controllers.
6.2 Simultaneous Approximation Theorem For A Rn, let C (A) denote the linear space of all real-valued bounded continuous functions de ned on A. Let 2 C (Rn) R be an integrable function such that (x)dx = 1. For any y > 0, let Cy denote the cube [?y; y]n . For any T > 0, and a multiindex = (m1 ; : : :; mn) 2 Nn , let P = f(I ; z )g be a partition of CT of size , with the n-intervals I . P is called a Perron partition [38] if the centers z are any points in I . The norm of the partition P, denoted by jP j, is given by jP j = max diam I ; where the diameter of the n-interval I is given by diam I = sup jjx ? yjj x;y2I
(here jj jj denotes the Euclidean norm on Rn). The volume V of I is given by V = ni=1 V i , where V i is the length of I along ith axis.
Theorem 8 Let K C(C1) be relatively compact, and let > 0 be given. For any T > 1 there exist ; > 0 and a certain
continuous extension f~ of f to Rn such that
x ? z X f(x) ? w (f) < ; x 2 C1; f 2 K;
~ ). for any Perron partition P = f(I ; z )g of CT with jP j < , where the weights w are given by w (f) = 1n V f(z
Proof: Note that by the Ascoli-Arzela theorem K is uniformly bounded, i.e., there exists an M > 0 such that sup jf(x)j < M; f 2 K: x2C1
4 ~ Extend every f 2 K by the construction given in (C.1) of Appendix C to f~ 2 C (Rn). Let K~ = ff: f 2 K g. By the construction it is clear that ~ j < M; f~ 2 K: ~ supn jf(x) (6.1)
R
x2
35 Using the integrability of , pick T0 > 0 such that
Z
RnnCT0
: j(x)jdx < 8M
(6.2)
~ By its construction, K~ is equicontinuous. Hence there exists > 0 such that for any x; y 2 Rn, and arbitrary f~ 2 K, ~ ? f(y) ~ j< ; jjx ? yjj < ) jf(x) (6.3) 4jjjj1 R where jjjj1 = j(x)jdx. Choose > 0 such that
jjxjj < for all x 2 CT0 : De ne 2 C (Rn) by
(6.4)
(x) = 1n x ; x 2 Rn:
Let 2 C1. From a change of variables, ~ = ( f)()
Z
~ ? x) (x)dx = f( n
R
~ Hence, for arbitrary f~ 2 K,
Z
~
Rn f( ? x)(x)dx:
~ ~ j( f)() Z ? f()j ~ ? x) ? f() ~ j j(x)jdx jf(
RZn
C T0
+
~ ? x) ? f() ~ j j(x)jdx jf(
Z
RnnCZT0
2jjf~jj1 j(x)jdx
Z j(x)jdx j(x)jdx + 2M < 4jj jj 1 CT0 RnnCT0 < 4 + 2M 8M = 2 where in the last two inequalities we have used (6.1), (6.3), (6.4), and (6.2). ~ = 0 for all x 2 Rn n CT , Since f(x) ~ ( f)() =
Z
CT
~ ( ? x)f(x)dx:
De ne f; : CT ! R by ~ f; (x) = ( ? x)f(x):
(6.5)
36 so that ~ ( f)() =
Z CT
f; (x)dx
(6.6)
Let =4 f f; : f 2 K; 2 C1g. Since the set of pointwise products of functions belonging to relatively compact sets is itself relatively compact1, and K~ as well as f ( ? ): 2 C1g are relatively compact, it follows that is relatively compact. By the lemma on uniform Riemann integrability in Appendix C.2, there exists a > 0 such that for any Perron partition P = f(I ; z )g of CT with jP j < ,
Z CT
f; (x)dx ?
X
f; (z )V < 2
(6.7)
for any f; 2 , and arbitrary centers z 2 I . From (6.7), (6.5) and (6.6), we have for arbitrary 2 C1,
X f() ~ )V < ; f~ 2 K: ~ ~ ? ( ? x)f(z
~ = f() for all 2 C1. Hence, by de ning But f() we have the desired result
~ ); ; w (f) = 1n V f(z
f() ? X w (f) x ? z < ; x 2 C ; f 2 K: 1
6.3 Robust Retrainable Neural Controllers The relevance of Theorem 8 to Robust Retrainable Neural Control is the following: if a suciently accurate RBFNC has been obtained with our method for one equicontinuous class of discrete-time continuous plants, then its width and centers can be used to construct a controller for any plant belonging to that class. The only parameters that need to be adapted during retraining for a new controller are the weights from the hidden nodes to the output. Note that this latter problem is 1 Consider the set H of functions of the form h(x) = a(x)b(x);x 2 CT , with a 2 A and b 2 B where A and B are relatively compact. If MA and MB are uniform bounds for A and B , then it is obvious that MA MB is a uniform bound for H . Also, if > 0 is arbitrary, by the equicontinuity of B , there exists 1 > 0 such that for any x; y 2 CT , and for arbitrary b 2 B , jjx ? yjj < 1 ) jb(x) ? b(y)j < 2MA . Similarly, by the equicontinuity of A, there exists 2 > 0 such that for any x;y 2 CT , and a 2 A, jjx ? yjj < 2 ) ja(x) ? a(y)j < 2M B . Hence for any a 2 A; b 2 B , and x;y 2 CT satisfying jjx ? yjj < 0 4 = min(1 ;2 ) we have jh(x) ? h(y)j = ja(x)b(x) ? a(y)b(y)j = j [a(x) ? a(y)]b(x) ? a(y)[b(x) ? b(y)] j < 2MB jb(x)j + ja(y)j 2MA .
37 linear, and hence frequently involves tractable computations. Also note that the error bound for the new controller remains the same. To state this result precisely, we need some preliminary material. Let X = `1 , the linear space of bounded real-valued sequences de ned over Z+, with the norm jjxjj1 = supt2Z + jx(t)j. For a positive number , let X denote a compact set uniformly bounded by . With ; > 0, we assume that the plants to be controlled can be viewed as input-output maps with inputs drawn from X , and outputs belonging to X . Let P denote the set of such maps that are continuous, i.e.,
P = fP: X ! X j P is continuousg: Let C denote the set of time-invariant continuous maps from X to X having exactly nite memory n ? 1 for a xed n 2 N, i.e., x0 (t0 ? t) = y0 (t0 ? t); for all t = 0; : : :; n ? 1 ) (Cx)(t0) = (Cy)(t0 ) for all t0 2 Z+, and all C 2 C , where x0 is an extension of x to Zgiven by x0(t) =
x(t), t 2 Z + 0
otherwise.
Let the nominal plant P 0 2 P and the desired output yd be xed henceforth. Let the perturbed family of plants P be any set of plants in P containing P 0 such that P is equicontinuous, i.e., for any positive there exists a positive with the property that x; y 2 X ; jjx ? yjj1 < ) jjPx ? Pyjj1 < ; P 2 P : Such P always exists; a trivial example is fP 0g, since P 0 is continuous. More interesting examples are nite subsets of P containing P 0, since these are usually of importance in model matching problems [33]. A \design procedure" D is an algorithm that accepts a plant P 2 P and produces a controller2 C 2 C for the desired output yd 2 X . We will always use such a controller in the unity feedback con guration to track yd (see Figure 6.1 where Ce denotes the error controller used in conventional design), so that the output of the controlled plant at time t, (P(DP)y)(t), is intended to be as close to yd (t) as possible for large t. De ne a norm on P by jjP jj = supx2X jjPxjj1 and similarly on C . We say that D is a \stable design procedure" if it is a continuous map on P . If D has the additional property that the controller designed by it steers the plant to the desired output uniformly aymptotically, i.e., for any positive there exists an m 2 Z+ (that may depend on but not on the plant P in the class P ) such that j(P(DP)y)(t) ? yd (t)j < ; t > m; we say that D is a \stable tracking design procedure". De ne the maps Vt : X ! Rn by Vt x = (x0 (t); x0(t ? 1); ; x0(t ? n + 1)); t 2 Z+: 2 See Section 6.4 for a comment on bounded controllability of nonlinear systems.
38
C
yd
-+ L e 6?
Ce
u-
y
P
Figure 6.1: Unity feedback con guration for tracking. Let Pt : X ! X be de ned by
(Ptx)() =
x(), t
We hereafter assume that
0
otherwise.
(i) Vn?1X = C ; (ii) X is closed under Pn?1: Let N be an RBFNC generator that accepts a plant P in P , and returns a map of the form y 7!
N X i=1
wi(P)
Vt y ? z i
(6.8)
; y 2 X ;
where N 2 N, the zi are real n-vectors, the wi (P) are real numbers, and is a positive constant. We say that the outputs of N are radial basis function neural controllers (RBFNCs), because they may be implemented by networks of processors computing weighted combinations of translates of the so-called radial basis function . We are now ready to state and prove our result.
Theorem 9 Let P be a given equicontinuous perturbed family of plants containing the nominal plant P 0 2 P . Let a stable tracking design procedure D be given for P . Then there exists an RBFNC generator N with the following property: for any positive there exists an m 2 Z+ for which j(P(N P)y)(t) ? yd (t)j < ; t > m; P 2 P :
Proof: Since P is equicontinuous it is relatively compact by the generalization of the Ascoli-Arzela theorem to continuous maps between compact metric spaces [29, Problem 5, p. 107]. Since the design procedure D is a continuous map, the set of controllers produced by D for P , C = fDP j P 2 P g;
39 is also relatively compact. By the generalization of the Ascoli-Arzela theorem cited above, C is equicontinuous. Since any controller C 2 C is time-invariant and has exactly nite memory n ? 1, it determines the function fC : C ! R given by fC (z) = (Cx)(t); z 2 C ; where x 2 X and t 2 Z+ are chosen such that z = Vt x. Let F be the set of such functions fC as C varies over the designed controllers for the perturbed plants, i.e.,
F = ffC j C 2 C g: Since C is equicontinuous, for any positive there exists a positive with the property that x; y 2 X ; jjx ? yjj1 < ) jjCx ? Cyjj1 < ; C 2 C : Let z1 ; z2 2 C be any points such that jjz1 ? z2 jj < . Using (6.8) choose x1 ; x2 2 X such that Vn?1xi = zi and xi (t) = 0; t n for i = 1; 2. It is obvious that jjx1 ? x2jj1 < , hence j(Cx1)(n ? 1) ? (Cx2)(n ? 1)j < , from which it follows that jfC (z1 ) ? fC (z2 )j < for any C 2 C . Hence F is an equicontinuous subset of C (C ). Since the outputs of any C 2 C belong to X , F is obviously uniformly bounded. Let any positive be given. Since P is equicontinuous, there exists a positive such that x; y 2 X ; jjx ? yjj1 < ) jjPx ? Pyjj1 < =2; P 2 P :
(6.9)
By Theorem 8, there exists a radial basis function network such that
z ? z N fC (z) ? X wi (fC ) i < ; z 2 C ; C 2 C : i=1
Setting z = Vt y we get N P such that
j((N P)y)(t) ? ((D P)y)(t)j < ; t 2 Z+; y 2 X ; P 2 P :
(6.10)
Since D is a stable tracking design procedure, there exists an m 2 Z+ such that
j(P(DP)y)(t) ? yd (t)j < =2; t > m; P 2 P :
(6.11)
Combining (6.11), (6.9) and (6.10) we have the desired result.
6.4 Comments First we provide motivation for our de nition of a stable tracking design procedure. In the case of linear time-invariant plants having nite-dimensional state space descriptions, there is extensive literature on discrete-time control design. The most commonly used design procedure in this case is stabilization with PID controllers, which is stable in our sense because it relies only on the technique of pole placement which is inherently robust [58]. Also, since we employ only unity gain
40 output feedback, the traditional methods of solving tracking problems with full or reduced-order observers can be used as stable tracking design procedures. Extension to the case of continuous-time systems is straightforward, with the corresponding change of the design procedure to generate xed order controllers. Again, in the case of linear time-invariant systems there is a plethora of results (in particular see [8, Chapter 7] for design of robust aymptotic tracking for a class of inputs; see also [32]) in the literature providing design procedures which are stable in our sense because of the robustness of pole placement designs. In some cases approximate pole placement can even be achieved with only rst order controllers using output feedback [30]. If the plants are nonlinear, then the theory of control design is not so simple. However, under certain mild assumptions a design procedure may be formulated. In this regard, see [54] where sucient conditions are given for the existence of piecewise linear controllers which regulate the nonlinear plant in a xed number of time steps. Also, in [55] it is shown that bounded control inputs may be used to control the plant under certain conditions on linear or nonlinear plants in continuous or discrete-time. In [59] stabilizability of a certain nonlinear time-varying plant with output feedback is considered, and in [60] a similar problem is solved for the case in which the nonlinear system has a right coprime factorization. A model reference adaptive control scheme was suggested in [34]. For the case of the perturbations to the nominal system arising out of parametric uncertainties, there are many recent results in the literature about the design and analysis of robust controllers. For example, a nice parametrization of robust stabilizing controllers is given in [40]. Continuous (indeed, analytic or rational or polynomial) dependence of controllers on plant parameters is shown to be possible in [19], [56] under the assumption of constant McMillan degree of the plants. In [10] compact sets of linear time-invariant single-input single-output systems are studied where the numerator and the denominator polynomials of the system transfer function have uncertain coecients belonging to a box around the nominal plant in the coecient space. In this connection see also the pioneering work of Kharitonov [28] where Hurwitz stability of such interval polynomials was shown to be equivalent to that of just four extreme polynomials. This work gave a strong impetus to research in robust stability of interval systems (see [5] and references therein). In summary, results in the literature indicate that our assumption of the existence of a design procedure is justi ed in a variety of settings. The stability and tracking properties of such design procedures are topics of future research. In our result stated above, we assume that there exists an m 2 Z+ independent of the plant P 2 P for asymptotic tracking. This requirement may be relaxed if it poses a strong restriction on the design procedure. For such cases we can obtain another result in which asymptotic tracking is still guaranteed with the RBFNCs, but the length of the transient period may vary with the perturbation.
6.5 Conclusion Our results provide an analytical basis for separating the problem of parameter determination of the RBFNCs into a nonlinear part involving the determination of the width and the centers, and a linear part concerning the estimation of the output weights. In particular, our results show that the computationally expensive nonlinear part needs to be solved only once for the given class of equicontinuous controllers. The remaining part (which occurs frequently due to retraining) usually involves relatively easy computations, because the problem reduces to linear estimation of parameters.
Chapter 7 Error Bounds for Function Approximation Here we prove results concerning the error bound in the mean-square sense of functionals de ned on nite-dimensional Hilbert spaces by neural network structures. The functionals belong to a certain general class introduced here involving a condition on the integrability of the Fourier transform. The bound on the error of approximation depends on the size of the network but not on the dimension of the underlying space. It is valid for all functionals in the class mentioned above. This class for which the theorems are applicable is shown to include some important interesting sets of functionals, e.g., the bandlimited signals on Rn and the Schwartz class.
7.1 Introduction Most of the current literature on neural networks deals with approximation of functions de ned on nite-dimensional spaces. Even though the existence of neural network structures for arbitrarily good approximations of functionals on compact subsets of these spaces has been known for some time [36, 26, 18], not much is known about the computational eciency for given classes of approximation structures. Hence it is important to consider approximation structures (of the neural network type or otherwise) for which error bounds for signi cant classes of functionals are known.
7.2 Notation This work is concerned with error bounds for approximation of a certain kind of functional de ned on a nite-dimensional Euclidean space. Let K be R or C , d 2 N, and let H be the nite-dimensional Euclidean space K d over K : The Lebesgue measure on H is denoted by md : The approximation error is speci ed in the L2 (H; ) norm, where is a nite positive Borel measure on H such that (H) = 1 (e.g., de ned by (S) = md (S \ Br )=md (Br ); S 2 B(H)
(7.1)
on the -algebra of Borel sets B(H), where Br is a ball of radius r centered at the origin). We also assume that satis es the following conditions.
Condition 1 is absolutely continuous with respect to md :
41
42
Condition 2 The Radon-Nikodym derivative [d=dmd] for H is bounded.1 Note that important practical choices of , e.g., the example given above in (7.1) satisfy these conditions. However, we rule out atomic measures by Condition 1. As usual for a positive measure on H, 4 ff: H ! K j Lp (H; ) =
and then
jjf jjp;
Z
Z
=4
jf(x)jp (dx) < 1g; 1 p < 1 jf(x)jp (dx)
1=p
:
If it is clear from the context, we leave out the measure with respect to which the norm is de ned and write simply jjf jjp. F f denotes the Fourier transform of a function f 2 L1 (H; md ): For approximation of complex-valued functionals we use the same architecture as in [49], where nodes belonging to the single hidden layer implement exponentiation and the input nodes compute the inner product of H as shown in Figure 7.1. This architecture is remarkable in that it can be used for approximation both in the uniform norm for continuous functions on compact subsets [49] and in the L2 norm for functions de ned on the whole space. The class of such networks (with m hidden nodes) is denoted by C(m) de ned below.
x2H
? -?? @@ @-
h; p1i h; pm i
.. .
-
exp(i )
-
exp(i )
1 @ cL @R ? ? cm fm (x)
Figure 7.1: Architecture of the complex Hilbert space network. A network N in the class C(m) implements a weighted nite sum of exponentials composed with inner products with xed elements of H, i.e., the output of N is a functional fm given by De nition:
fm (x) =
m X
k=1
ck exp(i hx; pk i)
where the network parameters ck belong to C and the network parameters pk are drawn from H. For approximation of real-valued functionals we adopt the architecture de ned below and shown in Figure 7.2. 1 [d=dmd ] exists because Condition 1 is satis ed. Moreover, it is real-valued and integrable [21, p. 134, Theorem 5.5.4].
(7.2)
2
43
x2H
?? -? @@ @-
h; b1i h; bmi
.. .
-
cos( + c1)
-
cos( + cm )
@@aL 1 R ? ? am fm (x)
Figure 7.2: Architecture of the real Hilbert space network. A network N in the class R(m) implements a weighted nite sum of cosines composed with inner products with xed elements of H, i.e., the output of N is a functional fm given by De nition:
fm (x) =
m X
k=1
ak cos(hx; bki + ck )
where the network parameters ak ; ck belong to R and the network parameters bk are drawn from H.
(7.3)
2
7.3 Relation to previous work It may be noted that previous results regarding approximation (in the uniform or the L2 norm) of functionals on nitedimensional spaces [36, 26, 18] are existence theorems. In practice it is useful to have some idea of how many terms are sucient for a given accuracy of approximation. In particular it is important to ensure that the error of approximation decreases with increasing network size with some acceptable upper bound. Sandberg [49] proved existence theorems regarding approximation of nonlinear functionals with the particular type of neural network architecture introduced in that paper. The relation between the applications of the Stone-Weirstrass Theorem [42, p. 122] and the Generalized Wiener Theorem [42, p. 228] in terms of the norms used in evaluating the error of approximation and the domains of de nition of the functionals is analogous to that between this work and [49]. These analogies are tabulated for easy reference in Table 7.1. Stone{Weirstrass Generalized Wiener L1 L1 Compacta in Rn All of Rn Sandberg [49] Present Work Norm L1 L2 Domain Compacta in H All of H Norm Domain
Table 7.1: An analogy between analogies.
44 Barron [3] considered error bounds for the class of neural networks that employ the usual \sigmoidal" functions in their p hidden nodes and established error bounds of O(1= m) for L2 approximation, where m is the number of hidden nodes. He used a lemma due to Jones [27] in an essential way along with a certain integrability condition on the Fourier transform of the functional. Girosi and Anzellotti [23] mainly studied rates of convergence of approximation with translates of a function in L2 and the Sobolev space, with emphasis on the Gaussian and the Bessel-MacDonalds kernels. The present work was inspired by Barron's approach; here also an integrability condition on functionals on a Hilbert space H is used. But we do not use Jones' lemma. Instead we develop and use a variant of Maurey's lemma [39]. Our Lemma 5 is not just a special case of Maurey's lemma because we prove more than the existence of a convex combination; the combination is a special (equally weighted, i.e., arithmetic) average. Hence, the computational load is reduced in applying our lemma because only one combination of the chosen m functionals needs to be tested for the desired error bound (instead of nding a suitable combination from the set of all convex combinations). Also, our proof is more straightforward and does not use any probabilistic terminology. It may be noted that the architecture considered in [3] is dierent from that used in the present work (pictured in Figure 7.1 for the case of complex-valued functionals and Figure 7.2 for the case of real-valued functionals).
7.4 Error Bounds: Preliminaries
7.4.1 The classes and c
We consider functionals which belong to the class c de ned below. De nition: The class of permissible functionals is denoted by
4 ff: H ! K j f 6= 0; f; F f 2 L1 (H; m ) \ L2 (H; m )g = d d
where the condition f 6= 0 is taken in the md -almost everywhere sense. De nition: The class of all c-integrable functionals is denoted by 4 fg 2 j cg =4 p 1 d c = ( 2)
Z
2
j(F g)(!)jmd (d!) cg
for any c > 0: 2 The most commonly assumed condition to guarantee the existence of an integral representation of the type crucially used in [3] is that f is integrable (in the Lebesgue sense). In this case it is easy to see that the integrability condition of [3] implies that of ours:
Z
j!j j(F f)(!)jmd (d!) c )
Z !62[?1;1]d
j!j j(F f)(!)jmd (d!) c
) )
45
Z Z!62[?1;1]d
j(F f)(!)jmd (d!) c
j(F f)(!)jmd (d!) 2d Fmax + c:
Here the Fourier transform of the integrable function f is bounded by some constant Fmax . However, the following example shows that there are functions f which satisfy our integrability condition but not the condition in [3]. Consider f: R ! R given by f(t) = exp(?jtj): In this case (F f)(!) = 2=(1 + !2 ); p and it is easy to verify that cf = 2 but the function ! 7! j! (F f)(!)j is not integrable.
7.4.2 A consequence of c-integrability
The c-integrability condition on the Fourier transform implies the boundedness of the equivalent continuous functional f,~ since Z ~ ~ j = p 1 exp(ih!; xi)(F f)(!)m (d!) jf(x) d ( 2)d Z ~ jmd (d!) 8x 2 H; p 1 d j(F f)(!) ( 2) Hence Z ~ j ~ jmd (d!) p 1 d j(F f)(!) sup jf(x) x ( 2) p 1 d jjF f~jj1 =4 cf~ c: =4 (7.4) ( 2)
7.5 Approximation of complex-valued functionals In this section let K = C in which case H = C d : The main result of this work is the following theorem which says that with 2m network parameters we are guaranteed an L2 (of measure ) error bound of O(1=pm):
Theorem 10 Let f: H ! C be a functional such that f 2 c for some c > jjf jj2;: Then, given m 2 N there exists a network N 2 C(m) implementing fm such that Z c2 ? jjf jj22; : jf(x) ? f (x)j2 (dx) H
m
m
First we prove a variant of Maurey's lemma [39] which guarantees the existence of a convex combination (in our case an equally weighted, i.e., arithmetic average) of functionals belonging to a certain subset of L2 (H; ) which satis es the
46 error bound claimed in the theorem. We note that this lemma may be proved for any inner-product space (even in the in nite-dimensional case) because its proof does not depend on the dimensionality of H or even its completeness.
Lemma 5 Let H be an inner product space and f: H ! C be a functional such that f(x) =
Z
H
d(x; !)(d!)
(7.5)
where (d!) is a positive measure such that (H) = 1 and d(; ) belongs to a subset D of bounded functionals indexed by ! 2 H, i.e., D fd: H H ! C j jd(x; !)j c 8x; ! 2 Hg (7.6) such that d(x; ) is measurable with respect to the -algebra on which is de ned and d(; !) is measurable with respect to the -algebra on which is de ned. Then, given m 2 N there exists an arithmetic average fm of m functionals in D satisfying the inequality Z c2 ? jjf jj22; jf(x) ? fm (x)j2 (dx) m :
H
Proof:
Consider the arithmetic average
m 1X m i=1 d(x; !i) of m functionals from D for any m 2 N: Let em denote
2 Z Z m X 1 f(x) ? m d(x; !i) (dx)m(d!1 d!m) i=1 m
H H
(7.7)
where m (d!1 d!m ) is the product measure. Observe that em is an averaged L2 (of measure ) error of arithmetic averages of m functionals from D as an approximant for f: Since f and d(x; !i) are Borel measurable and the integrand is non-negative we may apply Fubini's theorem for the interchange of the order of integration, obtaining
2 Z Z m X 1 em = f(x) ? m d(x; !i) m(d!1 d!m)(dx): i=1 m HH
By simple manipulations of the integrand, em
2 Z 1 Z X m m = m2 m i=1 d(x; !i) ? mf(x) (d!1 d!m )(dx) H H 2 Z 1 Z X m m = m2 m i=1 [d(x; !i) ? f(x)] (d!1 d!m )(dx) H
H
!2 Z 1 Z 8 m < X = m2 m : < i=1 [d(x; !i) ? f(x)] H H !29 m = X + = [d(x; !i) ? f(x)] ; m (d!1 d!m )(dx) i=1 = e 0 such that sup jf(x)j < M; f 2 F:
x2CT
De ne F~ : R+ ! R+ by
; : F~ () = min F (=4) ; 2 4M To verify the claim of equicontinuity, let an arbitrary > 0 be given, and consider the six cases where each of x; y 2 Rn belong to C1 ; CT n C1, or Rn n CT , satisfying the constraint jjx ? yjj < F~ (): 1. x; y 2 C1. ~ ? f(y) ~ j = jf(x) ? f(y)j < . Since F~ () < F (), it follows trivially that jf(x) 2. x 2 C1 ; y 2 CT n C1 . By the best approximation property of y^, jjy ? y^jj jjy ? xjj. By the triangle inequality, jjx ? y^jj jjx ? yjj + jjy ? y^jj 2jjx ? yjj . Now,
Since F~ () < min
F (=2) 2
~ ? f(y) ~ jf(x) j = f(x) ? f(^y ) + f(^y ) jjy ? y^jj jf(x) ? f(^y )j + jf(^y )j jjy ? y^jj:
; 2M , jjx ? y^jj < F (=2), and jjy ? y^jj < 2M , hence ~ ? f(y) ~ j < + M jjy ? y^jj : jf(x) 2
3. x 2 C1 ; y 2 Rn n CT . Impossible, since jjx ? yjj < F~ () < . 4. x; y 2 CT n C1. By the best approximation property,
jjx ? x^jj jjx ? y^jj; and jjy ? y^jj jjy ? x^jj: By the triangle inequality,
jjx ? y^jj jjx ? yjj + jjy ? y^jj; and jjy ? x^jj jjy ? xjj + jjx ? x^jj: Hence
jjx ? x^jj jjx ? yjj + jjy ? y^jj; and jjy ? y^jj jjx ? yjj + jjx ? x^jj;
59 implying
j jjy ? y^jj ? jjx ? x^jj j jjx ? yjj :
4 jf(x) ~ ? f(y) ~ j. We have Let =
= jf(^x) ? f(^y ) + [f(^y )jjy ? y^jj ? f(^x) jjx ? x^jj ]= j : Adding and subtracting f(^x)jjy ? y^jj=, jf(^x) ? f(^y )j + 1 j f(^x) (jjy ? y^jj ? jjx ? x^jj) ? [f(^x) ? f(^y )] jjy ? y^jj j 2jf(^x) ? f(^y )j x)j + jf(^ j jjy ? y^jj ? jjx ? x^jj j:
By a well-known theorem on the Lipschitz property of aprojection on a closed convex subset of a Hilbert space [24, p. 100], we have jjx^ ? y^jj < jjx ? yjj . Now, F~ () < min F (=4); 2M ; implies < 2 4 + M jjx ? yjj : 5. x 2 CT n C1 ; y 2 Rn n CT . Let y# be the unique point of CT which is closest to y. By the best approximation property, jjy ? y# jj jjy ? xjj. By the triangle inequality, jjx ? y# jj jjx ? yjj + jjy ? y# jj 2jjx ? yjj . Consider ~ ? f(y) ~ j = jf(x) ~ ? f(y ~ # )j: = jf(x)
Since F~ () < min F (2=4) ; 4M , jjx ? y# jj < min F (=4); 2M , and from the previous case we have . 6. x; y 2 Rn n CT . ~ = f(y) ~ = 0. Trivial, since f(x)
C.2 Uniform Riemann Integrability Here we present a lemma that may be of independent interest for other approximation studies. We assume that an arbitrary T > 0 is given. The Riemann integral of a function f: CT ! R is denoted by 0, there exists a > 0 such that for any Perron partition P = f(I ; z )g of CT with jP j < ,
X