Adaptive linear quadratic Gaussian control - Semantic Scholar

1 downloads 0 Views 923KB Size Report
and rewrite inequality (5.55) as kvtk ckvt k; 8t t ; 8# 2 Ap: Setting vt = xt , we nally get a bound on the state vector xt of the time-invariant system. (5.53) kxtk c t?t kxt ...
Universita degli Studi di Brescia Facolta di Ingegneria Dipartimento di Elettronica per l'Automazione

Adaptive linear quadratic Gaussian control: optimality analysis and robust controller design Maria Prandini

A thesis presented for the degree of Dottore di Ricerca in Ingegneria dell'Informazione Supervisor: Prof. Marco Campi

Preface

This thesis is the result of a three years research activity during my Ph.D. studies at the Department of Electronics for Automation at the University of Brescia. My research activity in the eld of adaptive control of stochastic systems started with the study of identi ability problems arising in self-tuning control. New identi cation methods able to enforce in the estimate desired characteristics while preserving fundamental closed-loop identi cation properties were introduced. This study helped me acquire a better understanding of the certainty equivalence approach philosophy. It was then a natural development of my research working out a new methodology for the design and analysis of self-tuning control schemes. The results of this activity are given in the rst part of the thesis where certainty equivalence LQG adaptive control schemes ensuring the achievement of stability and optimality are introduced. During the last year, inspired by recent developments in the eld of robust control, I studied new algorithms for the synthesis of adaptive controllers based on randomized methods. The objective was to incorporate robustness features with respect to parameter uncertainty. The second part of the thesis treats this topic. My current research activity is devoted to the extension of this new approach towards adaptive control settings generalizing those dealt with in the thesis.

Acknowledgments I would like to thank all the people who contributed directly or indirectly to the accomplishment of this work. First of all, I thank Prof. Marco Campi for his supervision and the enlightening discussions which proved to be an inspiring source of ideas. He spent a great deal of time reading my reports and helping me during my research activity. Next, I thank Prof. Sergio Bittanti, Prof. Patrizio Colaneri and Prof. Rogelio Lozano for the stimulating discussions on some subjects dealt with in this thesis. I thank all my friends, especially Daniela, Caterina and Luisa who shared with me this important period of my life. Special thanks go to my family for the support they always gave me. Finally to Michel: thanks for your encouragement and the help you were always willing to provide me. Brescia, February 1998

Maria Prandini

Contents 1 Introduction

1

2 LQG optimal control problem

7

1.1 General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Original contributions and results . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Description by chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1 2.2 2.3 2.4

Introduction . . . . . . . . . State space representation . Input output representation Proofs . . . . . . . . . . . .

... ... .. ...

. . . .

. . . .

3 Least Squares parameter estimation 3.1 3.2 3.3 3.4 3.5

. . . .

. . . .

Introduction . . . . . . . . . . . . . . . . Mathematical framework . . . . . . . . . Prediction error identi cation methods . The least squares identi cation method The Gaussian noise case . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

4 Certainty equivalence adaptive control: optimality and stabilizability issues 4.1 4.2 4.3 4.4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The certainty equivalence approach and the identi ability problem The optimality issue . . . . . . . . . . . . . . . . . . . . . . . . . . The stabilizability issue . . . . . . . . . . . . . . . . . . . . . . . . i

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . .

1 2 3

. 7 . 8 . 13 . 19 . . . . . . . . .

23 23 24 25 26 28

33 33 34 37 43

ii

CONTENTS

5 Singularity free adaptive LQG control: stability and optimality

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Statement of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 The system and the control law . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Natural requirements arising in the adaptive control context . . . . . . 5.2.3 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Penalized least squares identi cation . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 The case of no a-priori knowledge on the system parameter . . . . . . 5.3.3 The case when a coarse a-priori knowledge on the system parameter is available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Adaptive stability and performance analysis . . . . . . . . . . . . . . . . . . . 5.5 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Proofs of Section 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Proofs of Section 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Proofs of Section 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 Robust adaptive LQG control

6.1 Introduction: randomized algorithms in adaptive control . . . . . . . . . . . . 6.1.1 Average cost criteria in adaptive control . . . . . . . . . . . . . . . . . 6.1.2 Randomized methods for controller selection . . . . . . . . . . . . . . 6.1.3 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Robustness and tuning notions . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Randomized controller selection . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 An algorithm based on a sampling estimate of EP=t [J(; )] . . . . . . 6.3.2 A fully randomized algorithm . . . . . . . . . . . . . . . . . . . . . . . 6.4 Application to LQG control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Randomized algorithms for the synthesis of robust adaptive LQG controllers 6.5.1 Ideal adaptive control setting . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Computing P=t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 A randomized algorithm for the synthesis of an adaptive controller . . 6.6 Tuning properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47 47 50 50 52 53 54 54 57

62 66 70 74 74 83 94

101 101 101 103 104 105 108 108 111 115 117 118 119 120 121

CONTENTS

iii

6.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.8 A simulation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7 Concluding remarks and future research

143

A Technical results

145

B Uniform convergence of empirical means and the Pollard-dimension

151

Bibliography

157

iv

CONTENTS

Chapter 1 Introduction 1.1 General introduction Adaptive self-tuning control describes a body of approaches where a controller design method based on a system model is combined with an on-line estimator of the model parameter. The appealing feature of adaptive controllers consists in their ability to automatically adjust themselves so as to adapt to the true system. During the operation of the control system, the controller collects information on the system behavior, therefore reducing the level of uncertainty regarding the value of the unknown parameter. In turn, as the level of uncertainty is reduced, the controller is tuned more accurately to the true system so as to obtain a better control result. The key point is to nd an e ective and convenient way of tuning the adaptive controller on the basis of the information gathered on-line from the controlled system. The more commonly adopted strategy for the design of adaptive control laws is the certainty equivalence approach. Its success is mainly due to its conceptual simplicity, since it consists in estimating the unknown parameter via some identi cation method and then using the estimate to design the control law as if it were the true value of the unknown parameter. On the other hand, working out stability and optimality results for certainty equivalence adaptive control schemes is a dicult task even in the ideal case when the true system belongs to the model class. This is due to the intricate interaction between control and identi cation in closed-loop, which can cause identi ability problems. The objective of this thesis is twofold: i) We aim at introducing new adaptive control schemes based on the certainty equivalence principle able to overcome the diculties arising in standard certainty equivalence control systems. In particular, we are interested in designing adaptive controllers which

2

Chapter 1. Introduction ensure the overall control system stability irrespectively of the excitation characteristics of the involved signals. A further target is then to precisely characterize the corresponding performance and to study a suitable modi cation to the adaptive control scheme so as to obtain both stability and optimality results; ii) We want to devise a new strategy for the tuning of adaptive control laws so as to incorporate robustness features with respect to parameter uncertainty. The idea is that the adaptive controller should select at each time instant a cautious control law with the objective of obtaining an acceptable performance for most models, instead of completely relying on the currently most probable model as in the certainty equivalence approach. Then, a conservative control law is applied when uncertainty is large, but, as uncertainty is reduced by means of the data collected on-line from the system, the robust adaptive controller becomes better tailored to the true system.

Such objectives are pursued for linear, time-invariant stochastic SISO systems a ected by white noise based on the in nite-horizon LQG control design method.

1.2 Original contributions and results The main original contributions and results of this thesis are described below. They are distinguished according to the two objectives: i) the design of a certainty equivalence adaptive LQG control scheme achieving stability and optimality results and ii) the introduction of robust adaptive LQG control algorithms. i) Certainty equivalence adaptive LQG control - Chapter 5

A methodology for the analysis of a general adaptive control scheme based on the certainty equivalence approach is introduced. It is based on the representation of the real system (i.e. the closed-loop system where the true system is controlled by the adaptive control law) as a variation system with respect to the imaginary system (i.e. the closed-loop system consisting of the estimated system in combination with the adaptive controller). From such a representation it is easily seen that the stability of the real system can be ensured by stabilizing the imaginary system provided that a suitable bound on the estimation error is simultaneously guaranteed. New identi cation methods ensuring both uniform stabilizability of the estimated model and suitable closed-loop identi cation properties are introduced. This is obtained by adding an appropriate extra term to the least squares performance index which penalizes those parameterizations corresponding to uncontrollable models while preserving the fundamental properties of the least squares estimate (PLS, penalized least squares). We introduce a general adaptive control scheme where a control design technique able to

1.3. Description by chapters

3

stabilize a known system is appropriately combined with a PLS identi er. By applying the proposed analysis methodology to this scheme we prove a general stability result. In particular, adaptive stability is obtained for in nite-horizon LQG control. On the other hand, optimality (i.e. minimality of the in nite-horizon LQG performance index) is not guaranteed and in fact only a characterization of the achieved performance is given. We set up a suitable modi cation of the PLS-based adaptive LQG controller so as to achieve optimality by using the so-called attenuating excitation technique and prove consistency in the estimate and optimality in the control for the resulting control scheme. ii) Robust adaptive LQG control - Chapter 6

Inspired by a recent contribution by Professor M. Vidyasagar in the eld of robust control, we propose the use of randomized algorithms for the synthesis of adaptive control laws based on the minimization of average cost criteria. Speci cally, robust adaptive control algorithms are introduced with two main features: i) the controller at time t is selected via randomized methods and therefore exhibits robustness characteristics and ii) the probability distribution describing the di erent likelihood of the systems in the model set is updated through time on the basis of the observations, enabling the controller to tune to the true system and thus enhancing the control performance. This theory is applied to adaptive in nite-horizon LQG control. Randomized algorithms for the synthesis of robust adaptive LQG controllers are introduced and their tuning properties are studied in an ideal setting, where the true system belongs to the model class. Also in this case, it is then shown that the introduction of a suitable dither noise in the control input can ensure a self-tuning property of the adaptive control system. Simulation studies show the e ectiveness of the proposed approach.

1.3 Description by chapters We now give a description of the contents of the thesis by chapter. The main original contributions and results of the thesis are contained in Chapters 5 and 6. Chapters 2 and 3 revise the main facts on in nite-horizon LQG control and least squares parameter estimation to which we shall refer in the thesis. Chapter 4 contains a discussion on the optimality and stabilizability issues arising in certainty equivalence adaptive control which are addressed in Chapter 5.

Chapter 2 - LQG optimal control problem In nite-horizon LQG control for discrete time stochastic systems is introduced and some known facts on this control methodology are revised. The aim is to work out some LQG

4

Chapter 1. Introduction

control laws to be used later in the thesis and to point out the properties of LQG control which are relevant for the subsequent developments.

Chapter 3 - Least Squares parameter estimation The properties of standard least squares identi cation which hold true independently of the excitation characteristics of the involved signals are revised. This is fundamental in adaptive control, where the signals may be only partially exciting. In this chapter, we entirely focus on the case we shall deal with throughout the thesis, that is the case when the stochastic system to be identi ed is known to belong to an ARX model class. The nal section is devoted to the particular case in which the noise process is Gaussian. According to the Bayesian embedding approach, under the Gaussianity assumption the recursive least squares algorithm can be regarded as a stochastic Kalman lter recursively computing the conditional distribution of the true parameter vector given the observations. Thus, convergence can then be proven through the martingale convergence theory.

Chapter 4 - Certainty equivalence adaptive control: optimality and stabilizability issues A standard strategy for the design of adaptive control laws is the certainty equivalence approach. Unfortunately, this approach su ers from a general closed-loop identi ability problem. When a cost criterion other than the output variance is considered, this identi ability problem may hamper the ful llment of the control objective, thus leading to a strictly sub-optimal performance. Moreover, a main issue in the certainty equivalence approach is represented by the fact that standard identi cation methods do not guarantee the estimated model stabilizability when parameter estimation is performed in closed-loop. The occurrence of this event in fact causes a paralysis in the certainty equivalence control law selection. The principal classes of methods proposed in the literature to address the stabilizability issue are revised.

Chapter 5 - Singularity free adaptive LQG control: stability and optimality Guaranteeing the stability of the control system is reportedly a major issue in the design of adaptive controllers. In this chapter, we focus on adaptive in nite-horizon LQG control of discrete time SISO stochastic systems in the input output representation. A new adaptive LQG control scheme based on the certainty equivalence principle is introduced, which succeeds in adaptively stabilizing possibly nonminimum-phase discrete time systems without requiring extra excitation signals. This is obtained by the introduction of an appropriate identi cation method, which simultaneously ensures the uniform stabilizability of the estimated model and suitable closed-loop identi cation properties. Moreover, a precise char-

1.3. Description by chapters

5

acterization of the performance achieved with the proposed LQG control scheme is given. Finally, the optimality issue is addressed. It is shown that optimality can be obtained by a suitable modi cation of the proposed LQG control law through the so-called attenuating excitation technique. An asymptotically vanishing dither noise is added to the input in such a way that both consistency in the parameter estimate and optimality in control are achieved. This approach, however, is useful only in the case when noise injection is feasible.

Chapter 6 - Robust adaptive LQG control The design of adaptive control laws based on the minimization of average cost criteria is studied. At each time instant, the controller is selected so as to optimize the average control performance with respect to a probability distribution describing which systems in the model set are more likely to be a fair description of the true system. Due to computational issues the minimization of average cost criteria turns out to be very expensive and, therefore, randomized algorithms for the minimization are introduced. The probability distribution is updated on-line on the basis of the data gathered from the controlled system. In this way, a progressive tuning of the controller to the system characteristics is obtained. The general theory is then applied to the in nite-horizon LQG control problem and the tuning properties of robust adaptive LQG controllers are studied in an ideal setting. Simulations are provided to show the e ectiveness of the proposed approach.

6

Chapter 1. Introduction

Chapter 2 LQG optimal control problem In this chapter, we summarize some facts regarding the in nite-horizon LQG control method for known discrete time stochastic systems which are relevant for the following developments. This is useful in order to introduce the basic assumptions and notations used throughout the thesis.

2.1 Introduction The optimal LQG control problem we deal with has two main features: i) the dynamic system to be controlled is a discrete time stochastic linear system subject

to white noise;

ii) the performance index to be minimized is a long-term average quadratic cost function.

Due to the linearity of the system equation and the quadratic nature of the cost function, the solution to the control problem, i.e. the control law minimizing the performance index under the constraint given by the system dynamics, has a closed-form expression. Moreover, the LQG control method is characterized by the following desiderable properties (see e.g. [1]) 1. it is suitable for possibly nonminimum-phase systems; 2. the time delay does not enter explicitly into the calculation of the controller and thus it does not need to be known a-priori;

8

Chapter 2. LQG optimal control problem

3. stochastic noise acting on the system is taken into account in the optimal controller design. All these properties make its use attractive in a digital control context. In this chapter, we accurately formulate the in nite-horizon LQG control problem for a known system in the two cases when the system is in the state space form with state directly available (Section 2.2) and when the system is in the input output form (Section 2.3). Assuming that the state is available is often unrealistic. For example, some state variables may be inaccessible, the sensors used for measuring them may be inaccurate, or the cost of obtaining the exact value of the state may be prohibitive. On the other hand, Section 2.2 is instrumental for the derivation of the LQG control law expression in the input output representation case. This is in fact the real case of interest in an adaptive control context, since most identi cation techniques are based on the input output system description. Our intent is to highlight those aspects of LQG control referred to in the subsequent developments. Moreover, we introduce the assumptions and notations used throughout the thesis.

2.2 State space representation We preliminarily remind that this section is instrumental for the derivations in Section 2.3. This is the reason why we explicitly deal with the case when the noise acting on the system is a scalar process. Consider the discrete time system described by the following equation xt+1 = Axt + But + Cnt+1; where, according to standard notations, xt and ut respectively denote the state vector and the control input, and nt denotes the noise acting on the system. A, B and C are known matrices, whose dimensions are easily derived from the fact that ut and nt are real-valued scalars, and xt is a 0

2

almost surely.

Remark 2.1

Observe that the noise process fnt gt1 is not assumed to be Gaussian. The name Gaussian in LQG is actually slightly misleading as we assume full state observation and, in this case, the Gaussianity assumption is inessential. On the other hand, it is common to refer to linear quadratic stochastic control problems as LQG control ([4], [5]). 2 As for the initial condition x0 , it is assumed for simplicity that x0 = 0. However, due to the long-term average nature of the control problem, any initialization can be taken in place of x0 = 0, as long as it is F0 measurable. We are now in a position to precisely describe the in nite-horizon LQG design method, which we shall apply in an adaptive control context in the following chapters.

In nite-horizon LQG regulation problem for state space systems Consider the state space system

xt+1 = Axt + But + Cnt+1 (2.1) initialized with x0 = 0 and where the noise process fntgt1 satis es Assumption 2.1. Choose u := futgt0 in the set of the admissible control laws (

U= u :

NX ?1 t=0

(u2 + kx k2) = O(N); t

t

)

k2

kxN = o(N) a.s.,ut fx0 ; : : :; xtg-measurable; t  0 ;

so as to minimize the performance index NX ?1 [ xTt Qxt + ru2t ]; J(u) = lim sup N1 N !1

t=0

(2.2)

where the matrix Q is positive semide nite and the control weighting coecient r is strictly positive. This dynamic optimizationproblem aims at keeping the state of the system close to the origin without spending too much energy for the control action. Such an objective is formulated in terms of the minimization of a quadratic performance index since the quadratic form

10

Chapter 2. LQG optimal control problem

gives a reasonable cost to the deviations from the origin by inducing a high penalty for large deviations but a relatively small penalty for small deviations. Moreover, under standard assumptions, it has a closed-form solution u = fut gt0 due to the quadratic nature of index (2.2) in conjunction with the linearity of the dynamic constraint (2.1). Theorem 2.1 below summarizes the main facts concerning the in nite-horizon LQG regulation problem solution.

Theorem 2.1 ([6], [3], [7])

Under the assumption that (A; B) is stabilizable and (A; H) is detectable, where H is any matrix satisfying Q = H T H, we have that 1. there exists a unique positive semide nite solution P to the discrete time algebraic Riccati equation (DARE)   P = AT P ? PB(B T PB + r)?1 B T P A + Q: (2.3) In particular, P is strictly positive de nite if (A; H) is observable. 2. the solution to the in nite-horizon LQG regulation problem is given by the timeinvariant control law u := fKxtgt0; (2.4) where K = ?(B T PB + r)?1 B T PA: (2.5) The so-obtained closed-loop matrix F = A + BK is stable. Moreover, K is the unique optimal feedback gain matrix within the class of matrices L for which A + BL is stable. 2 We now introduce more technical results regarding LQG control to which we shall refer in the following developments.

Theorem 2.2 ([8], [3]) 1. The solution P to the discrete time algebraic Riccati equation (2.3) is analytic as a function of (A; B; H) in the set f(A; B; H) : (A; B) stabilizable and (A; H) detectable g: 2. P is the solution to the Lyapunov equation P = (A + BK)T P(A + BK) + rK T K + Q; where K is given by (2.5).

2.2. State space representation

11

3. The performance index value obtained by applying the optimal LQG control law (2.4) is given by the following expression J(u ) = min J(u) = trace(PCC T 2 ): (2.6) u2U

2

The LQG control design method can be applied to the more general case when one aims at keeping the state of the system close to a certain trajectory rather than close to the origin (tracking problem). The interested reader is referred to [3] for the formulation and solution to the LQG control problem in such a case. We next derive a closed-form expression for the LQG performance index value obtained by applying to system (2.1) the time-invariant control law uL := fLxtgt0; with A + BL stable.

Proposition 2.1

The value of the LQG performance index

NX ?1 [ xTt Qxt + ru2t ]; J(u) = lim sup N1 N !1

t=0

obtained by applying the stabilizing control law uL = fLxtgt0 is given by J(uL ) = trace(PL CC T 2); where matrix PL is the unique solution to the Lyapunov equation PL = (A + BL)T PL(A + BL) + rLT L + Q: (2.7) Proof. See Section 2.4. 2 This result is in perfect agreement with point 2. in Theorem 2.2. As a matter of fact, the optimal LQG control law (2.4) belongs to the family of time-invariant control laws UL = fu : ut = Lxt; t  0; with (A + BL) stableg for which Proposition 2.1 holds. So far, we have dealt with the general case when the noise acting on the system is described as a martingale di erence sequence. In order to get insight into the LQG design method, we now consider the particular case when the noise is a stationary process.

12

Chapter 2. LQG optimal control problem

The in nite-horizon LQG design method in the stationary case: the steady-state LQG control Assume that  the white noise process fntgt1 is stationary with variance E[n2t ] = 2 ;  yt := Hxt is a scalar signal and the in nite-horizon LQG performance index is expressed in terms of yt as follows NX ?1 [yt2 + ru2t ]; r > 0: J(u) = lim sup N1 N !1

t=0

In this case the in nite-horizon LQG design strategy has a simple interpretation: the optimal control law minimizes the weighted sum of the variances of the asymptotically stationary processes yt and ut, namely 2 2 tlim !1(E[yt ] + rE[ut ]); thus justifying the fact that the in nite-horizon LQG control problem is also known as steady-state LQG control problem. This is precisely explained in the following. Consider the state space system described by  xt+1 = Axt + But + Cnt+1 : (2.8) yt = Hxt Suppose that the stabilizing time-invariant control law uL = fLxt gt0 is applied to such a system. Then, the so-obtained stochastic processes ut and yt are asymptotically stationary due to the stationarity of the noise process. In particular, the variances of scalar processes  the 2  = LE xtxT  LT ut andyt can be expressed as a function of the variance of x , i.e. E u t t t   and E yt2 = HE xt xTt H T : Asymptotically, the variance of the process xt is given by the solution X to the Lyapunov equation X = (A + BL)X(A + BL)T + CC T 2 ; (2.9) and therefore lim (E[yt2 ] + rE[u2t ]) = HXH T + rLXLT : t!1 On the other hand, from Proposition 2.1 we have that J(uL) = trace(PL CC T 2 ); where matrix PL is the solution to the Lyapunov equation PL = (A + BL)T PL(A + BL) + rLT L + H T H: (2.10)

2.3. Input output representation

13

Observe now that from equations (2.9) and (2.10), the following chain of equalities is easily derived ?  ?  trace PL CC T 2 = trace (PLX) ? trace ?PL(A + BL)X(A + BL)T  (from (2.9)) T = trace (XP L ) ? trace X(A ?  + BL) PL(A + BL) T T = trace X(rL L + H H) (from (2.10)); and thus

?



?

J(uL ) = trace PL CC T 2 = trace X(rLT L + H T H)



2 2 = HXH T + rLXLT = tlim !1(E[yt ] + rE[ut ]): This means that the optimal in nite-horizon control law u = fKxtgt0 obtained by applying Theorem 2.1 minimizes the weighted sum of the variances of the asymptotically stationary process ut and yt . In the following section, when treating the input output representation case, we exploit the results presented so far for the state space case. The so-obtained solution then holds under the martingale di erence assumption on the noise process. For the sake of completeness, it is worth noticing that, under the stationarity assumption on the noise process, a polynomial approach has been proposed in the literature suitable for the LQG control problem in the input output representation case, which does not make direct use of a state space representation of the system ([9], [4], [5]).

2.3 Input output representation In standard applications, the system to be controlled is often described through its input output representation. In this section, we deal with discrete time SISO systems described by the ARX model yt = a1yt?1 + : : : + anyt?n + b1ut?1 + : : : + bm ut?m + nt;

(2.11)

which may represent a possibly unstable, nonminimum-phase system, including a time delay greater than one. Regarding this last aspect, it is important to note that i) it is not assumed that b1 6= 0; ii) the procedure for determining the expression of the optimal LQG control law is not a ected by the fact that b1 = b2 = : : : = bd?1 = 0 when the time delay is equal to

d > 1.

14

Chapter 2. LQG optimal control problem

This property of the in nite-horizon LQG control method turns out to be particularly useful in adaptive control applications. When the time delay is unknown, one can refer to model (2.11), where some of the coecients of the exogenous part will be possibly equal to zero. We assume that the stochastic noise process fntgt1 acting on the system satis es Assumption 2.1 and that system (2.11) is initialized at time t = 0 with yt = ut?1 = 0, t  0. Moreover, we make the standard `controllability' assumption on system (2.11) Assumption 2.2 qs A(#; q?1) and qs?1B(#; q?1 ) with s := max(n; m) are coprime, where

A(#; q?1) = 1 ? and

B(#; q?1 ) =

m X i=1

n X i=1

ai q?i

bi q?(i?1)

are polynomials in the unit-delay operator q?1 and # = [ a1 : : :an b1 : : :bm ]T is the system

parameter vector. 2 The in nite-horizon LQG regulation problem for the input output representation case can then be formulated as follows:

In nite-horizon LQG regulation problem for input output systems

Consider the system yt = a1 yt?1 + : : : + an yt?n + b1 ut?1 + : : : + bm ut?m + nt ; (2.12) initialized with yt = ut?1 = 0, t  0 and where the noise process fntgt1 satis es Assumption 2.1. Choose u := futgt0 in the set of the admissible control laws (

U = u:

NX ?1 t=0

)

[u2t + yt2 ] = O(N); yN2

= o(N) a.s., ut fy0 ; : : :; ytg-measurable; t  0 ;

so as to minimize the performance index

NX ?1 [ yt2 + ru2t ]; J(u) = lim sup N1 N !1

t=0

where the control weighting coecient r is strictly positive.

(2.13)

2.3. Input output representation

15

In order to solve this control problem by exploiting the results presented in Section 2.2 for the state space case, we represent system (2.12) in a state space form with the state accessible. De ning xt := [yt yt?1 : : :yt?(n?1) ut?1 ut?2 : : :ut?(m?1) ]T ; where n = maxfn; 1g, system (2.12) can be given the following state space representation of order q := n + m ? 1  xt+1 = A(#)xt + B(#)ut + Cnt+1; x0 = [0 0 : : :0]T (2.14) yt = Hxt with matrices 2 a1 : : : an?1 an b2 : : : bm?1 bm 3 6 1 7 0 ::: 6 7 ... ... 6 7 6 7 6 7 1 0 6 A(#) = 66 0 : : : : : : : : : 0 : : : : : : 0 777 ; 6 7 1 0 6 7 6 7 ... ... 4 5 1 0 2

B(#) =

6 6 6 6 6 6 6 6 6 6 6 6 4

3

2 b1 6 0 77 6 7 6 .. 7 6 . 7 6 0 77 ; C = 66 6 1 77 6 6 0 77 6 6 .. 75 4 . 0

3

1 0 77 .. 77 . 7 0 77 ; 0 77 0 77 .. 75 . 0





H = 1 0 0   0 ;

where a1 = 0 if n = 0. The LQG regulation problem for the system in input output representation (2.12) is now reformulated in terms of the introduced state space representation (2.14) as follows

16

Chapter 2. LQG optimal control problem

given

xt+1 = A(#)xt + B(#)ut + Cnt+1; choose the input sequence u = futgt0 2 U so as to minimize the performance index NX ?1 [ xTt Qxt + ru2t ]: J(u) = lim sup N1 N !1

t=0

where Q = H T H  0 and r > 0. In the case when n > 1 and m > 1, the state space representation (2.14) is non minimal (the order of system (2.12) is s = maxfn; mg, whereas the dimension of matrix A(#) is q = n + m ? 1 = n + m ? 1). On the other hand, in the following proposition it is shown that representation (2.14) is completely reachable - which means that the state space part added with respect to a minimal state space realization is in fact unobservable - and that the unobservable part is stable. Based on Proposition 2.2, we can then use Theorem 2.1 to determine the control law minimizing (2.13) when applied to system (2.12) by referring to the state space representation (2.14).

Proposition 2.2

Under the controllability Assumption 2.2, (A(#); B(#)) is reachable and (A(#); H) is detectable. Proof. See Section 2.4. 2 According to Theorem 2.1, the solution to the original LQG control problem has the following expression ut = 0 (#)yt +1(#)yt?1 +: : :+n ?1(#)yt?(n?1) +1 (#)ut?1 +: : :+m?1 (#)ut?(m?1); (2.15) where the q-dimensional coecients vector

(#) := [ 0(#) 1 (#) : : :n?1(#) 1 (#) : : :m?1 (#) ] is given by

(#) = ?(B(#)T P(#)B(#) + r)?1 B(#)T P(#)A(#); (2.16) P(#) being the unique positive semide nite solution to the discrete time algebraic Riccati equation P = A(#)T P ? PB(#)(B(#)T PB(#) + r)?1 B(#)T P A(#) + H T H: 



(2.17)

2.3. Input output representation

17

Remark 2.2

It is worth noticing that the introduction of the state space realization (2.14) is just instrumental for the derivation of the optimal LQG control expression. The solution to the LQG control problem for the input output representation case is in fact obtained by determining the control law coecients in expression (2.15) through equation (2.16) and the discrete time algebraic Riccati equation (2.17). In this regard, when the pair (A(#); H) is non completely observable, i.e. when s < q, the computational e ort needed for the calculation of the optimal control law coecients can be reduced by reconducting the solution of the DARE equation (2.17) with q  q dimensional matrices to the solution of a DARE equation with s  s dimensional matrices. This is just a computational matter, which, however, turns out to be signi cant in adaptive control where all the computations are performed on-line. We now precisely explain the details of the simpli cation. Consider the case when s < q. Being (A(#); H) non completely observable, there exists a matrix T(#) realizing the so-called canonical decomposition:   ~ = HT(#)?1 =  H~ o (#) 0  ~ = T(#)A(#)T(#)?1 = A~~oo (#) ~ 0 H(#) A(#) Ano (#) Ann(#) (2.18) where subscripts o and n respectively denote the observable and unobservable parts. Correspondingly, we get  ~o (#)  B ~ B(#) = T(#)B(#) = B~ (#) : (2.19) n

The feedback gain (#) in (2.16) can then be determined by referring to matrices (2.18) and (2.19) as follows

(#) = ~(#)T(#) (2.20) where ~ A(#); ~ ~ T P(#) ~ B(#) ~ + r)?1B(#) ~ T P(#) (2.21)

~(#) = ?(B(#) ~ being the unique positive semide nite solution to the DARE equation P(#) h

i

~ T P~ A(#) ~ ~ + r)?1 B(#) ~ + H(#) ~ T H(#): ~ B(#) ~ T P~ B(#) ~ T P~ ? P~ B(#)( P~ = A(#) Set

(2.22)

 ~ ~  P~ = PP~oo PP~on : no nn ~ B(#) ~ and H(#) ~ By direct substitution of the expressions in (2.18) and (2.19) for A(#), ~ ~ ~ into equation (2.22), it is easily seen that the submatrices Pnn(#), Pon (#) and Pno (#) are identically equal to zero, while as for the submatrix P~oo (#) it is the unique positive de nite

18

Chapter 2. LQG optimal control problem

solution ((A~oo (#); B~o (#)) is stabilizable and (A~oo (#); H~ o(#)) is observable) to the DARE equation i h Poo = A~oo (#)T Poo ? Poo B~o (#)(B~o (#)T Poo B~o (#) + r)?1B~o (#)T Poo A~oo (#)+H~ o (#)T H~ o (#): (2.23) Therefore, since P~nn(#), P~on (#) and P~no(#) are identically equal to zero, from expression (2.21) we get ~(#) = [~ o (#) 0], where ~o (#) is given by (2.24)

~o (#) = ?(B~o (#)T P~oo (#)B~o (#) + r)?1 B~o (#)T P~oo (#)A~oo (#): This nally implies that by equation (2.20) the feedback gain (#) can be determined as follows

(#) = [~ o (#) 0] T(#); where ~o (#) is obtained by solving equation (2.24) and the DARE equation (2.23). The described computational procedure for the determination of the control law coecients has a simple interpretation. It corresponds to the derivation of the control law expression by referring to a minimal state space realization of system (2.15) for which a state observer from the output and input signals is then needed. 2 We also note that from the technical Theorem 2.2 it follows that (#) is an analytic function of # 2 f# : (A(#); B(#)) stabilizable and (B(#); H) detectableg and that the value of the performance index (2.13) obtained by applying u = futgt0, with ut given by (2.15), is equal to J(u ) = trace(P(#)CC T 2 ): (2.25) By exploiting Proposition 2.1, we nally show that there exists a closed-form expression for the LQG performance index obtained by applying a stabilizing control law of the form ut = [ yt : : :yt?(n?1) ut?1 : : :ut?(m?1) ]T with n = maxfn; 1g to system (2.12). This result will then be exploited in Chapter 6 when dealing with adaptive LQG control.

Proposition 2.3

The value of the LQG performance index

NX ?1 [ yt2 + ru2t ] J(u) = lim sup N1 N !1

t=0

obtained by applying the stabilizing time-invariant control law u = f [ yt : : :yt?(n?1) ut?1 : : :ut?(m?1) ]T gt0

2.4. Proofs

19

to system

yt = a1yt?1 + : : : + anyt?n + b1ut?1 + : : : + bm ut?m + nt; with fntgt1 satisfying Assumption 2.1, is given by J(u ) = J(#; ) = trace(P(#; )CC T 2);

where matrix P(#; ) is the solution to the Lyapunov equation P = (A(#) + B(#) )T P(A(#) + B(#) ) + r T + H T H: Proof. It immediately follows from Proposition 2.1 by referring to the state space represen-

2

tation (2.14).

2.4 Proofs The proofs given in this section are all obtained by extension of the results presented in [7], [6] and [3].

Proof of Proposition 2.1 Observe rst that by (2.7) xT PL xN ? xT PL x0 0

N

= =

NX ?1 t=0

(xTt+1 PLxt+1 ? xTt PL xt)

NX ?1 t=0

f[(A + BL)xt + Cnt+1 ]T PL [(A + BL)xt + Cnt+1 ]

?xTt [(A + BL)T PL (A + BL) + rLT L + Q]xtg; and hence NX ?1 t=0

xTt Qxt = xT0 PL x0 ? xTN PL xN ? +2

NX ?1 t=0

NX ?1 t=0

(Lxt )T rLxt

[(A + BL)xt ]T PLCnt+1 +

NX ?1 t=0

C T PLCn2t+1 :

20

Chapter 2. LQG optimal control problem

P Then we obtain the following expression for JN (uL) = Nt=0?1 [ xTt Qxt + (Lxt )T rLxt]:

JN (uL ) = xT0 PL x0 ? xTN PL xN + 2

NX ?1

NX ?1

t=0

t=0

[(A + BL)xt ]T PLCnt+1 +

C T PLCn2t+1 ;

from which it nally follows that J(uL ) J(uL ) = lim sup N1 JN (uL) N !1 NX ?1 = lim sup N1 fxT0 PLx0 ? xTN PLxN + 2 [(A + BL)xt ]T PL Cnt+1 N !1 t=0 +trace(PL CC T ) We now show that uL belongs to U=fu :

NX ?1 t=0

N X t=1

n2t g:

(2.26)

(u2t + kxtk2 ) = O(N); kxN k2 = o(N) a.s., ut fx0 ; : : :; xtg-measurable; t  0g:

Due to the fact that ut = Lxt, we only need to prove that 1. kxN k2 = o(N) P 2. Nt=0?1 kxtk2 = O(N). This is shown next. 1. Since the time evolution of xt is governed by equation xt+1 = (A + BL)xt + nt+1 with A + BL stable, the norm of state xt can be bounded as follows

kxtk2  k1

t X i=1

 t?in2i ;

k1 and 0 <  < 1 being suitable constants. Observe now that from equation t t?1 X t ? 1 1 1 n2 = 1 X 2 2 t t t i=1 ni ? t t ? 1 i=1 ni ; and Assumption 2.1 (point ii)), we have that n2t = o(t) almost surely. Hence, t t 1X t?in2  X  t?i 1 n2 = o(1); a.s.;  i t i=1 i i i=1 from which by inequality (2.27) we get kxtk2 = o(t) almost surely.

(2.27)

2.4. Proofs

21

2. From inequality (2.27), we have that ?1 NX ?1 1 NX 2k 1 2 k x k t 2 N t=0 N t=1 nt ;

where k2 is an P appropriate constant. Since N1 Nt=1?1 n2t is almost surely bounded, we 1 obtain that N Nt=0?1 kxtk2 = O(1) almost surely. P

Since we have proven that uL belongs to set of control laws U, by Theorem A.1 in Appendix A we see that ?1 NX ?1 ?1 1 (NX 1 NX 2 ) (ln T k (A + BL)x k k(A + BL)xt k2) + ) [(A + BL)x ] P Cn =O( t t L t+1 N t=0 N t=0 t=0 p !1 0: =O( N1 N(lnN) + ) N! Finally, taking into account this last equation, Assumption 2.1 and the condition kxN k2 = o(N), from equation (2.26) the thesis follows. 2 1 2

1 2

1 2

Proof of Proposition 2.2

Observe rst that the thesis is trivially satis ed in the case when realization (2.14) is minimal, due to Assumption 2.2. In the following we therefore refer to the case when such a realization is non minimal, i.e. n > 1 and m > 1. Let us introduce rst the (n + mP ? 1)  (n + m ? 1) Sylvester matrix associated with the P polynomials qnA(#; q?1) = qn ? ni=1 aiqn?i and qm?1 B(#; q?1 ) = mi=1 biqm?i , i.e. 2

S(#) =

6 6 6 6 6 6 6 6 6 6 6 4

b1

b2 b1

b3 b2

1 ?a1 ?a2 1 ?a1

   bm b3   

bm ... b1 b2 b3    bm    ?an ?a2    ?an ... 1 ?a1 ?a2    ?an

9 3> > > = 7 7 7> 7> > 7; 7 79 7> 7> = 7> 7 5> > > ;

n : m?1

By direct inspection, it is easily seen that the following equations holds for any value of #: A(#)S(#) = S(#)A(#) and B(#) = S(#)B;

22

Chapter 2. LQG optimal control problem

where matrices A(#) and B are given by 2

13 0 77 .. 77 . 7 0 77 .. 77 : . 7 0 77 .. 75 . 0 By exploiting these equations, we have that the reachability matrix Kr (#) associated with the pair (A(#); B(#)) can be expressed as follows Kr (#) = [B(#) A(#)B(#) : : :A(#)n+m?1 B(#)] = S(#)[B(#) A(#)B(#) : : :A(#)n+m?1 B(#)] = S(#)K r (#); a1 : : : an?1 an 0 : : : 0 0 3 6 0 6 1 6 7 6 6 7 . . . . 6 6 7 . . 6 6 7 6 6 7 1 0 6 6 7 A(#) = 66 7; B = 6 ... ... 6 7 6 6 7 6 6 7 1 0 6 6 7 ... ... 5 6 4 4 1 0 2

where K r (#) is the reachability matrix associated with (A(#); B). Therefore the rank of Kr (#) is given by the rank of S(#) since the pair (A(#); B) is reachable for any #. From the coprimeness Assumption 2.2, we have that matrix S(#) is non singular (see e.g. [10]), from which it follows that the rank of Kr (#) is maximum, that is pair (A(#); B(#)) is reachable. We now turn to the detectability issue. Observe rst that, under the coprimeness Assumption 2.2, matrix A(#) can be obtained from matrix A(#) as follows A(#) = S(#)A(#)S(#)?1 ; and that the eigenvalues of A(#) are given by the roots of polynomial (q) := qn+m?1 + a1qn+m?2 + a2 qn+m?3 + : : : + anqm?1 : Since (q) can be rewritten in the following way (q) = qn+m?1?s [qsA(#; q?1)]; where qs A(#; q?1) is the polynomial introduced in Assumption 2.2 whose roots are the poles of system (2.12), it is then easily seen that matrix A(#) has s = maxfn; mg eigenvalues given by the poles of the system and n + m ? 1 ? s eigenvalues equal to zero. This means that the n+m ? 1 ? s eigenvalues added to the system when referring to the non minimal state space representation (2.14) are all stable (precisely identically equal to zero). Being any minimal state space realization of system (2.12) reachable and observable, from this fact it follows that pair (A(#); H) is detectable. 2

Chapter 3 Least Squares parameter estimation In this chapter, we revise those aspects of standard least squares identi cation which are relevant in an adaptive control context. As a matter of fact, we describe the properties of least squares estimation which hold true irrespectively of the excitation characteristics of the involved signals. This is fundamental in a closedloop setup, where the signals may be only partially exciting. This chapter is entirely focused on the case we deal with throughout the thesis, that is the case where the stochastic system to be identi ed is known to belong to an ARX model class.

3.1 Introduction In this chapter we highlight the properties of the standard least squares identi cation algorithm which hold true independently of the excitation characteristics of the involved signals. As a matter of fact, in adaptive control the parameter vector is estimated on-line, and the controller coecients are simultaneously adjusted as a function of the estimate according to a given controller design criterion. This complex identi cation-control interplay makes it generally impossible to guarantee the satisfaction of suitable excitation conditions that imply consistency in parameter estimation. The chapter is organized as follows. We rst describe our stochastic framework, where the system to be identi ed is known to be an ARX model subject to additive white noise (Section 3.2). This is preliminary to the explanation of the least squares identi cation method as a particular prediction error identi cation method (Section 3.3) and the derivation of its properties (Section 3.4). Finally, in Section 3.5 we deal with the Gaussian noise case and recall a fundamental and well known convergence result that has been proven via a

24

Chapter 3. Least Squares parameter estimation

Bayesian approach.

3.2 Mathematical framework Consider the SISO system yt = a1 yt?1 + a2 yt?2 + : : : + an yt?n + b1 ut?1 + b2 ut?2 + : : : + bm ut?m + nt;

(3.1)

which is known under the acronym ARX since it is composed of two regressions, one on the past values of the output signal yt (the AutoRegression part) and one on the past values of the input signal ut (the eXogenous part). Assume that system (3.1) is initialized at time t = 0 with deterministic values y0 , y?1 , : : :, y?(n?1), u0, u?1 , : : :, u?(m?1). Signal nt represents the e ect of disturbances on the given plant. We assume that fnt gt1 is a stochastic process characterized as indicated in the following

Assumption 3.1 fntgt1 is a martingale di erence sequence, i.e. n1; n2; : : : are random variables de ned on the probability space ( ; F ; P) such that for each t  1, nt is Ft measurable and E[nt =Ft?1] = 0, fFtgt0 being a family of non-decreasing -algebras F0  F1  : : :  F . 2 The system parameters are grouped in the vector # = [a1 a2 : : :an b1 b2 : : :bm ]T : Letting 't = [yt yt?1 : : :yt?(n?1) ut ut?1 : : :ut?(m?1) ]T be the observation vector, system (3.1) can then be given the usual regression-like form yt = 'Tt?1# + nt (3.2) to which we shall refer in the following developments. As for the model class, we conform to the usual paradigm that the structure of the ARX equation governing the dynamics of the system is known, i.e. integers n and m are a-priori given, but the true parameter vector # is not available. We in fact consider the family of predictors parameterized by the vector # = [a1 a2 : : :an b1 b2 : : :bm ]T : M^ (#) : y^t=t?1(#) = a1yt?1 + a2yt?2 + : : : + an yt?n + b1 ut?1 + b2 ut?2 + : : : + bm ut?m ; associated with the ARX models whose equation is a mimic of that of the true system (3.1).

3.3. Prediction error identi cation methods

25

3.3 Prediction error identi cation methods Given the set of candidate models fM(#); # 2 g parameterized as a model structure using the parameter vector #, the search of the best model within the set becomes a problem of estimating #. In this section, we shall give a very brief introduction to prediction error estimation, the aim being mainly to set the stage for the presentation of the least squares estimation method and its properties. We refer the reader to complete treatments (such as [11]) for an extensive presentation of this topic. Fix a time instant t and denote with Z t the data collected from the system up to time t, i.e. Z t = fy1 ; u1; y2 ; u2 ; : : :; yt ; ut g A parameter estimation method is a mapping from the data Z t to the set  of admissible parameterizations. The ability of a certain model, say M(#), to described the observed data can be evaluated by referring to the prediction errors given by i(#) = yi ? y^i=i?1 (#); i = 1; 2; : : :; t where y^i=i?1 (#) denotes the one-step-ahead prediction at time i for the model with parameterization #. A `good' model is one that is good at predicting, that is, one that produces small prediction errors when applied to the observed data. Perhaps the most common way to identify a `best' model in the model set is to x a scalar-valued positive function l() and to select the parameter vector # that minimizes

Vt (#; ) =

t X i=1

l(fi (#; ))

(3.3)

where fi (#; ) denotes the prediction errors ltered through a stable linear lter with transfer function L(z; ): fi (#; ) = L(z; )i (#; ): The parameter vector  indicates that the lter may depend on another set of parameters that are tuned in order to achieve a desired objective. The parameter estimate is then de ned as #^t = arg min Vt (#; ): (3.4) #2

The methods exploiting this way of estimating the parameter vector are known as prediction error identi cation methods (PEM). In general, the solution to (3.4) cannot be stated explicitely as a closed-form expression. Depending on the scalar function l(), iterative algorithms exist that can be used to compute an o -line solution to this minimization problem (see e.g. [11]). The least squares method is obtained as a particular case of (3.4) when l() in (3.3) is a quadratic function and the predictor y^i=i?1 (#) is linear in the parameter #. In this case, the

26

Chapter 3. Least Squares parameter estimation

minimizer (3.4) can be easily derived in a closed-form way. Moreover, such a solution can be evaluated recursively, thus making least squares attractive for applications to adaptive control.

3.4 The least squares identi cation method Let us consider the ARX predictor with parameterization # 2 t, and the normal equations admit a unique solution: P

t

?1 X 'i?1 yi : #^LS min V (#) = W t t = arg #2< t n m +

i=1

(3.6)

The convergence and consistency analysis of parameter estimation methods has, of course, a long history in the statistical literature. In the case of the least squares method, it is well known that #^LS t is consistent under suitable excitation conditions which, however, call for stationarity or quasi-stationarity conditions on the involved signals (see [11], [12]). One of the most general consistency results under the general Assumption 3.1 on the noise process and without assuming any stationarity conditions on the signals is supplied by Lai and Wei ([13] - Theorem 1, reported as Theorem 3.1 below). The fact that this result does not call

3.4. The least squares identi cation method

27

for any stationarity assumption makes it applicable to an adaptively controlled stochastic system. Moreover, Lai and Wei showed by means of an example that in the general case, when the regression vector 't is measurable with respect to Ft , their sucient conditions for parameter estimate consistency cannot be weakened.

Theorem 3.1 (properties of the LS estimate - [13], Theorem 1) Assume that the following conditions are ful lled:

i) fnt ; Ftg is a martingale di erence sequence with supt1 E[jntj2+ =Ft?1] < 1 for some

> 0, almost surely;

ii) ut is Ft-measurable.

Then,

t

t

i=1

i=1

T X 'i?1'T (# ? #^LS ) = O(lnmax (X 'i?1'T )); (# ? #^LS i?1 i?1 t t )

almost surely, which entails

(3.7)

!

ln max ( ti=1 'i?1 'Ti?1) 2 k# ? #^LS t k = O  (Pt ' 'T ) ; min i=1 i?1 i?1 P

almost surely. In particular, this implies that under the conditions t X 1. min ( 'i?1'Ti?1 ) ! 1 i=1

t X

2. lnmax (

i=1

t X

'i?1'Ti?1 ) = o (min (

i=1

the least squares estimate is consistent.

'i?1 'Ti?1)),

2

It is important to note that, besides giving the conditions for consistency and the corresponding rate of convergence, Theorem 3.1 provides an upper bound for the least squares estimation error (see equation (3.7)), which can be useful for proving stability and optimality results in adaptive control (see e.g. [14], [15] and [16]). Observe now that the least squares estimate can be given a recursive form (recursive least LS squares - RLS), enabling the computation of #^LS t from #^t?1 (see e.g. [17] and [12]), which makes its use attractive in adaptive control where parameter estimation is performed on-line. Precisely, the recursive version of (3.6) is given by

28

Chapter 3. Least Squares parameter estimation

Vt?1't?1 T LS LS (3:8:1) #^LS t = #^t?1 + 'T V ' + 1 (yt ? 't?1#^t?1); t?1 t?1 t?1 T (3:8:2) Vt = Vt?1 ? V'tT?1'Vt?1 ''t?1V+t?11 : t?1 t?1 t?1 inversion can be performed, and Note that if one sets Vt = Wt?1 for some t such that the Pt = V initiate the above algorithm at t with Vt and #^LS  t i=1 'i?1yi , then the estimate prot vided by the recursive equations coincides with the unique solution of the normal equations. More frequently, however, the equations are initialized at t = 0 with arbitrary initial conditions #0 and V0 > 0. It goes without saying that in this case Vt does not coincide with Wt?1 any more. Precisely, considering that the inverse equation of (3.8.2) is Vt?1 = Vt??11 + 't?1 'Tt?1; we have that Vt?1 = V0?1 + Wt . Therefore, in general, Vt?1  Wt. It is possible to see that the recursive least squares estimate #^LS t is in fact the minimizer of the regularized least squares cost function

Vt (#) =

t X i=1

[yi ? 'Tt?1 #]2 + (# ? #0 )T V0?1 (# ? #0 )

(3.9)

where (# ? #0)T V0?1(# ? #0) is the regularization term. Lai and Wei result still remains valid for such a recursively computed estimate (see [3], Theorem 4.1). In the case when the conditions 1: and 2: for consistency stated in Theorem 3.1 are not satis ed, there is no guarantee that the least squares estimate converges or that it even remains bounded ([18]).

3.5 The Gaussian noise case In this section we deal with the case when the white noise acting on the system is Gaussian and show that in such a case the least squares estimate converges for almost every true parameter vector except for a zero-measure set and, in addition, if the data covariance P matrix ti=1 'i?1'Ti?1 grows to in nity, the convergence is to the true parameters. This weakens, to a great extent, Lai and Wei conditions for consistency, which in fact also depend on the conditioning number of the data covariance matrix. The convergence analysis of the RLS algorithm in the case when the process noise fnt gt1 described in Assumption 3.1 satis es Assumption 3.2 fntgt1 is sequence of i.i.d. normally distributed random variables with mean E[nt] = 0 and variance E[n2t ] = 2 2

3.5. The Gaussian noise case

29

is thoroughly presented in [19], [20], [21], [22] and [23]. Following the lead originally given in [19] the convergence analysis can be performed in a Bayesian embedding context. This amounts to viewing the true parameter # as a random vector rather than an unknown constant. In this way, two are the sources of stochasticity in the analysis, namely randomness in noise and randomness in the true parameter. This leads to an enlarged probability space given by the product of the probability space of the noise times the auxiliary probability space of the parameter vector. This line of reasoning is just instrumental for the derivation of the results. The nal theorem (Theorem 3.2) below holds for a deterministic # . The main advantage of the Bayesian embedding approach is that, if # is assumed to be Gaussian, namely #  N (#; V ), and independent of the noise process fntg, the RLS equations (3.8.1) and (3.8.2) can be regarded as a Kalman lter applied to the following dynamic system   #t+1 = #t y = 'T # + n : t

t?1 t

t

As a matter of fact, by comparing the Kalman lter equations  Mt = Mt?1 + 'T VV t?1''t?1 + 2 (yt ? 'Tt?1Mt?1) t?1 t?1 T t?1 ;   V t ? Vt = Vt?1 ? 'T 1V' t?1''t?1V+t?12 t?1 t?1 t?1

where M0 = # and V0 = V , with the RLS equations (3.8.1) and (3.8.2) initialized with 2 LS 2 #^LS 0 = # and V0 = V = , it is easily seen that #^t = Mt and Vt = Vt= for any t. Since the Kalman lter recursively computes the conditional expectation and variance of # given the observations, it is then possible to study the convergence of the RLS algorithm via the powerful mathematical tool provided by the martingale convergence theory. Using this approach, in [19] and [20], it was proven that the RLS estimate generally converges. Further re nement of the stochastic lter interpretation, necessary for the rigorous use of the theory in control problems, were given in [21], [22]. Before precisely stating the RLS convergence result, we de ne the notions of excitation and unexcitation subspaces, originally introduced in [24]. This is in order to get more insight into the asymptotic properties of #^LS t . Since the information contained in P the data is conveyed to the identi cation algorithm via the observation vector 't , matrix ti=1 'i?1 'Ti?1 can be interpreted as the overall information over the time interval [0; t ? 1]. In general this information may be non uniformly distributedPint the parameter space. If x 2 Prf#g, by induction it is easily shown that at each time instant t the identi er will erroneously select # = # and the control u1 will be applied inde nitely. 2 Example 2 ([6])

Consider the scalar system

xt+1 = a xt + b ut + nt+1 ; where it is known only that # = [ a b ]T belongs to the nite set  = f[ 0 ? 1 ]T ; [ 1 1 ]T g. Let the noise process fntg be a sequence of i.i.d. normally distributed variables with zero mean and unitary variance. If the control objective is to minimize the in nite-horizon LQG performance index NX ?1 1 lim sup N [ x2t + 2u2t ]; N !1 t=0 then the optimal control law is given by ut =



0;

? 21 xt;

if # = [ 0 ? 1 ]T : if # = [ 1 1 ]T

If at each time t the adaptive controller applies the control law which is optimal for the least squares estimate #t obtained by minimizing the cost function

Vt (#) =

t X i=1

(xi ? axi?1 ? bui?1)2;

then, the control applied at time t is ut =



0;

? 21 xt;

if Vt ([ 0 ? 1 ]T )  Vt ([ 1 1 ]T ) : if Vt ([ 0 ? 1 ]T ) > Vt ([ 1 1 ]T )

Suppose that # = [ 0 ?1 ]T . Then the true system evolves according to xt+1 = ?ut +nt+1 . If at a certain time k the controller estimates incorrectly the true parameter to be #k = [ 1 1 ]T because Vk ([ 0 ? 1 ]T ) > Vk ([ 1 1 ]T ), the control applied will be uk = ?xk =2 and the real system will evolve according to xt+1 = x2t + nt+1:

40 Chapter 4. Certainty equivalence adaptive control: optimality and stabilizability issues As for the imaginary system, it will evolve identically, since xt+1 = xt + ut + nt+1 = x2t + nt+1 : Then, Vk+1 ([ 0 ? 1 ]T ) > Vk+1 ([ 1 1 ]T ), from which it is easily seen that the parameter estimate remains unchanged, i.e. #t = #k = [ 1 1 ]T , for all t  k. This is clearly undesiderable, not only because the true value # is [ 0 ? 1 ]T , but mostly because the control law ut = ? x2t applied for all t  k is sub-optimal (cost 2 versus cost 1 obtained with the optimal control law). To see that the condition Vk+1 ([ 0 ? 1 ]T ) > Vk+1 ([ 1 1 ]T ) can in fact happen with positive probability, observe that for k = 1, V1 ([ 0 ? 1 ]T ) = (x1 + u0)2 = n21 and V1 ([ 1 1 ]T ) = (x1 ? x0 ? u0)2 = (n1 ? x0 ? 2u0)2 : Therefore, the inequality V1 ([ 0 ? 1 ]T ) > V1 ([ 1 1 ]T ) is equivalent to 2n1 (x0 + 2u0) > (x0 + 2u0)2 ; which will occur with positive probability since n1 is Gaussian. 2 Example 3 ([30])

Consider the scalar system

xt+1 = a xt + b ut + nt+1 ; where the noise process fntg is a sequence of i.i.d. normally distributed variables with zero mean and unitary variance. The parameter vector # = [ a b ]T is unknown but we know that it belongs to the compact set  = f[ a b ]T : b = 8a=5 ? 3=5; a 2 [0; 1]g. Our control objective is to minimize the in nite-horizon LQG performance index NX ?1 25 1 lim sup N [ 24 x2t + u2t ]: N !1 t=0

The parameter estimate #t is obtained through the least squares algorithm, i.e. #t = P arg min#2 ti=1 (xi ? axi?1 ? bui?1)2 . Once #t has been determined, according to the certainty equivalence principle one applies the control input which is optimal for the system with parameterization #t. Suppose that at a certain instant point k #k = [ 1 1 ]T . Since the corresponding optimal control law is uk = ?5=8xt, the squared error at time k + 1 turns out to be (xk+1 ? axk ? buk )2 = (xk+1 ? axk ? (8a=5 ? 3=5)(?5=8xk))2 = (xk+1 ? 3=8xk)2 ;

4.3. The optimality issue

41

for all # 2 . The important feature of this expression is that it does not depend on the parameter # 2 . Hence, the term added at time k + 1 to the least squares index does not in uence the location of its minimizer, which therefore remains unchanged: #k+1 = #k = [ 1 1 ]T . As the same rationale can be repeated in the subsequent time instants, we have that the estimate sticks at [ 1 1 ]T , i.e. #t = [ 1 1 ]T , for all t  k. We now show that the least squares estimate can take value [ 1 1 ]T with positive probability even when the true parameter # is di erent from [ 1 1 ]T . Moreover, in such a case the optimal cost for the true parameter may be strictly lower than the incurred cost obtained by applying the control law optimal for the system with parameter [ 1 1 ]T . Assume that # = [ 0 ? 3=5 ]T . The least squares estimate at time k = 1 minimizes the cost (x1 ? ax0 ? bu0)2 = (?3=5u0 + n1 ? ax0 ? (8a=5 ? 3=5)u0)2 = (n1 ? a(x0 + 8=5u0))2 : Thus, #1 = [ 1 1 ]T whenever n1  (x0 + 8=5u0), which happens with positive probability since n1 is Gaussian. In addition, it is easily seen that the value of the LQG performance index for the real system is 5=3 whereas the optimal cost for the ideal system is 25=24. 2 A careful analysis of these examples reveals that the trouble comes from a straightforward use of the certainty equivalence principle. The observations obtained by applying the control law optimal for the current estimated system are in perfect agreement with the ones which would have been obtained if such an estimated system were the true system. Therefore, the certainty equivalence controller has no doubts about the correctness of the estimate and thus keeps it unchanged. In all the presented cases, convergence to a wrong value of the parameter leads to a sub-optimal performance, since in the long run the imaginary and the real systems behave identically, but di erently from the ideal system. In particular, Examples 2-3 show that the identi ability problem is signi cant in in nitehorizon LQG control and, in fact, in [28] it is proven that for a state space system subject to Gaussian noise the set of the parameterizations leading to optimality of LQG control is strictly contained in the set of the potential convergence points, a stronger result holding for linear quadratic control of deterministic systems ([31]). It is perhaps worth noticing that in minimum variance adaptive control this problem disappears since, despite of the fact that the asymptotically estimated parameter is normally di erent from the true parameter, the behavior of the real, imaginary and ideal systems turns out to be equal in the limit. Example 4 ([7])

Consider the SISO system governed by the equation yt+1 = a yt + b ut + nt+1 ; where the noise process fntg is a sequence of i.i.d. variables with zero mean and unitary variance. When # = [ a b ]T is known, the control law minimizing the cost function E[yt2 ]

42 Chapter 4. Certainty equivalence adaptive control: optimality and stabilizability issues is given by

 ut = ? ab yt : Suppose that # is unknown and that it is identi ed on-line. The control applied is then given by ut = ? ab t yt : t The diculties with identi ability occurs when #t ! #1 , where #1 is such that the asymptotic real system given by yt+1 = a yt ? b ab 1 ut + nt+1; 1 coincides with the imaginary system

yt+1 = nt+1 : For this two systems to be identical, the following condition has to be satis ed a = a1 ; b b1 which means that the adaptive control law becomes optimal in the long run even if #1 6= # . 2 Optimality of a minimum variance adaptive scheme based on stochastic approximation and least squares identi cation algorithms is in fact proven in [32] and [22] on the basis of the sole parameter estimate convergence, not necessarily to the true parameter. On the other hand, this result is mainly due to the particular form of the minimum variance performance index and cannot be extended to general control laws. Examples 1-3 above describe three cases where the parameter estimate converges, the real system is indistinguishable from the imaginary system, but the adaptive control scheme does not attain optimality. The performance index value obtained by controlling the estimated system with the corresponding optimal control law is in fact strictly greater than the performance index value obtained by controlling the true system with its optimal control law. So far, we have concentrated on the case when the parameter estimate converges, and we have shown that convergence in the estimate does not imply the achievement of optimality in control. On the other hand, convergence of the parameter estimate may not even be achieved, in general, and moreover the estimated model stabilizability, i.e. the absence of unstable pole-zero cancellations in the estimated model transfer function, is not guaranteed (see e.g. [33] and [34]). As for the case when the system to be controlled is minimum phase, one can compute stabilizing control laws regardless of the stabilizability property of the estimated system (see [35], [36], [32], [22], [24], [14]). The property that boundedness of the output

4.4. The stabilizability issue

43

implies boundedness of the input and the fact that prediction error identi cation methods are focused on the system output (they provide parameter estimates that adequately predict the system output) result in the optimality of minimum variance adaptive control schemes. On the other hand, it is well known ([33], [37], [38], [39], [40],[41], [42], [43], [44], [23], [45], [34]) that the possible occurrence of unstable pole-zero cancellations in the estimated model hampers the development of adaptive control algorithms for possibly nonminimum-phase systems. As a matter of fact, many well-established stability and performance results exist which are applicable under a uniform controllability assumption of the estimated model (see e.g. [46], [47], [23]). In particular, in [46] it is shown that once this assumption is taken for granted, stability can be proven for a linear quadratic adaptive control scheme by exploiting the closed-loop identi cation properties satis ed by standard identi cation algorithms. Unfortunately, standard identi cation methods do not guarantee the estimated model controllability property in the absence of suitable persistence of excitation conditions. Many contributions have appeared in the literature over the last decade to address this stabilizability problem in adaptive control. In the following section we revise the main classes of approaches proposed for solving this problem.

4.4 The stabilizability issue Standard parameter estimation techniques do not guarantee the estimated plant to be stabilizable, thus leading to a paralysis in the certainty equivalence control law selection. Guaranteeing that the estimated plant is stabilizable even in the long run is one of the main diculties encountered in the design of many indirect adaptive control algorithms. Moreover, even the nearly non stabilizable models must be avoided for the control law coecients to be bounded. Without any claim of completeness, we present below the main approaches proposed in the literature to address the estimated model stabilizability issue. i) Introduction of an appropriate identi cation algorithm

In this approach, the stabilizability problem is solved by using an appropriately designed identi cation algorithm such that the model associated with the parameter estimate is ensured to be stabilizable, at least asymptotically. The methods belonging to this class generally lead to easily implementable algorithms, but, unfortunately, they require a restrictive a-priori knowledge on the system parameter vector, which con nes their use to the cases in which the parameter uncertainty is highly structured. In [37] and [38], suitable identi cation algorithms are introduced which respectively forces the parameter estimate to converge to a preassigned convex region to which the true parameter belongs and such that all the models in that region are stabilizable. In [40], this strategy is extended to include any nite union

44 Chapter 4. Certainty equivalence adaptive control: optimality and stabilizability issues of convex sets such that the true parameter vector belongs to their union and all the parameterizations belonging to these sets are stabilizable. The identi cation procedure, however, becomes more complex. One has in fact to run a parameter estimator for each convex region, and select the estimate according to a suitable parameter estimator performance index which ensures that swapping between di erent regions does not occur in nitely often and that the key properties of the parameter estimator are retained. All these methods are only suitable for deterministic systems, subject to bounded noise. It is worth noticing that, in absence of noise, standard estimation algorithms ensure that the norm of the parameter estimate error is decreasing (see e.g. [26]). Therefore, the stabilizability condition can be ensured by an appropriate initialization of the estimation algorithm such that the parameterizations belonging to the region in the parameter space centred in the true parameterization and with radius given by the norm of the initial parameter estimate error are all stabilizable. On the basis of such a property and assuming a suitable a-priori knowledge on the system parameter vector, in [48] convergence of a pole-placement adaptive control algorithm scheme is in fact proven. In the case when a bounded noise is acting on the system, it is instead necessary to resort to ad-hoc methods as those proposed in [37] and [40]. 2 ii) A-posteriori modi cation of the parameter estimate

This approach ([33], [41], [42], [34]) secures the estimated model controllability by redressing the parameter estimate before using it to compute the control law. Precisely, an extra term is added to the least squares estimate which depends on the least squares covariance matrix in such a way that the closed-loop identi cation properties of the least squares algorithm (i.e. the ability of identifying the system dynamics excited in closed-loop) are preserved while securing the uniform controllability of the identi ed system. In this approach, no apriori knowledge on the region to which the true parameter belongs is assumed. An explicit method for the estimate modi cation based on a deterministic procedure is given in [42] for the bounded noise case, but it involves a computational e ort which highly increases with the order of the system. Therefore, an on-line implementation of these methods may be dicult. In [15], inspired by [42], a random-search method is proposed for modifying the parameter estimate so as to ensure uniform controllability. Such a method is easily implementable, but it is suitable for the case when the parameter estimate converges. 2 iii) Parameter estimate projection

This approach circumvent the region in the parameter space corresponding to uncontrollable models by projecting the estimate onto an a-priori known closed convex region containing the true parameter and such that all the models in this region are controllable. The scheme for projecting the estimate has to be appropriately designed so that the modi ed parameter

4.4. The stabilizability issue

45

estimate inherits some useful properties of the original estimate. In [26], a constrained parameter estimation algorithm is introduced which suitably incorporates a projection operator in the least squares algorithm so as to preserve all the properties of the standard least squares method. In [49], a constrained gradient parameter estimation is considered and a stability result is worked out for an adaptive pole placement scheme. This class of methods is applicable under the same conditions as approach i), since it requires a restrictive a-priori knowledge on the system parameter vector and is suitable for deterministic systems subject to bounded noise. Moreover, the actual complexity in implementing these methods depends on the constrained region, being in general marginal only when the constrained region is simple and regular (e.g. bounded by hyperplanes). 2 iv) Introduction of external probing signals

An alternative way of avoiding the pole-zero cancellation problem is to ensure that the parameter estimate converges to the true parameter value. This can be obtained by the introduction of suitable external probing signals. A rigorous analysis of an adaptive control scheme subject to an external probing signal has proven to be surprisingly tricky. The key problem is that it is dicult to prove that suitable excitation is enforced by the probing signals without proving before a stability result and that stability is obtained without proving before parameter consistency. Therefore, it is not easy to simultaneously establish consistency and stability without invoking a circular argument. The problem has been solved for deterministic systems in [50], [51] and [52], where parameter convergence subject to a persistency of excitation condition without requiring signal boundedness has been proven. A common feature of these methods is that the feedback law is held constant over certain time period. In [50], a particular probing signal is assumed which is a sum of sinusoids. In [51], the probing signal is not restricted to a special form, but a measure of the controllability of the unknown system is required. In [52], no special assumptions on the probing signal or system parameter are required. As for the case when the system to be controlled is stochastic, ensuring parameter consistency turns out to be more dicult. As a matter of fact, the results presented in the literature show that parameter consistency can be guaranteed once a certain stability result is ensured. In [53], [54] and [3], adaptive LQG control schemes are proposed, where stability is obtained through a certain control switching method aimed at keeping the growing rate of the input and output data bounded. Consistency is then achieved under the minimumphase ([3]) or the stability ([53], [54]) assumption on the true system. Also in [55], [56], [57] and [14], a dither noise is added to the system input to obtain consistency, but stability is actually ensured independently of the consistency in the parameter estimate by exploiting the minimum phase assumption on the system to be controlled. Consistency then turns out to be a consequence of the stability result and the presence of the dither noise. 2 For the sake of completeness, we mention a further class of approaches to take care of the

46 Chapter 4. Certainty equivalence adaptive control: optimality and stabilizability issues stabilizability issue, which does not require appropriate excitation conditions and is characterized by the design of the controller according to a method di erent from the standard certainty equivalence strategy. In [58], an overparameterized representation of the plant and the controller is used to de ne a control scheme taking the form of an adaptively tuned linear controller with an additional nonlinear time varying feedback signal. In [44], instead, the system is reparameterized in the form of an approximate model which is controllable regardless of the numerical value of the parameters. Finally, it is worth mentioning the interesting strategy introduced in [43] and [59] which adopts a logic-based switching controller to solve the stabilizability problem. All these alternatives to the standard certainty equivalence strategy, however, deal with the case when the system is deterministic, i.e. when there is no stochastic noise acting on the system.

Chapter 5 Singularity free adaptive LQG control: stability and optimality Guaranteeing the stability of adaptive control systems is reportedly a major issue in the design of adaptive controllers. In this chapter, a new adaptive LQG control scheme based on the certainty equivalence principle is introduced, which succeeds in adaptively stabilizing possibly nonminimum-phase discrete time systems without requiring extra excitation signals. This is obtained by introducing an appropriate identi cation method, which presents useful closed-loop identi cation properties while ensuring the uniform stabilizability of the estimated model. Moreover, a precise characterization of the performance achieved with the proposed LQG control scheme is given. Finally, the optimality issue is addressed. It is shown that optimality can be obtained by adding to the system input an asymptotically vanishing dither noise according to the so-called attenuating excitation technique. This approach, however, is useful only in the case when noise injection is feasible.

5.1 Introduction Since the appearance of the original contribution of Astrom and Wittenmark ([60]), the analysis of self-tuning control systems has constituted a challenging topic for theorists working in the area of adaptive control. The rst signi cant convergence results were obtained in the late 70s for minimum-variance control schemes. In particular, a global convergence result for an adaptive control system based on the stochastic gradient algorithm has been established in [35]. Extensions to the least squares algorithm are dealt with in [36] by introducing a

48

Chapter 5. Singularity free adaptive LQG control: stability and optimality

suitable modi cation to the standard RLS algorithm. More recently, Kumar has pointed out that such a modi cation is in fact not necessary in order to obtain optimality ([22]), i.e. the same performance as the one achievable under complete knowledge of the true plant. Optimality has also been proven for extended least squares-based minimum variance control schemes in [14]. The common result of all the above mentioned contributions is that a minimum-variance adaptive control system achieves optimality under various operating conditions. It is important, however, to emphasize that the minimum-variance control law calls for the restrictive - and often unrealistic - assumption that the plant is minimum-phase. Extending these results to more general control techniques suitable for nonminimum-phase plants has attracted much attention in the last decade. The corresponding analysis, however, is far more complex due to the possible occurrence of unstable pole-zero cancellations in the estimated model ([33], [61], [37], [39], [38], [42], [23] and [34]). In the general case of possibly nonminimum-phase systems, it is widely recognized (see e.g. [23]) that a drastic simpli cation in the analysis is achieved if the following two conditions are met: i) the estimated model is stabilizable (i.e. it does not present pole-zero cancellations in

the instability region) even asymptotically; ii) the identi cation algorithm exhibits suitable closed-loop identi cation properties.

In particular, guaranteeing the estimated model stabilizability constitutes the real stumbling block, closed-loop identi cation properties being ful lled under general conditions. In [47], Ren has shown that, under the assumption of the estimated model controllability (absence of pole-zero cancellations), the adaptive pole-placement controller self-tunes, i.e. it converges to the controller designed with complete knowledge of the plant parameters. On the other hand, while tuning holds for pole placement and minimum-variance control, it is also well known (see e.g. [62], [63], [6], [64], [31] and [28]) that such results are not extendible to general control laws based on the minimization of multistep performance indexes. As a matter of fact, the interplay between identi cation and control in a certainty equivalence adaptive control scheme may result in the convergence of the parameter estimate to a parameterization di erent from the true one in absence of suitable excitation conditions (see e.g. [60], [29], [6], [32] and [34]). When a cost criterion other than the output variance is considered, this identi ability problem results in a strictly suboptimal performance. On the other hand, a precise characterization of the achievable performance seems to be generally dicult to derive. In this chapter, we focus on adaptive in nite-horizon LQG control of discrete time SISO stochastic systems in the input output representation. Our main results are summarized as follows:

5.1. Introduction

49

 New identi cation methods simultaneously ensuring uniform stabilizability of the es-

timated model and suitable closed-loop identi cation properties (conditions i) and ii) above) are introduced. Starting from the observation that the standard least squares estimate satis es fundamental closed-loop identi cation properties, the estimated model stabilizability is ensured by the introduction of an appropriate extra term in the least squares performance index which penalizes those parameterizations corresponding to uncontrollable models without destroying the properties of the least squares estimate. Penalized least squares (PLS) identi cation methods are designed for the two cases when no a-priori knowledge on the system parameter is available (Subsection 5.3.2), and when a coarse estimate of the system parameter is a-priori available (Subsection 5.3.3).  Based on these properties, it is possible to prove a stability result for PLS-based adaptive control schemes where a control design technique able to stabilize a known controllable system is appropriately combined with one of the proposed PLS identi cation algorithms in a certainty equivalent fashion. Adaptive stabilization is therefore obtained for in nite-horizon LQG adaptive control. On the other hand, optimality, i.e. minimality of the in nite-horizon LQG performance index, is not guaranteed, and only a characterization for the achieved performance can be given (Section 5.4).

Convergence of the parameter estimate to the true parameter vector is a comfortable starting point in order to prove optimality, since it leads the certainty equivalence control law to selftune. Unfortunately, consistency in the estimate is not ensured, in general, due to the possible lack of excitation in closed-loop operating conditions. Introducing in the control scheme an asymptotically vanishing noise, namely the dither noise, seems to be a wise strategy for achieving consistency without upsetting the LQG performance index value (see [53], [54] and [3]).

 A suitable modi cation of the introduced adaptive LQG control law through the at-

tenuating excitation technique is proposed in Section 5.5. In the cases when applying dither noise is feasible, parameter estimate consistency is then ensured. This is instrumental for proving the optimality of the closed-loop system, which in fact is derived on the basis of the so-obtained self-tuning property in conjunction with the asymptotic vanishing characteristic of the added noise.

It is important to note that a suitable bound on the growing rate of the input has to be ensured for the dither noise to e ectively introduce the excitation needed for the parameter estimate consistency. In [3], an adaptive LQG control scheme is proposed, where this is obtained through a switching method. At certain time instants - adaptively selected on the basis the growing rates of the input and output data - the control law is switched over the LQG expression, a minimum variance expression or is set equal to zero. In this way,

50

Chapter 5. Singularity free adaptive LQG control: stability and optimality

consistency and optimality are both achieved. The only drawback is that the restrictive minimum-phase assumption is required. In [53] and [54], by replacing the minimum-phase condition with the stability assumption, similar results are obtained by an analogous procedure. On the other hand, none of these two conditions seems to be natural in the LQG control problem, as it is stated in [3]. In the solution proposed in Section 5.5, the required bound on the growing rate of the input is achieved by applying the PLS identi cation methods. Therefore, the obtained consistency and optimality results do not call for either the minimum-phase or the stability assumption on the system to be controlled. For the sake of completeness, we note that an alternative to address the optimality issue in LQG adaptive control ([6], [65] and [66]) has been presented in the literature. A cost-biased least squares parameter estimator incorporating a term which appropriately favors parameter estimates with smaller optimal costs is introduced. Stability and optimality are then proven without requiring the parameter estimate consistency. On the other hand, these results hold for a state space system subject to Gaussian noise whose parameter vector belongs to a known nite ([6]) or in nite but compact set ([65] and [66]) such that each parameterization in the set corresponds to a controllable model. For the sake of self-containedness, the control problem we deal with and the basic requirements for an identi cation algorithm to be successfully applied in a certainty equivalence adaptive control scheme are reviewed in the following section. The interested reader is referred to Chapters 2 and 4 for a more extensive treatment of such points. The presentation of the general setting we refer to is also useful in order to introduce the assumptions and notations used throughout the chapter, thus improving its readability.

5.2 Statement of the problem 5.2.1 The system and the control law We deal with the in nite-horizon LQG control of an unknown discrete time SISO system governed by the equation A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1; (5.1) where n X A(# ; q?1) = 1 ? ai q?i and

i=1

B(# ; q?1) =

m X i=1

bi q?(i?1)

5.2. Statement of the problem

51

are polynomials in the unit-delay operator q?1 and # = [ a1 : : :an b1 : : :bm ]T is the system parameter vector. System (5.1) is assumed to satisfy the following

Assumption 5.1 qsA(#; q?1 ) and qs?1B(# ; q?1) are coprime, where s = maxfm; ng, 2 which is known as `controllability property' (see e.g. [42] and [34]). Moreover, system (5.1) is initialized with yt = ut?1 = 0, for t  0. As for the stochastic noise process fntgt1 acting on the system, it is described as a martingale di erence sequence with respect to an increasing sequence of - elds fFtgt0, satisfying the following conditions

Assumption 5.2 i) sup E[jntj2+ =Ft?1] < 1, for some > 0, almost surely; t1 t X

1 2 2 ii) tlim !1 t i=1 ni =  > 0, almost surely.

2

As it is explained in Section 2.3 of Chapter 2, if the system parameter # were known, the control law u = futgt0 minimizing the in nite-horizon LQG performance index NX ?1 1 J(u) = lim sup N [ yt2 + ru2t ]; r > 0; N !1 t=0

would be given by where and

(5.2)

(# ; q?1) ut = (# ; q?1) yt ; (#; q?1) = 0(#) + (#; q?1) = 1 ?

nX ?1 i=1

mX ?1 i=1

i(#)q?i

i (#)q?i

are polynomials in the unit-delay operator q?1 whose coecients fi (#)gi=0;:::;n?1 and fi (#)gi=1;:::;m?1 depend on the parameter vector #. In particular, they are analytic functions of # in the set f# : qs A(#; q?1) and qs?1B(#; q?1) are coprimeg.

52

Chapter 5. Singularity free adaptive LQG control: stability and optimality

On the other hand, # is unknown and the control law is simply tuned to the estimated model according to the certainty equivalence approach. Denoting with #t = [ a1;t : : :an;t b1;t : : :bm;t ]T the estimate at time t of the parameter vector # , the control law to be applied is given by (#t; q?1) ut = (#t ; q?1) yt : (5.3) The conditions i) and ii) introduced in Section 5.1 should then be satis ed for the certainty equivalence approach to be successful in stabilizing the so-obtained closed-loop system. This is explained in the following subsection.

5.2.2 Natural requirements arising in the adaptive control context In the synthesis of a certainty equivalence adaptive controller, the feedback control to the process is tuned to the current value of the estimate with the understanding that the estimated model is viewed as if it were the true system, even though it might not be. On the other hand, the in nite-horizon LQG control law expression is derived under the controllability assumption. In particular, it would be no longer valid if the estimated model presents pole-zero cancellations in the instability region. The occurrence of this event would then lead to a paralysis in the control selection. The probability that the estimated system is non stabilizable in nite time is in fact zero under general conditions (see [2]), but such property is not ensured, in general, for the asymptotically estimated model. This is a well known problem in adaptive control which represents the main stumbling block to the development of adaptive control schemes suitable for possibly nonminimum-phase system. The reader is referred to Section 4.4 in Chapter 4 for a discussion on the stabilizability issue and the approaches proposed in the literature for its solution. From these considerations the importance of preserving the true system controllability in the asymptotically estimated model (condition i) in Section 5.1) is easily understood. On the other hand, it is intuitive that the controllability of the estimated model cannot be the only requirement for achieving stability and good performance of the adaptive control system. As a matter of fact, the estimated model must also exhibit good closed-loop identi cation properties. This assertion can be easily understood once the real system (i.e. the closed-loop system where the true system is controlled by the adaptive control law):  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 ; (5.4) (RS ) (# t; q?1 ) ut = (#t ; q?1) yt is represented as a variation system with respect to the imaginary system (i.e. the closed-loop system consisting of the estimated model in combination with the LQG adaptive controller):  A(#t ; q?1) yI ;t+1 = B(#t ; q?1) uI;t + nt+1 : (5.5) (IS ) (# t ; q?1) uI;t = (#t; q?1) yI ;t

5.2. Statement of the problem

53

By letting the estimation error be given by et = 'Tt [# ? #t]; where 't = [ yt : : :yt?(n?1) ut : : :ut?(m?1) ]T is the observation vector, the time evolution of vector wt+1 = [ yt+1 ut ]T generated by RS is in fact governed by the equation 



wt+1 = M(#t; q?1) wt + et +0nt+1 ; where matrix



[1 ? A(#t; q?1 )]q B(#t ; q?1)q (#t; q?1) [1 ? (#t; q?1)]q describes the dynamics of the imaginary system (see Figure 5.1). M(#t; q?1) =



θt et

perturbation system

nt+1

+ C(θt)

ut

yt+1

S(θt) ut

Figure 5.1: Real system versus imaginary system. Then, from this representation it is easily seen that the stability of the real system can be ensured by stabilizing the imaginary system provided that a suitable bound on the estimation error is simultaneously guaranteed (closed-loop identi cation condition ii) in Section 5.1). In Section 5.3, we in fact propose new identi cation methods which guarantee that both requirements naturally arising in an adaptive control context (conditions i) and ii) in Section 5.1) are satis ed. These identi cation algorithms are applicable under di erent a-priori knowledge on the system parameter vector (see Subsections 5.3.2 and 5.3.3).

5.2.3 Achievements The fundamental requirement to be satis ed when designing an adaptive control scheme is guaranteeing that the combination of the chosen control and identi cation methods is

54

Chapter 5. Singularity free adaptive LQG control: stability and optimality

successful in stabilizing the unknown system. In this connection, in Section 5.4 it is shown that the adaptive LQG control scheme obtained by suitably coupling one of the identi cation algorithms proposed in Section 5.3 and the in nite-horizon LQG design method behaves appropriately since i) the true system controlled by the adaptive LQG control law is pathwise L2 -stable. Moreover, the analysis of the proposed LQG adaptive control scheme shows that ii) the value of the LQG performance index (5.2) obtained by controlling in an adaptive

fashion system (5.1) equals the value of the performance index for the imaginary system (5.5), namely NX ?1 NX ?1 [ yt2 + r u2t ] = lim sup N1 [ yI2;t + r u2I;t ]: lim sup N1 N !1

t=0

N !1

t=0

This result can be rephrased as follows. If we consider the system governed by the estimated parameter #t and apply the corresponding optimal control law to it, we obtain the imaginary system (5.5). According to the very philosophy of the certainty equivalence principle, the actual control law is selected so as to impose a desired behavior to the estimated model. Therefore, result ii) says that the actual adaptive control system shares the desired behavior with the imaginary system. Result ii), however, means in no way that the adaptive control system achieves optimality, i.e. the value of the performance index which would be obtained if the true parameterization # were known. As a matter of fact, it is a known fact that this result is generally false for general certainty equivalent adaptive control schemes (see Section 4.3 in Chapter 4). On the other hand, the optimality objective can be achieved by the introduction of a probing signal in the stabilizing control law (5.3) in such a way that sucient excitation for parameter consistency is given without upsetting the performance index. In Section 5.5, we show the additional result: iii) the LQG adaptive control scheme achieves optimality when an appropriately vanishing dither noise is added to the control input (5.3).

5.3 Penalized least squares identi cation 5.3.1 Introduction In this section, we propose new identi cation methods which, di erently from standard identi cation algorithms, guarantee the estimated model stabilizability without requiring

5.3. Penalized least squares identi cation

55

any particular excitation condition. As for the least squares method, it does not ensure the stabilizability property of the estimated system, but presents interesting closed-loop identi cation properties. Di erently from [33], [42] and [34], where the estimated model uniform stabilizability (conditions i) introduced in Section 5.1) is secured by redressing the least squares parameter estimate before using it to compute the control law, our solution to the stabilizability issue is based on a direct modi cation of the standard least squares identi cation index. This modi cation is accurately designed in such a way that the stabilizability condition is ensured while preserving the closed-loop identi cation properties of the least squares method (condition ii) in Section 5.1). The new index is obtained by adding to the standard least squares index a suitable term which penalizes the parameterizations corresponding to uncontrollable models. The major point is that the minimization of the penalized least squares identi cation index cannot be accompanied by the obnoxious side e ect of stabilizability violation, since uniform controllability is guaranteed. Even more so, the valuable closed-loop properties of the least squares algorithm stated in Theorem 5.1 below are preserved. This is of crucial importance in adaptive control applications for proving stability and optimality results (see e.g. [14], [15], [16] and [67]). A bottom line of our study is that using penalized techniques - which are well known in the eld of operation research for the solution of constrained optimization problems (see e.g. [68]) - turns out to be attractive in the area of adaptive control. In Subsection 5.3.2 we introduce a penalized least squares identi cation method which explicitely penalizes the uncontrollable - or nearly uncontrollable - parameterizations by exploiting an appropriate measure of controllability ([69], [67] and [70]). This method is suitable for adaptive control applications where no a-priori knowledge on the system parameter is available. On the other hand, it should be noted that the corresponding penalized identi cation index has, in general, multiple local minima and its minimization is not straightforward. Therefore, it should be minimized by a global optimization algorithm (see e.g. [71], [72] and [73]). One can for instance resort to the multistart technique. In this method, a certain number of points # are rst selected (usually by means of a random procedure). Then, a standard local search method (such as a conjugate gradient method or a quasi-Newton method) initialized at the di erent points # is run. The output of this procedure is a bunch of local minima. Among these, one nally selects the minimum corresponding to the lower value for the function. Alternatively, one can resort to randomized algorithms such as simulated annealing, which seem to be more ecient than the multistart approach in the case of large dimension problems. When a coarse knowledge on the uncertainty region for the system parameter is available, one can adopt the penalized least squares identi cation method introduced in Subsection 5.3.3. This method turns out to be easily implementable, since it is recursive, and still it is shown to guarantee both conditions i) and ii) in Section 5.1 ([16] and [74]). Such a solution to the stabilizability issue belongs to the stream of methods which forces the estimates to belong to an a-priori known region containing the true parameter and such that all the models in

56

Chapter 5. Singularity free adaptive LQG control: stability and optimality

that region are stabilizable (see e.g. [37], [38] and [40]). The required a-priori knowledge is certainly a restrictive assumption - which may or may not be satis ed depending on the application at hand - but in the case such a knowledge is in fact available, the introduced identi cation algorithm represents an ecient and easily implementable way to circumvent the stabilizability problem. The interested reader is referred to Section 4.4 in Chapter 4 for a review of the main approaches presented in the literature to address the stabilizability issue. The results stated in Subsections 5.3.2 and 5.3.3 are worked out under the assumption that system (5.1) has non trivial autoregressive and exogenous parts, i.e. n > 0 and m > 1, since if n = 0 or m = 1 the stabilizability issue automatically disappears and standard least squares identi cation can then be used. Moreover, they are based on the properties satis ed by the least squares estimate under Assumption 5.2 on the noise process, which we recall hereafter in order to improve the readability of the present section (see Chapter 3 for more details). Letting 't = [ yt : : :yt?(n?1) ut : : :ut?(m?1) ]T be the observation vector, model A(#; q?1) yt+1 = B(#; q?1) ut (5.6) can be given the usual regression-like form yt+1 = 'Tt # and the cost of the regularized least squares algorithm for the estimate of parameter # can be expressed as follows

Vt (#) =

t X i=1

(yi ? 'Ti?1 #)2 + (# ? #0)T V0?1 (# ? #0);

(5.7)

where V0 = V0T > 0. Denote with #^LS t the minimizer of Vt (#). Then, under the general Assumption 5.2 on the noise process, #^LS t satis es the properties stated in Theorem 5.1 below, which holds true independently of the excitation characteristics of the involved signals. Theorem 5.1 (properties of the LS estimate - [13], [3]) Suppose that ut is Ft-measurable. Then, t

t

i=1

i=1

X ?1  LS ?1 T T TX (# ? #^LS t ) [ 'i?1'i?1 +V0 ](# ? #^t ) = O(lnmax ( 'i?1'i?1 +V0 )); a.s.; (5.8)

which entails

Pt ?1 ! ' ln ( i ? 1 'Ti?1 + V0 ) max i =1  LS 2 ^ ; k# ? #t k = O P min ( ti=1 'i?1 'Ti?1 + V0?1 )

almost surely. In particular, this implies that under the conditions

(5.9)

5.3. Penalized least squares identi cation 1. min (

t X i=1

'i?1'Ti?1 ) ! 1

t X

2. lnmax (

57

i=1

t X

'i?1'Ti?1 ) = o (min (

i=1

'i?1 'Ti?1)),

the least squares estimate is consistent. 2 We are now in the position to introduce the penalized least squares identi cation methods and to prove their properties.

5.3.2 The case of no a-priori knowledge on the system parameter In this subsection, a penalized least squares identi cation method for the case of no apriori knowledge on the system parameter is introduced and the related properties (uniform controllability of the estimated model through time, boundedness of the estimate and closedloop identi cation properties) are shown. A similar but simpler identi cation method is also introduced for the case when the noise acting on the system is Gaussian.

The penalized least squares identi cation index and the properties of the parameter estimate We now introduce a new performance index whose minimizer preserves the closed-loop identi cation properties of the least squares estimate #^LS t stated in equation (5.8) of Theorem 5.1, LS ^ but, in contrast with #t , results in an asymptotic controllable model. For a given parameterization # = [ a1 : : :an b1 : : :bm ]T , a standard measure of the controllability of model A(#; q?1) yt+1 = B(#; q?1 ) ut is expressed by the absolute value of the s A(#; q?1) = qs 1 ? Pn ai q?i and Sylvester resultant associated with the polynomials q i =1 P qs?1 B(#; q?1) = qs?1 mi=1 bi q?(i?1) with s = maxfn; mg, given by 2

Sylv(#) = det

6 6 6 6 6 6 6 6 6 6 6 4

b1

b2 b1

1 ?a1 1



bs

b2    b s ... b1 b2    bs    ?as ?a1    ?as ... 1 ?a1    ?as

9 3> > > = 7 7 7> 7> > 7; 7 79 7> 7> = 7> 7 5> > > ;

s (5.10) s?1

with ai = 0 if i > n and bi = 0 if i > m (jSylv(#)j is zero if and only if the model

58

Chapter 5. Singularity free adaptive LQG control: stability and optimality

A(#; q?1)yt+1 = B(#; q?1 )ut is controllable and jSylv(#)j  c > 0 means that the model has a certain degree of controllability - see e.g. [10]). Such a controllability measure can be exploited so as to modify the least squares performance index in such a way as to penalize uncontrollable models. Speci cally, the modi ed performance index is given by D t(#) = Vt (#) + tP (#); (5.11) where

1 P (#) = jSylv(#) j

is the penalization term and Vt (#) is the LS index (5.7). In the performance index (5.11) a major role is played by the scalar function t in front of the penalization term. In principle, this function should grow rapidly enough such that the penalization term P (#) asserts itself. On the other hand, the penalization term tP (#) should be mild enough to avoid destroying the valuable properties of the least squares performance index stated in Theorem 5.1. The heart of the penalized least squares method lies on a suitable selection of t in such a way that the two contrasting objectives described above are met simultaneously. Denote by #^t the minimizer of the performance index D t(#): #^t := arg min Dt (#) (5.12) #2 0 is a suitable random constant. ii) (# ? #^t)T

t X i=1

t

X 'i?1 'Ti?1(# ? #^t) = O(ln max ( 'i?1 'Ti?1)):

i=1

5.3. Penalized least squares identi cation

59

Proof. See Subsection 5.6.1.

2

Theorem 5.2 suggests that the function t must be adaptively selected in the light of the value taken by the observation vectors f'i?1 gi=1;:::;t and therefore is time varying. The fact that t is adaptively selected is not surprising. In fact, the least squares part Vt (#) of the performance index Dt (#) depends on the observation vectors generated by the system. On the other hand, the penalization part tP (#) must be well-scaled with respect to Vt (#) such that the minimizer of D t(#) still preserves some good properties of the minimizer of Vt (#) and, at the same time, tP (#) is not negligible with respect to Vt (#). From this, we see that t being dependent of the observation vectors is quite a natural result. In adaptive control applications, in addition to controllability, it is useful to secure that the estimate cannot escape to in nity. As a matter of fact, this property is not ful lled by the standard least squares algorithm (see e.g. [18]). In the penalized least squares algorithm, the boundedness of the estimate can be forced by adding an extra term which penalizes parameterizations with large norm. This leads to considering the performance index

Dt(#) = Vt (#) + tP (#) where

(5.14)

1 + #T Q#; (Q = QT > 0) P (#) = jSylv(#) j

is the new penalization term.

Theorem 5.3

Under the same assumptions as in Theorem 5.2 the parameter estimate #^t = arg min Dt(#); #2 0 for the parameter estimate #^t, uniformly in t, means that the estimated system presents a certain degree of controllability, even asymptotically. Theorem 5.7 (properties of the PLS estimate) i) There exists a (random) constant c > 0 such that jSylv(#^t )j  c, 8 t, almost surely. ii) Assume that ut is Ft-measurable. Then, t

t

i=1

i=1

X X (# ? #^t)T 'i?1'Ti?1 (# ? #^t ) = O((ln max ( 'i?1 'Ti?1))1+ ); a.s.;

which entails

almost surely.

!

t T 1+ ; k# ? #^tk2 = O (ln max ( Pit=1 'i?1 'iT?1)) min ( i=1 'i?1'i?1 ) P

(5.23)

(5.24)

66

Chapter 5. Singularity free adaptive LQG control: stability and optimality

Proof. See Subsection 5.6.1.

2

By comparing the bounds (5.9) and (5.24) for the parameter estimation error, it is easily seen that the convergence rate of the standard least squares estimate is slightly better than the one of the penalized least squares estimate due to the exponent 1+. On the other hand, the least squares algorithm does not guarantee the estimated model controllability nor even the parameter estimate to be bounded (see [18]) in absence of the appropriate excitation conditions (points 1. and 2. in Theorem 5.1). The fact that the penalized least squares estimate obtained by minimizing the cost function (5.14) or (5.20) simultaneously satis es a closed-loop property similar to the least squares estimate and the boundedness property turns out to be useful for its application to adaptive control. As a matter of fact, on the basis of such properties it is possible to show that the estimation error is `small' in a sense precisely stated in the following section (see Lemma 5.2). As it has been explained in Subsection 5.2.2, this is fundamental in order to adaptively stabilize the unknown true system.

5.4 Adaptive stability and performance analysis The objective of the present section is to study the stability and the performance achieved by coupling the in nite-horizon LQG control law with one of the identi cation algorithms presented in the previous section. First, we prove that stability is obtained under general conditions. Then, we turn to the performance evaluation. The result we prove here is that the performance of the adaptive LQG control system are the same of that of the so-called imaginary system (see point ii) in Subsection 5.2.3). This, however, does not imply that optimality is achieved. The optimality issue is postponed to the next Section 5.5. We start with a standard observation in adaptive control concerning the time variability of the estimated system. Since the parameter estimate is time-varying and the control law is tuned to such an estimate, an adaptive control system is always a time-varying system. On the other hand, it is well known that, in the case of time-varying systems, guaranteeing a stability property at each time point for the \frozen dynamics" does not imply that the overall time-varying system has a stable dynamics. This basic problem can be circumvented by updating the estimate at a slower rate than the updating of the system variables. This is a simple way to robustify the adaptive scheme so as to avoid the fact that the possible high-frequency uctuations of the estimates might hamper the overall stability of the closedloop control system. Such a strategy, known as estimate with freezing e ect, is for instance exploited in [75] and [76].

5.4. Adaptive stability and performance analysis

67

Following this idea, we de ne #t =



#^t; if t = ti #t?1; otherwise,

(5.25)

where the update time instants fti gi0 are obtained by the recursive equation ti+1 = ti +Ti initialized with t0 = 0 (recall that #^t is the minimizer of Dt(#) in equation (5.14) or (5.20), depending on the penalized identi cation method used). The time interval Ti is chosen so as to stabilize the time-varying estimated system. This is explained next. Consider the time-varying estimated system (AIS )



A(#t; q?1 ) yA;t+1 = B(#t ; q?1) uA;t ; (#t; q?1) uA;t = (#t ; q?1) yA;t

(5.26)

which in the following we refer to as the autonomous imaginary system (AIS ), accordingly to the notations introduced in Chapter 4 and recalled in Subsection 5.2.2. By letting xt := [yA;t : : :yA;t?(n?1) uA;t?1 : : :uA;t?(m?1) ]T , system (5.26) can be given the state space representation xt+1 = F(#t) xt ; where 3 2 a1 + b1 0(#) : : : an + b1n?1(#) b2 + b1 1 (#) : : : bm + b1m?1 (#) 7 6 1 7 6

F(#) =

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

... 0(#)

1 :::

0 n?1(#)

1 (#) 1

:::

m?1 (#)

... 1

0

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

:

(5.27) Choose now a constant  < 1 (contraction constant). The time interval Ti is then de ned as (5.28) Ti := inf f 2 Z+ : kF(#ti ) k  g (note that such a Ti exists since by de nition (5.25) #ti = #^ti corresponds to a controllable model - Theorems 5.3 and 5.7). In this way, the time-varying system (5.26) is kept constant

68

Chapter 5. Singularity free adaptive LQG control: stability and optimality

until its transition matrix is contracted by a factor , whence guaranteeing its stability. The fact that Ti is selected so as to stabilize the estimated system can be intuitively motivated as follows. In adaptive control, the true system is not known. Consequently, in an attempt to stabilize the true system, one stabilizes the estimated model. This will eventually result in the stabilization of the true system, provided that the estimated model accurately describes the behavior of the true system, at least in the long run. In Theorem 5.8 below we shall prove that the control law tuned to the estimated parameter with freezing (5.25) is in fact able to stabilize the unknown true system. The proof of Theorem 5.8 is based on the following technical Lemmas.

Lemma 5.1

The autonomous system xt+1 = F(#t) xt is almost surely exponentially stable, uniformly in time: kxtk  M t?t kxt k; for all t; t ; t  t; where M > 0 and 0 <  < 1 are suitable random constants. Proof. See Section 5.6. 2

Remark 5.1 Note that the qualifying feature of the above statement is that the stability is exponential, uniformly in time, the stability of xt+1 = F(#t)xt being already secured by the selection of Ti as stated in (5.28). 2

Lemma 5.2

The estimation error et = 'Tt (# ? #t) satis es the following bound almost surely N X t=0; t62BN

jet jp = o (

N X t=0

k'tkp ); 8p  2;

where BN is a set of instant points which depends on N, whose cardinality is upper bounded by a constant CB for any N. Proof. See Section 5.6. 2 We are now in a position to precisely state the L2 -stability result for the unknown true system (5.1) controlled by tuning the control law (5.3) with the estimator (5.25).

Theorem 5.8 (L2 -stability)

The LQG adaptive control scheme  A(#; q?1) yt+1 = B(# ; q?1) ut + nt+1 (#t; q?1) ut = (#t; q?1) yt

(5.29)

5.4. Adaptive stability and performance analysis

69

is almost surely L2 -stable: NX ?1   1 yt2 + u2t < 1: lim sup N N !1 t=0

Proof. See Subsection 5.6.2.

2

Remark 5.2

It is perhaps worth noticing that such a stability result still holds when a control design method di erent from LQG is used. By direct inspection of the proof of Theorem 5.8, it is in fact easily seen that the only required condition is that the control law is able to stabilize a known possibly nonminimum-phase controllable system, which is not a restrictive assumption. 2 We now turn to study the performance of our adaptive control scheme. By running the adaptive control system given by the plant (5.1) controlled with the adaptive P control law (5.3), we obtain a cost for the control performance index given by lim supN !1 N1 Nt=0?1 [ yt2 + r u2t ]. If we consider the system governed by parameter #t and apply to it the corresponding optimal control law, we obtain the imaginary system (5.5). In order to avoid confusion, it is perhaps worth noticing that #t in the equation (5.5) governing its evolution is by de nition the parameter estimate (5.25) obtained on-line by processing the data generated by the true system (5.1) controlled through equation (5.3). Therefore, it is not based on the input uI;t and the output yI ;t of the imaginary system. Our result then states that the cost associated 1 PN ?1 [ y2 + r u2 ], equals the actually with this imaginary system, namely lim sup I ;t N !1 N t=0 I;t P incurred cost lim supN !1 N1 Nt=0?1[ yt2 + r u2t ]. We also note that this result means in no way that the adaptive control system achieves optimality, i.e. the value of the performance index which would be obtained if the true parameterization # were known. On the other hand, the statement of Theorem 5.9 is in perfect accordance with the certainty equivalence philosophy, since one imagines that the estimated model is the true system and in fact the performance achieved for the real system are the same as that for the imaginary system.

Theorem 5.9 (performance)

The value of the LQG performance index for the real system  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 (#t; q?1) ut = (#t ; q?1) yt equals the value of the performance index for the imaginary system  A(#t; q?1) yI ;t+1 = B(#t ; q?1) uI ;t + nt+1 (#t; q?1) uI ;t = (#t; q?1) yI ;t

70

Chapter 5. Singularity free adaptive LQG control: stability and optimality

almost surely: NX ?1  NX ?1    yt2 + r u2t = lim sup N1 yI2;t + r u2I;t : lim sup N1 N !1

Proof. See Subsection 5.6.2.

t=0

N !1

t=0

2

5.5 Optimality In Section 5.4, it has been shown that the proposed LQG adaptive control scheme is stable and secures certain performance properties. On the other hand, its optimality is not guaranteed. This is not a surprising result for a certainty equivalent control scheme (see Section 4.3 in Chapter 4). As a matter of fact, the control law to be applied is selected so as to impose a desired behavior to the estimated system and therefore, according to such a philosophy, the best expected result is that the real system in fact behaves as the imaginary system. Therefore, a convenient way to achieve optimality - i.e. the behavior that would be obtained under complete knowledge of the true system - is to somehow ensure that the parameter estimate be consistent. The simultaneous achievement of consistency in the estimates and optimality in the control performance is however a delicate problem in general. In order to obtain consistency, one could resort to the introduction of probing signals in the control system. The introduced signals should be suitably selected so as to avoid that the LQG performance index is di erent from its minimal value, otherwise the optimality objective is failed. In [53], [54] and [3], the upsetting of the performance index value is in fact avoided by using an appropriately diminishing dither noise. On the other hand, achieving consistency is not straightforward, the key problem being that it is dicult to prove that suitable excitation is enforced by the injected noise without before proving a stability result. For the attenuating excitation technique to be e ective a suitable bound on the growing rate of the input has in fact to be ensured (Theorem 6.2 in [3]). Our solution to the optimality issue then consists in combining the use of the penalized identi cation methods introduced in Section 5.3 with the attenuating excitation technique. As a matter of fact, we show that the adaptive LQG control scheme proposed in Section 5.4 which is based on the penalized identi cation methods - attains optimality when an asymptotically vanishing dither noise is added to the control input. We rst prove that the stability result stated in Theorem 5.8 still holds true for the new adaptive LQG control scheme. On the basis of this property, consistency in the parameter estimate is obtained by applying the above mentioned Theorem 6.2 in [3]. Optimality nally follows from the consistency and stability properties. According to the attenuating excitation approach, we apply an adaptive control law taking

5.5. Optimality the following form where

71 ut = uet + vt ;

(5.30)

uet = (#t ; q?1) yt + [1 ? (#t; q?1)] ut = 0 (#t)yt + : : : + n?1(#t)yt?(n?1) + 1 (#t) ut?1 + : : : + m?1 ut?(m?1) is the standard certainty equivalent control input given in equation (5.3) tuned to the parameter estimator (5.25), and vt is the dither noise precisely described hereafter. Letting fdtgt0 be a sequence of i.i.d. random variables with continuous distribution, independent of fntgt1 and satisfying E[dt] = 0; E[d2t ] = 1; jdtj  K; K > 0; the dither noise fvt gt0 is given by 1 (5.31) vt = (t +dt1) ;  2 (0; 4(maxfm; ng + n) ): It can be shown that fvtg satis es the following condition t?1 X 1 ? 2 v2 = 1; lim t!1 t1?2 i=0 i

(5.32)

which is fundamental for proving optimality (this result is originally shown in [3] and the proof is reported in Proposition A.1, Appendix A, for the sake of completeness). In order to prove that the pathwise L2 -stability result stated in Theorem 5.8 for system (5.1) controlled by (5.3) still remains valid for the closed loop system  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 ; (#t; q?1) ut = (#t; q?1) yt + vt where system (5.1) is controlled by the control law (5.30), we preliminarily need to show that the properties of the penalized least squares estimates (5.15) and (5.17.5), derived in Theorems 5.3 and 5.7, are preserved. This is explained next. Without loss of generality, we may assume that

Assumption 5.5 the family of -algebra fFtgt0 introduced in Assumption 5.2 is rich enough such that both nt and vt are Ft-measurable. 2 The control law (5.30) turns out to be Ft-measurable, thus implying that the properties of

the least squares estimate stated in Theorem 5.1 still hold. From this it immediately follows

72

Chapter 5. Singularity free adaptive LQG control: stability and optimality

that the penalized least squares estimate properties stated in Theorems 5.3 and 5.7 are preserved, since they are only based on Theorem 5.1 (see their proofs in Subsection 5.6.1). By exploiting the penalized least squares estimates properties in conjuction with condition (5.32) satis ed by the dither noise vt , the following stability result can be proven for the unknown true system (5.1) controlled by control law (5.30). Theorem 5.10 (L2 -stability) The LQG adaptive control scheme  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 ; (#t ; q?1) ut = (#t ; q?1) yt + vt where #t is given by the estimator (5.25) and fvtgt0 is de ned by (5.31), is almost surely L2 -stable: NX ?1   yt2 + u2t < 1: lim sup N1 N !1

t=0

2 We are now in the position to prove the estimate consistency and the optimality of the LQG adaptive control scheme. Such a proof is based on Theorem 6.2 in [3], which is reformulated below according to our notations the sake of clarity. Proof. See Subsection 5.6.3.

Theorem 5.11 ([3], Theorem 6.2)

Suppose that for system (5.1) the following conditions are satis ed: 1. fnt; Ftg is a martingale di erence sequence with t 1X 2 2 sup E[jntj2+ =Ft?1] < 1, for some > 0 and tlim !1 t ni =  > 0 almost surely. t1 i=1 qsA(# ; q?1) and qs?1B(# ; q?1) are coprime, where s = maxfm; ng.

2. Moreover, assume that the diminishing excited control (5.30) is applied, where uet is Ft0 = fni ; 1  i  t; vj ; 0  j  t ? 1g-measurable and satis es t?1 X 3. 1t (uei )2 = O(1) almost surely, i=0

and where fvtgt0 is given by

vt = (t +dt1) ;  2 (0; 4(s 1+ n) );

5.5. Optimality

73

fdtgt0 being a sequence of i.i.d. random variables with continuous distribution, independent of fntgt1 and satisfying E[dt] = 0; E[d2t ] = 1; jdtj  K; K > 0:

Then there exists a constant c > 0 such that t X

min ( for t large enough, almost surely.

i=1

'i?1'Ti?1 )  c t1?(s+n)2 ;

2

Theorem 5.12 (consistency and optimality)

Consider the LQG adaptive control scheme  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 ; (#t; q?1) ut = (#t; q?1) yt + vt where #t is the estimator (5.25) with #^t given by (5.15) or (5.17.5) according to the available a-priori knowledge and fvtgt0 is de ned by (5.31). Then, the following properties are satis ed i) the parameter estimate #t is consistent lim # = # ; t!1 t almost surely, with convergence rate 8   ln t ) ; > if #^t is given by (5.15) O ( > ? s n  > t < k#t ? #k = >    > (ln t)  > ; if #^t is given by (5.17.5) : O t?s n  1 ( + )2

1+

1 2

1 2

1 ( + )2

ii) the adaptive control scheme attains optimality: NX ?1  NX ?1    lim sup N1 yt2 + ru2t = lim sup N1 (yt )2 + r(ut )2 ; N !1 N !1 t=0 t=0

almost surely, where



A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1

(# ; q?1) ut = (# ; q?1) yt is the closed-loop system where the true system is controlled by the optimal LQG control law.

74

Chapter 5. Singularity free adaptive LQG control: stability and optimality

2

Proof. See Subsection 5.6.3.

Remark 5.3

Recall that in the Gaussian noise case, a speci c identi cation criterion has been proposed in Subsection 5.3.2 for the case when no a-priori knowledge on the system parameter is available. Therefore, the properties of the corresponding parameter estimate stated in Theorem 5.5 should be veri ed for the results in Theorems 5.10 and 5.12 to remain valid. The proof of Theorem 5.5 is based on Theorems 5.1 and 5.4. As for Theorem 5.1, it has already been observed that it still holds, since the control law (5.30) turns out to be Ft -measurable. On the other hand, the measurability condition of the observation vector 't with respect to (y1 ; : : :; yt) required in Theorem 5.4 is no longer satis ed. It is just a technical issue, however, to extend the results in Theorem 5.4 to the case when a dither noise is acting on the system. 2

5.6 Proofs 5.6.1 Proofs of Section 5.3

Proof of Theorem 5.2

Observe rst that equation t X i=1

'i?1 yi = [

is easily derived from the de nition

t X i=1

?1 'i?1'Ti?1 + V0?1]#^LS t ? V0 #0

(5.33)

t

X (yi ? 'Ti?1#)2 + (# ? #0)T V0?1 (# ? #0)g: #^LS min f t = arg #2< n m +

i=1

By using expression (5.33), the following equality can then be proven to hold for all # 2 0 such that 1 ? V (#^LS )  k; 8 t: Dt(#^t ) ? Vt (#^LS (5.39) t ) = Vt (#^t) + ^t ) t t Sylv(# 1 Since Vt (#^t) ? Vt (#^LS t )  0, we then have jSylv(#^t )j  k; 8 t: If we set c = k1 , part i) immediately follows. T X 'i?1'T + V ?1](# ? #^LS ) + (#U ? #^LS i?1 U;t ) [ U U ;t 0

78

Chapter 5. Singularity free adaptive LQG control: stability and optimality

Part ii)

Taking into account that

1

jSylv(#^t )j

 0, equation (5.39) gives: t X

T ?1 LS LS T Vt (#^t) ? Vt (#^LS t ) = (#^t ? #^t ) [ 'i?1 'i?1 + V0 ] (#^t ? #^t )  k; 8 t: i=1

Equation (5.40) implies that

(5.40)

s

k k#^t ? #^LS t k   (V ?1) ; 8 t; min 0

(5.41)

min (V0?1 ) being the minimum eigenvalue of the positive de nite matrix V0?1. LS Since #^LS t ! #^1 (properties i) and ii) in Theorem 5.4), it follows that fk#^tkg is uniformly bounded. Part iii)

Consider the equation t

X (# ? #^t )T [ 'i?1 'Ti?1 + V0?1 ](# ? #^t) (

i=1

t

?1  LS T TX  2 (# ? #^LS t ) [ 'i?1 'i?1 + V0 ](# ? #^t ) T + (#^LS t ? #^t) [

i=1

t X i=1

)

'i?1'Ti?1 + V0?1](#^LS t ? #^t)

:

Since in view of equationP(5.40) and equation (5.8) both terms in the right-hand-side are almost surely O(ln max ( ti=1 'i?1 'Ti?1)), we get t

t

i=1

i=1

X X (# ? #^t)T [ 'i?1'Ti?1 + V0?1 ](# ? #^t) = O(ln max ( 'i?1 'Ti?1))

almost surely, i.e. point iii). Part iv)

We prove that #^E;t ! #E by contradiction. Suppose that #^E;t does not converge to #E . Then,  considering that #^LS E;t ! #E (point ii) of Theorem 5.4), there would exist a positive constant M and a sequence fk ; k = 0; 1; : : :g of time instants such that k#^E;k ? #^LS E;k k  M; 8 k. LS ^ ^ Project now vector #k ? #k onto the excitation and unexcitation subspaces, and, by using Schwarz inequality, rewrite equation (5.40) at time instants k 's as follows

5.6. Proofs

79 k

?1 T LS TX k  (#^k ? #^LS k ) [ 't?1't?1 + V0 ](#^k ? #^k ) t=1

T  (#^E;k ? #^LS E;k )

k X

k

^LS T X 't?1'Tt?1 (#^U;k ? #^LS ^ 't?1'Tt?1(#^E;k ? #^LS U;k ) E;k ) + (#U ;k ? #U ;k )

t=1 t=1 k k T X 'i?1 'T (#^E; ?#^LS )] 21 [(#^U; ?#^LS )T X 'i?1 'T (#^U ; ?#^LS )] 12 : ) ?2[(#^E;k ?#^LS k E;k k U ;k k U ;k i?1 E;k i?1 i=1 i=1

(5.42) k  M, the rst term in the right-hand-side is bounded from below by Since k#^E;k ? #^LS E;k (

M2

) T k (#^E;k ? #^LS (#^E;k ? #^LS E;k ) E;k ) X T : 'i?1'i?1 ^ k#^E;k ? #^LS k#E;k ? #^LS E;k k i=1 E;k k

^LS

k ?#E;k ) belongs to the As for the term within the brackets, by observing that xk := k(##^E; E; ?#^LS k

^

k T X 'i?1 'T x lim x i?1 k k!1 k i=1

excitation subspace E , we have intuitive result see the Lemma in [24]). Thus, k

k

E;k

= 1 (for a rigorous proof of this

T X 'i?1'T (#^E; ? #^LS ) = 1: ^E;k ? #^LS ) lim ( # k E;k E ; i?1 k k!1 i=1

(5.43)

As far as the second term on the right-hand side of equation (5.42) is concerned, by De nition 3.1 of unexcitation subspace (Section 3.5 of Chapter 3) and inequality (5.41), it follows that k T X 'i?1'T (#^U; ? #^LS ) ) (#^U ;k ? #^LS k U ;k i?1 U;k i=1

keeps bounded. This last result, together with equation (5.43), implies that the right-hand side of equation (5.42) tends to in nity, which contradicts the fact that it is upper bounded by constant c. Therefore, a subsequence f#^k ; k = 0; 1; : : :g such that k#^E;k ? #^LS E;k k  M; 8 k  ^ does not exist and #E;t ! #E . This completes the proof. 2

Proof of Theorem 5.6 Part i)

It is not hard to see that Dt(#) de ned in equation (5.20) is a quadratic function of #, whose

80

Chapter 5. Singularity free adaptive LQG control: stability and optimality

minimizer #t is given by t

t

i=1

i=1

X X #t = [ 'i?1 'Ti?1 + V0?1 + tI]?1 [ yi 'i?1 + (V0?1 + t I)#]:

In order to determine a recursive expression for #^t, we de ne the matrix Qt as Qt := [ so that

t X i=1

'i?1 'Tk?1 + V0?1 + t I]?1;

t

X #t = Qt [ yi 'i?1 + (V0?1 + tI)#]:

(5.44)

i=1 Pt It is easy to show that the term i=1 yi 'i?1 in the right-hand-side of this last equation can

be written as t X i=1

yi 'i?1 = yt 't?1 +

t?1 X i=1

yi 'i?1 = yt 't?1 + Q?t?11#t?1 ? (V0?1 + t?1I)#:

By substituting this last expression and the recursive expression for Q?t 1 given by Q?t 1 = Qt??11 + 't?1'Tt?1 + ( t ? t?1)I (5.45) in equation (5.44), we conclude that #t can be determined as a function of the previous estimate #t?1 in the following way #t = Qtfyt 't?1 + [Q?t 1? 't?1'Tt?1 ? ( t ? t?1)I]#t?1 ? (V0?1+ t?1I)# + (V0?1+ tI)#g = #t?1 + Qt 't?1(yt ? 'Tt?1#t?1) + Qt ( t ? t?1)(# ? #t?1); which is just the recursive expression of #t in equation (5.17.2). The fact that t given by (5.22) can be recursively computed through (5.17.3) and (5.17.4) with the initialization r0 = trace(V0?1 ) given in (5.19) is a matter of a simple veri cation. Finally, the fact that step 1 in the algorithm actually computes the inverse of matrix Q?t 1 given in (5.45) is a simple application of the matrix inversion lemma. This completes the proof of part i). Part ii)

Denote by #^LS t the minimizer of the least squares performance index Vt (#) and set Wt :=

t X i=1

'i?1'Ti?1 + V0?1 :

(5.46)

5.6. Proofs

81

It is then easy to show that #t = arg #2min D (#) can be expressed as a function of #^LS t as Rn m t follows ?1 #t = (Wt + t I)?1 Wt #^LS t + t(Wt + tI) #: By subtracting #, we get  #t ? # = (Wt + tI)?1 Wt (# ? #) + (Wt + tI)?1 Wt (#^LS t ? # ): +

Thus, the norm of #t ? # can be upper bounded as follows

 k#t ? #k  k# ? #k + k(Wt + t I)?1 Wt kkWt (#^LS t ? # )k: 1 2

1 2

(5.47)

 We apply now Theorem 5.1 so as to upper bound the term kWt (#^LS t ? # )k. Since ut is assumed to be Ft-measurable, and also considering Assumption 5.2, by this theorem we obtain the following upper bound: 1 2

 2 kWt (#^LS t ? # )k = O(ln trace(Wt ));

(5.48)

1 2

almost surely. The term k(Wt + t I)?1 Wt k can instead be handled as follows. Denote by f1;t; : : :; n+m;t g the eigenvalues of the positive de nite matrix Wt . Since Wt is symmetric and positive de nite, there exists an Tt such that Wt =  orthonormal matrix  ? 1 ? 1 Tt diag(1;t; : : :; n+m;t )Tt and Wt = Tt diag 1;t; : : :; n+m;t Tt . Then, 1 2

1 2

(Wt + tI)?1 Wt

1 2

1 2

1 2

= Tt(Tt?1 (Wt + tI)Tt )?1Tt?1 Wt Tt Tt?1 1 2

0

1

  A T ?1 : = Tt diag @  1+;t ; : : :;  n+m;t t + 1;t t n+m;t t This implies that

1 2

1 2

0

1

i;t A @ k(Wt + tI)?1 Wt k = i=1max : ;:::;n+m i;t + t 1 2

1 2

(5.49)

x ; x  0: Such a function has an absolute maximum Consider now the function: f(x) = x + t value 21 t? in x = t. It then obviously follows from equation (5.49) that 1 2

1 2

k(Wt + tI)?1 Wt k  12 ?t : 1 2

1 2

(5.50)

82

Chapter 5. Singularity free adaptive LQG control: stability and optimality

Substituting the estimates (5.48) and (5.50) in equation (5.47), we obtain   t) ; k#^t ? #k  k# ? #k + h1 ln trace(W t 1 2

h1 being a suitable constant. Observe now that from the regression-like form yt = 'Tt?1# + nt of system (5.1) it follows that n2i  2 maxfk#k2; 1g[ yi2 + k'i?1k2 ]. Taking into account that the autoregressive part of system (5.1) is not trivial (n > 0), this in turn implies that n2i  2 maxfk#k2 ; 1g[ k'ik2 + k'i?1k2 ], from which it is easily shown that t X i=1

n2i  h2

t+1 X i=1

k'i?1k2 ;

where h2 is a suitable constant. From Assumption 5.2 (point ii)) and de nition (5.46) of Wt , we then get lim trace(Wt ) = 1: t!1 Since by de nition (5.22) t = (lntrace(Wt ))1+ , we then obtain that 8  > 0 there exists a time instant  such that k#^t ? #k  k# ? #k + , 8 t  . By Assumption 5.4, this implies that there exists a nite time instant t such that #t 2 S (#; ), 8t  t. This proves point ii). 2

Proof of Theorem 5.7 Part i)

Since the absolute value of the Sylvester matrix determinant is a continuous function of the system parameter # and it is strictly positive for any # 2 S(#; r) (see Assumption 5.4), we can take c := min jSylv(#)j > 0: #2S (#;)

Point i) then immediately follows from the de nition of #^t in equation (5.17.5). Part ii)

Let us rewrite the performance index Dt(#) as a function of the least squares estimate #^LS t : T Dt(#) = (# ? #^LS t ) [

t X i=1

2 LS 'i?1'Ti?1 + V0?1 ](# ? #^LS t ) + tk# ? #k + Vt (#^t ):

From the de nition of #t, it follows that t

T ?1 LS 2 TX (#t ? #^LS t ) [ 'i?1'i?1 + V0 ](#t ? #^t ) + tk#t ? #k i=1

5.6. Proofs

83 t X

?1  LS T  2 T  (# ? #^LS t ) [ 'i?1'i?1 + V0 ](# ? #^t ) + tk# ? #k i=1

= O( t ); (5.51) almost surely, where the last equality is a consequence of equation (5.8) and of the boundedness of # . Consider now the equation (# ? #t)T [ (

t X i=1

'i?1'Ti?1 + V0?1](# ? #t)

T  2 (# ? #^LS t ) [ T + (#^LS t ? #t ) [

t X i=1

t X i=1

'i?1'Ti?1 + V0?1 ](# ? #^LS t ) )

'i?1 'Ti?1 + V0?1 ](#^LS t ? #t) :

Since in view of equation (5.51) both terms in the right-hand-side are almost surely O( t), we get t X (# ? #t)T [ 'i?1'Ti?1 + V0?1 ](# ? #t) = O( t) i=1

almost surely. Since #^t = #t, 8t  t (point ii) in Theorem 5.6) and also recalling de nition (5.22) of t, point ii) immediately follows. 2

5.6.2 Proofs of Section 5.4

Proof of Lemma 5.1

We start by proving that T(#) := inf f 2 Z+ : kF(#) k  g is uniformly bounded in the compact set Ap := f# 2 0 such that #t belongs to the compact set Ap = f# 2 0, the following inequality holds for the state vector xt = [yt : : :yt?(n?1) ut?1 : : :ut?(m?1)]T 8 t 0 such that max k't k2 > ci ; 8i  0: 0ti Letting tk = arg max0ti k't k2, we get a sequence ftk gk0, such that k'tk k2 > ci  ctk , which contradicts condition k'tk = o(t) in (5.67). 2

5.6.3 Proofs of Section 5.5 Proof of Theorem 5.10

Observe rst that the technical Lemmas 5.1 and 5.2 still remain valid in the presence of the dither noise vt , since they are based on the properties of the penalized least squares estimate. Then this proof is completely similar to that of Theorem 5.8. The only di erence is given by the term vt in the control law, which can be treated as the noise term nt, but, di erently from nt, has a vanishing pathwise square average (see equation (5.32)). As a matter of fact, the time evolution of the state vector xt = [yt : : :yt?(n?1) ut?1 : : :ut?(m?1)]T can be described through the equation xt+1 = F (#t) xt + Cnt+1 + B(#t )vt (5.79) = F(#t) xt + C[et + nt+1 ] + B(#t )vt ; where B(#) = [b1 0 : : :0 1 0 : : :0]T is uniformly bounded in time. Therefore, by following the same steps as in Theorem 5.8, we get ?1 N NX ?1 NX ?1 X 1 NX 2 1 2  O(1) + o( 1 2) + k 1 2 k ' k k x k k ' k t t t N t=0 N t=0 N t=0 N t=0 vt ;

5.6. Proofs

95

k being P a suitable constant. Taking into account equation (5.32), this last inequality implies 2 that N1 Nt=0?1 k't k2 remains bounded. Then, the thesis immediately follows.

Proof of Theorem 5.12

Part i) From Theorem 5.10 and property (5.32), it follows that ?1 N ?1 N ?1 1 NX e )2  2[ 1 X (ut)2 + 1 X v2 ] = O(1): (u t N t=0 N t=0 N t=0 t

Therefore, Condition 3. in Theorem 5.11 is satis ed. Being the other two conditions 1. and 2. veri ed (see Assumptions 5.1 and 5.2), we have that min (

t X i=1

'i?1'Ti?1 )  c t1?(s+n)2 ; 8t  t;

where  2 (0; 4(s1+n) ) is the constant given in (5.31). P As for the growing rate of max ( ti=1 'i 'Ti ), it is easily obtained by applying Theorem 5.10 that t t X X max ( 'i?1 'Ti?1) = O( k'i?1k2) = O(t); i=1

i=1

almost surely. Then, the consistency and the convergence rate of #t de ned in (5.25) are consequences of the properties of #^t given either by (5.15) or (5.17.5) and respectively stated in Theorems 5.3 and 5.7. Part ii)

We preliminarily introduce the ideal system (i.e. the closed-loop system where the true system is controlled by the optimal LQG control law):  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 : (5.80) (# ; q?1) ut = (# ; q?1) yt Then, by setting

(#) = [0(#) : : :n?1(#) 1 (#) : : :m?1 (#)]T (5.81) and xt = [yt : : :yt?(n?1) ut?1 : : :ut?(m?1)]T ; the real system  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 (5.82) (#t ; q?1) ut = (#t; q?1) yt + vt

96

Chapter 5. Singularity free adaptive LQG control: stability and optimality

can be represented as a variation system with respect to the ideal system as follows  A(# ; q?1) yt+1 = B(# ; q?1) ut + nt+1 (5.83) (# ; q?1) ut = (# ; q?1) yt + xTt ( (#t ) ? (# )) + vt : p p Let wt := [ yt r ut?1 ]T and wt := [ yt r ut?1 ]T . We now prove that N 1X (5.84) (kwtk2 ? kwt k2 ) = 0: lim N !1 N t=1

Indeed, equation (5.84) implies the optimality result ii). In fact, we have ?1 NX ?1 1 NX 2 + ru2 ] ? 1 2 2 lim f [ y t N !1 N t=0 t N t=0 [ (yt ) + r(ut ) ]g N 1X 2 ? kw k2 ) ? 1 (y2 ? (y )2 )g = Nlim f w k ( k t t N !1 N t=1 N N where N1 (yN2 ? (yN )2 ) tends to zero in view of the following argument. Set et = 'Tt (# ? #t). For any time instant t > 0, the following inequality can be derived by referring to equation (5.79) and by following the same procedure as that used in Theorem 5.8 for proving equation (5.66): 8 t g = q(M; ); (6.10) =1

2?

6.3. Randomized controller selection

113

which clearly implies

P=tM f : supN jc^t ( ) ? ct ( )j > g  q(M; ):

2f j gj=1

(6.11)

Putting together these two results we have that the following holds with a product probability QNt  P=tM not less than 1 ? [(1 ? )N + q(M; )]

Qtf : ct( ) < ct(arg 2fmin c^t ( )) ? g

j gNj ct( ))g  Qtf : ct ( ) < ct (arg 2fmin

j gNj  : =1

=1

(using (6:11)) (using (6:9))

Thus, minimizing E^P=t ;M [J(; j )] over f j gNj=1 leads to an approximate minimizer of EP=t ;M [J(; j )] to accuracy  and level with con dence  = (1 ? )N + q(M; ):

(6.12)

If we now use estimate (6.7) in (6.12) we can easily conclude that using &

ln(  ) N(; ) = ln(1 ?2 ) controller parameters and



'

(6.13) 

16 16e 16e M(; ) = 32 (6.14) 2 [ln  + d ln  + d lnln  ] model parameters suces to approximately minimize ct( ) to accuracy  and level with con dence . One important feature of the expression for M(; ) is that it does not depend on N. Therefore, if the number of controllers N grows unbounded, the number M of model parameters remains bounded. Obviously, this is a consequence of the UCEM assumption. It is interesting to note that in the case we want to resort to a fully randomized algorithm a di erent estimate for M(; ) can be obtained by a direct use of Hoe ding's inequality (see Appendix B). Note rst that in the above derivations we have made use only of equation (6.11), whereas equation (6.10) was just instrumental in the derivation of (6.11). Since in (6.11) supremum is taken over the nite set f j gNj=1, the UCEM property is in fact not really necessary to upper bound the left-hand-side of (6.11). In the case one does not want to use the UCEM property (or, elsewhere, the UCEM property does not hold) he can still resort to more standard inequalities in probability theory. This leads to estimates for M(; ) which

114

Chapter 6. Robust adaptive LQG control

depend on N and tend to in nity as N ! 1. Yet, in real situations where N is nite these estimates may be even tighter than the estimate (6.14), worked out through the UCEM theory. This consideration justi es the study of other bounds for M(; ) as reported below. One way to compute an estimate for M(; ) is to resort to Hoe ding's inequality (see Appendix B). In particular, a straightforward use of Theorem B.1 in the appendix leads to the following estimate:   ) : (6.15) M(; ) = 212 ln 2N(;  As already mentioned, M(; ) in equation (6.15) tends to in nity as the number of controllers N increases unbounded. On the other hand, the rate of growth is very slow as N appears under the sign of logarithm. We are now in a position to state the fully random algorithm for the controller selection. Algorithm 2 Suppose that fJ(; ); 2 ?g has nite P-dimension, say d. Given  > 0, > 0 and  > 0, do the following: l

m



ln( ) 1. extract at random N(; ) = ln(1 ? ) independent controller parameters 1 ; 2; : : :;

N (;) according to probability Qt ; 2

 l

n

2N (;) 16 16e 16e 1 2. extract at random M(; ) = min 32  [ln  + d ln  + d lnln  ] ; 2 ln  independent model parameters #1; #2; : : :; #M (;) according to probability P=t ; 2

mo

2

MX (;) 1 ^ J(#i; j ); 3. for j = 1; 2; : : :; N(; ) compute EP=t ;M ; [J(; j )] := M(; ) (

4. choose t = arg

min

2f j gjN=1(;)

)

i=1

E^P=t ;M ; [J(; j )]. (

)

In the case in which fJ( l ; ); 2 ?gmdoes not have the UCEM property, the algorithm still works with M(; ) = 21 ln 2N (;) . 2 2

The following theorem immediately follows from the discussion preceeding the algorithm

Theorem 6.2

t computed via Algorithm 2 is an approximate minimizer of EP=t [J(; )] to accuracy  and level with con dence . 2

6.4. Application to LQG control

115

6.4 Application to LQG control Consider the following model

A(#; q?1)yt+1 = B(#; q?1 )ut + nt+1 ;

(6.16)

where the polynomials

A(#; q?1) = 1 ?

n X i=1

ai q?i and B(#; q?1) =

m X i=1

bi q?(i?1);

depend on the parameter vector # = [a1 : : :an b1 : : :bm ]T 2  = 0: J 0 = lim sup N1 N !1

(6.17)

t=0

Given a model in class (6.16) (i.e. given a parameterization # 2 ), the optimal LQG control law takes the following form: (  ; z ?1) ut = (  ; z ?1) yt where the polynomials (  ; q?1) = 1 ?

mX ?1 i=1

  iq?i and (  ; q?1) =

nX ?1 i=0

 iq?i ;

with n = maxfn; 1g, depend on vector  = [ 0  1 : : : n ?1   1 : : :  m?1 ], which is a function of #. The expression of  =  (#) is derived in Section 2.3 of Chapter 2. For the sake of self-containedness we brie y recall it. Set xt := [yt yt?1 : : :yt?(n?1) ut?1 : : :ut?(m?1)]T . Then, model (6.16) can be given the following state space representation of order q := n + m ? 1: 

xt+1 = A(#)xt + B(#)ut + Cnt+1 yt = Hxt

(6.18)

116

Chapter 6. Robust adaptive LQG control

where

2

3

a1 : : : an?1 an b2 : : : bm?1 bm 6 1 7 0 ::: 6 7 6 7 . . .. .. 6 7 6 7 6 7 1 0 6 A(#) = 6 0 : : : : : : : : : 0 : : : : : : 0 77 ; 6 7 6 7 1 0 6 7 6 7 . . .. .. 4 5 1 0 3 2 2 1 b1 3 6 0 7 6 0 7 7 6 6 7 6 .. 7 6 .. 7 7 6 . 6 . 7 7 6 6 7 6 0 7 6 0 7   6 B(#) = 66 1 77 ; C = 66 0 777 ; H = 1 0 0       0 ; 6 7 6 . 7 6 0 7 6 .. 7 6 7 7 6 6 . 7 6 . 7 4 .. 5 4 .. 5 0 0 with a1 := 0 if n = 0. If system (6.16) is controllable then model (6.18) is reachable and detectable, and therefore there exists a unique positive semide nite solution P(#) to the discrete time algebraic Riccati equation 



P = A(#)T P ? PB(#)(B(#)T PB(#) + r)?1 B(#)T P A(#) + H T H: The optimal LQG control law minimizing (6.17) is then given by ut =  (#) [ yt yt?1 : : :yt?(n?1) ut?1 : : :ut?(m?1) ]T ; where

 (#) = ?(B(#)T P(#)B(#) + r)?1B(#)T P(#)A(#) is a continuous function of # 2 f# : A(#; q?1)yt+1 = B(#; q?1)ut is controllableg. In the case in which the true system is unknown, it is natural to consider as controller class the set of systems ( ; q?1 ) ut = ( ; q?1 ) yt ; (6.19) T q parameterized in = [0 1 : : :n?11 : : :m?1 ] 2 ? = < . Denote with J 0(#; ) the LQG control performance achieved by applying controller ( ; q?1 ) ut =

6.5. Randomized algorithms for the synthesis of robust adaptive LQG controllers

117

( ; q?1 ) yt to model A(#; q?1)yt+1 = B(#; q?1)ut + nt+1. Under the assumption that A(#) + B(#) is stable, J 0 (#; ) is given by J 0 (#; ) = trace(P(#; )CC T 2 ) where P(#; ) is the solution to the Lyapunov equation P = (A(#) + B(#) )T P(A(#) + B(#) ) + r T + H T H (6.20) (see Proposition 2.3 in Section 2.3 of Chapter 2). Then, the normalized cost criterion used in the robust approach 0 J(#; ) = 1 + J J(#;0 (#; ) ) has the following closed-form expression 8 trace(P(#; )CC T 2 ) ; if A(#) + B(#) is stable > > < T 2 J(#; ) = > 1 + trace(P(#; )CC  ) (6.21) > : 1; otherwise: In the following proposition, we show that J(; ) : 0 (?1)q f(#; ) (?1) > 0

124

Chapter 6. Robust adaptive LQG control

det(Xi (#; ) + Yi (#; )) > 0, det(Xi (#; ) ? Yi (#; )) > 0, i = 1; 2; : : :; q ? 1, where 2

Xi (#; ) :=

6 6 6 6 6 4

3

1 f1 (#; ) : : : fi?1 (#; ) 1 : : : fi?2 (#; ) 77 7 .. 7; . 7 1 f1 (#; ) 5 1 3

2

::: fq (#; ) 6 f (#;

) f q q ?1 (#; ) 7 7 6 7 6 .. Yi (#; ) := 66 7: . 7 4 fq?i+2 (#; ) 5 fq (#; ) fq?1 (#; ) : : : fq?i+1 (#; ) Such 2q conditions can be written in the form i (#; ) > 0; i = 1; 2; : : :; 2q; where the largest degree of polynomials i(#; ) > 0, i = 1; 2; : : :; 2q, as a function of is 2(q ? 1) (the degree of in coecients fi (#; ), i = 1; : : :; q is at most 2). Therefore 2q \ fA(#) + B(#) stableg = Ai ; i=1

with Ai = fi(#; ) > 0g such that degree(i (#; ))  2(q ? 1), i = 1; 2; : : :; 2q. Consider now the second set ftrace(P(#; )CC T 2 )  (1 ?c c) g: Matrix P(#; ) is the unique solution to the Lyapunov equation P = (A(#) + B(#) )T P(A(#) + B(#) ) + r T + H T H;

(6.25)

which can be reformulated as a system of linear equations in the components of P by means of the Kronecker product, [94]. This is explained next. Let V be an h  k matrix and W an k  h matrix. Then, the Kronecker product V W is an hk  hk matrix de ned in block form by 2

V W = 64

v11 W : : : v1k W .. .. . . vh1 W : : : vhk W

3 7 5

:

(6.26)

6.7. Proofs

125

Let vi denote the ith column of matrix V : V = [v1v2 : : :vk ]. Then the hk  1 column vector vec(V ) is de ned as 3 2 v1 6 v2 7 vec(V ) = 664 .. 775 : . vk Moreover, given matrices V , W and Z of appropriate dimensions we have that vec(V WZ) = (Z T V )vec(W): On the basis of such properties, equation (6.25) can be rewritten as follows ?  vec(P) = vec (A(#) + B(#) )T P(A(#) + B(#) ) + vec(r T + H T H) ?



= (A(#) + B(#) )T (A(#) + B(#) )T vec(P) + vec(r T + H T H); from which it follows that ?  vec(P(#; ))T = vec(r T + H T H)T Iq ? (A(#) + B(#) ) (A(#) + B(#) ) ?1 : Letting W(#; ) := Iq ? (A(#) + B(#) ) (A(#) + B(#) ), condition trace(P(#; )CC T 2 ) = p11(#; )2  (1 ?c c) can then be rewritten as follows: c det(W(#; )) ? (1 ? c) 2 vec(r T + H T H)T )[adj(W(#; ))]1  0 where [adj(W)]1 denotes the rst column of the adjugate matrix of W. By recalling that F(#; ) := A(#) + B(#) has the following form 3 2 a1 + b10 : : : an + b1n ?1 b2 + b11 : : : bm + b1m?1 7 6 1 7 6 2

2

F(#; ) =

6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

... 0

:::

0 n?1

1 1

::: ...

m?1 0

7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

126

Chapter 6. Robust adaptive LQG control

and the de nition (6.26) of Kronecker product, one can see that the degree of polynomial det(W(#; )) as a function of = [0 1 : : :n?1 1 : : :m?1 ] is less than or equal to 4q. As a matter of fact, matrix W(#; ) = F(#; ) F(#; ) has 4 rows whose terms have all degree 2 and 4(q ? 2) rows whose terms have at most degree 1. As for polynomial vec(r T + H T H)T [adj(W(#; ))]1 , it has the same degree 4q since all the terms of vector [adj(W(#; ))]1 have at most degree 4q ? 2. Therefore, by setting 2q+1 ((#; c); ) = c det(W(#; )) ? (1 ? c) 2 vec(r T + H T H)T [adj(W(#; ))]1; we have that ftrace(P(#; )CC T 2 )  (1 ?c c) g = A2q+1 with A2q+1 = f2q+1 ((#; c); ) > 0g and degree(2q+1 ((#; c); )) = 4q Finally, the searched set A is given by q A A = \2i=1 i

[

A2q+1

where

Ai = fi ((#; c); ) > 0g  0, one can determine a positive constant k such that Prfsup t kVt k > kg   t

(6.28)

On the other hand, Theorem 6.4 implies the existence of an instant point t1 such that Prf sup kMt ? # k > =2g  : tt1

(6.29)

We shall now compute the probability of the event in which at least one of the parameters #01 ; #02; : : :; #0N ; extracted at point 2. of Algorithm 4 at the generic instant t  t1 is outside ball B(# ; ), conditioned to event (

)

A := fsup t kVt k  kg \ f sup kMt ? #k  =2g: tt1

t

Then, the thesis will be drawn by using this result in conjunction with estimates (6.28) and (6.29). Let Bt := f#01 2= B(# ; ) or : : : or #0N ; 2= B(# ; ) at time tg. Fix a positive de nite matrix V 2 =2 at time t =Mt = M; Vt = V g(: 6.30) Probability Prfk#01 ? Mt k > =2 at time t =Mt = M; Vt = V g can be easily estimated by observing that parameter #01 is extracted at random from a Gaussian probability distribution with mean Mt and variance Vt. Letting i2 , i = 1; 2; : : :; n + m be the eigenvalues of matrix V , we have: Prfk#01 ? Mt k > =2 at time t =Mt = M; Vt = V g Z nY +m 1 exp(? zi2 )dz = 2i2 kzk>=2 i=1 (2)1=2 i Z nY +m 1 exp(? zi2 )dz  2i2 z:9i2f1;:::;n+mg s:t: jzi j>=[2(n+m) = ] i=1 (2)1=2 i nX +m Z 1 exp(?y2 )21=2dy (y := z =(21=2 ))  i i i i i 1=2 i=1 jyi j>=f[8(n+m)] = i g (2) 1 2

1 2

128

Chapter 6. Robust adaptive LQG control



nX +m i=1

2

Z

2(n + m)1=2 i exp(?y2 )dy2 i i yi > =f8(n+m)i g (2)1=2  nX +m 4(n + m)1=2 i exp(? 2 = 8(n + m)i2 ) (2)1=2 i=1 2

2

2

(6.31)

Plugging (6.31) into (6.30), we nally obtain PrfBt=Mt = M; Vt = V g 

nX +m 4(n + m)1=2i exp(? 2 1 = 2 (2)  8(n + m)i2 ): i=1

(6.32)

We are now in a position to bound the probability of event Bt given A. Since in set A we have kVtk  k=t - and, thereby, all eigenvalues of matrix Vt are bounded by k=t - from (6.32) we can conclude that there exists a deterministic constant c (depending on ) such that (6.33) PrfBt=Ag  tc2 ; for all t  t1 . The thesis can now be easily proven by using (6.33) along with (6.28) and (6.29). Since, 1 X t=0

Pr(Bt \ A) 

1 X

Pr(Bt =A) X  t1 + tc2 tt < 1; t=0

1

by Borel-Cantelly lemma, [77], we have (i.o.=in nitely often) Pr(Bt \ A i:o:) = 0: On the other hand, by (6.28) and (6.29), Pr(A) > 1 ? 2 and so, owing to the arbitrariness of , Pr(Bt i:o:) = 0: This proves that with probability 1 one can determine a t such that all model parameters #01; : : :; #0N ; selected at point 2. of Algorithm 4 are in the ball B(# ; ), 8t  t. In view of the discussion at the beginning of this proof, this proves the theorem. 2 (

)

6.8. A simulation example

129

6.8 A simulation example In this section, we show by a simulation example the performance achievable when applying a standard certainty equivalence and a robust LQG adaptive controller in an ideal control setting, i.e. in the case when the true system belongs to the model class. This is in order to point out the e ectiveness of the proposed approach to adaptive control and to get insight into such a strategy. Consider the system yt+1 = 0:8yt + ut ? 0:9ut?1 + nt+1 (6.34) where the noise process fntgt1 is a sequence of i.i.d. normally distributed random variables with E[nt] = 0 and E[n2t ] = 1. Suppose that the control objective is to minimize the in nite-horizon LQG performance index NX ?1 [yt2 + u2t ]: (6.35) lim sup N1 N !1

t=0

If the system were known, the solution to the LQG control problem would be given by the time-invariant control law ut = ?0:34yt + 0:38ut?1 (6.36) and the optimal performance index value would be equal to 2; 13. The controlled system P? 1[y2 + u2]) when behavior (input, output and \sample LQG performance index", i.e. 1t it=0 i i system (6.34) is controlled by the optimal control law (6.36) is represented in Figures 6.16.3. Suppose now that the value of the system parameter vector # = [0:8 1 ? 0:9]T is not available, and system (6.34) is known to belong to the model class 



yt+1 = a1yt + b1 ut + b2ut?1 + nt+1; # = [a1 b1 b2]T 2 > < t t t t?1 t?1 0 : J(#; ) = > 1 + J (#; ) > : 1; otherwise In the sequel, we set = 0:25. Certainty equivalent LQG adaptive control algorithm For t = 0; : : :; 500, do the following: 1. compute the parameter estimate #t = Mt through the Kalman lter; 2. determine the controller parameter t =  (#t); 3. apply control ut = t [yt ut?1]T .

2

Robust LQG adaptive control algorithm (fully randomized algorithm - Subsection 6.3.2) Set  = =  = 0; 1. For t = 0; : : :; 500, do the following: 1. compute Mt and Vt through the Kalman lter; 2. extract at random N(; ) = 29 independent model parameters #01; #02; : : :; #029 according to probability P=t  N (Mt ; Vt). Then set j =  (#j ), j = 1; 2; : : :; 29; 3. extract at random M(; ) = 319 independent model parameters #1; #2; : : :; #319 according to probability P=t  N (Mt ; Vt ); 319 1 X J(#i; j ); 4. for j = 1; 2; : : :; 29 compute c^t ( j ) := 319

5. choose t = arg min ^ct( i );

2f j g29 j=1

i=1

6.8. A simulation example

131

6. apply control ut = t [yt ut?1]T . 2 We also introduce a robust LQG control algorithm with dither noise. Robust LQG adaptive control algorithm with dither noise Set  = =  = 0; 1. For t = 0; : : :; 500, do steps 1.-5. of the robust adaptive control algorithm described above and then 6. apply control ut = t [yt ut?1]T + (t+1)1 = dt, 1 15

p

p

where fdtg is a sequence of i.i.d. random variables uniformly distributed in [? 3; + 3] (this choice implies E[d2t ] = 1), independent of fntg. 2 Figures 6.4-6.6 represent the rst 100 instant points of the control system behavior when the certainty equivalent LQG control algorithm is applied. These pictures show that the input and output variables are highly uctuating in the transient phase. Such an undesiderable phenomenon is due to the fact that the certainty equivalent controller puts an in nite trust in the currently most probable model. When the parameter uncertainty is large, i.e. in the transient phase, such a system may not adequately describe the true system, and thus applying the control input optimal for it may cause a large deterioration in the true system behavior. In order to better clarify this fact, in Figure 6.7 we have represented the optimal LQG performance index as a function of the estimate #t. We note that around t = 8, the estimated system has a large optimal LQG performance index. As a matter of fact, #8 = [1:11 1:04 ? 1:19]T corresponds to a model dicult to control (its transfer function presents a zero in 1.14 and a pole in 1.11) for which a high value of the controller parameter

 (#8 ) = [6:83 ? 7:34] is needed to achieve optimality. Figures 6.8-6.10 show that the robust adaptive controller is able to overcome the above described diculty and it signi cantly enhances the transient behavior with respect to the certainty equivalence controller (compares Figures 6.6 and 6.10). This is due to the fact that, by the very philosophy of minimizing the average cost function ct ( ), one selects a cautious controller with the objective of obtaining an acceptable performance for most of the models. This is in contrast with what happens in the certainty equivalent approach where, at time t = 8, the controller parameters are selected to be large despite the fact that this leads to poor performance for the majority of models in the P=t distribution (including the true system!). This observation is con rmed by analyzing the behavior of J 0 (Mt ;  (Mt )) for the robust control system (Figure 6.11). At time t = 10, the system corresponding to M10 turns out to be dicult to control and the associated optimal controller (  (M10 ) = [?15:06 14:59]) has large parameters. Nevertheless, the selected robust controller is 10 = [0:40 ? 0:59], so that the controller avoids to overexcite the system. An important feature of the robust controller is that the system parameter are not consistently estimated (see Figure 6.12). As a consequence the control system performance are

132

Chapter 6. Robust adaptive LQG control

strictly suboptimal even in the long run (compare Figure 6.10 with Figure 6.3). The reason why this happens is that the robust controller has no probing features and, in fact, it tends not to excite the unknown dynamics of the true system. Thus, the controller parameter gets stuck to a value di erent from the optimal one (compare Figure 6.13 with equation (6.36)). In order to overcome this diculty a robust controller with dither noise can be used. If the dither noise is suitably selected, then according to Theorem 6.6 one should enforce optimality. The control system behavior obtained by applying the robust LQG control algorithm with dither noise is displayed in Figures 6.14-6.18. As can be seen, the control system behavior does not deteriorate in the transient phase (as it happens with the certainty equivalent controller) and recover to the optimal performance in the long run. As a nal comment to our robust controller with dither noise, we note that in general in an ideal context even the certainty equivalent controller is able to provide an asymptotically optimal performance (and, in fact, by running in the present example the certainty equivalent controller up to instant t = 500 one can see that it tends to the optimal one). However, this is achieved at the price of exciting the system in the transient phase with rash control action, which, in certain situations, can even destabilize the control system. A much wiser control policy is to robustly secure certain properties (such as stability) of the closed-loop system by applying a cautious controller. Then, excitation is recovered by injecting into the stable loop a suitable dither noise. In this way, excitation is achieved in a controlled way thanks to external signals, rather than from internal signals generated by a self-exciting loop.

6.8. A simulation example

133

Optimal control 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 0

50

100

150

200

250

300

350

400

450

500

400

450

500

Figure 6.1: System input 5 4 3 2 1 0 −1 −2 −3 −4 0

50

100

150

200

250

300

350

Figure 6.2: System output

134

Chapter 6. Robust adaptive LQG control

7

6

5

4

3

2

1

0 0

50

100

150

200

250

300

350

400

450

500

t?1 X Figure 6.3: Sample LQG performance index: 1t [yi2 + u2i ] i=0

6.8. A simulation example

135

Certainty equivalent adaptive control 25

20

15

10

5

0

−5 0

10

20

30

40

50

60

70

80

90

100

80

90

100

Figure 6.4: System input 30

25

20

15

10

5

0

−5

−10 0

10

20

30

40

50

60

70

Figure 6.5: System output

136

Chapter 6. Robust adaptive LQG control 160

140

120

100

80

60

40

20

0 0

10

20

30

40

50

60

70

80

90

100

t?1 X Figure 6.6: Sample LQG performance index: 1t [yi2 + u2i ] i=0 600

500

400

300

200

100

0 0

10

20

30

40

50

60

70

Figure 6.7: J 0 (#t;  (#t))

80

90

100

6.8. A simulation example

137

Robust adaptive control 4

3

2

1

0

−1

−2

−3 0

50

100

150

200

250

300

350

400

450

500

400

450

500

Figure 6.8: System input 8

6

4

2

0

−2

−4

−6 0

50

100

150

200

250

300

350

Figure 6.9: System output

138

Chapter 6. Robust adaptive LQG control 16

14

12

10

8

6

4

2

0 0

50

100

150

200

250

300

350

400

450

500

t?1 X Figure 6.10: Sample LQG performance index: 1t [yi2 + u2i ] i=0 1800 1600 1400 1200 1000 800 600 400 200 0 0

50

100

150

200

250

300

350

Figure 6.11: J 0 (Mt ;  (Mt ))

400

450

500

6.8. A simulation example

139

1.5

1

0.5

0

−0.5

−1

−1.5 0

50

100

150

200

Figure 6.12: Mt = [a1;t b1;t b2;t]T (a1

;t

250

300

350

: solid -, b1;t : dotted

400

450

 and b1

;t

500

: dashed

1

0.5

0

−0.5

−1

−1.5 0

50

100

150

200

250

Figure 6.13: t = [0;t 1;t] (0

;t

300

350

400

450

: solid - and 1;t : dotted

 )

500

??)

140

Chapter 6. Robust adaptive LQG control

Robust adaptive control with dither noise 6

4

2

0

−2

−4

−6 0

50

100

150

200

250

300

350

400

450

500

400

450

500

Figure 6.14: System input 10 8 6 4 2 0 −2 −4 −6 −8 0

50

100

150

200

250

300

350

Figure 6.15: System output

6.8. A simulation example

141

25

20

15

10

5

0 0

50

100

150

200

250

300

350

400

450

500

t?1 X Figure 6.16: Sample LQG performance index: 1t [yi2 + u2i ] i=0

142

Chapter 6. Robust adaptive LQG control 1.5

1

0.5

0

−0.5

−1

−1.5 0

50

100

150

200

Figure 6.17: #t = [a1;t b1;t b2;t]T (a1

;t

250

300

350

: solid -, b1;t : dotted

400

450

 and b2

;t

500

: dashed

1.5

1

0.5

0

−0.5

−1

−1.5

−2 0

50

100

150

200

250

Figure 6.18: t = [0;t 1;t] (0

;t

300

350

400

450

: solid - and 1;t : dotted

)

500

?? )

Chapter 7 Concluding remarks and future research The rst part of the thesis is devoted to the standard certainty equivalence approach to adaptive control. Adaptive LQG control schemes suitable for nonminimum-phase systems are proposed, which ensure stability and optimality of the control system. These results are easily extendible to more general control laws, the only required condition being that the adopted control method is able to stabilize a known controllable system. An additional assumption made throughout this thesis is that the true system is described as an ARX system subject to white noise. This hypothesis is necessary mainly for the applicability of the proposed penalized least squares identi cation methods, whose properties are in fact derived on the basis of the least squares estimate properties. As a consequence of this fact, the extension to the ARMAX system case is not straightforward. On the other hand, inspired by the result obtained for the white noise case, one can conceive to introduce appropriate identi cation algorithms for the colored noise case. In this regard, much work has to be done, but an encouraging starting point is represented by the fact that the extended least squares algorithm satis es closed-loop properties similar to those valid for the least squares algorithm (see e.g. [3]). In the second part of the thesis, we have introduced a new general methodology for adaptive control that combines randomized methods for the minimization of average cost criteria with the updating of the a-posteriori parameter distribution. After introducing the main ideas in a general control set-up, we have mostly concentrated on an ideal setting where the true system is assumed to belong to the model class. Speci c tuning results have been worked out in this context and a simulation example has illustrated the ecacy of the method. We now would like to point out that the proposed approach looks very promising for a more general set-up with respect to the ideal one. In particular, it potentially enables us 143

144

Chapter 7. Concluding remarks and future research

to address all situations in which unmodeled dynamics are allowed. In this connection, the present work should be considered as inspirational and a stimulus for further research. In the case of on-line adaptive control with unmodeled dynamics, one can still use randomized methods for the controller design provided that some extra care is paid in the updating of the P=t distribution. To x ideas, consider a typical situation in real control applications, namely that in which the model class is a set of transfer functions of reduced complexity able to describe at most the low frequency behavior of the true system. In this case, the updating law of the controller coecients should meet the following two di erent objectives: i) incorporating the new information on the system low frequency behavior as it becomes available; and ii) preserving a certain degree of dispersion in those directions describing the high frequency behavior. Many methods can be conceived to obtain these objectives. A simple (and very coarse) one consists in ltering input-output observations with a low-pass lter before using them in the Kalman lter algorithm. The output of the Kalman lter will be a transfer function (GLOW ) describing the low frequency behavior of the system. One can then attach to GLOW an extra transfer function (GHIGH ) whose parameters are not updated so as to describe uncertainty in the high frequency region. The controller selection will nally be based on G = GLOW GHIGH . This line of proceeding is presently under study. Obviously, di erent approaches can be conceived. In any case, it is our personal belief that, in presence of unmodeled dynamics, the proposed approach can gain an advantage over existing adaptive control methods even more signi cantly than in the ideal setting where the simplicity of the control setting makes standard techniques work acceptably well in almost all the cases. An additional potential eld of applicability of randomized methods is the currently emerging area of iterative methods for identi cation and control design, [95], [96], [97], [98], [99]. In these methods, the control performance is progressively improved through a sequence of o -line control-relevant identi cation and controller design stages. Applying randomized methods in this context makes it possible to explicitly take into account model uncertainty in the controller design, whenever uncertainty is described in a probabilistic way, such as in [100]. Moreover, by using randomized methods it is possible to decouple the controller complexity from the model complexity in a very simple way. Thus, the design of controllers of given complexity can be easily accomplished.

Appendix A Technical results Lemma A.1

Consider a sequence of l-dimensional vectors fvtgt1 such that the following assumptions are satis ed: i) fvtgt0 is bounded: kvtk  v, for all t; ii) fvtgt0 is piecewise constant: vt = vti , t 2 [ti; ti+1 ), where fti gi0 is an sequence of increasing intergers such that T := sup(ti+1 ? ti ) < 1. i0

Given a second l-dimensional vector sequence fzt gt1 such that iii)

tX i ?1 t=0

jztT vti jk = o(

it follows that

tX i ?1 t=0

kzt kk ) + O(1); k  2, N X t=0; t62BN

jztT vt jk = o(

N X t=0

kzt kk + N);

where BN is a set of instant points which depends on N, whose cardinality, however, is upper bounded by Tl for any N: jBN j  Tl, 8N. Proof.

Fix a real number  > 0 and a time instant N. Consider the set of instant points in the interval [0; N] where vt changes: t0 ; t1; : : :; ti N , (N ) where i(N) := maxfi : ti  N g. In these instant points we de ne a set of subspaces fSti gii=0 through the following backward recursive procedure: (

145

)

146

Appendix A. Technical results

for i = i(N) + 1, set Si = ; for i = i(N); i(N) ? 1; : : :; 0, set (here and throughout the symbol vt;S stands for the projection of vector vt onto the subspace S) (

Sti =

if kvti;St?i k   Sti ; Sti  spanfvti g; otherwise: +1

+1

(A.1)

+1

For each t 2 [0; N], with the notation i(t) := maxfi : ti  tg , we have k T ? v ? jk + c1 jz T jztT vt jk  c1 jzt;S t;Sti t t;Sti t vt;Sti t j ; t i(t)

( )

( )

( )

(A.2)

where c1 is a suitable constant depending on k. By de nition (A.1), the rst term in the right-hand-side can be upper bounded as follows T ? v ? jk  k kzt kk jzt;S (A.3) t;Sti t t i(t)

( )

To handle the second term, we rst work out a basis in Sti t . For this purpose, consider (St ) (N ) such that subspace S enlarges: S  S , of instant points ftigii=0 the subset fj gjdim ti j ti =1  dim(St ) ti > j . The searched basis is vj j =dim(St )?dim(Sti t )+1 . In view of the boundedness assumption i) and also considering the very de nition of subspaces Sti (equation (A.1)), it is easy to see that vectors fvj g are spread in subspace Sti t in such a way that the angle between each two of them tends to zero only when  ! 0. Consequently, there exists a constant c(), depending on , but independent of N, such that k T term jzt;S ti t vt;Sti t j in the right-hand-side of inequality (A.2) can be bounded as follows ( )

0

0

0

( )

( )

( )

( )

k k k k T jzt;S ti(t) vt;Sti(t) j  v kzt;Sti(t) k  v c()

dimX (St0 ) j =dim(St0 )?dim(Sti(t) )+1

kzt;spanfvj g kk :

(A.4)

By plugging estimates (A.3) and (A.4) in equation (A.2), we obtain

jztT vt jk  c1 k kzt kk + c1 vk c()

dimX (St0 ) j =dim(St0 )?dim(Sti(t) )+1

kzt;spanfvj g kk :

Summing up these relations from time t = 0 to t = N, we nally have dimX (St0 ) N N X X T k 2 k 2 jzt vtj  2 kztk + 2 v c() kzt;spanfvj g kk : t=0 t=0 t=0 j =dim(St0 )?dim(Sti(t) )+1

N X

(A.5)

147 (St ) fj ; j +1; : : :; j +T ? Introduce now the time-varying set of instant points BN := [dim j =1 1g. Since dim(St )  l, we obviously have jBN j  Tl. Then, 0

0

dimX (St0 )

N X

kzt;spanfvj g kk 

t=0; t62BN j =dim(St0 )?dim(Sti(t) )+1 dimX (St0 ) X j ?1

 1k

t=0

j =1

dimX (St0 ) X j ?1 j =1

t=0

kzt;spanfvj g kk

N X (ztT vj )2  lk [ o( kzt kk ) + O(1)];

(A.6)

t=0

where the last inequality is a consequence of hypothesis iii) and the fact that dim(St )  l,

8N.

0

By using inequality (A.5) and inequality (A.6), we obtain: N X t=0; t62BN

jztT vt jk  c1 k

N X t=0

 c1 k O(

kzt kk + c1vk c() lk [ o(

N X t=0

N X t=0

kzt kk ) + O(1)]

kzt kk + N) + c1vk c() lk [o(

N X t=0

kzt kk + N)];

which nally implies that N X

jztT vt jk

lim sup t=0N; t62BN  c 1 k : N !1 X kzt kk + N t=0

Since  is arbitrarily chosen, the thesis follows.

Proposition A.1 ([3]) Let fdtgt0 be a sequence of i.i.d. random variables such that E[dt] = 0; E[d2t ] = 1; jdtj  K; K > 0: The process fvt gt0 given by vt = (t +dt1) ;  2 (0; 41 )

2

148

Appendix A. Technical results

satis es the following condition t?1 X 1 ? 2 v2 = 1: lim t!1 t1?2 i=0 i

Proof.

Observe that 1 X

"

#

jvi2 ? (i+1)1  j2

1 X

"

jvi2 ? (i+1)1  j2 #

E (i + 1)(1?2 )2 =fvj ; j  i ? 1g = E (i + 1)(1?2 )2 < 1: i=0 i=0 Then, by Chow's Theorem (Theorem 2.7, [3]) we nd that 1 vi2 ? 1  X (i+1) 1?2 (i + 1) i=0 2

2

2

1 v2? ( +1) 2

converges to a nite random variable, since f (ii+1)i ?  ; fvj ; j  igg is a martingale difference sequence (recall that vi is independent of fvj ; j  i ? 1g). Applying Kronecker Lemma (Lemma 2.4, [3]) to this last equation, we see that 1 2

t?1 X 1 1 2 tlim !1 t1?2 i=0 (vi ? (i + 1)2 ) = 0:

(A.7)

On the other hand, from the following chain of equalities and inequalities t Z i+1 dx X t?1 t Z i dx X 1 t1?2 ? 1 ; (t + 1)1?2 ? 1 = Z t+1 dx = X   1+ = 1+ 2 1 ? 2 x2 i=1 1 x2 i=0 (i + 1)2 1 ? 2 1 i=2 i?1 x we get t?1 1 ? 2 X 1 = 1: lim 1 ? 2  t!1 t (i + 1)2 i=0

Then the thesis follows from equation (A.7).

2

Theorem A.1 ([3], Theorem 2.8) Let fXt; Ftg be a matrix martingale di erence sequence and fMt ; Ftg an adapted sequence of random matrices kMt k < 1 almost surely, 8t  0. If sup E[kXt+1 k =Ft] < 1 t

149 almost surely for some 2 (0; 2], then as t ! 1 t X i=0



Mi Xi+1 = O st ( ) ln + (s t ( ) + e)

almost surely 8 > 0, where st ( ) =

1

t X i=0

kMi k

! 1



:

2

150

Appendix A. Technical results

Appendix B Uniform convergence of empirical means and the Pollard-dimension In this appendix, we give a formal presentation of the notion of uniform convergence of empirical means and discuss a condition of general applicability for this property to hold. All the presented material is by now classical in the eld of statistical learning theory, [101], except that Theorem B.5 which has been recently proven in [87] and [88]. Moreover, an account of all these results can also be found in [91]. Consider a set X with a -algebra X and a probability P on (X; X ). Moreover, let f be a measurable function from X to [0; 1]. Given a sample x := (x1 ; x2; : : :; P xM ) of points ^ independently extracted from X in accordance with P , let EP;M [f] := 1=M M i=1 f(xi ) be the corresponding sampling estimate of EP [f] based on the multisample x. The application of the law of large numbers permits one to conclude that

P M fx : jE^P;M [f] ? EP [f]j > g ! 0; 8 > 0; as M ! 1 (convergence of the empirical mean to its true value). More importantly for our purposes, it is possible to determine a value of the sample size M such that the estimate E^P;M [f] is within an error of  from the true value EP [f] with a probability 1 ? . Set q(M; ) := P M fx : jE^P;M [f] ? EP [f]j > g: Then we have the following result: Hoe ding's inequality q(M; )  2e?2M

(B.1)

2

2

151

152

Appendix B. Uniform convergence of empirical means and the Pollard-dimension

By making this inequality explicit with respect to M we immediately obtain the following estimate for M(; ) := minimum M such that P M fx : jE^P;M [f] ? EP [f]j > g  :   2 1 M(; )  22 ln  ; where the symbol dz e denotes the smallest integer greater than or equal to z. Consider now a family F of functions from X to [0; 1]. In such a case, we introduce the following fundamental notion of uniform convergence of the empirical means to their true values.

De nition B.1 F has the property of uniform convergence of empirical means (UCEM) to their true values if P M fx : sup jE^P;M [f] ? EP [f]j > g ! 0; 8 > 0; f 2F as M ! 1. 2 In the case when F has nite cardinality, the UCEM property immediately follows from Hoe ding's inequality. In fact: P M fx : sup jE^P;M [f] ? EP [f]j > g  jFjP M fx : jE^P;M [f] ? EP [f]j > g  jFj2e?2M ; f 2F (B.2) and the last expression tends to 0; as M ! 1. Moreover, using (B.2), one readily obtains the sample estimate given in the following theorem. 2

Theorem B.1

Given a family F of functions from X to [0; 1] of nite cardinality, we have   M(; )  212 ln 2jFj  ; where M(; ) := minimum M such that P M fx : supf 2F jE^P;M [f] ? EP [f]j > g  . 2 In the case in which jFj = 1, this last theorem is useless and in fact the UCEM property may or may not be satis ed. Moreover, verifying whether or not it holds is, in general, a non trivial task. One sucient (and rather mild) condition for the UCEM property to hold is that family F has a nite Pollard-dimension. This is brie y explained below (the interested reader is referred to [91] for a comprehensive presentation of the subject). Let  if z  0 H(z) := 1; 0; if z < 0 be the Heaviside function.

153

De nition B.2 1. A set S = fx1; x2; : : :; xN g  X is P-shattered by F if there exists a real vector c = [c1 ; c2; : : :; cN ]T 2 [0; 1]N such that, for every binary vector e = [e1 ; e2; : : :; eN ]T 2 f0; 1gN there exists a function fe 2 F such that H(fe (xi ) ? ci) = ei ; i = 1; 2; : : :; N: 2. The P-dimension of F is the largest N such that there exists a set of cardinality N that is P-shattered by F . If for any integer N there exists a set of cardinality N that is P-shattered by F , the P-dimension of F is set to in nity. 2 Similarly to (B.1), let

q(M; ) := P M fx : sup jE^P;M [f] ? EP [f]j > g: f 2F

Theorem B.2 If F has nite P-dimension, say d, then it has the UCEM property. Moreover, for every real number  < e=(2 log2 e) and integer M, the estimate

  16e d e?M =32 ln q(M; )  8 16e  

(B.3)

2

2 Interestingly enough, the de nition of the P-dimension is purely geometrical and does not depend on the probability measure P . As a consequence, estimate (B.3) holds uniformly over the set of all probability measures on (X; X ). By making expression (B.3) explicit with respect to M, we have the following result. holds true.

Theorem B.3 If F has nite P-dimension, say d, we have 



8 16e 16e M(; )  32 2 [ln  + d ln  + d lnln  ] where M(; ) := minimum M such that P M fx : supf 2F jE^P;M [f] ? EP [f]j > g  .

2 For the applicability of Theorem B.3, a basic problem arises in connection with the evaluation of the P-dimension d of family F . As a matter of fact, a direct computation of the Pdimension through De nition B.2 is far from being easy even in simple cases. Fortunately,

154

Appendix B. Uniform convergence of empirical means and the Pollard-dimension

some recent results of general applicability ([87], [88]) helps in this task. These results are described next. Consider a family G of indicator functions (this is just a special case of the situation as far considered of a family F of functions from X to [0; 1]). In such a case, the P-dimension of F de ned in De nition B.2 is also called Vapnik Chervonenkis (VC)-dimension of G . We state now two theorems in cascade. The rst one relates the P-dimension of a family F of functions from X to [0; 1] to the VC-dimension of a certain family G of indicator functions that can be determined from F . The second one provides a way to compute the VC-dimension of family G . The joint use of the two theorems permits one to compute the P-dimension of family F . Given a family F of functions from X to [0; 1], let G := fg(x; c) = H(f(x) ? c); x 2 X; c 2 [0; 1]; f 2 Fg be an associated set of indicator functions from X  [0; 1] to f0; 1g. The following simple result is proven in [91], Lemma 10.1.

Theorem B.4 P-dimension(F )=VC-dimension(G ).

2 Some premises and additional assumptions are needed before we state the second theorem. Consider a subset W of