Linear Quadratic Regulator Problem with Positive ... - Semantic Scholar

10 downloads 0 Views 394KB Size Report
Feb 14, 1998 - In many real-life problems, the in uence we have on the system can be used only .... give a complete treatise of solving the nite horizon problem ...
Linear Quadratic Regulator Problem with Positive Controls W.P.M.H. Heemels, S.J.L. van Eijndhovenyand A.A. Stoorvogelz February 14, 1998 Abstract

In this paper, the Linear Quadratic Regulator Problem with a positivity constraint on the admissible control set is addressed. Necessary and sucient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming. The main results are concerned with smoothness of the optimal control and the value function. The maximum principle will be extended to the in nite horizon case. Based on these analytical methods, we propose a numerical algorithm for the computation of the optimal controls for the nite and in nite horizon problem. The numerical methods will be justi ed by convergence properties between the nite and in nite horizon case on one side and discretised optimal controls and the true optimal controls on the other.

1 Introduction In the literature, the Linear Quadratic Regulator Problem has been solved by the use of Riccati equations [1]. In this paper the same problem will be treated with the additional constraint that the control function can only take values in a closed convex polyhedral cone, like the nonnegative orthant in a Euclidean space. More precisely, we will consider the regular case, where the weight on the control in the integrand of the cost function is positive de nite. In many real-life problems, the in uence we have on the system can be used only in one direction. One could for instance think of regulating the temperature of a room: the heating element can only put energy into the room, but it cannot extract energy. In process industry or mechanical systems, the ow of a certain uid or gas is regulated by one-way valves. Other examples arise in the control of electrical networks with diode elements and in economic systems, where quantities like investments and taxes are always nonnegative. A basic issue to be solved before computing or analysing optimal controls is related to existence and uniqueness of optimal controls. The existence part leads to a veri cation whether the set of admissible controls has been chosen properly. In section 2 we shortly discuss this problem. Existence of solutions to the optimal control problem is studied in [16] based on \regular synthesis," a method based on strengthening Pontryagin's necessary conditions for optimality into sucient conditions. The form of the conditions in [16] is only explicit in the case where the integrand in the cost functional is independent of the trajectory x (i.e. Q = 0 in terms of [16] and C > D = 0, C > C = 0 in terms of section 2) below). This special case is hardly studied in the literature and also in our paper these results do not apply. After suitable transformation of our optimal control problem we can rely on a result from functional analysis (cf. Theorem 5 below) to prove existence and uniqueness of optimal controls. Moreover, a characterisation of optimal controls in terms of  Eindhoven University of Technology, Department of Electrical Engineering, Measurement and Control Systems, Section: Measurement and Control, P.O. Box 513, 5600 MB Eindhoven, The Netherlands, tel: +31 40 2473587, email: [email protected], fax: +31 40 2434582 y Eindhoven University of Technology, Department of Mathematics and Computing Science, P.O. Box 513, 5600 MB Eindhoven, tel: +31 40 2472808, email: [email protected], fax: +31 40 2465995 z Eindhoven University of Technology, Department of Mathematics and Computing Science, P.O. Box 513, 5600 MB Eindhoven, tel: +31 40 2472378, email: [email protected], fax: +31 40 2465995

1

inner products is obtained as a side product. For the in nite horizon case the existence of an admissible control is even not clear at all. By admissible control we mean a nonnegative control function that keeps the cost functional nite and the state square integrable. This problem is treated in [8], were necessary and sucient conditions are stated that are easily veri ed. Under the assumption of positive stabilizability of the system, existence and uniqueness of the optimal controls for the in nite horizon case are established. In addition to papers like [16] where only existence of optimal controls is studied, we characterise the optimal controls in various equivalent forms. Moreover, smoothness properties of the optimal control and value function and convergence results between nite and in nite horizon are stated. In [16] no attention is paid to the in nite horizon case and the problem of how to compute or approximate the optimal control in both the nite and in nite horizon problem. All these issues will be considered in the current paper. Mainly, there are two classical approaches in optimal control theory. The rst approach is the maximum principle, initiated by Pontryagin et al. [17]. The original maximum principle has been used and extended by many others [13, 4, 15, 11]. To get a complete treatise of dealing with constrained nite horizon optimal control problems the application of the maximum principle is incorporated. However, for the in nite horizon case a similar result is nontrivial. The convergence results between the nite and in nite horizon problem that will be established, can be exploited to derive, under rather mild conditions, a maximum principle on in nite horizon for the \positive Linear Quadratic Regulator problem." The second approach, dynamic programming, was originally conceived by Bellman [3] as a fruitful numerical method to compute optimal controls in discrete time processes. Later, people realized that the same ideas can be used for optimal control problems in continuous time. For continuous time problems, dynamic programming leads to a partial di erential equation, the socalled Hamilton-Jacobi-Bellman equation, which has the value function among its solutions [18]. Traditionally, one needed assumptions on the smoothness of the value function to apply this theory. However, in many problems, the value function does not behave smoothly, which causes both analytical and numerical problems. Recently, the notion of viscosity solutions has been introduced, which generalizes the concept of a solution of the HJB equation [5]. In this paper, it will be shown that the value function in our problem is continuously di erentiable and satis es the HJB equation in the classical sense. This simpli es dynamic programming in both analytical and numerical aspects and allows us to obtain a direct relationship with the results obtained via the maximum principle. It will become clear that both dynamic programming and the maximum principle state necessary and sucient conditions for optimality. The maximum principle results in a two-point boundary value problem. When this problem is solved, the optimal open-loop control for one speci c initial state has been found. By implementing such an open-loop control, we only use a priori knowledge of the initial condition and there is no adaption possible for disturbances acting on the system, like measurement noise, unmodelled dynamics of the plant, etc. In contrast to the maximum principle, the solution of the HJB equation gives the optimal state-feedback controller for all states. If disturbances are active, deviations occur in the optimal path. The feedback controller uses the current state to determine its best current control value and hence can adapt to such deviations. This is the main advantage of feedback control over open-loop control and explains why dynamic programming is used more often for control than the maximum principle. However, the computation of the optimal controls with the techniques mentioned above are not explicit in the sense that they lead to a suitable form for simple numerical implementation. Hence, the need arises for an easily implementable approximation method. Our aim is not to give a complete treatise of solving the nite horizon problem numerically, but brie y state a possible approximation method based on discrete dynamic programming as initiated by Bellman. However, the analysis of convergence of the approximations to the exact optimal control is crucial and justi es the proposed algorithm. For more details on implementation aspects, we refer to more specialised books like [12]. We discretise our optimization problem in both time and state. Similar techniques can be found, for instance in [11, 12] with some illustrative examples. In later sections, we will see how this method can be used to approximate also the in nite horizon optimal 2

feedback. This method is justi ed by the convergence results between the nite and in nite horizon problems. The organization of the paper is as follows. In section 2 the problem is formulated for the nite horizon case. Section 3 contains a motivating example of a pendulum to show that there is no obvious connection between the unconstrained and constrained Linear Quadratic Regulator problem. In section 4, we recall the concept of projections on closed convex sets in Hilbert spaces. The next section, section 5, contains under a regularity condition the existence and uniqueness of optimal controls with various characterizations of the optimal control. In section 6, a recursive scheme to approximate the optimal control will be described with convergence results of the approximations to the optimal control. The in nite horizon case and its connection with the nite horizon case will be the object of study in section 7. At the end of the paper, we will conclude.

2 Problem Formulation

We consider a linear system with input or control u :[t; T ] ! IRm , state x :[t; T ] ! IRn and output z :[t; T ] ! IRp , given by

x_ (s) = Ax(s) + Bu(s) (1) z (s) = Cx(s) + Du(s); (2) where s denotes the time, A; B; C and D are matrices of appropriate dimensions and T is a xed

end time. We consider inputs in the Lebesgue space of square integrable, measurable functions on [t; T ] taking values in IRm , denoted by L2 [t; T ]m. For every input function u 2 L2 [t; T ]m and initial condition (t; x0 ), i.e. x(t) = x0 , the solution of (1) is an absolutely continuous state trajectory, denoted by xt;x0 ;u . The corresponding output can be written as

Zs CeA(s? ) Bu( )d + Du(s); {z }

A(s?t) zt;x0;u (s) = Ce | {z x}0 + t (Mt;T x0 )(s) |

(3)

(Lt;T u)(s)

for s 2 [t; T ]. Mt;T is a bounded linear operator from IRn to L2[t; T ]p and Lt;T is a bounded linear operator from L2 [t; T ]m to L2 [t; T ]p. The closed convex cone of positive functions in L2 [t; T ]m is de ned by P [t; T ] := fu 2 L2[t; T ]m j u(s) 2 almost everywhere on [t; T ]g; m where the control restraint set equals IRm + := f 2 IR j i  0g, the nonnegative orthant in m 1 IR . First, we consider the nite horizon case (T < 1). The in nite horizon version (T = 1) will be postponed to section 7. Problem 1 The objective is to determine for every initial condition (t; x0 ) 2 [0; T ]  IRn a control input u 2 P [t; T ], an optimal control, such that J (t; x0 ; u) := kzt;x0;u k22 = kMt;T x0 + Lt;T uk22

ZT > (s)zt;x ;u (s)ds zt;x 0 0 ;u t ZT = [x>t;x0 ;u (s)C > Cxt;x0 ;u (s) + x>t;x0 ;u (s)C > Du(s) + u> (s)D> Du(s)]ds (4)

=

is minimal.

t

1 In fact, we can take the control restraint set to be any arbitrary convex polyhedral cone

~ 2 IRm~ containing ~ B; ~ C; ~ D~ ) = (A; BF; C; DF ) translates 0. Parametrising ~ as ~ = F , where F is an m~  m matrix and taking (A; the problem into an optimization problem with positive controls (as in [16]).

3

Along with this problem formulation, there are several questions to be answered. Does there exists an optimal control? And moreover, if it exists, is it unique and how is it characterised? And last but not least: can the optimal control be explicitly computed or is there a numerical method that approximates the exact optimal control? In the latter case, the method is acceptable if there are corresponding convergence results that justify the use of the method. The answering of these questions is the main goal of this paper. Note that the only question considered in [16] is the rst one. Of course, this question is treated there for a more general class of optimal control problems. In fact, this is a minimum-norm problem over a closed convex cone. The control functions minimizing the criterion in (4) for given initial condition and horizon are called optimal controls. The optimal value of J for all considered initial conditions is described by the value function.

De nition 2 (Value function) The nvalue function V is a function from [0; T ]  IRn to IR and is de ned for every (t; x0 ) 2 [0; T ]  IR by V (t; x0 ) := u2inf J (t; x0 ; u) P [t;T ]

(5)

As announced in the introduction, we will restrict ourselves to the regular problem. By this we mean that the matrix D is injective, i.e. D has full column rank. To motivate the regularity assumption, we consider a couple of alternatives. In the singular case, the optimal controls may not exist in an L2 -setting. Look at the following simple example.

Example 3 Consider the system RT

x_ (s) = u(s) x2 (s)ds and initial condition x(0) = ?1. The optimal control will be the Dirac

with criterion 0 pulse (t), which results in an initial jump to 0 at time instant 0. The optimal costs are 0 in this case. However, the Dirac pulse is no L2 -function. In case we put a weight on the control, the Dirac pulse leads to an in nite criterion.

To circumvent the above problem without imposing weights on the control in the cost function, traditionally one restricted the controls to take values in compact sets. This resulting optimal controllers are nonsmooth \bang-bang" controllers attaining only the saturation borders of the restraint set.

Example 4 Looking at the optimal control problem above with restraint set := [0; 1] it is obvious that u(s) = 1, s 2 [0; 1) and u(s) = 0, s  1 is the optimal control. Note that the optimal control is discontinuous.

We are using the weighting of the control to guarantee boundedness and moreover obtain smoothness: the optimal control and the value function behave rather smoothly, as we will see. Physical implementation of bounded smooth controllers is preferred in most cases. Saturation characteristics and rate limiters often obscure the physical implementation of impulsive, bangbang or nonsmooth (discontinuous) controls.

3 Motivating Example A rst thought may be that there exists a simple connection between the optimal static feedback corresponding to the unconstrained optimal control problem ( = IRm ). It is well-known that the optimal feedback in the unconstrained case arises from the Algebraic Riccati Equation [1]. To show that such a simple connection is not easily established, we consider the pendulum as in gure 1. On a pendulum the gravitation f and the control force u act as a vertical force and horizontal force, respectively. The angle the pendulum makes with the vertical dashed line, is called . Since it is only allowed to push against the pendulum from one side, the control is restricted to be 4

      



   

u

 

y 

- 

?

f

Figure 1: Pendulum. nonnegative. After proper scaling of the time and linearising around the equilibrium  = _ = u = 0 we obtain the equations x_ 1 = x2 x_ 2 = ?x1 ? u; where x1 is the rst order approximation of  and x2 the rst order approximation of _. Our objective is to minimize

Z1 0

fx21 (s) + x22 + u2 (s)gds:

(6)

with initial state (1; 0)> on time instant t = 0. Without the positivity constraint the optimal solution would be given by static state-feedback u(x) = Kx. In case of this single input system a good guess of the optimal positive control could be ug (x) = g max(0; Kx) for some suitable chosen gain g  0. Especially, u1 seems a good candidate. For di erent values of g, we computed the expression (6) by simulating the system with corresponding feedback ug for initial state(1; 0)>. The results are plotted in gure 2. Indeed, we observe u1 is not optimal, because u1:13 performs better. In section 7) an approximation of the optimal state feedback is computed. The cost function for this controller equals 5:108, the level indicated by the dashed line in the picture. Our observation is that this controller is even better and indicates that for this example no trivial connection exists between the constrained and unconstrained problem.

4 Projections on Closed Convex Sets As mentioned before, the problem we consider is a minimum norm problem over a closed convex set. This motivates the introduction of a generalization of orthogonal projections on closed subspaces in Hilbert spaces. Consider the following fundamental theorem in Hilbert space theory concerning the minimum distance to a closed convex set. For a proof, see chapter 3 in [14]. Theorem 5 (Minimum Distance to a Convex Set) Let x be a vector in a Hilbert space H with inner product ( j ) and let K be a closed convex subset of H . Then there is exactly one k0 2 K such that kx ? k0 k  kx ? kk for all k 2 K . Furthermore, a necessary and sucient condition that k0 is the unique minimizing vector is that (x ? k0 j k ? k0 )  0 for all k 2 K . 5

5.17

5.16

cost function

5.15

5.14

5.13

5.12

5.11

5.1 0.9

0.95

1

1.05

1.1 values of g

1.15

1.2

1.25

1.3

Figure 2: Cost function for various values of the gain g.

De nition 6 Let K be a closed convex set of the Hilbert space H . We introduce the projection

PK onto K as the operator that assigns to each vector x in H , the vector contained in K that is closest to x in the norm induced by the inner product. Formally, PK x = k0 2 K () kx ? k0 k  kx ? kk 8k 2 K

(7)

for x 2 H .

This concept can be found for instance in [9]. Theorem 5 justi es this de nition and gives an equivalent characterisation of PK :

PK x = k0 2 K () (x ? k0 j k ? k0 )  0 8k 2 K: (8) A set K in a linear vector space is a cone, if x 2 K implies x 2 K for all  0. In case K is such a closed convex cone, we can take k = 21 k0 and k = 2k0 in the above equation to observe that (x ? k0 j k0 ) = 0 and hence, (x ? k0 j k)  0 8k 2 K: (9) To make the paper self-contained, the following two lemmas describing further properties that can be found also in [9] are included.

Lemma 7 (Continuity of PK ) PK is globally Lipschitz continuous. In particular, for all x; y 2 kPK x ? PK yk  kx ? yk (10) Proof. Using the characterisation (8) for x with k = PK y we arrive at (x ? PK x j PK y ? PK x)  0: Changing the roles of x and y we get (y ? PK y j PK x ? PK y)  0: Adding both inequalities gives (x ? y + PK y ? PK x j PK y ? PK x)  0, which leads to kPK x ? PK yk2  (x ? y j PK x ? PK y): Applying the Cauchy-Schwarz inequality to the right side and dividing by kPK x ? PK yk completes H it holds that

2

the proof.

6

Lemma 8 (Positive-homogeneity of PK ) Besides K being closed and convex, assume it is a cone. Then PK ( x) = PK x 8  0; x 2 X: Proof. This follows immediately from (8). 2

5 Optimal Controls In this section, we answer the rst two questions raised in Section 2, just after the problem formulation.

5.1 Existence and Uniqueness

As mentioned before, to establish existence of optimal controls the results from [16] cannot be used, because the condition formulated there are only explicit in case of C > C = 0, C > D = 0. As discussed in Section 2, a standing assumption in the remainder of the paper will be the full column rankness (or injectivity) of D. In this regular case, Lt;T , as de ned in section 2, has a bounded left inverse from L2 [t; T ]p to L2 [t; T ]m. An operator L~t;T is a bounded left inverse of Lt;T , if Lt;T u = z implies L~t;T z = u. By manipulating (1-2) it follows that one of the bounded left inverses of Lt;T is described by the following state-space representation.

x_ (s) = (A ? B (D> D)?1 D> C )x(s) + B (D> D)?1 D> z (s); u(s) = (D> D)?1 D> fz (s) ? Cx(s)g:

x(t) = 0

We denote this particular left inverse by L~t;T . Using this bounded left inverse, we can reformulate our problem as the minimization of kv ? Mt;T x0 k over v 2 Lt;T (P [t; T ]). However, this is exactly the setting of Theorem 5 with K = Lt;T (P [t; T ]) and k0 = Lt;T (u? ). The linearity of Lt;T shows that Lt;T (P [t; T ]) is a convex cone. Moreover, since K is the inverse image of a closed set under L~t;T , it is closed as well. Hence, we proved Theorem 9 (Existence and Uniqueness of the Optimal Control) Let t; T 2 IR with t < T and x0 2 IRn . There is a unique control ut;T;x0 2 P [t; T ] such that kMt;T x0 + Lt;T ut;T;x0 k2  kMt;T x0 + Lt;T uk2 for all u 2 P [t; T ]. A necessary and sucient condition for u? 2 P [t; T ] to be the unique minimizing control is that (Mt;T x0 + Lt;T u? j Lt;T u ? Lt;T u?)  0 (11) for all u 2 P [t; T ]. The optimal control, the optimal trajectory and the optimal output with initial conditions (t; x0 ) and nal time T will be denoted by ut;T;x0 , xt;T;x0 and zt;T;x0 , respectively. Considering the proof above, it is not hard to see that in terms of projections, we can write (12) ut;T;x0 = L~t;T P (?Mt;T x0 ); where P is the projection on the closed convex cone Lt;T (P [t; T ]) in the Hilbert space L2 [t; T ]p. From (12) and the positive-homogeneity of P , it is clear that for  0 (13) ut;T; x0 = ut;T;x0 and V (t; x0 ) = 2 V (t; x0 ): (14) 7

5.2 Maximum Principle

In spite of the fact that the results in this subsection are classical (see e.g. [17]) they are stated here, because in section 7 this theory is exploited to derive a maximum principle for the in nite horizon case and convergence results between nite and in nite horizon optimal controls. Moreover, in a complete overview of available techniques to tackle constrained optimal control problems, the maximum principle cannot be absent. Consider the Hamiltonian H (x; ; ) = x> A>  + > B >  + x> C > Cx + 2x> C > D + > D> D for (x; ; ) 2 IRn  IRm  IRn . The adjoint or costate equation and the corresponding terminal conditions read (15) _ = ? @H (x; u; ) = ?A>  ? 2C > Cx ? 2C > Du = ?A>  ? 2C > z ; (T ) = 0;

@x where z = Cx + Du and the column vector valued function @H @x denotes the partial derivative of H with respect to x. Pontryagin's maximum principle states that the optimal control ut;T;x0 , satis es for all s 2 [t; T ] (16) H (xt;T;x0 (s); ; t;T;x0 (s)); ut;T;x0 (s) 2 arg min 2IRm +

where t;T;x0 is the solution to the adjoint equation (15) with z equal to the optimal output zt;T;x0 . We introduce kxkD> D := x> D> Dx as the norm induced by the inner product (x j y)D> D = x> D> Dy for x; y 2 IRn in the Hilbert space IRn . Furthermore, in this Hilbert P denotes the projection on := IRm +. Lemma 10 Let the initial time t, the nal time T > 0 and initial state x0 2 IRn be xed. The optimal control ut;T;x0 satis es (17) ut;T;x0 (s) = P (? 21 (D> D)?1 fB > t;T;x0 (s) + 2D>Cxt;T;x0 (s)g) for all s 2 [t; T ], where the functions xt;T;x0 and t;T;x0 are given by

x(t) = x0 x_ = Ax + But;T;x0 _t;T;x0 = ?A> t;T;x0 ? 2C > Cxt;T;x0 ? 2C > Dut;T;x0 (T ) = 0

(18)

Moreover, the optimal control is continuous in time. Proof. This lemma is a straightforward application of the maximum principle. By (16), ut;T;x0 (s) is the unique pointwise minimizer of the costs > D> D + > g(s) = k + 21 (D> D)?1 g(s)k2D> D ? 14 g> (s)(D> D)?1 g(s); taken over all  2 , where g(s) := B > t;T;x0 (s) + 2D> Cxt;T;x0 (s). According to Lemma 7 the projection P is continuous. Since xt;T;x0 and t;T;x0 are (absolutely) continuous in time, the continuity of ut;T;x0 in time follows.

2

Substituting (17) in (18) gives a two-point boundary value problem. Its solutions provide us with a set of candidates containing the optimal control, because the maximum principle is a necessary condition for optimality. In case D> D = 12 I with I the identity matrix, the expression P (? 21 (D> D)?1 fB > (s) + > 2D Cx(s)g) simpli es to maxf0; ?B >(s) ? 2D> Cx(s)g; where \max" for vectors means taking the maximum componentwise. 8

5.3 Dynamic Programming

In dynamic programming the value function satis es the Hamilton-Jacobi-Bellman equation. The diculty in using this partial di erential equation is often that the value function is not smooth and classical results do not apply. Recently, an extended solution concept is used called \viscosity solution" of the HJB-equation [5]. We show in this subsection that this complicated solution concept is not required for the LQ-problem with positive controls: the value function is continuously di erentiable and a classical solution to the HJB equation. This simpli es both analytical and numerical approaches of using dynamic programming for the solving of the problem at hand. We start with a short overview of the technique of dynamic programming. To do so, we introduce the function L (called \Langrangian") for (x; ) 2 IRn  by L(x; ) = z > z = x> C > Cx + 2x> C > D + > D> D: The HJB equation is given for x 2 IRn by Wt (t; x) + H~ (x; Wx (t; x)) = 0; (19) where H~ is given by H~ (x; p) = inf fp> Ax + p> B + L(x; )g (20) 2

for (x; p) 2 IRn  IRn . W is a function with domain [0; T ]  IRn taking values in IR, where Wt and Wx (considered to be a column vector valued function) denote its partial derivatives with respect to time and state x, respectively (provided they exist). Before continuing we introduce the function space L1 [t; T ]m as the normed space of all bounded Lebesgue measurable functions on [t; T ]. The so-called \veri cation theorem" [5] states: if W is a continuously di erentiable solution of the HJB equation, satisfying the boundary conditions W (T; x) = 0; x 2 IRn , then W (t; x)  V (t; x), where V denotes the value function as introduced in section 2. Moreover, if there exists u? 2 L1 [t; T ]m such that u? (s) = arg min fW (s; x? (s))> Ax? (s) + Wx (s; x? (s))> B + L(x? (s); )g (21) 2 x

for almost all s 2 [t; T ], then u? is optimal for initial data (t; x0 ). In the above expression, x? is the state trajectory corresponding to initial condition (t; x0 ) and input u?. If such a function W is found, (21) is a sucient condition for optimality. By manipulating (20), we get (analogously as in the previous section) 1 (D> D)?1 g(x; p)k2 k  + H~ (x; p) = x> C > Cx + p> Ax ? 41 g>(x; p)(D> D)?1 g(x; p)+ inf D> D (22) 2

2 with g(x; p) = B > p + 2D> Cx. The minimizing  equals P (? 12 (D> D)?1 g(x; p)). We shall now prove the continuously di erentiability of the value function and show that it satis es (19). Note that for the usage of the veri cation theorem as stated above with W = V the continuous di erentiability of V is required. Theorem 11 (Di erentiability of V w.r.t. x) The value function Vn is di erentiable with re-n spect to x and its directional derivative in the point (t; x0 ) 2 [t; T )  IR with increment h 2 IR is given by or, explicitly,

(Vx (t; x0 ) j h) = 2 (zt;T;x0 j Mt;T h)2

ZT

(23)

(24) eA> (s?t) C > zt;T;x0 (s)ds: t zt;T;x0 is the output corresponding to initial condition (t; x0 ) with optimal control ut;T;x0 . Notice that in (23) the left inner product is the Euclidean inner product in IRn and the right inner product is the L2 -inner product. Furthermore, the resulting partial derivative is continuous with respect to

Vx (t; x0 ) = 2

both arguments.

9

Proof. The proof is separated in two parts, giving upper and lower bounds of the expression

V (t; x0 + h) ? V (t; x0 ) ? 2(zt;T;x0 j Mt;T h)2 , respectively. (i) We have V (t; x0 + h)  J (t; x0 + h; ut;T;x0 ) = (z j z )2 with z = Mt;T (x0 + h)+ Lt;T (ut;T;x0 ) = Mt;T h + zt;T;x0 . Writing out the above inequality

j zt;T;x0 )}2 ?2(zt;T;x0 j Mt;T h)2  kMt;T hk22  kMt;T k2 khk2 V (t; x0 + h) ? (|zt;T;x0 {z

(25)

=V (t;x0 )

(ii) Since J is quadratic in u, it can be expanded as

J (t; x; u) = V (t; x) + 2(Lt;T ut;T;x + Mt;T x j Lt;T u ? Lt;T ut;T;x )2 + kLt;T u ? Lt;T ut;T;xk22 : For ease of exposition, we omit the subscripts t and T . For example, we write ux0 instead of ut;T;x0 for the optimal controls and zx0 instead of zt;T;x0 for the corresponding optimal output. It is easy to derive that

V (t; x0 + h) = J (t; x0 + h; ux0 + uh) ?2(Lux0 +h + Mx0 + Mh j Luh + Lux0 ? Lux0 +h )2 ? kLuh + Lux0 ? Lux0 +h k22: and

(26)

J (t; x0 + h; ux0 + uh) = V (t; x0 ) + 2(zx0 j Mh + Luh )2 + V (t; h):

Substituting this in (26) gives

V (t; x0 + h) = V (t; x0 ) + 2(zx0 j Mh) ?2 (Lux0 + Mx0 j Lux0 ? Lux0 +h )2 + R(t; x0 ; h);

(27)

where the function R is given by

R(t; x0 ; h) := ?kLuh + Lux0 ? Lux0 +h k22 + V (t; h) ?2 (Lux0 +h ? Lux0 + Mh j Lux0 + Luh ? Lux0 +h )2 :

(28)

Because of (11), the last inner product in (27) is nonpositive. So, we get

V (t; x0 + h) ? V (t; x0 ) ? 2(zx0 j Mh)2  R(t; x0 ; h):

(29)

We claim that the function R satis es

j R(t; x0 ; h) j  E khk2

(30)

for certain positive constant E . This fact will be veri ed in a lemma in the appendix. Combining (25) and (29) then completes the proof. Note that continuity of the partial derivative follows by some simple calculations. For a formal proof, see [7]. 2 Comparing (24) with the adjoint equation of the Maximum Principle yields a connection between the adjoint variable and the gradient of V :

t;T;x0 (s) = Vx (s; xt;T;x0 (s));

(31)

where t;T;x0 is the solution to the adjoint equation corresponding to ut;T;x0 . For proving the di erentiability of V with respect to t we need an auxiliary result, which we formulate in the next lemma. It is an extension of the mean value theorem for functions of a real variable (see [19]). 10

Lemma 12 Letn f be a function from IRn to IR, which is di erentiable on IRn with gradient fx. Let x0 ; x1 2 IR . Then, there exists a 2 (0; 1) such that for  = x1 + (1 ? )x0 holds that

f (x1 ) ? f (x0 ) = (fx( ) j x1 ? x0 ): Proof. Consider the function h with domain [0; 1] de ned by h : 7! f (x0 + [x1 ? x0 ]): This function is continuous on [0; 1] and di erentiable on (0; 1) with derivative dh( ) = (f (x + [x ? x ]) j x ? x ) x 0 1 0 1 0 d

(32)

Using the mean value theorem for functions of one real variable [19] we arrive at the existence of a 2 (0; 1) such that h(1) ? h(0) = dgd ( ) ( ). Translating this result to f proves the lemma. 2

Theorem 13 (Di erentiability of V w.r.t. t) The value function V is di erentiable with respect to t and the partial derivative in the point (t; x0 ) 2 [t; T )  IRn equals > (t)zt;T;x (t) (33) Vt (t; x0 ) = ?(Vx (t; x0 ) j Ax + But;T;x0 (t)) ? zt;T;x 0 0 Continuity of the partial derivatives is concluded after the proof. Proof. Let  be such that 0  t < t +   T and write x = xt;T;x0 for shortness. In the second step of the derivation below, we use Lemma 12, where  () = x0 + (1 ? )x (t + ) for certain 2 (0; 1). V (t + ; x0 ) ? V (t; x0 )  V ( t + ; x 0 ) ? V (t + ; x (t +  )) + V (t + ; x (t +  )) ? V (t; x0 ) =  Z t+   (t + ) ? x0 1 x > (s)zt;T;x (s)ds )?  zt;T;x = ?(Vx (t + ;  ()) j 0 0  t

Because x (t + ) ! x0 ( # 0) it follows that  () ! x0 ( # 0). The continuity of Vx implies

now that

V (t + ; x0 ) ? V (t; x0 ) ?! ?(V (t; x ) j x_  (t)) ? z > (t)z x 0 t;T;x0 t;T;x0 (t)  if  # 0. Since ut;T;x0 is continuous, x is continuously di erentiable in time and x_  (t) equals Ax + But;T;x0 (t). We proved the right di erentiability of V w.r.t. t. The left di erentiability can be deduced almost analogously and gives the same derivative. With this last remark the proof is completed.

2

By using (31) and the maximum principle, which states that the ut;T;x0 minimizes the Hamiltonian, one can recognize the Hamilton-Jacobi-Bellman equation in (33). In other words, the value function satis es the HJB equation in classical sense. Since Vx and H~ are continuous in its arguments, the right-hand side of (19) is continuous and thus also the left-hand side. Hence, Vt is continuous in its arguments. Since it is clear that V (T; x) = 0, x 2 IRn , veri cation theorem states that (34) ut;T;x0 (s) = P (? 21 (D> D)?1 [B > Vx (s; xt;T;x0 (s)) + 2D> Cxt;T;x0 (s)]) 11

This time-varying optimal feedback can be obtained also by combining (31) and (17). Actually, this yields (34) without using the veri cation theorem. Moreover, we can even conclude that both dynamic programming and the maximum principle are necessary and sucient conditions for optimality. This follows from the link (31) between the two methods. Necessity of the maximum principle in terms of Lemma 10 translates via (31) into necessity of dynamic programming in terms of (34). Vice versa, the suciency of dynamic programming leads to suciency of the maximum principle along the same lines.

Remark 14 The results obtained so far hold also for time-varying systems of the form x_ (s) = A(s)x(s) + B (s)u(s) z (s) = C (s)x(s) + D(s)u(s) with control restraint set being an arbitrary closed convex set containing the origin. However, we have to assume smoothness assumptions like A(); B (); C (); D() are continuous and bounded, D() injective and (D> ()D())?1 bounded and continuous.

6 Approximative Aspects Both the maximum principle and dynamic programming lead to equations, 2-point boundary value problem and a partial di erential equation, respectively, which provide methods to extract (candidates for) the optimal control when solved. A drawback of solving a 2-point boundary value problem arising from the maximum principle is that it speci es only an open-loop optimal control function for one particular initial condition. For practical implementation of a controller, a feedback law specifying the optimal control value for all states and times is much more e ective, in particular when disturbances are active. Moreover, solving a 2-point boundary value problem is not trivial at all. From this point of view a feedback like (34) is preferred. Since solving of a partial di erential equation is in general very complicated and time consuming, an alternative approach resulting in a feedback law will be investigated. We would like to stress that our aim is not to give a complete treatise of the numerical solution of the problem at hand, but specify a possible framework and justify its e ectiveness by convergence results. Closer inspection of the expression ut;T;x0 = L~t;T P (?Mt;T x0 ) reveals that only P | the projection on Lt;T (P [t; T ]) | cannot explicitly be computed. The other two operators are given by a state space description. Hence, to get an approximation using this description of the optimal control function, we must focus on approximating projections PK with a complex convex set K . One way of doing this, is considering a sequence of closed convex subsets fKn gn2IN of K . This sequence has to ll up the set K from the inside and PKn should { of course { be easier to compute. To make this more speci c, we write Kn " K (n ?! 1), if Kn  K; n 2 IN and dist(k; Kn ) := inf fkk ? lk j l 2 Kn g ! 0(n ! 1) for all k 2 K . Since Kn is a subset of K , it is obvious that kx ? PK xk  kx ? PKn xk and in general it is even strictly smaller. However, for increasing n Kn becomes closer and closer to K , so one may guess that the di erence between kx ? PK xk and kx ? PKn xk vanishes. By using this result and the parallelogram law, it can be shown that PKn x converges to PK x for increasing n.

Theorem 15 (Approximation of Projections) Let K be a closed convex set in a Hilbert space H . Let fKngn2IN be a sequence of closed convex subsets of K with Kn " K (n ?! 1): Then for all x 2 H we have PKn x ?! PK x (n ?! 1): Proof. This proof resembles the proof of Theorem 1 in [14]. Let fkn gn2IN be a sequence that converges to PK x 2 K with the property that kn 2 Kn for all n 2 IN . Since Kn  K , we get for all n 2 IN the inequality  := kx ? PK xk  kx ? PKn xk  kx ? kn k ?! kx ? PK xk =:  (n ?! 1): Hence, limn?!1 kx ? PKn xk = kx ? PK xk: 12

Next, we show that fPKn xgn2IN is a Cauchy sequence. By the parallelogram law, kPKn x ? PKm xk2 = 2kPKn x ? xk2 + 2kPKm x ? xk2 ? 4kx ? PKn x +2 PKm x k2 The convexity of K yields kx ? PKn x+2PKm x k  . Hence,

kPKn x ? PKm xk2  2kPKn x ? xk2 + 2kPKm x ? xk2 ? 42 ! 0(n; m ?! 1): The sequence fPKn xgn2IN is Cauchy and let k0 2 K be its limit. By continuity, kx ? k0 k = kx ? PK xk. The uniqueness of the minimizing vector implies that k0 = PK x. 2 To apply the above procedure a sequence of closed convex subsets has to be constructed, which ll up the admissible control set as described. One method, that is often used in practise, is discretisation by requiring additionally that the input u is piecewise constant. Formally, let h := TN?t for some N 2 IN andti = t + ih; i = 0; : : : ; N: De ne Ph [t; T ] as

fu 2 P [t; T ] j u j[ti ;ti+1 ) is constant ; i = 0; : : : ; N ? 1g:

Analogous to subsection 5.1, it can be shown that there exists a unique control function in

Ph [t; T ], that minimizes the cost function for initial condition (t; x0 ) over the set Ph [t; T ]. This optimal discrete control with time-step h will be denoted by uht;T;x0 . Since the collection of all step-functions with di erent time-steps forms a dense subset of

L2 [t; T ]m and Lt;T is a continuous, linear transformation, we see that

Lt;T (Ph [t; T ]) " Lt;T (P [t; T ])(h # 0): Theorem 15 shows that PLt;T (Ph [t;T ]) converges pointwise to P := PLt;T (P [t;T ]). Using (12) and realizing that uht;T;x0 = L~t;T PLt;T (Ph [t;T ]) (?Mt;T x0 ) yields

Theorem 16 The optimal discrete controls converges in the norm of L2 to the exact optimal control when the time-step converges to zero. Mathematically,

L2 u uht;T;x0 ?! t;T;x0 (h # 0):

The last question of this section to be answered is how to compute the optimal discrete control for a xed time-step h. In fact, by the discretisations the problem is transformed to an optimal control problem of a discrete time system in a way that the original techniques of Belmann's dynamics programming for discrete time systems can be exploited, see e.g. [3] and [11]. Every control uh 2 Ph [t; T ] can be parametrised as

uh =

N X uhi1[ti?1 ;ti )

(35)

i=1 with uhi 2 . For such a discrete control uh , we denote the corresponding solution of (1) with initial condition (t; x0 ) by x and introduce xh (i) := x(ti ); i = 0; : : : ; N . For xh we can write

xh (i + 1) = eAh xh (i) +

Zh Z ti+1 eA(ti+1 ? )Buh ( )d = eAh xh (i) + eA(h?)Bd uhi

ti h =: Ah x (i) + Bh uhi ; with initial condition xh (0) = x0 .

0

Furthermore, we introduce a discrete version of the value function. 13

(36)

De nition 17 Fix times t; T such that t < T (s)zi;v (s)dsg; zi;v

(39)

ti?1 where zi;v is the output of (1), (2) with initial conditions (ti?1 ; x) and control identically equal to v 2 IRm on the interval [ti?1 ; ti ). It is clear that V h (N; x) = 0 for all x 2 IRn . So, we can now recursively determine the value function V h and store the optimal control values for every point (i; x). The integral in (39) can be

expressed explicitly in terms of x and the chosen value of v [2], thereby facilitating the calculations. The above approximation avoids the problem of solving the Hamilton-Jacobi-Bellman partial di erential equation. However, we obviously cannot store the value function into a computer without discretisation (also called \gridding") of the state space. The explicit expressions for the derivatives of the value function can be helpful in choosing how dense the gridding of the state space should be in order to get accurate estimations for the states that are not stored. Using (13) and (14), it is possible to store one dimension less, because knowledge of the optimal control values on the unit circle of the state space is sucient to characterise the optimal control for all states. Note that nishing the complete recursive scheme gives you the optimal controls for all stored initial states. Since the optimal control values depend continuously on the initial conditions (as is proven in [7]), interpolation between stored values give good approximations of the optimal control values for the non-stored states. For similar techniques and more details, we refer to monographs like [12]. A problem is of course how small to choose the time-step h. The time constants of the system (A; B ) provides us with some information about the size to make ecient control possible. Explicit formulas specifying upper bounds on the time-step corresponding to a certain performance are not available. Rule of thumbs based on engineer's insight indicate suitable choices of h, where the explicit expressions of the time derivative of the value functions are quite valuable. Finally, we would like to remark that physical implementation of continuously updating of control values as is required by a feedback law like (34) is often impossible in practise due to complex computations. So it could be possible that a lower bound is put on the size of the timestep in order to provide enough time for the controller to update its control values. A drawback of the above method is the storage of the optimal control values in some kind of look-up table. The time-varying behaviour of the feedback law is the main cause of the large amount of stored data. To get rid of the time-variance, the in nite horizon case will be investigated in the next section.

7 In nite Horizon

7.1 Problem Formulation

In this section, the object of study is the in nite horizon case (T = 1). The time-invariance of the problem guarantees that we can take t = 0 without loss of generality. On in nite horizon it 14

seems natural to require some boundedness properties of the state. In this paper the boundedness is expressed by a nite L2 -norm of the state trajectory. The in nite horizon problem is de ned then de ned as follows.

Problem 18 The objective is to nd for all initial states x0 a control function u 2 P [0; 1)

minimizing

J1 (x0 ; u) :=

Z1 zx>0;u (s)zx0 ;u (s)ds 0

(40)

subject to the relations (1), (2), x(0) = x0 and the additional constraint that the corresponding state trajectory x is contained in L2 [0; 1)n .

Notice that x 2 L2 [0; 1)n implies x_ 2 L2 [0; 1) and hence, x(s) ?! 0 (s ?! 1). Naturally, this leads to the following concept.

De nition 19 (Stabilizability and Positive Stabilizability) A control is said to be stabilizing for initial state x0 , if the corresponding state trajectory x satis es x 2 L2 [0; 1)n . (A; B ) is said to be positive stabilizable, if for every x0 there exists a stabilizing control u 2 P [0; 1). Notice that u 2 L2 [0; 1)m and x 2 L2[0; 1)n imply that z 2 L2 [0; 1)p and hence, the

niteness of J1 . The positive stabilizability of (A; B ) guarantees the existence of positive controls, that keep the cost function nite. In [8] necessary and sucient conditions are stated for positive stabilizability of a system (A; B ). In fact, these conditions are easily veri ed.

7.2 Existence and Uniqueness

In the remainder of this paper it will be assumed that 1. (A; B ) is positively stabilizable; 2. D has full column rank; 3. (A; B; C; D) is minimum phase. To introduce the concept of minimum phase, we rst de ne what we mean by the transmission zeros of a system (A; B; C; D). A transmission zero is a complex number  such that the system matrix   I ? A ?B (41) C D

looses rank. If all zeros have real parts smaller than zero, the system (A; B; C; D) is called minimum phase. A system (C; A) is detectable, if the matrix (A> ? I ; C > ) does not loose rank for any complex number  with real part larger than or equal to zero. A well-known result in systems theory is that an equivalent characterisation of detectability of the pair (C; A) is the existence of a matrix G such that A + GC is stable [6]. Since D has full column rank, (A; B; C; D) is minimum phase i (C + DF2 ; A + BF2 ) is detectable, where F2 is uniquely determined by (C + DF2 )> D = 0. This fact is observed, if you postmultiply the matrix in (41) by the invertible matrix



resulting in



I 0 F2 I



 I ? A ? BF2 ?B : C + DF2 D 15

(42)

The left and right block in the resulting matrix are orthogonal due to (C + DF2 )> D = 0. The equivalence to the detectability of (C + DF2 ; A + BF2 ) is obtained by noting that D has full column rank. The assumption of minimum phase is needed to get convergence between the nite and in nite horizon problem. Under the assumptions state above the optimal controls with in nite horizon exist and are unique. These results will be proven automatically in establishing the convergence results.

7.3 Connection between Finite and In nite Horizon

In the following two subsections, connections between the nite and in nite horizon problem will be investigated. We will prove the convergence of the optimal costs for nite horizon to the optimal costs for in nite horizon when the horizon approaches in nity. Also, the optimal control, state trajectory, output for nite horizon converges to the optimal control, state trajectory, output for in nite horizon, respectively in the norm of L2 . The convergence of the optimal controls in the sense of pointwise convergence can be shown by using the maximum principle for nite horizon and extending it to in nite horizon. We start with an auxiliary result, that states the existence of a bounded causal operator from L2 -output to corresponding control. An operator L from L2 [0; 1)p to L2[0; 1)m is called causal, if for all z1 ; z2 2 L2 [0; 1)p with z1 (s) = z2 (s) for almost all s  s0 we have (Lz1 )(s) = (Lz2 )(s) m for almost all s  s0 . We introduce the function space Lloc 1 [0; 1) as the collection of locally m m integrable functions taking values in IRR with domain [0; 1). More speci c, u 2 Lloc 1 [0; 1) , if b for all a; b 2 [0; 1); a < b the integral a u(s)ds exists in Lebesgue sense. For locally integrable inputs the system given by (1) and (2) results in well-de ned state and output trajectories. m Lemma 20 Consider the system given by (1) and (2). If for input u 2 Lloc 1 [0; 1) and initial p conditions (0; x0 ) we have that the output zt;x0;u 2 L2[0; 1) , then it must hold that u 2 L2 [0; 1)m and moreover, the corresponding state trajectory x is contained in L2 [0; 1)n . In fact, there exists a causal, bounded operator Gx0 from L2 [0; 1)p to L2[0; 1)m , which maps

an output of system (A; B; C; D) to the corresponding control input u for given initial state x0 . This operator can be written as u = Gx0 z = G~x0 + G0 z: for z 2 L2 [0; 1)p . G~ is a linear bounded operator from IRn to L2 [0; 1)m and G0 is a linear, bounded operator from L2 [0; 1)p to L2 [0; 1)m .

Proof. Let F2 be such that (C + DF2)> D = 0. Apply the preliminary feedback u(s) = F2 x(s) + w(s) to get z (s) = (C + DF2 )x(s) + Dw(s) and premultiply by D> to get D> Dw(s) = D> [z (s) ? (C + DF2 )x(s)] = D> z (s): So, there holds

w(s) = (D> D)?1 D> z (s); (43) from which we conclude w 2 L2 [0; 1)m : Since (C + DF2 ; A + BF2 ) is detectable, there exists G such that A + BF2 + G(C + DF2 ) is stable. But then

x_ = [A + BF2 + G(C + DF2 )]x ? Gz + (B + GD)w: Since z; w are L2 -functions, we conclude x 2 L2[0; 1)n . Combining (43) with the above equation results in

x_ = [|A + BF2 +{zG(C + DF2 )]} x ? [|G + (B + GD{z)(D> D)?1 D>}] z u

A~ = F2 x + w = F2 x + (D> D)?1 D> z;

16

B~

(44)

which is a description of the transformation Gx0 . More explicitly,

Zs ~ As ~ ( )d u(s) = (Gx0 z )(s) = e| {zx}0 + eA~(s? ) Bz 0 {z } G~x0 | (G0 z)(s)

(45)

2

We prove now the uniqueness of optimal controls. Suppose u0 and u1 are both optimal with corresponding output z0 and z1 , respectively. Consider the admissible controls u given by u = (1 ? )u0 + u1 with output z for 0   1. Hence,

kz k2 = kz0 + (z1 ? z0 )k2 = kz1 ? z0k2 2 + 2(z0 j z1 ? z0 ) + kz0 k2 : kz1 ? z0k = 0, because otherwise kz k < kz0k for 2 (0; 1), contradicting optimality of z0 . Hence,

z1 = z0. The previous lemma implies now that u0 = u1 . De nition 21 A sequence fxngn2IN in a Hilbert space H is said to converge weakly to x 2 H , if w x(n ?! 1). for every h 2 H we have (xn j h) ?! (x j h) (n ?! 1): Notation: xn ?! The following properties of weak limits are classical (see e.g. [20]).

Lemma 22

1. A weakly convergent sequence is bounded. 2. Each bounded sequence in a Hilbert space, has a weakly convergent subsequence. w x and kx k ?! kxk. 3. In a Hilbert space xn ?! x if and only if xn ?! n

w x implies kxk  lim inf 4. In a Hilbert space xn ?! n?!1 kxn k. 5. If a bounded sequence in a Hilbert space has the property that all weakly convergent subsequences have the same limit, then the sequence itself converges weakly to this limit.

We introduce the operators T : L2 [0; 1)m ! L2 [0; T ]m for s 2 [0; T ] by (T u)(s) = u(s) and T : L2 [0; T ]m ! L2[0; 1)m for all s 2 [0; 1) by ( )(s) = T



u(s) s  T 0 s>T

In what follows, we denote by uT ; xT and zT the optimal control, the optimal state trajectory and the optimal output on [0; T ] with xed initial condition (0; x0 ) for T > 0 or T = 1. Furthermore, to distinguish between the value function for di erent horizons, V T (t; x0 ) denotes the minimal costs over the time interval [t; T ] with initial state x0 .

Theorem 23 For all x0 2 IRn, there holds that V T (x0 ) ?! V 1(x0 ) (T ?! 1), where V T (x0 ) = V T (0; x0 ). For T ?! 1 we have T uT ?! u1 ; T zT ?! z1 and T xT ?! x1 in the L2 -norm. The proof of the next auxiliary lemma can be found in the appendix.

Lemma 24 Let fTj gj2IN be a sequence of positive numbers with Tj ?! 1 (j ?! 1). If the sequence fTj zTj gj2IN is weakly convergent with limit z, then there exists a control u 2 P [0; 1) such that zx0;u = z and

w u (j ?! 1): Tj uTj ?!

17

Proof of Theorem 23 We know that V T (x0 ) is increasing in T and bounded by V 1 (x0 ) for xed initial state x0 . Hence, it has a limit, say . Obviously, fT zT gT 2IR+ is bounded in L2 -sense. Hence, there exists a weakly convergent sequence fTj zTj gj2IN with limit z. According to the previous lemma, there exists an admissible control u with zx0 ;u = z. Finally, inf V Tj (x0 ) =  V 1 (x0 ): V 1 (x0 )  kzk2  lim inf k z k2 = lim j ?!1 j ?!1 Tj Tj

The second part of the theorem is proven as follows. Let fTj gj2IN be any sequence of positive numbers with Tj ?! 1 (j ?! 1). Since V T (x0 ) is bounded by V 1 (x0 ), we have fTj zTj gj is a bounded sequence. Take any two weakly convergent subsequences fnj znj gj and fmj zmj gj of fTj zTj gj with limits z n and z m, respectively. We will prove z n = z m. On account of Lemma 24, we conclude the existence of admissible controls un and um with zx0 ;un = z n and zx0;um = z m . Then kz nk22;[0;1)  lim inf j?!1 knj znj k22;[0;1) = lim inf j?!1 kznj k22;[0;nj ) = V 1 (x0 ) and similar for z m. We nd un and um are both optimal inputs. By uniqueness of optimal controls, we see that un = um and hence z n = z m = z1 . Using Lemma 22 we conclude that fTj zTj gj is a weakly convergent sequence with limit z1 . Lemma 24 states now the existence of optimal controls. Since kTj zTj k2 = V Tj (x0 ) ?! V 1 (x0 ) = kz1k, we get now from Lemma 22 that fTj zTj gj actually converges to z1 in the sense of L2. L2 G z : Hence, L2 z , it follows from the continuity of G that G  z ?! Since Tj zTj ?! 0 1 1 0 0 Tj Tj L 2  Gx0 Tj zTj ?! Gx0 z1 = u1 , which implies by causality of Gx0

kuTj ? Tj u1 k[0;Tj ] = kTj Gx0 Tj zTj ? Tj u1 k2 ?! 0 (j ?! 1): Since u1 2 L2 [0; 1)m , ku1 k2;[Tj ;1) ?! 0 (j ?! 1). Since kTj uTj ? u1k2  kuTj ? Tj u1 k2;[0;Tj ] + ku1k(Tj ;1) ;

L2 u if j ?! 1. these last two remarks combined lead to Tj uTj ?! 1 The result for the state trajectories is proved analogously by replacing Gx0 by the operator from the output z to the trajectory x, which can be derived from (44). 2

Notice that in the above proof, the existence of optimal controls has been shown.

7.4 Maximum Principle with In nite Horizon

In this subsection the maximum principle for nite horizon together with the convergence results are exploited to arrive at a maximum principle for in nite horizon. This maximum principle states smoothness properties of the in nite optimal control function and will result in an additional convergence relation between nite and in nite horizon optimal control functions: the optimal controls converge also pointwise. The precise mathematical formulation will be stated below. This extended convergence lemma justi es the numerical approximation of the stationary in nite horizon optimal feedback in Subsection 7.5. For a xed initial state x0 the corresponding optimal controls uT for nite horizon T satisfy (46) uT (s) = P (? 21 (D> D)?1 fB > T (s) + 2D> CxT (s)g) for all s 2 [0; T ], where the functions xT and T are given by

x_ T = AxT + BuT ; xT (0) = x0 _T = ?A> T ? 2C > zT ; T (T ) = 0: Lemma 25 For all T > 0 it holds that kT (0)k  M0kx0k for certain constant M0. 18

(47)

Proof. De ne xT by x_ T = AxT + BuT ; xT (0) = T (0); where uT is the optimal control achieving

V T (T (0)) = V T (0; T (0)). I.e. uT = u0;T;T (0) . We get d > dt (T j x) = (B T j uT ) ? 2(C xT j CxT + DuT ) = (B > T + 2D>CxT + 2D>DuT j uT ) + ?2(C xT j CxT + DuT ) ? (2D> CxT + 2D> DuT j uT )  ?2(C xT + DuT j CxT + DuT ): We used that according to (9) and (46)

(B > T + 2D>CxT + 2D>DuT j uT ) = ?2(? 12 (D> D)?1 [B > T + 2D> CxT ] ? uT j uT )D> D  0:

R Since kT (0)k2 = (T (0) j x(0)) = 0T ? dtd (T (s) j x(s))ds, we have kT (0)k2  2(C xT + DuT j CxT + DuT )2;[0;T ]  2kCxT + DuT k2;[0;T ]kC xT + DuT k2;[0;T ] q = 2 V T (T (0))V T (x0 ) p  2 V 1 (T (0))V 1 (x0 ): ( j )2;[0;T ] denotes the standard inner product in L2 [0; T ]p. Clearly, there exists a constant M such that V 1 (x)  M kxk2 8x 2 IRn : But then kT (0)k2  2M kT (0)k kx0 k:

2

Theorem 26 The optimal control u1 corresponding to initial condition (t; x0 ) satis es u1 = P (? 12 (D> D)?1 fB >1 (s) + 2D> Cx1 (s)g); where the continuous function 1 2 L2 [0; 1)n is given by _ 1 = ?A> 1 ? 2C > z1 for some initial condition (t; 1 (t)). Moreover, u1 is a continuous function.

(48) (49)

Proof. Take an arbitrary T . Lemma 25 assures the existence of a convergent sequence fTi (0)gi, where Ti ?! 1 (i ?! 1). Since fT zT g converges to T z1 in L2 [0; T ]p when (T ?! 1), we see by comparing (47) and (49, that T Ti ?! T 1 in L1 [0; T ]n when i ?! 1. Analogously L1 [0; T ]n -convergence of fT xTi gi to T x1 can be deduced by using the L2 -convergence of fT uT g. Equation (46) implies now that fT uTi gi converges uniformly on [0; T ] to P (? 21 (D> D)?1 fB > 1 (s) + 2D>Cx1 (s)g);

because the projection P is continuous. The L2 -convergence of fT uTi gi to T u1 implies the existence of a subsequence, which converges a.e. on [0; T ] to T u1 . This implies that T u1 equals the expression above almost everywhere, because of uniqueness of limits. Letting T go to in nity, completes the proof. 2 From this we conclude that u1 is continuous. From the proof of the lemma, we see that 1 (0) equals the limit of a convergent sequence in the collection fT (0)gT 0 with the property fTi (0)gi , where Ti ?! 1 (i ?! 1). We use the above maximum principle for strengthening

the convergence properties between nite and in nite horizon optimal controls. 19

Theorem 27 For all T > 0 T uT ?! T u1(T ?! 1) in L1[0; T ]m . In the proof of this theorem, the following lemma is used.

Lemma 28 We consider a xed initial condition (0; x0 ). For every T > 0, there exists a constant CT | depending on T | such that the optimal controls for nite and in nite horizon satisfy for all s; t 2 [0; T ] kuT (s) ? uT (t)k  CT j s ? t j 21 This holds lemma makes only sense for T  T or T = 1. Proof. See appendix 2 Proof of Theorem 27. De ne the continuous di erence function wT 2 L2[0; T ]m for T  T by wT := T (uT ? u1 ):

Suppose

(50) 9">0 8T0 T 9T>T0 9s2[0;T ] kwT (s)k  ": Then, there exists a subsequence fwTi gi2IN with Ti ?! 1(i ?! 1) and a sequence of points fsi g in [0; T ] such that kwTi (si )k  " for a certain ". Since for s; t 2 [0; T ] kwT (s) ? wT (t)k  kuT (s) ? uT (t)k + ku1(s) ? u1 (t)k  2CT j s ? t j 12 ; we can construct for each i an interval round si with measure larger then a constant  > 0, 1 where kwTi k  p 2 ". This contradicts the L2 -convergence of wTi to the zero function, because

kwTi k2;[0;T ]  21 " :

2

7.5 Approximation

In the nite horizon problem, we saw that the optimal control could be given by a time-varying feedback of the form (cf. (34)) (51) u(s; x) = P (? 12 (D> D)?1 [B > Vx (s; x) + 2D> Cx]): As mentioned in Section 6, the time-varying behaviour of the feedback is for implementation purposes not convenient due to the large storage requirements of the control system. Especially, when the time-step h is chosen small giant storage capacities are needed to store the optimal control values for all discrete time instants. In contrast with the in nite horizon case, the timedependence vanishes. It is obvious that for all s  0 ut;x0 (s + t) = ux0 (s). From this it follows that that there exists a time-invariant optimal feedback ufdb de ned by lim u0;T;x0 (0): ufdb(x0 ) := ut;x0 (t) = u0;x0 (0) = T ?!1

(52)

The limit in (52) is the result of the technical and analytical work, we performed in the previous (sub)section. It provides us with a method to approximate the optimal feedback. We proposed a method to compute the nite horizon optimal controller in Section 6. If T is large enough, u0;T;x0 can be used as an approximation for ufdb(x0 ). An unsolved problem at the moment is how to choose T in order it is large enough. Of course, the time constants of the system under study give an indication. It is also clear that the choice of weightings C and D in uence the choice of T . If D> D increases, the control intensity reduces resulting in a less fast closed-loop system. Consequently, a larger value of T is required. Since the numerical algorithm proceeds recursively 20

back in time, one could also compare two successive time steps and use a criterion on the di erence between the computed feedbacks on ti and ti?1 . If this di erence is small compared to a speci ed tolerance the algorithm is stopped, otherwise one continues. Using the explicit expression for the time derivative of the value function can be used also: T has to be chosen so large that VtT (0; ) becomes close to zero indicating that the control does not change too much anymore. An alternative could be simulation of the closed-loop system to verify whether the chosen T is large enough. In the example below T = 15 is suciently large to approximate the feedback, because the system is stabilized within this time span. In our investigation of connections between nite and in nite horizon problems, the approximation of the stationary feedback as proposed is justi ed by the convergence results. In [12], much more approximation techniques are considered in particular for value functions. Numerical robustness of the method is guaranteed by choosing suciently large T and suf ciently small h. The explicit expressions of the derivatives of the value function as given in subsection 5.3 are helpful in choosing good values for T and h. These expressions can be used also to determine the mistake you make by gridding of the state space. A point to be mentioned is that the indicated approximation su ers from the \curse of dimensionality." Increasing state dimensions of the system results in exponentially more computation time and storage capacities. However, the ongoing evolution of very fast processors in computers makes the method more and more feasible also for very large systems. We illustrate our approximation by the example of the pendulum. The dotted line in gure 2 is the value of the cost function if we apply the control as in gure 3. The optimal control is computed by the algorithm sketched above with T = 25 and time-step h = 0:125. The resulting state trajectory is also depicted. 0.6 control

0.5 0.4 0.3 0.2 0.1 0 0

2

4

6

8

10 time

12

14

16

18

20

2

4

6

8

10 time

12

14

16

18

20

state trajectory

1 0.5 0 −0.5 −1 0

Figure 3: Optimal control and optimal state-trajectory. We like to stress the advantage of a feedback optimal formulation over open-loop control. The open-loop control is determined a priori on the initial condition only. Hence, the open-loop control is determined for the complete considered time-interval and adaption to disturbances active on the system is not possible anymore. Feedback, however, determines the control-value on the actual measured state. In this way, it can adapt to all kinds of disturbances. To illustrate the advantage of feedback over open-loop controllers, we implement both control strategies on the pendulum (initial state (1; 0)> ) in case a constant wind is blowing. We model this by adding 0:5 to the input u. In gure 4, we see how the feedback control adapts to this disturbance. We observe 21

that the oscillations of the state die out more quickly in the feedback case, resulting in a better performance. Open−loop 0.8

0.8

0.6

0.6

0.4

control

control

Feedback 1

0.4

0.2

0.2 0 0

0 5

10 time

15

−0.2 0

20

0

−1

−2 0

10 time

15

20

5

10 time

15

20

1 state trajectory

state trajectory

1

5

5

10 time

15

0

−1

−2 0

20

Figure 4: Comparing feedback and open-loop formulations.

8 Conclusions In this paper, we addressed three approaches to Linear Quadratic Regulator problem with a positivity constraint on the admissible control set. The optimal controls were characterised by necessary and sucient conditions in terms of inner products, the maximum principle and dynamic programming. The connection between dynamic programming and the maximum principle was exploited to show that both the maximum principle is sucient and dynamic programming is necessary as well. The maximum principle indicated the smoothness properties of the optimal control, especially the continuity of the optimal control. The maximum principle is stated by a two-point boundary value problem, dynamic programming by a partial di erential equation. If one of these equations is solved, then the solution leads to the optimal controller. Although the value function at hand turned out to be continuously di erentiable and a classical solution to the Hamilton-Jacobi-Bellman equation, the partial di erential equation is not easily solved. The same holds for the two-point boundary problem with the additional drawback that it gives an open-loop control function for only one particular initial condition when solved. For that reason another approach was taken to propose a recursive scheme for approximating the optimal control by piecewise constant control functions. Convergence results between the optimal discrete controls and the exact optimal control justi ed the e ectiveness of the method. However, it was argued that the storage capacities of the controller could become quite large, if a small time step is implemented. To overcome this problem the in nite horizon problem was studied to arrive at a stationary feedback which requires less data storage. The investigation of the connections between the nite and in nite horizon problem resulted in convergence results between the optimal controls. To strengthen the L2 -convergence to pointwise convergence, the maximum principle for the nite horizon was extended to in nite horizon under a rather mild condition of \minimum phase" of the system. This guaranteed also the continuity of the optimal control on in nite horizon. This analytical results indicated a method to approximate the optimal positive feedback in the considered problem. The only problem is that currently explicit bounds for the time22

step h and the horizon T guaranteeing good approximation of the stationary in nite horizon optimal feedback are not available. However, some rules of thumbs were speci ed to get a rough estimate on how to choose these quantities. We brie y touched upon how to use the explicit expressions for the derivatives of the value function could be used with these choices. Especially, the speci cation of these bounds is a topic of onward development. Brie y, we discussed the advantage of feedback: it performs better than the open-loop controllers, because it can adapt to disturbances like measurement noise and unmodelled dynamics. To conclude, we would like to stress that the questions raised in Section 2 are natural in any optimal control problem and that we answered them all in the settings of the optimal control problem at hand. In particular, the proposed approximation method based on discretisation is often used in practise, but justi cation of such method by means of convergence methods is hardly encountered in the literature.

9 Acknowledgement The authors would like to thank the unanimous reviewers for their useful comments and constructive criticism to improve the overall quality of this paper.

A Appendix

Lemma 29 For the function R as de ned in (28), it holds that there exists a constant E such that for all h 2 IRn j R(t; x0 ; h) j  E khk2 (53) Proof. By using the following inequalities in combination with the Cauchy-Schwarz inequality, the lemma is proved.

V (t; h)  J (t; h; 0) = kMt;T hk22  kMt;T k2 khk2: Lemma 7 and (12) imply that for all t 2 [0; T ] and x0 ; x1 2 IRn . kut;T;x0 ? ut;T;x1 k2  kMt;T kkL~t;T kkx0 ? x1 k

2

Proof of Lemma 24 There holds, w G (z ) (j ?! 1): Gx0 (Tj zTj ) ?! x0 We de ne u^Tj := Gx0 (Tj zTj ) and u := Gx0 (z ). Thus, fu^Tj gj2IN is weakly convergent with limit u. Since the operator Gx0 is causal, Tj u^Tj = uTj : We see that for every h 2 L2 [0; 1)m (54) (h j u^Tj )[0;1) = (h j Tj uTj )[0;1) + (h j u^Tj )[Tj ;1) The Cauchy-Schwarz inequality implies

j (h j u^Tj )[Tj ;1) j khk[Tj ;1) ku^Tj k[0;1) : Since weak convergence of the sequence fu^Tj gj implies L2 -boundedness, we see that the left-hand side of the above equation tends to zero, if j ?! 1. Hence from (54) we get, fTj uTj gj is a weakly convergent sequence with limit u. Notice that u 2 P [0; 1), because P [0; 1) is weakly closed in L2 [0; 1)m. w u(j ?! 1 Left to prove is zx0;u = z. Consider an arbitrary scalar  > 0. From Tj uTj ?! w  u: Hence, M x + and the boundedness and linearity of  we get for Tj >  , that  uTj ?!  0; 0 w  z: w M x + L  u: On the other hand, M x + L  u =  z ?! L0;  uTj ?!   Tj 0; 0 0;  0; 0 0;  Tj 23

Uniqueness of weak limits gives M0; x0 + L0;  u =  z: Since  was arbitrary we conclude zx0 ;u = z. 2

Proof of Lemma 28 kP (a) ? P (b)kD>D  ka ? bkD>D can be translated into kP (a) ? P (b)k  ka ? bk, where k  k is the usual Euclidean norm and  is a certain positive constant. Since the optimal controls uT satisfy ( 46) and (48), we have for T  s > t  0 kuT (s) ? uT (t)k  12 k(D>D)?1 B k kT (s) ? T (t)k + k(D>D)?1D> C k kxT (s) ? xT (t)k:

First, we note that the collection fT zT j T > T g is bounded in L2 [0; T ]p , because fV T gT is bounded. Since we can write uT = G~x0 + G0 zT , we see that the collection fT uT j T > T g is also bounded in L2 [0; T ]m . In turn, this implies that fT xT j T > T g is bounded in L1 [0; T ]n . Let s > t.

Zs

kxT (s) ? xT (t)k  k[eA(s?t) ? I ]xT (t)k + k eA(t? )BuT ( )d k t k A k ( s ? t )  [e ? 1]kxT (t)k + ekAkT kB k j s ? t j 21 kuT k2;[0;T ] kAkT  e T ? 1 j s ? t j kxT k1;[0;T ] + ekAkT kB k j s ? t j 21 kuT k2;[0;T ] kAkT  e p ? 1 j s ? t j 12 kxT k1;[0;T ] + ekAkT kB k j s ? t j 21 kuT k2;[0;T ] T The second step is a consequence of the Cauchy-Schwarz inequality. The last upper bound can be dominated by a constant which depends on T only, because kuT k2;[0;T ] and kxT k1;[0;T ] are uniformly bounded for all T  T and T = 1. The term kT (s) ? T (t)k goes in an almost identical way. 2

References [1] Anderson, B.D.O., and Moore, J.B., 1990, Optimal Control - Linear Quadratic Methods. Prentice-Hall. [2]  Astrom, K.J. and Wittenmark, B., 1984, Computer-controlled Systems - Theory and Design. Prentice-Hall. [3] Bellman, R., 1967, Introduction to the Mathematical Theory of Control Processes. Academic Press.  [4] Feichtinger, G., and Hartl, R.F., 1986, Okonomischer Prozesse: Anwendung des Maximumprinzips in den Wirtschaftswissenschaften. De Gruyter, Berlin. [5] Fleming, W.H., and Soner, H.M., 1993, Controlled Markov Processes and Viscosity Solutions. Springer-Verlag. [6] Hautus, M.L.J., Stabilization, controllability and observability of linear autonomous systems, 1969. Nederl. Akad. Wetensch., Proc., Ser. A73, p. 448-455. [7] Heemels, W.P.M.H., 1995, Optimal Positive Control & Optimal Control with Positive State Entries. Master's thesis Eindhoven University of Technology. [8] Heemels, W.P.M.H., 1998, Necessary and sucient conditons for positive stabilizability of a system (A; B ). Internal Report of Eindhoven University of Technology, Dept. of Elect. Eng, Measurement and Control Systems, Nr. 98 I/01. 24

[9] Hiriart-Urruty, J.-B., and Lemarechal, C., 1993, Convex Analysis and Minimization Algorithms I. Springer-Verlag. [10] Kailath, T., 1980, Linear Systems. Prentice-Hall, Inc.. [11] Kirk, D.E., 1970, Optimal Control Theory, An Introduction. Prentice-Hall Inc.. [12] Kushner, H.J., and Dupuis, P.G., 1992, Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag. [13] Lee, E.B., and Markus, L., 1967, Foundations of Optimal Control Theory. John Wiley & Sons, Inc.. [14] Luenberger, D.G., 1969, Optimization by Vector Space Methods. John Wiley & Sons, Inc.. [15] Macki, J., and Strauss, A., 1982, Introduction to Optimal Control Theory. Springer-Verlag Inc.. [16] Pachter, M., 1980, The Linear-Quadratic Optimal Control Problem with Positive Controllers, Int. Journal of Control, Vol. 32. Iss. 4, 589-608. [17] Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., and Mishchenko, E.F., 1962, The Mathematical Theory of Optimal Processes. Interscience Publishers, John Wiley & Sons. [18] Fleming, W.H., and Rishel, R.W., 1975, Deterministic and Stochastic Optimal Control. Springer-Verlag. [19] Thomas, G.B., and Finney, R.L.,1996, Calculus and Analytic Geometry. Addison-Wesley Publishing Company. [20] Yosida, K., 1980, Functional Analysis. Springer.

25

Suggest Documents