Separation Theorem for Linearly Constrained LQG Optimal Control

0 downloads 0 Views 164KB Size Report
Key words: Optimal control; linear constraints; LQ control; LQG control; ... We show that the optimal control can be determined by solving an .... t dm t + bm t.
Separation Theorem for Linearly Constrained LQG Optimal Control - Continuous Time Case 1 Andrew E.B. Lim John B. Moore Department of Systems Engineering Research School of Information Sciences and Engineering Australian National University Canberra ACT 0200 Australia. Telephone: +61 6 249 4581. Email: [email protected].

Leonid Faybusovich Department of Mathematics University of Notre Dame Mail Distribution Center, Notre Dame, IN 46556 USA. February 1996

Abstract: We solve the linearly constrained linear-quadratic (LQ) and linear-quadratic-Gaussian (LQG) optimal

control problems. Closed form expressions of the optimal controls are derived, and the Separation Theorem is generalized.

Key words: Optimal control; linear constraints; LQ control; LQG control; Separation Theorem.

The authors wish to acknowledge the funding of the activities of the Cooperative Research Centre for Robust and Adaptive Systems by the Australian Commonwealth Government under the Cooperative Research Centre Program, and NSF grant DMS94-23279. 1

1 Introduction. The Separation Theorem for the LQG problem is a classic result is stochastic optimal control. Though proven in the late 1960's [9], it continues to capture the attention of researchers and alternative proofs are still being published [4]. In [8] the Separation Theorem for the linearly constrained LQ and LQG problems for the discrete time version of the problem is proven. In this paper, we prove that the Separation Theorem is also true for the continuous time case. We show that the optimal control can be determined by solving an unconstrained LQ or LQG problem together with a nite-dimensional, linearly constrained quadratic programming problem. Alternatively, a consequence of the Separation Theorem is that new techniques (derived from interioir point methods) for solving the linearly constrained LQ problem [6, 8] can now be used to solve the linearly constrained LQG problem, bringing with it the favourable convergence results associated with interior-point methods [6] as well as the added advantage that feasible sub-optimal controls for the constrained LQ problem can be used to construct feasible sub-optimal controls for the constrained LQG problem. In this paper, we shall consider linear integral constraints of the form (3.3) for the deterministic problem, and (4.3) for the stochastic problem. We show that a key step in the proof of the Separation Theorem for linearly constrained LQ/LQG is formulating the problem on the appropriate functional space. Once this has been done, we use results from optimization theory to prove our result.

2 Mathematical Preliminary Our proof of the Separation Theorem has two important steps. First, we formulate the LQ/LQG problems as optimization problems on appropriate functional spaces and second, we solve these newly formulated optimization problems using results from optimization theory. The following results from optimization theory will be needed [2]. Let X be an arbitrary vector space and f(x) a linear-quadratic convex functional de ned on X . Let fi (x) (i = 1; : : :; m) be linear functionals de ned on X and C be the convex subset of X de ned by the inequality constraint fi (x)  ci , (i = 1;    ; m) where ci 2 R are given a priori; that is C =

fx 2 X : fi (x)  ci ; i = 1;    ; mg

Let V be an ane susbpace of X . We shall consider the problem ( minf(x) (CP ) subject to: x 2 V \ C For   0 (ie. i  0 for i = 1;    ; m) and x 2 V , de ne g (; x) = f(x) +

m X i=1

i fi (x):

(2.1)

The following theorem follows as a consequence of the Kuhn-Tucker theorem [2]. In general, for this theorem to hold, we must make a so called Slater Assumption. It is shown in [5] that for the case considered above, this is not required. 1

Theorem 2.1 Under the assumptions above, let x be the optimal solution of the optimization problem (CP ) and x() be de ned by

x() = arg min fg(; x)g x2V

Then the Lagrange multiplier  de ned by 

exists and x = x( ).



 = arg max min fg(; x)g , T c = arg max [g(; x()) , 0 c] 0 x2V 0

(2.2) (2.3)

3 Deterministic case. For each t 2 [0; T ] let A(t) 2 Rnn and B(t) 2 Rnm . Assume that A(t) and B(t) depend continuously on t 2 [0; T]. Consider the deterministic linear system x_t = A(t)xt + B(t)ut ; x0 =  (3.1) with solution (xt ; ut) such that xt 2 Ln2 [0; T ], ut 2 Lm2 [0; T ]. De ne X = Ln2 [0; T ]  Lm2 [0; T ]. Clearly, X is a vector space and the set V = f(xt; ut) 2 X : (xt; ut) satisfy (3.1)g is an ane subset of X . Let the cost functional f : X ! R be given by Z T f(x; u) = 12 [x0tQ(t)xt + u0t R(t)ut] dt + 21 x0T FxT (3.2) 0 where for each t 2 [0; T ], Q(t) 2 Rnn is a symmetric positive semi-de nite matrix and R(t) 2 Rmm is symmetric positive de nite matrix. Furthermore, we shall assume that Q(t) and R(t) are continuous functions of t 2 [0; T ]. Let the constraint functionals fi : X ! R (i = 1;    nc ) be given by fi (x; u) =

T

Z

0

[a0i (t)xt + b0i(t)ut ]dt + c0ixT

(3.3)

where for each t 2 [0; T], ai (t) 2 Rn and bi (t) 2 Rm . Once again, we shall assume that ai (t) and bi(t) are continuous functions of t 2 [0; T]. The class of feasible controls U is the set Lm2 [0; T]. It should be noted by the reader that for deterministic systems, open-loop and closed loop control is equivalent. The deterministic linearly constrained LQ (DLCLQ) optimal control problem can be described as follows.

Problem 1 (DLCLQ): Find the control function ut 2 U which minimizes the performance index (3.2) and satis es the linear system equations (3.1) and the constraints fi (x)  ci for i = 1;    ; m where ci 2 R are assigned a priori. It is clear from the discussion above that (DLCLQ) is a linearly constrained (convex) quadratic optimization problem on the ane subspace V and can be written in the following way: ( minf(x; u) (P1) (x; u) 2 V \ C 2

where

f(x; u) 2 X : fi (x; u)  ci ; i = 1;    ; nc g

C =

Hence, we can use Theorem 2.1 to determine the optimal solution. As in (2.1), we de ne for   0 and (xt ; ut) 2 V the functional g (; (xt ; ut)) = f(xt ; ut) + For   0 de ne a(; t) =

nc X i=1

i ai (t)

m X i=1

i fi (xt ; ut): nc X

b(; t) =

i=1

i bi (t)

with c() de ned similarly. The following is easily derived using dynamic programming.

Proposition 3.1 Let i  0 (i = 1;    ; m) be given and consider the problem of nding the optimal (xt(); ut()) 2 V for the problem (

ming (; (xt; ut))

subject to: (xt; ut) 2 V The optimal control ut() is given by

ut() = ,R,t 1 [B 0 (t)K(t)xt () + B 0 (t)d(; t) + b(; t)]

(3.4)

where K(t), d(; t) and p(; t) are the solutions of

K_ = ,KA , A0 K + KBR,1 B 0 K , Q; K(T ) = F   _ d() = , A , BR,1 B 0 K d , a() + KBR,1 b(); d(T; ) = c() p() _ = [B 0 d + b()]0 R,1 [B 0 d + b()]; p(T; ) = 0

(3.5) (3.6) (3.7)

and xt() is the state process which results when ut() is used in (3.1). The resulting optimal cost is

g(; xt ; ut ) = 12 0 K(0) + d0(; 0) + 12 p(; 0)

(3.8)

As a result, we have the following theorem.

Theorem 3.1 Let  be the optimal solution of 



max d0(0) + 12 p(0) , T c 0 subject to (3.6)-(3.7) Then the optimal control of (DLCLQ) is

ut( ) = ,R,t 1 [B 0 (t)K(t)xt ( ) + B 0 (t)d(; t) + b( ; t)] 3

(3.9)

Proof:

As shown above, (DLCQP) is equivalent to (P1 ). From Theorem 2.1 the optimal solution of (P1 ) is (xt ; ut ) = (xt( ); ut( )) where (xt (); ut()) is the solution of (2.2) and  is the solution of (2.3). With f(xt ; ut) and fi (xt ; ut) given by (3.2)-(3.3) respectively, it follows from Proposition 3.1 that ut () is given by (3.4), xt() by (3.1) with ut() as the control, and (by noting that (3.8) is independent of )  is the optimal solution of (3.9). The existence of  is guaranteed by Theorem 2.1. 2 We can write the solution d(t) of the vector di erential equation (3.6) in the form d(t) =

h



d1(t)

i

dm (t)  

where for each t 2 [0; T], di (t) 2 Rn is the solution of the -independent linear vector di erence equation   d_i = , A , BR,1 B 0 K di , ai + KBR,1 bi; di(T) = ci

Similarly, we can write the solution pi(t) of the scalar di erence equation (3.7) as p(t) = 0 P where P (t) is the solution of the -independent matrix di erential equation P_ = Z 0 R,1 Z; where for each t 2 [0; T], Z(t) 2 Rmm and is given by Z(t) =

h

B 0 (t)d1(t) + b1 (t)



B 0 (t)dm (t) + bm (t)

i

It is now clear that the dual optimization problem (3.6)-(3.7) (3.9) is a linearly constrained quadratic optimization problem that can be solve by a number of numerical optimization packages. Alternatively, by generalizing an interior-point method for nite-dimensional linearly constrained quadratic optimization problems to in nite dimensions, new techniques for solving the linearly constrained LQ optimal control problem are derived [6]. In [8], it is shown how these interior-point techniques can be used to calculate the optimal value of the Lagrange multipliers  for the discrete time problem. A similar procedure can be used to determine an interior point method for determining the optimal values of the Lagrange multipliers for the continuous time problem. Apart from being highly ecient, the interior-point method approach has the added advantage that each (sub-optimal) iterate results in a feasible control; a property which is necessary in many applications.

4 Full information stochastic case. Let A(t), B(t) be de ned as in the deterministic case, and for each t 2 [0; t], let C(t) 2 Rnk . We shall also assume that C(t) depends continuously on t 2 [0; T ]. Let ( ; F ; P) be a probability space. Denote by X the space of all stochastic processes fxt : t 2 [0; T ]g such that xt 2 Lm2 ( ; F ; P) for each t 2 [0; T]. Similarly, let U be the space of all stochastic processes fut : t 2 [0; T ]g such that ut 2 Lm2 ( ; F ; P) for each t 2 [0; T]. Let fWt : t 2 [0; T ]g 4

be a standard Brownian motion [7] and let Ft = fWs : s 2 [0; t]g be the sub-sigma algebra on generated by fWs : s 2 [0; t]g. De ne the sets

X = ffxtg 2 X : xt measurable with respect to Ft g U = ffutg 2 U : ut measurable with respect to Ftg That is, X is the set of all processes fxtg 2 X which are non-anticipative with respect to fFtg and U is the set of all futg 2 U which are non-anticipative with respect to fFtg. Clearly, X and U are vector spaces. Denote X~ = X U . Then X~ is a vector space. Consider now the stochastic di erential equation dxt = A(t)xt dt + B(t)ut dt + C(t)dWt; x0  N(; P0)

(4.1)

It can be easily shown that the set V~ of all stochastic processes (fxtg; futg) 2 X~ which satisfy (4.1) is an ane subset of X~ . Let the cost functional f : X~ ! R be de ned as "

Z T f(fxt g; futg) = E 12 (x0tQ(t)xt + u0t R(t)ut) dt + 21 x0T FxT 0

#

(4.2)

where Q(t) and R(t) satisfy the same assumptions as in the deterministic case (3.2). Under these assumptions, f(fxt g; futg) is a convex quadratic functional on X~ . Similarly, let the nc linear constraint functionals fi : X~ ! R (i = 1;    ; nc ) be given by fi (fxtg; futg) = E

"Z

T 0

[a0 (t)xt + b0 (t)ut ]dt + c0 xT i

i

#

i

(4.3)

The stochastic full information linearly constrained LQG optimal control problem (SFLLQG) can be described as follows.

Problem 2 (SFLLQG): Find the optimal control policy fut g 2 U~ which minimizes f(fxtg; futg) such that (fxtg; futg) 2 X~ satis es the linear system (4.1) and the constraints fi (fxtg; futg)  ci (i = 1;    ; m) where ci 2 R (i = 1;    ; nc) are given a priori. Once again, (SFLLQG) as stated above is a linearly constrained convex quadratic optimization problem on an ane subset of a vector space and may be stated in the following way: (P2)

(

minf(x; u) (x; u) 2 V~ \ C~

where C~ =

n

o

(x; u) 2 X~ : fi (x; u)  ci; i = 1;    ; m

The optimal solution of (P2 ) can be determined using Theorem 2.1. Once again, for   0 and (fxtg; futg) 2 V~ we de ne as in (2.1) the functional g (; (fxtg; futg)) = f(fxt g; futg) + We have the following result. 5

m X i=1

i fi (fxtg; futg)

Proposition 4.1 Let i  0 (i = 1;    ; m) be given and consider the problem of nding the optimal (fxt()g; fut()g) 2 V~ for the problem (

ming (; (xt; ut)) subject to: (fxt g; futg) 2 V~

The optimal control ut() is given by

ut() = ,R,t 1 [B 0 (t)K(t)xt () + B 0 (t)d(; t) + b(; t)]

(4.4)

where K(t), d(; t) and p(; t) are the solutions of (3.5)-(3.7). The resulting optimal cost is Z T g(; fxt ; ut g) = 12 0K(0) + d0(; 0) + 12 p(; 0) + 12 tr fC 0(t)K(t)C(t)g dt 0

Proof:

When   0 and the class of feasible controls is the set of feedback controls V , it is a classical result of stochastic control that the optimal control of the unconstrained full information LQG problem 8 > > < > > :

ming(; (fxtg; futg)) (fxtg; futg) satisfy (4:1) fxtg 2 X ; futg 2 V

is given by (4.4). It is proven in Corollary 4:1 on page 163 from [7] that (4.4) is also optimal over the class V~ and our result follows immediately. 2 The Separation theorem for the full information LQG problem follows immediately from Theorem 2.1, Proposition 4.1 and Theorem 3.1

Theorem 4.1 If the optimal control for the linearly constrained LQ optimal control problem (DLCLQ) is ut( ) = ,R,1 (t) [B 0 (t)K(t)xt + B 0 (t)d(; t) + b( ; t)] then the optimal control for the linearly constrained, full information LQG optimal control problem (FILLQG) is

ut( ) = ,R,1 (t) [B 0 (t)K(t)xt + B 0 (t)d(; t) + b( ; t)]

Proof:

It is clear from the discussion above that the full information LQG problem (FILLQG) is equivalent to the linearly constrained convex quadratic optimization problem (P2 ). From Theorem 2.1, the optimal solution of (P2 ) is (x ; u) = (x( ); u( )) where (x(); u()) is given by (4.1) and (4.4). Furthermore,  exists and is given by (2.3). From Proposition (4.1), it follows that  is given by (

1 0 K(0)  + d0(; 0)  + 1 p(; 0) + 1  = arg max 0 2 2 2 

T

Z

0 

1 0 K(0)  + d0 (; 0)  + 1 p(; 0) , T c = arg max 0 2 2 6

)

tr fC 0(t)K(t)C(t)g dt , 0 c

since

1 Z T tr fC 0(t)K(t)C(t)g dt 2 0 is independent of . The expression for  is the same as (3.9) and our result follows immediately.

2

It is easy to show that if  results in a feasible (sub-optimal) control of the form (3.4) for the unconstrained LQ control problem (3.1)-(3.3), then the control (4.4) which results from this  is also feasible for the full information LQG problem (4.1)-(4.3). Therefore, if the techniques discussed in [6, 8] are used to calculate the optimal Lagrange multipliers for the constrained LQ problem (3.1)-(3.3), then each sub-optimal value of the Lagrange multipliers can be used in (4.4) to calculate a feasible sub-optimal control for the full information constrained LQG problem.

5 Partial information stochastic case. In addition to the assumptions for the full information stochastic linear sytem (4.1), we shall make the following additional ones for the partial information stochastic linear system. Let H(t) 2 Rpn and G(t) 2 Rpl for each t 2 [0; T]. We assume that H(t) and G(t) depend continuously on t 2 [0; T]. Let fVt : t 2 [0; T ]g be standard Brownian motion and consider the linear system with observation equation dxt = A(t)xt dt + B(t)ut dt + C(t)dWt; x0  N(; 0) dyt = H(t)xt dt + G(t)dVt

(5.1) (5.2)

where in this case, futg 2 U is constrained to belong to the set U = ffutg 2 U : ut is measurable with respect to Gtg where Gt = fys : s 2 [0; t]g is the sub-sigma algebra on generated by fys : s 2 [0; t]g. That is, futg must be non-anticipative with respect to fGtg. Let the cost functional be given by (4.2) and the nc constraint functionals by (4.3). the partial information linearly constrained LQG (PILLQG) can be de ned as follows.

Problem 2 (PILLQG): Find the optimal control policy fut g 2 U which minimizes f(fxt g; futg) such that (fxtg; futg) satis es the linear system (5.1) and the constraints fi (fxtg; futg)  ci (i = 1;    ; nc) where ci 2 R (i = 1;    ; nc ) are given a priori. Denoting x^t = E [xtjyt ; t 2 [0; t]] = E [xtjGt], it is well known from estimation theory [3] that if xt is given by (5.1) and the output measurements yt by (5.2), then x^t is given by the Kalman lter d^xt = A(t)^xtdt + B(t)ut , (t)H(t) (G(t)G0(t)),1 t; x^0 =  where the mean square error (t) = E[(xt , x^t)(xt , x^t)0 ] is given by the ltering Riccati equation _ = A + A0  , PH 0 (GG0),1 HP + CC 0; (0) = 0 and the innovations process ft : t 2 [0; T]g by dt = dyt , H(t)^xtdt 7

(5.3)

We require the following results which we now state without veri cation. The interested reader could consult [3]. First, the optimal state estimate and the optimal state estimate error are orthogonal; that is E [(xt , x^t) x^0t] = 0: Second, the innovations process ftg is Gaussian with E [tx^0t] = 0

E [k ] = 0; It is easily shown that under the assumptions above

E [x0t Q(t)xt] = E [^x0t Q(t)^xt] + tr fQ(t)(t)g and hence, (4.2) becomes f(fxt g; futg) = 21 E

"Z

T 0

#

(^x0tQ(t)^xt + u0t R(t)ut) + x^0T Q(t)^xT + 1

2

T

Z 0

tr fQ(t)(t)g

Note that the terms (t) is deterministic and independent of the input futg. To deal with the constraints, note that when (5.3) is subtracted from (5.1), we have d(xt , x^t) = A(t)(xt , x^t )dt + C(t)dWt + (t)H(t) (G(t)G0(t)),1 t with E[x0] =  and x^0 = . Since fWt g and ftg are Brownian motions, they are both zero mean and it follows that t

Z

E [xt , x^t ] =

0





A(t)E [xt , x^t]dt; E x0 , x^0j,1 = 0

and hence E[xk ] = E[^xkjk,1]. Therefore, the constraint functionals (4.3) becomes fi (fxt g; futg) = =

T

Z

0 Z T 0

= E

(a0i (t)E [xk ] + b0i(t)E [uk ]) dt + c0 E [xT ] (a0i (t)E [^xt ] + b0i (t)E [ut ]) dt + c0E [^xT ]

"Z

T 0

(a0i (t)^xt + b0i (t)ut) dt + c0x^T

#

Therefore, the partial information problem (PILLQG) is equivalent to the full information problem 8 > >


min 12 E

hR

T (^x0 Q(t)^x + u0 R(t)u ) + x^0 Q(t)^x i + 1 R T tr fQ(t)(t)g t t T t t T 2 0

0

hR

i

subject to: E 0T (a0i (t)^xt + b0i (t)ut) dt + c0x^T  ci ; i = 1;    ; nc over the class of functions futg 2 U where x^t and ut are related by the Kalman lter (5.3). The solution of (FIP ) is given in Theorem 4.1. The Separation theorem for the partial information linearly constrained LQG problem follows immediately. > :

Theorem 5.1 If the optimal control for the deterministic LQ optimal control problem (DLCLQ) is given by ut( ) = ,R,1 (t) [B 0 (t)K(t)xt + B 0 (t)d( ; t) + b( ; t)] (5.4) 8

then the optimal control for the partial observation linearly constrained LQG optimal control problem (PILLQG) is given by

ut( ) = ,R,1 (t) [B 0 (t)K(t)^xt + B 0 (t)d( ; t) + b( ; t)]

(5.5)

where fx^tg is the output of the Kalman lter (5.3).

Note once again that for controls of the form (3.4) and (5.4), if  results in a feasible control for the deterministic constrained LQ problem, it also results in a feasible control for the partial information LQG problem. Therefore, if interior-point methods [6, 8] are used to solve the deterministic LQ problem, each sub-optimal Lagrange multiplier can be used to give a feasible control for the partial information LQG problem. In addition, fast convergence to the optimal Lagrange multipliers (and hence, the optimal control) is also guaranteed.

6 Conclusion. In this paper we have solved the continuous time linearly constrained LQ and LQG optimal control problems, and generalized the Separation Theorem to this case. We have shown that the optimal control is determined by solving an unconstrained LQ or LQG problem together with a nite-dimensional, linearly constrained quadratic programming problem. Also, we have shown that the constrained LQG problem may be solved by solving the related constrained LQ problem using new techniques based on interior point methods, with all desirable properties associated with interior point methods carrying over to the constrained LQG problem (namely, fast convergence and feasibility of sub-optimal iterates).

References [1] B.D.O. Anderson and J.B. Moore. Optimal Control: Linear Quadratic Methods (Prentice-Hall, Englewood Cli s, NJ, 1989). [2] A.V. Balakrishnan. Applied Functional Analysis (Springer-Verlag, New York, 1976). [3] M.H.A. Davis. Linear Estimation and Stochastic Control (Chapman Hall, London, 1977). [4] M.H.A. Davis and M. Zervos. A New Proof of the Discrete-Time LQG Optimal Control Theorems. IEEE Trans. Automat. Contr., vol 40 no 8, 1995, pp 1450-1453. [5] L. Faybusovich. Coordinate-free simplex method. Cybernetics, vol 20 no 6, 1984, pp 124-130. [6] L. Faybusovich and J.B. Moore. A long-step path-following algorithm for the convex quadratic programming problem in a Hilbert space. Proc. 34th IEEE Conference on Decision and Control. [7] W. H. Fleming and R.W. Rishel Deterministic and Stochastic Optimal Control (Springer Verlag, New York, 1975). 9

[8] A.E.B. Lim, J.B. Moore and L. Faybusovich. Linearly Constrained LQ and LQG Optimal Control. em Proc. 13th IFAC World Congress (to appear). [9] W.M. Wohnam. On the separation theorem of stochastic control. SIAM J. Control, no 6, 1968, pp 312-326.

10