The LQGP Problem: Theory and Computation 1

8 downloads 0 Views 177KB Size Report
Mar 27, 1996 - ... Computer Science. University of Illinois at Chicago. 851 Morgan St.; Room 322 SEO; M/C 249. Chicago, IL 60607-7045 ... dWi(t), for i = 1 to r.
The LQGP Problem: Theory and Computation F. B. Hanson and J. J. Westman Laboratory for Advanced Computing Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago 851 Morgan St.; Room 322 SEO; M/C 249 Chicago, IL 60607-7045 e-mail: [email protected] [email protected] 35th CDC March 27, 1996

Abstract. The Linear Quadratic Gaussian Poisson (LQGP) problem denotes an optimal control problem with linear dynamics and quadratic costs with both Gaussian and Poisson noise disturbances. The LQGP problem provides a benchmark model with sucient complexity while permitting formal solutions for testing both theoretical and computational methods. The problem is examined in this paper.

1. Introduction

The linear dynamics, quadratic performance, Gaussian noise and Poisson noise or LQGP problem, has its dynamics governed by the stochastic di erential equation (SDE)

dX(t) = A(t)X(t)dt + B (t)U(t)dt + G(t)dW(t) (1) + [H1 (t)X(t)]d 1 (t) + [H2 (t)U(t)]d 2 (t) + H3 (t)d3 (t); for general Markov processes in continuous time, with m  1 state vector X(t), n  1 control vector U(t), r  1 Gaussian noise vector dW(t), and q`  1 space-time Poisson noise vector d `(t), for ` = 1 to 3. The dimensions of the respective coecient matrices are: A(t)Pis m  m, B (t) is m  n, G(t)Pis m  r, while the H`(t) are dimensioned, so that [H1 (t)x] = [ k H1ijk (t)xk ]mq1 , [H2 (t)u] = [ k H2ijk (t)uk ]mq2 and H3 (t) = [H3ij (t)]mq3 . Note that the space-time Poisson terms are formulated to maintain the linear nature of the dynamics, but the rst two are actually bilinear in either X or U and d` for ` = 1 or 2, respectively. Also, the brackets around the coecients, [H1 (t)x] and [H2 (t)u] indicate nonstandard linear algebra or matrix forms. Analogous terms can be used to generalize the Gaussian coecient, G(t). This is necessary so that a modi cation of the LQG analysis (see Bryson and Ho [2] or Lewis [7]) will work from the dynamic programming point of view. The LQGP problem with linear dynamics in the form of (1) has its origins in an early set of lecture notes by Wonham [11] on random di erential equations. A similar formulation,but including  Work supported in part by the National Science Foundation Grants DMS-9102343 and DMS-9301107.

1

jumps in the system parameters along with Poisson disturbances such as in (1), is discussed by Mariton [8] in his monograph on optimal control of linear systems with random jumps in system parameters. The latter problem with quadratic costs is more often is referred to as the Jump LQG or JLQG problem. The LQGP problem could also be considered as a JLQG problem, since Poisson term introduces a discrete jump in the state, superimposed on the continuous, yet nonsmooth Gaussian noise contribution. However, the JLQG problem covers noise other than Poisson and there is quite a di erence between the Markov chain noise introduced through system parameters and disturbances introduced as a term in the dynamics as in (1). The more general and current name for combinations of discrete (e.g., Markov chains or Poisson noise) and continuous systems is hybrid systems. The machinery for these generalizations is found in Hanson and Ryan [5] and related papers, where a concrete, constructive proof is given for the equivalency between the state mean and the quasi-deterministic mean derived form the in nitesimal mean and variance for general continuous time Markov noise with linear amplitudes. In Section 2 the rst and second moments of the SDE (1) are derived. In Section 3 the quadratic performance index and stochastic dynamic programming formulation are described. In Section 4 the formal optimal solution along with the Riccati equation are derived. In Sections 5 and 6, a computational example for a k-stage manufacturing model is considered.

2. In nitesimal Conditional Means and Variances

The Gaussian white noise term, dW(t), consists of r independent, standard Wiener processes dWi (t), for i = 1 to r. These Gaussian components have zero in nitesimal mean and diagonal covariance, Mean[dW(t)] = 0r1 and Covar[dW(t); dWT (t)] = Ir dt:

(2)

For cross-correlated noise, see Stengel [10]. We have left the Gaussian noise coecient independent of both state and control vectors for simplicity, but G(t) could well be generalized like those of the Poisson noise coecients, [H1 (t)x], [H2 (t)x] and H3 (t). The space-time Poisson noise terms, d ` (t) = [d`;i (t)]q 1 , consist of q` independent di erentials of space-time Poisson processes each. They are related to Poisson random measure, P`;i (dt; dz), formulation (see Gihman and Skorohod [4]): `

d`;i (t) =

Z

Z

`;i

zP`;i(dt; dz);

(3)

where z is the Poisson vector amplitude random variable of the d ` (t) vector processes, for ` = 1 to 3. These have mean Mean[d ` (t)] = ` (t)dt

Z

Z

`

z`(z; t)dz  `(t)Z`(t)dt;

(4)

where ` (t) is the Poisson rate and ` (z; t) is the density of the vector amplitude marks, and have variance Var[d ` (t)] = ` (t)dt

Z

Z

`

(z ? Z` )(z ? Z` )T ` (z; t)dz  ` (t)` (t)dt: 2

(5)

If the mark vector density does not exist even in terms of generalized functions, than ` (z; t)dz could be easily replaced by the corresponding distribution `(dz; z; t). Also, note that the mark distributions have not been assumed to be zero mean distributions, i.e., Z` 6= 0 for each `, so that there will be some added complexity of the formal solution as compared to the prior solutions of Wonham [11] or Mariton [8]. It is further assumed that all of the individual component terms of the Gaussian noise are independent of all of the Poisson processes, Covar[dW(t); d T` (t)] = 0rq ; (6) for all `. The j th jump of the `th space-time Poisson process at time t`;j causes the following jump in the state: 9 8 ? > = < [H1 (t`;j )X(t`;j )]z`;j ; ` = 1 > (7) X(t+`;j ) ? X(t?`;j ) = >: [H2 (t`;j )U(t?`;j )]z`;j ; ` = 2 >; : H3(t`;j )z`;j ; `=3 From the above statistical properties of the stochastic processes, dW and d ` , it follows that the conditional in nitesimal expectation of the state is (8) Mean[dX(t) j X(t) = x; U(t) = u] = [A(t)x + B (t)u + 1 (t)[H1 (t)x]Z1 + 2 (t)[H2 (t)u]Z2 + 3 (t)H3 (t)Z3 ]dt; and the conditional in nitesimal variance, `

h

Covar[dX(t) j X(t) = x; U(t) = u] = (GGT )(t) + 1 (t)[H1 (t)x]1 (t)[H1 (t)x]T + 2 (t)[H2 (t)u]2 (t)[H2 (it)u]T + 3 (t)H3 (t)3 (t)H3T (t) dt:

(9)

The conditional in nitesimal mean (8) and covariance (9) are the fundamental moments needed for application modeling.

3. Quadratic Performance Index and Dynamic Programming

A performance index is a measure of how optimal the performance of a system is, e.g., a queueing control model may use of the fraction of time that the servers are busy as a performance index. The performance index is an essential component in modeling a physical reality since it is used to determine the optimal control law or policy and the form of the optimal solution. The quadratic performance index is given in the quadratic, cost-to-go form Zt (10) V (X; U; t) = 21 XT (tf )S (tf )X(tf ) + 12 d [XT Q( )X( ) + UT ( )R( )U( )]; t where the time horizon is tf  t. The cost matrices are Q which is a symmetric positive semide nite m  m matrix, R which is a symmetric positive de nite n  n matrix, and S which is a symmetric positive semi-de nite m  m. A nontrivial terminal cost matrix S is essential for a nontrivial solution for this problem. The nal condition, (11) V (X(tf ); U(tf ); tf ) = 21 XT (tf )S (tf )X(tf ) f

3

yields the salvage cost. For simplicity, the quadratic performance index (10) was selected to extend the results of the LQG problem. A more general form of the instantaneous quadratic cost function can be found in Naimipour and Hanson [9]. The optimal, expected performance, v(x; t), is de ned as:

v(x; t)  Min [ Mean [V (X; U; t) j X(t) = x; U(t) = u]]: u(t;t ] P;W(t;t f

f]

(12)

The only other restrictions on the state and control are that they belong to the admissible classes for the state and control, respectively. Upon applying the principle of optimality to the optimal, expected performance index, (12,10), and the chain rule for Markov stochastic processes in continuous time, dv(X(t); t) = v(X(t + dt); t + dt) ? v(X(t); t) T = @v @t (X(t); t) + (A(t)X(t) + B(t)U(t)) rx v(X(t); t) + 12 (GGT )(t) : rx rTx v(X(t); t) dt + (GdW)T (t)rx v(X(t); t) (13) Z + [v(X(t) + [H1 (t)X(t)]z; t) ? v(X(t); t)] P1 (dt; dz) + +

ZZ1

ZZ2

Z3

[v(X(t) + [H2 (t)U(t)]z; t) ? v(X(t); t)] P2 (dt; dz) [v(X(t) + H3 (t)z; t) ? v(X(t); t)] P3 (dt; dz);

yields the partial di erential equation of stochastic dynamic programming, where

A:B=

XX

i

j

Ai;j Bi;j = Trace[AB ];

assuming symmetric matrices. The last three terms of (13) represents the total contributions of the jumps in the state from the three types of Poisson processes. Hence, the partial di erential equation of stochastic dynamic programming is h Min ( x ; t ) + (A(t)x + B (t)u)T rx v(x; t) 0 = @v @t u (14) + 21 (GGT )(t) : rxrTx v(x; t) + 21 xT Q(t)x + 12 uT R(t)u Z + 1 (t) [v(x + [H1 (t)x]z; t) ? v(x; t)] 1 (z; t)dz + 2 (t) + 3 (t)

ZZ1

Z Z 2 Z3

[v(x + [H2 (t)u]z; t) ? v(x; t)] 2 (z; t))dz 

[v(x + H3 (t)z; t) ? v(x; t)] 3 (z; t)dz ;

The backward partial di erential equation (PDE) (14) is known as the Hamilton-Jacobi-Bellman equation. It is subject to the nal condition: (15) v(x; tf ) = 21 xT S (tf )x: 4

The argument of the minimum function is called the Hamiltonian of the system.

4. Formal Solution

To solve (14), assume a modi cation solution of the form for a LQG problem (for the usual LQG, see Bryson and Ho [2] or Lewis [7]): Zt 1 1 T T v(x; t) = 2 x S (t)x + D (t)x + E (t) + 2 d (GGT )( ) : S ( ): (16) t The modi cation of the assumed solution form takes into the account that the Poisson mark distribution are not centered, i.e., not zero mean amplitudes, so a linear term with time dependent coecient DT (t) and a state-control independent term E (t) have been included along with an inhomogeneous type integral term to take care of the Gaussian contribution. The nal condition (15) is satis ed, provided that D(tf ) = 0 and E (tf ) = 0, since quadratic form coecient S (t) has been chosen as a backward extension of the nal value S (tf ) for t < tf . Substituting the ansatz (16) and its derivatives into the PDE of stochastic dynamic programming (14) and simplifying yields:   1 1 1 T T _ 0 = x 2 S (t) + A (t)S (t) + 2 Q(t) + 2 1 (t)C1 (t) x iT h + D_ (t) + 1 (t)ZH1T (t)D(t) + 3 (t)S (t)H3 (t)Z3 (t) x    (17) + E_ (t) + 21 3 (t) H3T (t)S (t)H3 (t) : 3 (t)    T   T + 12 3 (t) H3 (t)Z3 (t) S (t) H3 (t)Z3 (t) + 3 (t) H3 (t)Z3 (t) D(t)  + Min 21 uT (R(t) + 2 (t)C2 (t)) u u f

+ where

uT







B T (t) + 2 (t)ZH2T (t) S (t) x + 2 (t)ZH2T (t)D(t)

h

T 

;

i

C1(t)  [H1T (t)]i S (t)[H1 (t)]j : 1 (t) mm + ZH1T (t)S (t)ZH1 (t) + 2S (t)ZH1 (t); h

i

C2 (t)  [H2T (t)]i S (t)[H2 (t)]j : 2 (t) mm + ZH2T (t)S (t)ZH2 (t);

(18) (19)

and

# q i "X h T T ZH`(t)  Z` (t)H` (t) = Z `;k H`;i;k;j k=1 mm `

; for ` = 1 to 2; m1 = m; m2 = n: `

fCaution: If H` = [H`;i;j;k]mq m , then H`T = [H`;j;i;k]q mm . The usual transpose array

algebra, (AT B T )T = BA, does not apply to [ZT` H`T ]T for ` = 1 to 2, since the triple subscript bracket array construct in not standard with transposition causing only a change in the rst two subscripts.g Note that the state dependent solution form unknowns, S (t) and D(t) appear in the `

`

`

5

`

above coecient de nitions. The added complexity of the space-time Poisson distribution means added realism for the model. The unconstrained, optimal control or regular control, u ,is found by determining the critical point of the argument of the minimum in (17): h

i

u (t) = ? R(t) +  (t)Ce (t) ? 2

1

h





i

B T (t) + 2 (t)ZH2T (t) S (t) x + 2 (t)ZH2T (t)D(t) ; (20)

2

where Ce` = (C` + C`T ) is the symmetrized quadratic form coecient of C` , for ` = 1 to 2. Since the matrix R is positive de nite, R?1 exists, presumably extendible to [R + 2 Ce2 ], and u will be a minimum location. Note that the optimal control in (20) is also feedback control. However, this feedback control is not purely linear in the state x as it would be without the nonzero mean Poisson amplitude, but is an ane function of the state, i.e., a linear function plus state independent part. Using the optimal control (20) in (17) as well as the fact that x is an arbitrary member in the admissible class yields the backward, quasi-Riccati matrix equation for S (t): 0mm = S_ (t) + AT (t)S (t) + A(t)S (t) + Q(t) + 1 (t)Ce1 (t) (21) h



? B T (t) +  (t)ZH T (t) S (t)  S_ (t) ? S (S (t); t); 2

iT h

R(t) + 2 (t)Ce2 (t)

2

i?1 h



B T (t) + 2 (t)ZH2T (t) S (t)

i

where S (tf ) = Sf is speci ed. This is a not a Riccati matrix equation since the unknown quadratic form coecient S (t) appears in a highly nonlinear way in the inverse term [R + 2 Ce2 ]?1 , though linearly in the coecient C2 (t) of (19). Hence the Riccati matrix equation is replaced by a more general, nonlinear matrix di erential equation. In addition, there are the backward equations for the auxiliary coecients:   (22) 0m1 = D_ (t) + 1(t)ZH1T (t)D(t) + 3(t) SH3Z3 (t) h



ih

?  (t) B T (t) +  (t)ZH T (t) S (t) R(t) +  (t)Ce (t)  D_ (t) ? D(D(t); S (t); t); 2

and

2

2

2

2

i?1

ZH2T (t)D(t)

 T      (23) 0 = E_ (t) + 12 3 (t) (H3T SH3 : 3 (t) + 12 3 (t) H3 Z3 S H3 Z3 (t)  h i?1 T  + 3 (t) H3 Z3 D (t): ? 21 22 (t)DT (t)ZH2 (t) R(t) + 2 (t)Ce2 (t) ZH2T (t)D(t):  E_ (t) ? E (E (t); D(t); S (t); t);

Since the nal condition for the optimal, expected value v(x; t) is a pure quadratic form, the auxiliary function satisfy trivial nal conditions: D(tf ) = 0m1 and E (tf ) = 0. It is assumed that the nonlinear matrix di erential equation (21) for S (t) is solved rst and the resulting approximation for S (t) substituted into equation (22) for D(t), which is then solved, and then both results for S (t) and D(t) are substituted into equation (23) for the state-control independent term E (t). Equations (21, 22, 23) are uni-directionally coupled. Since S (t) is a symmetric matrix by being de ned with a quadratic form, only a triangle part of S (t) need be solved, or n  (n + 1)=2 component equations. Thus, for the whole coecient set fS (t); D(t); E (t)g, only n  (n +1)=2+ n +1 component equations need to be solve, so that for large n the count is O(n2 =2), asymptotically, which is the same order of e ort in getting the triangular part of S (t). 6

Notice that the assumed form of the optimal value (16) depends quite explicitly on the covariance of the Gaussian noise, but that the dependence on the Poisson noise is through the Riccati equation (21) and the associate auxiliary coecients in (22) and (23). The main di erence is due only to the di erence in the noise representation, in that, we only assumed state-control independent Gaussian noise.

5. Computational Example: k-stage Manufacturing Model

Consider a system of k stages for manufacturing an item during one planning period. The number of machines at each stage is n` , for ` = 1 to k. For every stage, each machine is considered identical and is subject to failure. The mean time until any machine fails is exponentially distributed. Once a machine fails, it is assumed that repairs will not complete during the planning period. Each stage is assumed to be independent, except for the restriction that the aggregate production at stage `, a` (t), is used as the initial material at stage ` + 1. The a` (t) represents the number of pieces that have gone through ` stages of the manufacturing process which are not defective and are ready to go on to stage ` + 1. Gaussian noise is used to model the number of defective pieces at each stage. See Boukas and Haurie [1], for instance, for a similar model, but with age dependence and multiple planning stages. The stage production rates are x` (t) = [x`;i (t)]n 1 , where n` is the number of machines, and the state aggregate production level vector is a(t) = [ai (t)]k1 , where k is the number of stages. The global, augmented state vector is X(t) = [x1 (t); : : : ; xk (t); a(t)]T . The demand at each stage, d` , is assumed to be known. The stage control vector, u` (t) = [u`;i (t)]n 1 , regulates the production levels of all n` machines for ` = 1 to k stages. The x`;i(t) represents the fraction of time that machine i in stage ` is busy, where u`;i (t) is the control for that particular machine. For every stage, ` = 1 to k, the de ning SDE for x` (t) is: `

`

dx` (t) = B`(t)u` (t)dt + [H`(t)x` (t)] d `(t);

(24)

where B` (t) is n`  n` diagonal matrix with diagonal elements B`;i;i(t) = b`;i(t), [H` (t)x` (t)] is n`  n` diagonal matrix with diagonal elements [H`(t)x` (t)]i;i = ?b`;i(t)(x`;i (t) + u`;i(t)), and d `(t) = [d`;i(t)]n 1 is a Poisson noise vector which assumes the value 1 if machine i fails, otherwise is 0. The b`;i (t) is 0 if machine i has failed prior to time t, and is 1 otherwise. The terms on the left hand side of (24) represent the change due to the control and the random failure of a machine. The form of the coecients of the Poisson noise is selected so that, if a machine fails, the fraction of time that machine is busy is identically 0. For every stage ` = 1 to k, the de ning SDE for a` (t) is: `

da` (t) = (a` (t) ? d` )dt + P` dt

? P`

n X `

j =1

n X `

j =1

b`;j (t) (x`;j (t) + u`;j (t)) + g(t)dW` (t)

(25)

b`;j (t)(x`;j (t) + u`;j (t))d`;j (t);

where P` is the maximum production capacity per unit time of any machine in stage ` and dW` (t) is the Gaussian noise associated with stage ` with amplitude g(t). The terms on the left hand side of (25) represent the change due to consumption of the aggregate production at stage ` + 1, the 7

amount produced, the random number of defective pieces produced, and the amount not produced due to machine failure, respectively.

6. Numerical Results for the Manufacturing Model

By the time of the conference, we plan to have a computational results concerning the manufacturing application.

7. Conclusions

These results should be of great interest because the LQGP problem is a theoretical model and a canonical computational model having several exact analytical results, except for solving the LQGP Riccati equation (21). Also, the problem will be useful for severe testing of physical devices in computer experiments as well as investigating large uctuations in nancial markets, because the Poisson random component is useful for modeling large random uctuations, whereas Gaussian noise is useful for relatively milder background uctuations.

References [1] E.-K. Boukas and A. Haurie, \Manufacturing ow control and preventive maintenance: A stochastic control approach," IEEE Trans. Autom. Control, vol. 35, pp. 1024-1031, Sept. 1990. [2] A. E. Bryson and Y. Ho, Applied Optimal Control, Ginn, Waltham, 1969. [3] J. J. Florentin, \Optimal control of systems with generalized Poisson inputs," ASME Trans. 85D (J. Basic Engr. 2), pp. 217-221, 1963. [4] I. I. Gihman and A. V. Skorohod, Stochastic Di erential Equations, Springer-Verlag, New York, 1972. [5] F. B. Hanson and D. Ryan, \Mean and quasi-deterministic equivalence for linear stochastic dynamics," Math. Biosci., vol. 93, pp. 1-14, 1989. [6] D. E. Kirk, Optimal Control Theory, Prentice-Hall, Englewood Cli s, New Jersey, 1970. [7] F. L. Lewis, Optimal Estimation with an Introduction to Stochastic Control Theory, Wiley, New York, 1986. [8] M. Mariton, Jump Linear Systems in Automatic Control, Marcel Dekker, New York, 1990. [9] K. Naimipour and F. B. Hanson, \Numerical Convergence for the Bellman Equation of Stochastic Optimal Control with Quadratic Costs and Constraints," Dynamics and Control 3, pp. 237-259, 1993. [10] R. F. Stengel, Stochastic Optimal Control, Wiley, New York, 1986. [11] W. M. Wonham, \Random Di erential Equations in Control Theory," in Probabilistic Methods in Applied Mathematics, vol. 2, edited by A. T. Bharucha-Reid, Academic Press, New York, pp. 131-212, 1970. 8

Suggest Documents