A Maximum Principle for Optimal Control of Discrete ... - IEEE Xplore

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 4, APRIL 2015

1121

A Maximum Principle for Optimal Control of Discrete-Time Stochastic Systems With Multiplicative Noise Xiangyun Lin and Weihai Zhang, Member, IEEE Abstract—The maximum principle (MP) for the discrete-time stochastic optimal control problems is proved. It is shown that the adjoint equations of the MP are a pair of backward stochastic difference equations. Index Terms—Backward stochastic difference equations, discrete-time stochastic systems, maximum principle. I. I NTRODUCTION In the past decades, the stochastic optimal control problems have been extensively studied, in particular, various results on stochastic maximum principle (MP) have appeared; see [1], [2], [4]–[8], [10]– [12] and the references therein. However, these results are mostly concentrated on the continuous-time stochastic Itô systems. As far as discrete-time stochastic systems with multiplicative noise, most results for MP are analogous to the deterministic systems, which are based on the Lagrange multiplier method [2], [14]. However, as said in [15], from Peng’s MP on optimal control of Itô systems [7], we know that the stochastic MP is different from the deterministic systems, which reflects the stochastic nature of this problem. It can be found that, up to now, few results appear on discrete-time version of Peng’s continuoustime MP. On the other hand, theoretically, the discrete-time backward stochastic difference equations (BSDEs), which are the counterparts of continuous-time BSDEs applied in [7], do not necessarily have adapted solutions (see Remark 3.3) in general. This is our main motivation to study the stochastic MP for discrete-time systems. In this paper, we will discuss the discrete-time stochastic MP associated with the following optimal control problem: minimize the cost functional

N −1

J(v) = E

l(xk , vk ) + Eh(xN )

(1)

N −1 where, v = {vk }k=0 is the admissible control variable with vk ∈ Uk , T N x = {xk }k=0 is the state variable and {ωk = (ωk1 , · · · , ωkd ) }N k=1 are N −1 the given random sequence. If the optimal control u = {uk }k=0 is given, in order to obtain the necessary condition for the optimization problem (1), (2), we extend the spike variation method to the discretetime stochastic systems, and let the perturbed admissible control u ¯k = (1 − δkm )uk + δkm v, v ∈ Um . Since, for the discrete-time systems, the time difference is constant and cannot be infinitesimal. So, Peng’s method in [7], where the variation of J(v) is based on the time difference, cannot be applied to the discrete-time case directly. In order to overcome this difficulty, we assume that v has the form v = um + Δv, where is a positive constant number and Δv is a bounded random variable. Based on this idea, we obtain the variation of the cost functional J(v) and the MP for the optimal control problem (1), (2). Moreover, we derive the adjoint equations of the discrete-time stochastic MP as a pair of BSDEs. For convenience, we adopt the following notations: Rn : the set of all real n-dimensional vectors; Rn×d : the set of n × d real matrices; AT , xT : transpose of a matrix A or vector x; tr(A): trace of a square matrix A; ·, ·: inner product of two vectors or matrices with x, y = tr(xT y) and the corresponding norm |x| = x, x; A ⊗ B: the Kronecker product of two matrices A and B; δij : The Kronecker delta, i.e., δij = 1 when i = j while δij = 0 when i = j; Id : d × d identity matrix.

II. P RELIMINARIES In a given complete probability space (Ω, F, P), let {ωk }k=0,1,···,N be a sequence of F-measurable, Rd -valued random variables and satisfy the following conditions (i, j, k = 0, 1, · · · , N ): (i) ω0 , ω1 , · · · , ωN are independent, and for every ωk =

k=0

subject to

ωk1 , · · · , ωkd

T

, ωk1 , · · · , ωkd are independent R-valued

random variables. xk+1 = g(xk , vk ) + σ(xk , vk )ωk ,

(2)

x0 ∈ Rn , k = 1, 2, · · · , N − 1

(ii)

Eωk = 0d×1 , E ωi ωjT = δij Id .

(3)

Let Fk ⊂ F be the σ-field generated by ω0 , ω1 , · · · , ωk−1 , i.e. Manuscript received June 3, 2013; revised December 3, 2013 and May 21, 2014; accepted July 26, 2014. Date of publication July 31, 2014; date of current version March 20, 2015. This work was supported by the NSF of China (No. 61174078), by the Research Fund for the Taishan Scholar Project of Shandong Province of China, SDUST Research Fund (No. 2011KYTD105), and State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources(Grant LAPS13018). Recommended by Associate Editor Z. Wang. X. Lin is with the College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao 266590, China (e-mail: [email protected]). W. Zhang is with the College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2014.2345243

Fk = σ{ω0 , ω1 , · · · , ωk }, k = 0, 1, · · · , N. and, F−1 = {∅, Ω}(∅ is the empty set). Set F = {Fk }N k=0 . A random sequence y = {yk }N k=0 is called F-predictable, if for every k = 0, 1, · · · , N , the random variable yk is Fk−1 -measurable. Denote L2 (Ω, Fk ; Rn ) the set of all Rn -valued Fk -measurable random variables X with E|X|2 < ∞. Lemma 2.1: (Riesz representation theorem) If f : H → R is a continuous linear functional on the Hilbert space H, then there exists a unique yf ∈ H such that f (x) = yf , xH ,

∀x ∈ H

where ·, ·H denotes the inner product.

0018-9286 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1122


Let

For the given data as φ

pk = E Fk−1 , i=k+1 N T Aik+1 li ωkT Fk−1 , Qk = E

N −1 N −1 = {φk }k=0 , ψ = {ψk }k=0 , φk ∈ L2 (Ω, Fk−1 ; Rn ), 1 2 d ψk = ψk , ψk , · · · , ψk ∈ L2 (Ω, Fk−1 ; Rn×d )

suppose z(φ, ψ) = {zk }N k=0 satisfies the following discrete-time difference equation:

⎧ ⎨ ⎩

zk+1 = (gk zk + φk ) +

d

k = 0, · · · , N − 1.

(4)

j=1

Then, for Qk = (Q1k , Q2k , · · · , Qdk ), we have

where, gk , σkj ∈ L2 (Ω, Fk−1 ; Rn×n ), j = 1, 2, · · · , d, are bounded Rn×n -valued random variables. For each given lk ∈ L2 (Ω, Fk−1 ; Rn ), we consider the following linear functional: I(φ, ψ) = E

lk , zk .

Qjk

N

=E

T Aik+1 li ωkj

i=k+1

HN = L2 (Ω, Fk ; Rn ) × L2 (Ω, Fk ; Rn×d )

⎧

d ⎪ jT j T ⎪ p = E l + g p + σ Q F ⎪ k k+1 k+1 k+1 k+1 k+1 k−1 ⎪ ⎪ j=1 ⎪

⎨ d jT j T T Qk = E lk+1 +gk+1 pk+1 + σk+1 Qk+1 ωk Fk−1 , (6) ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ pN −1 = E[lN |FN −2 ], QN −1 = E[lN ωN −1 |FN −2 ], ⎩ k = 0, 1, · · · , N − 2.

It is easy to see that BSDE (6) has a unique F−predictable solution N −1 and (p, Q) = {(pk , Qk )}k=0 (pk , Qk ) ∈ L2 (Ω, Fk−1 ; Rn ) × L2 (Ω, Fk−1 ; Rn×d ). Lemma 2.2: Suppose (p, Q) is the solution of (6), then the functional I(·, ·) can be uniquely represented as N −1

[pk , φk + Qk , ψk ] .

(7)

k=0

d

T

Proof: Denote Ak = gk + j=1 σkj ωkj , k = 0, 1, · · · , N − 1, then the solution of difference (4) can be solved by zk =

k−1

Aki+1

φi +

i=0

d

ψij ωij

,k ≥ 1

j=1

where, Aki , 1 ≤ i ≤ k ≤ N is defined as Aki = In when i = k, and Aki = Ak−1 Ak−2 · · · Ai when 1 ≤ i < k. Substituting the above zk into (5), we have

N −1

I(φ, ψ) = E

k=0

N

Aik+1

T

l i , φk

i=k+1

+

N

I(φ, ψ) = E

E

N

k=0

T Aik+1 li |Fk−1

i=k+1

+

E

N

i=k+1

k=0

Aik+1

T

αk1 , βk1 ∈ L2 (Ω, Fk ; Rn ), αk2 , βk2 ∈ L2 (Ω, Fk ; Rn×d ), α=

αk1 , αk2

N −1 k=0

,β =

βk1 , βk2

N −1 k=0

∈ HN .

It is easy to verify that I(φ, ψ) is a continuous linear functional on HN , (φ, ψ) ∈ HN and (pk , Qk ), k = 0, 1, · · · , N − 1 satisfy the BSDE (6). By the Riesz representation theorem (Lemma 2.1), there N −1 ∈ HN satisfying (7). This proves the exists a unique {(pk , Qk )}k=0 uniqueness of the representation of (7). III. M AXIMUM P RINCIPLE FOR THE S TOCHASTIC D ISCRETE -T IMES S YSTEMS In this section, we suppose that, in the optimal control problem (1), N −1 , Uk = L2 (Ω, Fk ; Rnv ) (2), the admissible control set U = {Uk }k=0 1 d T N and {ωk = (ωk , · · · , ωk ) }k=1 are the random sequence and satisfy (3). We also assume that Assumption 1: φ is twice continuously differentiable w.r.t. x and v, φx , φxx , φv , φvv , lxx are bounded, where φ = g, σ, h. ωk has the fourth moment, E|ωk |4 < ∞. N −1 is the optimal control of problem Suppose that u = {uk }k=0 (1), (2). For fixed integer 0 ≤ m ≤ N − 1, construct the perturbed admissible control u ¯k = (1 − δkm )uk + δkm v, v ∈ Um . In the rest of this paper, we let v have the form of v = um + Δv, > 0, where Δv is an arbitrary random variable with values in Um ⊂ L2 (Ω, Fm ; Rnv ), such that (8)

ω∈Ω

Then u ¯k = uk + δkm Δv. Let x ¯ = {¯ xk }N k=0 be the solution of (2) corresponding to the control u ¯ with the initial value x ¯0 = x0 . Then, we have the following lemmas. Lemma 3.1: Under Assumption 1, the following hold:

, φk

li ωkT |Fk−1 , φk

where

sup |Δv(ω)| < ∞.

αk1 , βk1 + αk2 , βk2

k=0

.

By the smoothing property of conditional expectation, we have

N −1

α, βHN = E

T Aik+1 li ωkT , ψk

i=k+1

N −1

N −1

with the inner product as

Construct a pair of discrete-time BSDEs as follows:

Fk−1 , j = 1, · · · , d.

Denote the Hilbert space

(5)

k=0

I(φ, ψ) = E

T Aik+1 li

i=k+1

(σkj zk + ψkj )ωkj ,

z0 = 0, k = 0, 1, · · · , N − 1,

N

N

xk − xk |2 ≤ C1 2 E|Δv|2 sup E|¯

(9)

0≤k≤N

.

xk − xk |4 ≤ C1 4 E|Δv|4 . sup E|¯ 0≤k≤N

(10)


Let y = {yk }N k=0 be the solution of the following difference equation

1123

Since

(11) we have

E|Zk+1 |2 ≤ C3 E|Zk |2 + C3 4 E|Δv|4 ,

xk − xk − yk |2 ≤ M 4 E|Δv|4 , sup E|¯

E|Zm+1 |2 ≤ C3 4 E|Δv|4 , k = m + 1, · · · , N − 1.

(12)

By iteration, the inequality (12) can be shown. By Lemma 3.2, we see that yk is the first-order variation of xk . So, (11) is called the first-order variational equation of (2). Similarly, we can verify that the variation of J(v) at u is

0≤k≤N

with M > 0 being a constant. ¯k − xk , then x ˆ0 = 0 and Proof: Denote x ˆk = x

x ˆk+1 = gx (xk , uk )ˆ xk + φ1k + δkm gv (xm , um )Δv + φ2m +

d 2 Eσxj (xk , uk )Zk +ψk1,j j=1

Then, we obtain the following result. Lemma 3.2: Under Assumption 1, we have

d

2

E|Zk+1 |2 = Egx (xk , uk )Zk +φ1k +

yk+1 = [gx (xk , uk )yk + δkm gv (xm , um )Δv] +[σx (xk , uk )yk + δkm σv (xm , um )Δv]ωk+1 y0 = 0.

N −1

2,j

σxj (xk , uk )ˆ xk + ψk1,j + δkm σvj (xm , um )Δv + ψm

jk

LJ(u) = E

[lx (xk , uk ), yk + δkm lv (xm , um ), Δv]

k=0

j=1

+ E hx (xN ), yN .

where

⎡ ϕ (u, v) x1 1 ⎢ ϕ2x1 (u, v) ϕx (u, v) = ⎢ .. ⎣

. ϕxn1 (u, v)

ϕ1x2 (u, v) ϕx2 (u, v) 2 .. . ϕx2 (u, v)n

··· ··· ···

We take the Hamiltonian function as

⎤

ϕ1xn (u, v) ϕx2n (u, v) ⎥ ⎥ .. ⎦ . n ϕxn (u, v)

H(x, v, p, Q) = l(x, v) + p, g(x, v) + x ∈ R ,v ∈ R

0 .. .

···

n ˆ k , uk ) ϕn xx (xk + θk x

⎦

0 .. .

···

˜n ˆm ) ϕn vv (xk , uk + θk u

⎦.

In above notations, 1n = (1, 1, · · · , 1)T ∈ Rn , ϕ = g, σ j , j = 1, · · · , d and ϕ = (ϕ1 , ϕ2 , · · · , ϕn )T , and all θ, such as θki , θ˜ki , satisfy 0 < θ < 1, and u ˆm = Δv. By the boundness of ϕxx and Lemma 3.1, we have the following: E|φ1k |2 E|φ2m |2 E|ψk1,j |2 2,j 2 E|ψm |

4

Hv (xm , um , pm , Qm ) = 0

Zk+1 = gx (xk , uk )Zk + φ1k +

d

a.s.

(15)

2,j j ψm m , k = m + 1, · · · , N − 1.

N −1

LJ(u) = E

σxj (xk , uk )Zk + ψk1,j ωk ,

j=1

j=1

(14)

Proof: By Lemma 2.2, we have

≤ C2 4 E|Δv|4 , k ≥ m + 1, j = 1, · · · , d.

d

k=0

4

≤ C2 4 E|Δv|4 ,

N −1

to BSDE (13) satisfy the following equality:

Denote Zk = x ¯k − xk − yk , then Zk = 0 when k = 0, 1, · · · , m. For k ≥ m + 1, we have

Zm+1 = φ2m +

(pk , Qk ) ∈ L2 (Ω, Fk ; Rn ) × L2 (Ω, Fk ; Rn )

4

≤ C2 E|Δv| , 4

(13)

Now, we can obtain our main result, which is a necessary condition, called the MP of the optimal control problem (1), (2). N −1 Theorem 3.1: Under Assumption 1, if ({xk }N k=0 , {uk }k=0 ) are solutions of the optimal control problem (1), (2), then the solutions

≤ C1 E|ˆ xk | ≤ C2 E|Δv| , 4

Q = (Q , · · · , Qd ) ∈ Rn×d . 1

k = 0, 1, · · · , N − 2.

⎤

···

,p ∈ R , n

⎧ pk = E lx (xk+1 , uk+1 ) + gxT (xk+1 , uk+1 )pk+1 ⎪ ⎪ ⎪ ⎪ d ⎪ ⎪ j jT ⎪ + σx (xk+1 , uk+1 )Qk+1 |Fk−1 ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Qk = E lx (xk+1 , uk+1 ) + gxT (xk+1 , uk+1 )pk+1 $ d j jT T ⎪ ⎪ + σ (x , u )Q ω |F k+1 k+1 k−1 x k k+1 ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ ⎪ pN −1 = E[hx (xN )|FN −2 ] ⎪ ⎪ ⎪ ⎪ Q = E[hx (xN )ωN −1 |FN −2 ] ⎪ ⎪ ⎩ N −1

⎤

···

nv

Construct the discrete-time BSDE as

and ϕvv (xk , uk + θk u ˆ ) ⎡ 1m ϕvv (xk , uk + θ˜k1 u ˆm ) .. ⎣ = . 0

Qj , σ j (x, v) ,

j=1 n

and ϕv has the similar meaning. Moreover, φ1k = (ˆ xTk ⊗ In )gxx × j ˆk , uk )(ˆ xk ⊗ 1n ), ψk1,j = (ˆ xTk ⊗ In )σxx (xk + θkj x ˆ k , uk ) × (xk +θk x (ˆ xk ⊗ 1n ) when k ≥ m + 1, and φ1k = 0 when k ≤ m. φ2m = 2,j = 2 (Δv T ⊗ 2 (Δv T ⊗ In )gvv (xm , um + θ˜m Δv)(Δv ⊗ 1n ), ψm j j (xm , um + θ˜m Δv)(Δv ⊗ 1n ), where In )σvv ϕxx (xk + θk x ˆ k , uk ) ⎡ 1 ϕxx (xk + θk1 x ˆ k , uk ) .. ⎣ = . 0

d

[pk , δkm gv (xm , um )Δv

k=0

+ Qk , δkm σv (xm , um )Δv + δkm lv (xm , um ), Δv] . So LJ(u) = E Hv (xm , um , pm , Qm ), Δv . Since u is the optimal control, it follows LJ(u) = 0.

1124


This implies E Hv (xm , um , pm , Qm ), Δv = 0,

∀Δv ∈ U.

(16)

generated by {ωk }. Considering the one-step BSDE of (23). Let N = 1, f1 (y, z) = y, y, z ∈ R, and ω0 takes values in {− 3/2, 0, 3/2} with probability

For every integer K > 0, let Δv = Hv (xm , um , pm , Qm )1{|Hv (xm ,um ,pm ,Qm )|

A Maximum Principle for Optimal Control of Discrete ... - IEEE Xplore

A Maximum Principle for Optimal Control of Discrete ... - IEEE Xplore

Suggest Documents

Maximum principle for optimal control of sterilization of ... - IEEE Xplore

A MAXIMUM PRINCIPLE FOR OPTIMAL CONTROL PROBLEMS

THE STOCHASTIC MAXIMUM PRINCIPLE IN OPTIMAL CONTROL ...

A Maximum Principle for Hybrid Optimal Control Problems with ...

Optimal Stochastic Control of Discrete-Time Systems ... - IEEE Xplore

Research Article Maximum Principle for Optimal Control Problems of

Maximum principle for optimal control of ... - Semantic Scholar

Stochastic maximum principle for optimal control of SPDEs

Maximum principle for optimal control of forward-backward doubly ...

ON THE DISCRETE MAXIMUM PRINCIPLE FOR THE

Discrete-Time Nonlinear Systems Inverse Optimal Control - IEEE Xplore

Discrete-Time Optimal Reset Control Design and ... - IEEE Xplore

DISCRETE MAXIMUM PRINCIPLE FOR PRISMATIC FINITE

A MAXIMUM PRINCIPLE FOR STOCHASTIC CONTROL ... - CiteSeerX

Maximum principle for the general optimal control ... - Semantic Scholar

The Maximum Principle for Optimal Control Problems ... - Springer Link

A Sensorless Scalar-Control Strategy for Maximum ... - IEEE Xplore

Discrete maximum principle and a Delaunay-type mesh condition for

DISCRETE MAXIMUM PRINCIPLE AND ADEQUATE ... - CiteSeerX

Optimal State Estimation for Discrete-Time Markovian ... - IEEE Xplore

Analysis of Maximum Traffic Intensity and Optimal ... - IEEE Xplore

A maximum principle for a stochastic control problem with multiple ...

Vector Control Strategy for Maximum Wind-Power ... - IEEE Xplore

Optimal Control Approaches for Analysis of Energy Use ... - IEEE Xplore