IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 4, APRIL 2015
1121
A Maximum Principle for Optimal Control of Discrete-Time Stochastic Systems With Multiplicative Noise Xiangyun Lin and Weihai Zhang, Member, IEEE Abstract—The maximum principle (MP) for the discrete-time stochastic optimal control problems is proved. It is shown that the adjoint equations of the MP are a pair of backward stochastic difference equations. Index Terms—Backward stochastic difference equations, discrete-time stochastic systems, maximum principle. I. I NTRODUCTION In the past decades, the stochastic optimal control problems have been extensively studied, in particular, various results on stochastic maximum principle (MP) have appeared; see [1], [2], [4]–[8], [10]– [12] and the references therein. However, these results are mostly concentrated on the continuous-time stochastic Itô systems. As far as discrete-time stochastic systems with multiplicative noise, most results for MP are analogous to the deterministic systems, which are based on the Lagrange multiplier method [2], [14]. However, as said in [15], from Peng’s MP on optimal control of Itô systems [7], we know that the stochastic MP is different from the deterministic systems, which reflects the stochastic nature of this problem. It can be found that, up to now, few results appear on discrete-time version of Peng’s continuoustime MP. On the other hand, theoretically, the discrete-time backward stochastic difference equations (BSDEs), which are the counterparts of continuous-time BSDEs applied in [7], do not necessarily have adapted solutions (see Remark 3.3) in general. This is our main motivation to study the stochastic MP for discrete-time systems. In this paper, we will discuss the discrete-time stochastic MP associated with the following optimal control problem: minimize the cost functional
N −1
J(v) = E
l(xk , vk ) + Eh(xN )
(1)
N −1 where, v = {vk }k=0 is the admissible control variable with vk ∈ Uk , T N x = {xk }k=0 is the state variable and {ωk = (ωk1 , · · · , ωkd ) }N k=1 are N −1 the given random sequence. If the optimal control u = {uk }k=0 is given, in order to obtain the necessary condition for the optimization problem (1), (2), we extend the spike variation method to the discretetime stochastic systems, and let the perturbed admissible control u ¯k = (1 − δkm )uk + δkm v, v ∈ Um . Since, for the discrete-time systems, the time difference is constant and cannot be infinitesimal. So, Peng’s method in [7], where the variation of J(v) is based on the time difference, cannot be applied to the discrete-time case directly. In order to overcome this difficulty, we assume that v has the form v = um + Δv, where is a positive constant number and Δv is a bounded random variable. Based on this idea, we obtain the variation of the cost functional J(v) and the MP for the optimal control problem (1), (2). Moreover, we derive the adjoint equations of the discrete-time stochastic MP as a pair of BSDEs. For convenience, we adopt the following notations: Rn : the set of all real n-dimensional vectors; Rn×d : the set of n × d real matrices; AT , xT : transpose of a matrix A or vector x; tr(A): trace of a square matrix A; ·, ·: inner product of two vectors or matrices with x, y = tr(xT y) and the corresponding norm |x| = x, x; A ⊗ B: the Kronecker product of two matrices A and B; δij : The Kronecker delta, i.e., δij = 1 when i = j while δij = 0 when i = j; Id : d × d identity matrix.
II. P RELIMINARIES In a given complete probability space (Ω, F, P), let {ωk }k=0,1,···,N be a sequence of F-measurable, Rd -valued random variables and satisfy the following conditions (i, j, k = 0, 1, · · · , N ): (i) ω0 , ω1 , · · · , ωN are independent, and for every ωk =
k=0
subject to
ωk1 , · · · , ωkd
T
, ωk1 , · · · , ωkd are independent R-valued
random variables. xk+1 = g(xk , vk ) + σ(xk , vk )ωk ,
(2)
x0 ∈ Rn , k = 1, 2, · · · , N − 1
(ii)
Eωk = 0d×1 , E ωi ωjT = δij Id .
(3)
Let Fk ⊂ F be the σ-field generated by ω0 , ω1 , · · · , ωk−1 , i.e. Manuscript received June 3, 2013; revised December 3, 2013 and May 21, 2014; accepted July 26, 2014. Date of publication July 31, 2014; date of current version March 20, 2015. This work was supported by the NSF of China (No. 61174078), by the Research Fund for the Taishan Scholar Project of Shandong Province of China, SDUST Research Fund (No. 2011KYTD105), and State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources(Grant LAPS13018). Recommended by Associate Editor Z. Wang. X. Lin is with the College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao 266590, China (e-mail:
[email protected]). W. Zhang is with the College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2014.2345243
Fk = σ{ω0 , ω1 , · · · , ωk }, k = 0, 1, · · · , N. and, F−1 = {∅, Ω}(∅ is the empty set). Set F = {Fk }N k=0 . A random sequence y = {yk }N k=0 is called F-predictable, if for every k = 0, 1, · · · , N , the random variable yk is Fk−1 -measurable. Denote L2 (Ω, Fk ; Rn ) the set of all Rn -valued Fk -measurable random variables X with E|X|2 < ∞. Lemma 2.1: (Riesz representation theorem) If f : H → R is a continuous linear functional on the Hilbert space H, then there exists a unique yf ∈ H such that f (x) = yf , xH ,
∀x ∈ H
where ·, ·H denotes the inner product.
0018-9286 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1122
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 4, APRIL 2015
Let
For the given data as φ
pk = E Fk−1 , i=k+1 N T Aik+1 li ωkT Fk−1 , Qk = E
N −1 N −1 = {φk }k=0 , ψ = {ψk }k=0 , φk ∈ L2 (Ω, Fk−1 ; Rn ), 1 2 d ψk = ψk , ψk , · · · , ψk ∈ L2 (Ω, Fk−1 ; Rn×d )
suppose z(φ, ψ) = {zk }N k=0 satisfies the following discrete-time difference equation:
⎧ ⎨ ⎩
zk+1 = (gk zk + φk ) +
d
k = 0, · · · , N − 1.
(4)
j=1
Then, for Qk = (Q1k , Q2k , · · · , Qdk ), we have
where, gk , σkj ∈ L2 (Ω, Fk−1 ; Rn×n ), j = 1, 2, · · · , d, are bounded Rn×n -valued random variables. For each given lk ∈ L2 (Ω, Fk−1 ; Rn ), we consider the following linear functional: I(φ, ψ) = E
lk , zk .
Qjk
N
=E
T Aik+1 li ωkj
i=k+1
HN = L2 (Ω, Fk ; Rn ) × L2 (Ω, Fk ; Rn×d )
⎧
d ⎪ jT j T ⎪ p = E l + g p + σ Q F ⎪ k k+1 k+1 k+1 k+1 k+1 k−1 ⎪ ⎪ j=1 ⎪
⎨ d jT j T T Qk = E lk+1 +gk+1 pk+1 + σk+1 Qk+1 ωk Fk−1 , (6) ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ pN −1 = E[lN |FN −2 ], QN −1 = E[lN ωN −1 |FN −2 ], ⎩ k = 0, 1, · · · , N − 2.
It is easy to see that BSDE (6) has a unique F−predictable solution N −1 and (p, Q) = {(pk , Qk )}k=0 (pk , Qk ) ∈ L2 (Ω, Fk−1 ; Rn ) × L2 (Ω, Fk−1 ; Rn×d ). Lemma 2.2: Suppose (p, Q) is the solution of (6), then the functional I(·, ·) can be uniquely represented as N −1
[pk , φk + Qk , ψk ] .
(7)
k=0
d
T
Proof: Denote Ak = gk + j=1 σkj ωkj , k = 0, 1, · · · , N − 1, then the solution of difference (4) can be solved by zk =
k−1
Aki+1
φi +
i=0
d
ψij ωij
,k ≥ 1
j=1
where, Aki , 1 ≤ i ≤ k ≤ N is defined as Aki = In when i = k, and Aki = Ak−1 Ak−2 · · · Ai when 1 ≤ i < k. Substituting the above zk into (5), we have
N −1
I(φ, ψ) = E
k=0
N
Aik+1
T
l i , φk
i=k+1
+
N
I(φ, ψ) = E
E
N
k=0
T Aik+1 li |Fk−1
i=k+1
+
E
N
i=k+1
k=0
Aik+1
T
αk1 , βk1 ∈ L2 (Ω, Fk ; Rn ), αk2 , βk2 ∈ L2 (Ω, Fk ; Rn×d ), α=
αk1 , αk2
N −1 k=0
,β =
βk1 , βk2
N −1 k=0
∈ HN .
It is easy to verify that I(φ, ψ) is a continuous linear functional on HN , (φ, ψ) ∈ HN and (pk , Qk ), k = 0, 1, · · · , N − 1 satisfy the BSDE (6). By the Riesz representation theorem (Lemma 2.1), there N −1 ∈ HN satisfying (7). This proves the exists a unique {(pk , Qk )}k=0 uniqueness of the representation of (7). III. M AXIMUM P RINCIPLE FOR THE S TOCHASTIC D ISCRETE -T IMES S YSTEMS In this section, we suppose that, in the optimal control problem (1), N −1 , Uk = L2 (Ω, Fk ; Rnv ) (2), the admissible control set U = {Uk }k=0 1 d T N and {ωk = (ωk , · · · , ωk ) }k=1 are the random sequence and satisfy (3). We also assume that Assumption 1: φ is twice continuously differentiable w.r.t. x and v, φx , φxx , φv , φvv , lxx are bounded, where φ = g, σ, h. ωk has the fourth moment, E|ωk |4 < ∞. N −1 is the optimal control of problem Suppose that u = {uk }k=0 (1), (2). For fixed integer 0 ≤ m ≤ N − 1, construct the perturbed admissible control u ¯k = (1 − δkm )uk + δkm v, v ∈ Um . In the rest of this paper, we let v have the form of v = um + Δv, > 0, where Δv is an arbitrary random variable with values in Um ⊂ L2 (Ω, Fm ; Rnv ), such that (8)
ω∈Ω
Then u ¯k = uk + δkm Δv. Let x ¯ = {¯ xk }N k=0 be the solution of (2) corresponding to the control u ¯ with the initial value x ¯0 = x0 . Then, we have the following lemmas. Lemma 3.1: Under Assumption 1, the following hold:
, φk
li ωkT |Fk−1 , φk
where
sup |Δv(ω)| < ∞.
αk1 , βk1 + αk2 , βk2
k=0
.
By the smoothing property of conditional expectation, we have
N −1
α, βHN = E
T Aik+1 li ωkT , ψk
i=k+1
N −1
N −1
with the inner product as
Construct a pair of discrete-time BSDEs as follows:
Fk−1 , j = 1, · · · , d.
Denote the Hilbert space
(5)
k=0
I(φ, ψ) = E
T Aik+1 li
i=k+1
(σkj zk + ψkj )ωkj ,
z0 = 0, k = 0, 1, · · · , N − 1,
N
N
xk − xk |2 ≤ C1 2 E|Δv|2 sup E|¯
(9)
0≤k≤N
.
xk − xk |4 ≤ C1 4 E|Δv|4 . sup E|¯ 0≤k≤N
(10)
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 4, APRIL 2015
Let y = {yk }N k=0 be the solution of the following difference equation
1123
Since
(11) we have
E|Zk+1 |2 ≤ C3 E|Zk |2 + C3 4 E|Δv|4 ,
xk − xk − yk |2 ≤ M 4 E|Δv|4 , sup E|¯
E|Zm+1 |2 ≤ C3 4 E|Δv|4 , k = m + 1, · · · , N − 1.
(12)
By iteration, the inequality (12) can be shown. By Lemma 3.2, we see that yk is the first-order variation of xk . So, (11) is called the first-order variational equation of (2). Similarly, we can verify that the variation of J(v) at u is
0≤k≤N
with M > 0 being a constant. ¯k − xk , then x ˆ0 = 0 and Proof: Denote x ˆk = x
x ˆk+1 = gx (xk , uk )ˆ xk + φ1k + δkm gv (xm , um )Δv + φ2m +
d 2 Eσxj (xk , uk )Zk +ψk1,j j=1
Then, we obtain the following result. Lemma 3.2: Under Assumption 1, we have
d
2
E|Zk+1 |2 = Egx (xk , uk )Zk +φ1k +
yk+1 = [gx (xk , uk )yk + δkm gv (xm , um )Δv] +[σx (xk , uk )yk + δkm σv (xm , um )Δv]ωk+1 y0 = 0.
N −1
2,j
σxj (xk , uk )ˆ xk + ψk1,j + δkm σvj (xm , um )Δv + ψm
jk
LJ(u) = E
[lx (xk , uk ), yk + δkm lv (xm , um ), Δv]
k=0
j=1
+ E hx (xN ), yN .
where
⎡ ϕ (u, v) x1 1 ⎢ ϕ2x1 (u, v) ϕx (u, v) = ⎢ .. ⎣
. ϕxn1 (u, v)
ϕ1x2 (u, v) ϕx2 (u, v) 2 .. . ϕx2 (u, v)n
··· ··· ···
We take the Hamiltonian function as
⎤
ϕ1xn (u, v) ϕx2n (u, v) ⎥ ⎥ .. ⎦ . n ϕxn (u, v)
H(x, v, p, Q) = l(x, v) + p, g(x, v) + x ∈ R ,v ∈ R
0 .. .
···
n ˆ k , uk ) ϕn xx (xk + θk x
⎦
0 .. .
···
˜n ˆm ) ϕn vv (xk , uk + θk u
⎦.
In above notations, 1n = (1, 1, · · · , 1)T ∈ Rn , ϕ = g, σ j , j = 1, · · · , d and ϕ = (ϕ1 , ϕ2 , · · · , ϕn )T , and all θ, such as θki , θ˜ki , satisfy 0 < θ < 1, and u ˆm = Δv. By the boundness of ϕxx and Lemma 3.1, we have the following: E|φ1k |2 E|φ2m |2 E|ψk1,j |2 2,j 2 E|ψm |
4
Hv (xm , um , pm , Qm ) = 0
Zk+1 = gx (xk , uk )Zk + φ1k +
d
a.s.
(15)
2,j j ψm m , k = m + 1, · · · , N − 1.
N −1
LJ(u) = E
σxj (xk , uk )Zk + ψk1,j ωk ,
j=1
j=1
(14)
Proof: By Lemma 2.2, we have
≤ C2 4 E|Δv|4 , k ≥ m + 1, j = 1, · · · , d.
d
k=0
4
≤ C2 4 E|Δv|4 ,
N −1
to BSDE (13) satisfy the following equality:
Denote Zk = x ¯k − xk − yk , then Zk = 0 when k = 0, 1, · · · , m. For k ≥ m + 1, we have
Zm+1 = φ2m +
(pk , Qk ) ∈ L2 (Ω, Fk ; Rn ) × L2 (Ω, Fk ; Rn )
4
≤ C2 E|Δv| , 4
(13)
Now, we can obtain our main result, which is a necessary condition, called the MP of the optimal control problem (1), (2). N −1 Theorem 3.1: Under Assumption 1, if ({xk }N k=0 , {uk }k=0 ) are solutions of the optimal control problem (1), (2), then the solutions
≤ C1 E|ˆ xk | ≤ C2 E|Δv| , 4
Q = (Q , · · · , Qd ) ∈ Rn×d . 1
k = 0, 1, · · · , N − 2.
⎤
···
,p ∈ R , n
⎧ pk = E lx (xk+1 , uk+1 ) + gxT (xk+1 , uk+1 )pk+1 ⎪ ⎪ ⎪ ⎪ d ⎪ ⎪ j jT ⎪ + σx (xk+1 , uk+1 )Qk+1 |Fk−1 ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Qk = E lx (xk+1 , uk+1 ) + gxT (xk+1 , uk+1 )pk+1 $ d j jT T ⎪ ⎪ + σ (x , u )Q ω |F k+1 k+1 k−1 x k k+1 ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎪ ⎪ pN −1 = E[hx (xN )|FN −2 ] ⎪ ⎪ ⎪ ⎪ Q = E[hx (xN )ωN −1 |FN −2 ] ⎪ ⎪ ⎩ N −1
⎤
···
nv
Construct the discrete-time BSDE as
and ϕvv (xk , uk + θk u ˆ ) ⎡ 1m ϕvv (xk , uk + θ˜k1 u ˆm ) .. ⎣ = . 0
Qj , σ j (x, v) ,
j=1 n
and ϕv has the similar meaning. Moreover, φ1k = (ˆ xTk ⊗ In )gxx × j ˆk , uk )(ˆ xk ⊗ 1n ), ψk1,j = (ˆ xTk ⊗ In )σxx (xk + θkj x ˆ k , uk ) × (xk +θk x (ˆ xk ⊗ 1n ) when k ≥ m + 1, and φ1k = 0 when k ≤ m. φ2m = 2,j = 2 (Δv T ⊗ 2 (Δv T ⊗ In )gvv (xm , um + θ˜m Δv)(Δv ⊗ 1n ), ψm j j (xm , um + θ˜m Δv)(Δv ⊗ 1n ), where In )σvv ϕxx (xk + θk x ˆ k , uk ) ⎡ 1 ϕxx (xk + θk1 x ˆ k , uk ) .. ⎣ = . 0
d
[pk , δkm gv (xm , um )Δv
k=0
+ Qk , δkm σv (xm , um )Δv + δkm lv (xm , um ), Δv] . So LJ(u) = E Hv (xm , um , pm , Qm ), Δv . Since u is the optimal control, it follows LJ(u) = 0.
1124
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 4, APRIL 2015
This implies E Hv (xm , um , pm , Qm ), Δv = 0,
∀Δv ∈ U.
(16)
generated by {ωk }. Considering the one-step BSDE of (23). Let N = 1, f1 (y, z) = y, y, z ∈ R, and ω0 takes values in {− 3/2, 0, 3/2} with probability
For every integer K > 0, let Δv = Hv (xm , um , pm , Qm )1{|Hv (xm ,um ,pm ,Qm )|