Optimal Control of Uncertain Stochastic Systems

Optimal Control of Uncertain Stochastic Systems Subject to Total Variation Distance Uncertainty

Farzad Rezaei1 , Charalambos D. Charalambous2 and N.U. Ahmed3 Abstract This paper is concerned with optimization of uncertain stochastic systems, in which uncertainty is described by a total variation distance constraint between the measures induced by the uncertain systems and the measure induced by the nominal system, while the pay-off is a linear functional of the uncertain measure. Robustness at the abstract setting is formulated as a minimax game, in which the control seeks to minimize the pay-off over the admissible controls while the uncertainty aims at maximizing it over the total variation distance constraint. By invoking the Hanh-Banach theorem, it is shown that the maximizing measure in the total variation distance constraint exists, while the resulting pay-off is a linear combination of L1 and L∞ norms. Further, the maximizing measure is characterized by a linear combination of a tilted measure and the nominal measure, giving rise to a pay-off which is a non-linear functional on the space of measures to be minimized over the admissible controls. The abstract formulation and results are subsequently applied to continuous-time uncertain stochastic controlled systems, in which the control seeks to minimize the pay-off while the uncertainty aims to maximize it over the total variation distance constraint. The minimization over the admissible controls of the non-linear functional pay-off is addressed by developing a generalized principle of optimality or dynamic programming equation satisfied by the value function. Subsequently, it is proved that the value function satisfies a generalized Hamilton-Jacobi-Bellman (HJB) equation. Finally, it is shown that the value function is also a viscosity solution of the generalized HJB equation. Throughout the paper the formulation and conclusions are related to previous work found in the literature.

1

Introduction

Minimax strategies are often employed to address robustness issues associated with parametric and non-parametric uncertain stochastic systems. Uncertainties in the system may cause the distribution of the system to move away from the nominal one, and therefore proper 1

E-mail: [email protected] Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, Nicosia, CYPRUS. E-mail: [email protected]. 3 E-mail: [email protected]. Also with the Department of Mathematics, University of Ottawa. 2

models of uncertainty must be employed to capture the difference between the nominal distribution and the true distributions. In an abstract measure theoretic formulation in which systems are described by measures it is often desirable to characterize uncertainty using a distance metric between the nominal and the true measures. This paper in concerned with uncertain systems described by measures induced by the underlying processes, pay-off’s by linear functionals on the space of measures (e.g., error or energy criteria), while uncertainty is described by the total variation distance between measures. The objectives of this paper are twofold: • To formulate uncertain stochastic systems on abstract spaces by introducing a general quantitative definition of uncertainty via the total variation distance between measures, and derive fundamental results associated with this formulation. • To apply the total variation distance uncertainty model to formulate and solve uncertain stochastic minimax control of systems governed by stochastic differential equations. Thus, the theory and contributions of this paper are developed at two levels of generality; the abstract level and the application level. At the abstract level, a general framework of uncertainty description via the total variation distance is put forward, while fundamental results are derived. At this level, systems are represented by measures on abstract spaces, pay-off’s by linear functionals on the space of measures, and uncertainty by a set described by the total variation distance (uncertainty ball with respect to total variation norm) centered at the nominal measure having a pre-specified radius. At the abstract level robustness is formulated via the minimax value of the pay-off, where the maximization is over the measures which satisfy the total variation distance constraint, while the minimization is over the admissible controls. The main objective of this part is to show existence of the maximizing measure in the total variation uncertainty set, and characterize the maximizing measure. Specifically, it is shown that the maximization of the pay-off over the total variation distance uncertainty is equivalent to a pay-off which is a linear combination of L1 and L∞ norms. Further, the maximizing measure is constructed as a convex combination of a tilted measure and the nominal measure. The theory and results developed at the abstract setting are applied to formulate and solve continuous-time uncertain stochastic controlled systems subject to a total variation distance uncertainty constraint, in which the control seeks to minimize the pay-off while the uncertainty seeks to maximize it over the constraint set. While stochastic optimal control theory is well developed when the pay-off is a linear functional of the measure induced by the system, such as, integral or exponential-of-integral payoff functionals [1], [2], [3], further work is required to develop the theory for the non-linear functional emerging from the maximization problem over the total variation distance constraint. 2

The main contributions of this part is the derivation of a generalized dynamic programming or principle of optimality associated with the value function, and a generalized HamiltonJacobi-Bellman (HJB) equation which have not appeared in the literature. In addition, due to the lack of smoothness associated with classical solutions, it is shown that the value function is a viscosity solution of the generalized HJB equation. The rest of the paper is organized as follows. In Section 2 various models of uncertainty are introduced and their relation to total variation distance is described. In Section 3 the abstract formulation is introduced while in Section 4 the solution of the maximization problem over the variational norm constraint set is presented. In section 5, the abstract setup is applied to uncertain stochastic fully observable continuous-time controlled systems. In Section 5.2, the new type of principle of optimality associated with the non-linear pay-off is derived. In Section 5.3 a generalized HJB equation is obtained which is shown in the absence of uncertainty to give as a special case the classical HJB equation. Finally, in Section 6, it is shown that the value function is a viscosity solution of the new generalized HJB equation.

2

Motivation of Total Variation Uncertainty Class

Throughout the paper the mathematical formulation as well as the results obtained are related to other existing models of uncertainty as well as to other optimization problems found in the literature. Below, a summary of the uncertainty models and their relations to total variation distance model is discussed. Let (Σ, dΣ ) denote a complete separable metric space (a Polish space), and (Σ, B(Σ)) the corresponding measurable space, in which B(Σ) is the σ-algebra generated by open sets in Σ. Let M1 (Σ) denote space of countably additive probability measures on (Σ, B(Σ)). The relative entropy of ν ∈ M1 (Σ) with respect to μ ∈ M1 (Σ) is a mapping H(·|·) : M1 (Σ) × M1 (Σ) → [0, ∞] defined by

H(ν|μ) =

⎧ dν ⎪ ⎨ Σ log( dμ )dν, ⎪ ⎩

dν if ν 0. Taking the limit s → +∞, inequality (6) can be written as follows.

sN0 (x) η(dx) Σ N0 (x)e s (x) N0 s→+∞ η(dx) Σe

lim

≥ (|| ||∞ − ).

η(E ∩ Emax ) η(E )

(7)

Note that as ↓ 0, we have E ↓ Emax , and since η is a finite measure, lim↓0 η(E ) = η(∩>0 E ) = η(Emax ). From this and (7), we finally get

sN0 (x) η(dx) Σ N0 (x)e s (x) N0 s→+∞ η(dx) Σe

lim

= || ||∞

Combine the above inequality with (5) to get the desired result. •

Remark 4.4 Since V arηnor ( )

d = ds

(x)es(x) η(dx)

Σ

Σ

es(x) η(dx)

is non-negative,

7

then

monotone non-decreasing function of s (see also [10]). Hence, we also have

lim

s→+∞

(x)es(x) η(dx) = sup s(x) η(dx) s>0 Σe

Σ

Σ

(x)es(x) η(dx)

Σ

Σ

es(x) η(dx)

is a

(x)es(x) η(dx) = || ||∞ s(x) η(dx) Σe

Next, we state the main theorem which characterizes the maximizing measure by a linear combination of two probability measures. Theorem 4.5 Under the assumptions of Lemma 4.3, there exists a family of probability measures which attain the supremum in (2) given by

1 β E es0 (x) η(dx) + μ(E), ν (E) = s (x) 0 β +1 Σe η(dx) 1 + β ∗

E ∈ B(Σ)

(8)

where β ∈ (2, ∞), s0 ∈ (0, ∞) and η is an arbitrary finite non-negative finitely additive measure defined on (Σ, B(Σ)). Moreover, β and η should be chosen such that ||ν ∗ − μ|| = R. Proof. Let β > 2. Then the right hand side of (4) can be rewritten as follows. R.|| ||∞ + Eμ ( ) = (1 + β)

β R 1 . .|| ||∞ + Eμ ( ) 1+β β 1+β

Since β > 2 and R ∈ [0, 2], then 0
0 such that φ(t, x, u) = b(t, x, u), f (t, x, u), h(x), σ(t, x, u) satisfy |φ(t, x, u) − φ(t, x ˆ, u)| ≤ L|x − xˆ|, ∀t ∈ [0, T ], x, x ˆ ∈ n , u ∈ U |φ(t, 0, u)| ≤ L, ∀(t, u) ∈ [0, T ] × U Under the above two assumptions, for any (s, y) ∈ [0, T ) × n and u(·) ∈ U w [s, T ], the SDE in (11) admits a unique weak solution x(·) ≡ x(·; s, y, u(·)). Using Lemma 4.3, the above problem can be re-written in an exponential form as follows. u∗ ,∗

∗

J(u , Q +EP

T

)=

inf

u(·)∈U w [s,T ]

s

f (t, x(t), u(t))dt + h(x(T )) ∞

f (t, x(t), u(t))dt + h(x(T ))

s

= (1 + β)

R.

T

infw

u(·)∈U [s,T ]

EQu,∗

T

f (t, x(t), u(t))dt + h(x(T ))

(14)

s

where · ∞ is defined over the space Ω ≡ C([0, T ]; m ) (this is the space of continuous sample paths generated by the Brownian motion, which according to Definition 5.1 both state and control processes are measurable function of these paths) and Qu,∗ is given by the following measure. u,∗

Q

(E) = γ.

eα

T s

Eη eα

f (t,x(t),u(t))+h(x(T ))

T s

f (t,x(t),u(t))+h(x(T ))

η(E) + (1 − γ)P (E),

E∈F

(15)

β . where γ = 1+β Notice that α ≡ α(u, s) depends on the choice of control u(·) and initial time s, and that η ∈ M1 (Ω) is an arbitrary measure which is not unique. Clearly, the resulting minimization problem (14) (pay-off) is not standard, because it depends non-linearly on the measure η ∈ M1 (Ω). Hence, one cannot apply the standard derivation of principle of optimality and dynamic programming to derive a Hamilton-Jacobi-Bellman type equation; in the work found in the literature [17] the derivation of principle of optimality is necessitates linearity of the pay-off with respect to the measure induced by the stochastic system.

Remark 5.4 The measure η ∈ M1 (Ω) is not unique; hence it can be induced by various strategies as follows. 1) The measure η can be taken to be η = P ∈ M1 (Ω), and hence Qu,∗ ∈ M1 (Ω) is uniquely determined by the weak solution of system (11). Clearly, although the total variation distance uncertainty set BR (P ) = {Q ∈ M1 (Ω); ||Q − P || ≤ R} does not require that Q 0 and f (0) = g(0) = 0. Also d d assume that their derivatives dσ f (σ) and dσ g(σ) are both non-decreasing functions, then d d f (σ) ≤ g(σ), dσ dσ

∀σ > 0

(21)

Proof. See Appendix. s ˆ

Lemma 5.8 Let J(ˆ s, θ, x(ˆ s); u(·)) = γ

ˆ (x,u)eθT (x,u) ||F s ) E(sT s ˆ

E(e

ˆ (x,u) θs T ||Fsˆs )

+ (1 − γ)E( sTˆ (x, u)||Fsˆs), where

θ > 0. Then given > 0, there exists α0 > 0 such that the following inequalities hold. αs J(ˆ s, αsˆ, x(ˆ s); u(·)) − γ α0 ≤

αs

s, αsˆ, x(ˆ s); u(·)) + γ(1 − ≤ αs J(ˆ

0

J(ˆ s, θ, x(ˆ s); u(·))dθ

R )αs (Lf (T − sˆ) + Lh ) β

where Lf and Lh are the upper bounds for f and h. 8

The cost J(s, αs , x(s); u(·)) is said to be directly related to ||sT (x, u)||∞ if there exists a β > R such that s

E(sT (x, u)eαs T (x,u) ) R J(s, αs , x(s); u(·)) = γ + (1 − γ)E(sT (x, u)) = γ ||sT (x, u)||∞ + (1 − γ)E(sT (x, u)) αs sT (x,u) β E(e )

15

Proof. Notice that ≤ γ|| sTˆ (x, u)||∞ + (1 − γ)E( sTˆ (x, u)||Fsˆs) R R = γ || sTˆ (x, u)||∞ + (1 − γ)E( sTˆ (x, u)||Fsˆs) + γ(1 − )|| sTˆ (x, u)||∞ β β R ≤ J(ˆ s, αsˆ, x(ˆ s); u(·)) + γ(1 − )|| sTˆ (x, u)||∞ (22) β

J(ˆ s, θ, x(ˆ s); u(·))

Also we have the following upper bound. || sTˆ (x, u)||∞

= ||

T sˆ

f (t, x(t), u(t))dt + h(x(T ))||∞ ≤ Lf (T − sˆ) + Lh

(23)

Applying the above to (22) leads to the following inequality.

αs 0

J(ˆ s, θ, x(ˆ s); u(·))dθ ≤ αs J(ˆ s, αsˆ) + γ(1 −

R )αs (Lf (T − sˆ) + Lh ) β

(24)

For the lower bound notice that ξ log E(e ξ ) is convex in ξ, and hence ξ log E(e

ˆ (x,u) s T ξ

+ log E(e

Let ξ =

1 αs

and ξ0 =

1 , α0

||Fsˆs)

ˆ (x,u) s T ξ0

≥ ξ0 log E(e

ˆ (x,u) s T ξ0

||Fsˆs )

ˆ (x,u) s T

Then using the previous equation and (25), then follows9 . 0

(25)

where α0 is such that given > 0,

1 s ˆ log E(eα0 T (x,u) ||Fsˆs ) ≥ || sTˆ (x, u)||∞ − , α0

αs

1 E( sTˆ (x, u)e ξ0 ||Fsˆs) ||Fsˆs) − (ξ − ξ0 ) ˆ (x,u) s T ξ0 E(e ξ0 ||Fsˆs )

αs 0

P |Fsˆs − a.s.

J(ˆ s, θ, x(ˆ s); u(·))dθ can be bounded as

s ˆ

J(ˆ s, θ, x(ˆ s); u(·))dθ = αs γ log E(eαs T (x,u) ||Fsˆs ) + (1 − γ)E( sTˆ (x, u)||Fsˆs ) s ˆ

1 1 E( sTˆ eα0 T (x,u) ||Fsˆs ) + (1 − γ)E( sTˆ (x, u)||Fsˆs) − γαs α0 ( − ) ≥ γ log E(e s ˆ (x,u) α s 0 αs α0 E(e T ||Fsˆ ) 1 1 ≥ γα0 (|| sTˆ (x, u)||∞ − ) − γαs α0 ( − )|| sTˆ (x, u)||∞ + (1 − γ)E( sTˆ (x, u)||Fsˆs) αs α0 R sˆ ≥ γαs || T (x, u)||∞ + (1 − γ)E( sTˆ (x, u)||Fsˆs) − γ α0 β ≥ αs J(ˆ s, αsˆ, x(ˆ s); u(·)) − γ α0 ˆ (x,u) α0 sT

||Fsˆs )

• 9

Note that α0 > αs

16

Lemma 5.9 Let x1 (t; sˆ, x1 (ˆ s); u(·)) and x2 (t; sˆ, x1 (ˆ s); u(·)) be the solutions of the nominal s) and x2 (ˆ s), respectively at time sˆ. state equation corresponding to initial conditions x1 (ˆ Then given > 0, there exists C > 0 such that s); u(·)) − J(ˆ s, αsˆ, x2 (ˆ s); u(·))| ≤ C|x1 (ˆ s) − x2 (ˆ s)| + |J(ˆ s, αsˆ, x1 (ˆ

γR β

Proof. By definition we have the following J(ˆ s, αsˆ, x1 (ˆ s); u(·)) − J(ˆ s, αsˆ, x2 (ˆ s); u(·)) = γ −γ

R sˆ || T (x1 , u)||∞ + (1 − γ)E( sTˆ (x1 , u)||Fsˆs) β

R sˆ || (x2 , u)||∞ − (1 − γ)E( sTˆ (x2 , u)||Fsˆs) β T

Given > 0, there exists α1 > 0 and α2 > 0 such that || sTˆ (x1 , u)||∞ ≤ and || sTˆ (x2 , u)||∞ ≤

1 α2

s ˆ

1 α1

s ˆ

log E eα1 T (x1 ,u) ||Fsˆs +

log E eα2 T (x2 ,u) ||Fsˆs + . The following inequalities hold.

R 1 s ˆ log E eα1 T (x1 ,u) ||Fsˆs β α1 γR s ˆ − log E eα1 T (x2 ,u) Fsˆs + (1 − γ)E(| sTˆ (x1 , u) − sTˆ (x2 , u)|Fsˆs) + β

J(ˆ s, αsˆ, x1 (ˆ s); u(·)) − J(ˆ s, αsˆ, x2 (ˆ s); u(·)) ≤ γ

R 1 s ˆ log E eα2 T (x1 ,u) Fsˆs β α2 γR s ˆ − log E eα2 T (x2 ,u) Fsˆs − (1 − γ)E(| sTˆ (x1 , u) − sTˆ (x2 , u)|Fsˆs) − β

(26)

J(ˆ s, αsˆ, x1 (ˆ s); u(·)) − J(ˆ s, αsˆ, x2 (ˆ s); u(·)) ≥ γ

(27)

Using the inequality log z ≤ z − 1 it follows that 1 s ˆ s ˆ log E eα1 T (x1 ,u) ||Fsˆs − log E eα1 T (x2 ,u) ||Fsˆs α1 α sˆ (x ,u) α sˆ (x ,u) s 1 E e 1 T 1 − e 1 T 2 ||Fsˆ ≤ s ˆ α1 E eα1 T (x2 ,u) ||Fsˆs

s ˆ

s ˆ

≤ E eα1 (T (x1 ,u)∨T (x2 ,u)) | sTˆ (x1 , u) − sTˆ (x2 , u)|Fsˆs

(28)

An application of Gronwall and Burkholder inequalities leads to the following upper bound. s) − x2 (ˆ s)|p E(| sTˆ (x1 , u) − sTˆ (x2 , u)|p Fsˆs ) ≤ C|x1 (ˆ

(29)

where p > 1. Also

s ˆ

s ˆ

E eα1 q(T (x1 ,u)∨T (x2 ,u)) Fsˆs ≤ E eα1 qL(T −ˆs+1)(2+supsˆ≤t≤T |x1 (t)|+supsˆ≤t≤T |x2 (t)|) From the nominal state equation we have s) + x1 (t) = x1 (ˆ

t sˆ

b(τ, x1 (τ ), u(τ ))dτ + 17

t sˆ

σ(τ, x1 (τ ), u(τ ))dW (τ )

(30)

Now by Gronwall’s inequality, an upper bound for sup |x(t)| can be found t

s)| + sup sup |x1 (t)| ≤ (L(T − sˆ)eL(T −ˆs) + 1) |x1 (ˆ

sˆ≤t≤T

+L(T − sˆ)

sˆ≤t≤T

sˆ

σ(τ, x1 (τ ), u(τ ))dW (τ ) (31)

From above we also obtain

E eθ supsˆ≤t≤T |x1 (t)| Fsˆs ≤ eD1 θ(|x1 (ˆs)|+L(T −ˆs)) E eD1 θ supsˆ≤t≤T |

t s ˆ

σ(τ,x1 (τ ),u(τ ))dW (τ )|

Fsˆs

(32)

where D1 = (L(T − sˆ)eL(T −ˆs) + 1) and θ > 0. By the change of time property for martingales [19](Chapter II, Theorem 41), it follows that

t sˆ

σ(τ, x1 (τ ), u(τ ))dW (τ ) = B(tˆ)

Where tˆ(t) = sˆt |σ(τ, x1 (τ ), u(τ ))|2 dτ and {B(t)}t≥ˆs is an ({Ftsˆ}t≥ˆs , P ) standard Brownian motion. Since by Assumptions 5.3, σ is bounded then the right-hand side of (32) can be bounded using the result in [20] to obtain.

E eθ supsˆ≤t≤T |x1 (t)| Fsˆs ≤ 80e

(18θ)2 (T −ˆ s)L2σ 2

eD1 θ(|x1 (ˆs)|+L(T −ˆs))

(33)

where Lσ is the upper bound on σ. Now by using (29), (30) and (33) the following upperbound for (28) is obtained. 1 s ˆ s ˆ log E eα1 T (x1 ,u) ||Fsˆs − log E eα1 T (x2 ,u) ||Fsˆs α1 2 ≤ C1 |x1 (ˆ s) − x2 (ˆ s)|eD1 α1 (|x1 (ˆs)|+|x2 (ˆs)|)+E1 α1 +F1 α1

where C1 , D1 , E1 and F1 are constants. The following lower bound is obtained in a similar fashion. 1 s ˆ s ˆ log E eα2 T (x1 ,u) Fsˆs − log E eα2 T (x2 ,u) Fsˆs α2 2 ≥ −C1 |x1 (ˆ s) − x2 (ˆ s)|eD1 α2 (|x1 (ˆs)|+|x2 (ˆs)|)+E1 α2 +F1 α2

Applying the above two bounds to (26) and (27), the following bounds are also obtained. J(ˆ s, αsˆ, x1 (ˆ s); u(·)) − J(ˆ s, αsˆ, x2 (ˆ s); u(·))

≤ C(α1 , α2 )|x1 (ˆ s) − x2 (ˆ s)| +

γR β

where C(·, ·) is a constant. Notice that the above inequality holds for value functions, as well s)) − V (ˆ s, x2 (ˆ s))| ≤ C(α1 , α2 )|x1 (ˆ s) − x2 (ˆ s)| + |V (ˆ s, x1 (ˆ 18

γR β

establishing the results. •

The following theorem gives the non-linear stochastic version of Bellman’s principle of optimality, also known as the dynamic programming equation. The derivation employs the results of the previous lemmas. Theorem 5.10 Let Assumptions 5.3 hold. Then for any (s, y) ∈ [0, T ) × n we have

V (s, y) = eαs

sˆ

γE

infw

u(·)∈U [s,T ]

1

γ

f (t,x(t;s,y,u(·)),u(t))dt+ αγs s

V (ˆ s, x(ˆ s; s, y, u(·))) + V (ˆ s,x(ˆ s;s,y,u(·)))

sˆ

f (t, x(t; s, y, u(·)), u(t))dt

s

× sˆ αs E eαs s f (t,x(t;s,y,u(·)),u(t))dt+ γ V (ˆs,x(ˆs;s,y,u(·))) +(1 − γ)E

sˆ

f (t, x(t; s, y, u(·)), u(t))dt

s

, 0 ≤ s ≤ sˆ ≤ T

Proof. Note that V (s, y) =

inf

u(·)∈U w [s,T ]

J(s, αs , y; u(·))

T d γ log E eα s f (t,x(t;s,y,u(·)),u(t))dt+αh(x(T ;s,y,u(·))) = infw u(·)∈U [s,T ] dα T (1 − γ) αE + f (t, x(t; s, y, u(·)), u(t))dt + h(x(T ; s, y, u(·)) |α=αs γ s

Alternatively, the above expression is written as

αf˜s (x,u)+αhs (x)+ (1−γ) αE f˜s (x,u)+hs (x) d T T T γ γ log E e T |α=αs V (s, y) = infw u(·)∈U [s,T ] dα

d αf˜sˆs (x,u) = inf γ log E e E exp αf˜Tsˆ (x, u) + αhsTˆ (x) u(·)∈U w [s,T ] dα (1 − γ) s sˆ sˆ s s ˜ ˜ + Fsˆ |α=αs αE fsˆ (x, u) + E fT (x, u) + hT (x) Fsˆ γ

Also,

V (s, y)

(1−γ) d αf˜s (x,u)+ γ αE γ log E e sˆ = infw u(·)∈U [s,T ] dα

ˆ (x)+ αf˜Tsˆ (x,u)+αhsT

×E e

(1−γ) αE γ

Using Lemma 5.6, the above is written as follows.

19

f˜sˆs (x,u)

ˆ (x) F s f˜Tsˆ (x,u)+hsT s ˆ

Fsˆs

|α=αs

αf˜s (x,u)+ (1−γ) αE d γ V (s, y) = infw γ log E e sˆ u(·)∈U [s,T ] dα

f˜sˆs (x,u)

e

1 γ

α 0

J(ˆ s,β,x(ˆ s),u(·))dβ

|α=αs

By lemmas 5.7 and 5.8, we get the following.

V (s, y) =

˜s

αs J(ˆ s,αsˆ,x(ˆ s);u(·))

E (γ f˜sˆs (x, u) + J(ˆ s, αsˆ, x(ˆ s); u(·)))eαsfsˆ (x,u)+ γ

inf

+(1 − γ)E f˜sˆs (x, u)

˜s

αs J(ˆ s,αsˆ,x(ˆ s);u(·))

E eαs fsˆ (x,u)+ γ

u(·)∈U w [s,T ]

(34)

Now notice that for any > 0, there exists a 5-tuple (Ω, F , P, W (·), u(·)) ∈ U w [s, T ] such s); u(·)) ≥ V (ˆ s, x(ˆ s)) that V (s, y)+ > right-hand side of (34). Using the fact that J(ˆ s, αsˆ, x(ˆ 10 and Lemma 5.7 , the following is obtained.

E V (s, y) + >

f˜sˆs (x, u)

γ

+

1 V γ

αs f˜sˆs (x,u)+ αγs V (ˆ s,x(ˆ s))

(ˆ s, x(ˆ s)) e


E e

+(1 − γ)E f˜sˆs (x, u) By taking infimum over the class of admissible controls and letting → 0, we get

V (s, y) >

inf

u(·)∈U w [s,T ]

E

s ˜s f˜sˆs (x, u) + γ1 V (ˆ s, x(ˆ s)) eαs fsˆ (x,u)+ γ V (ˆs,x(ˆs))

γ


α

E e

+(1 −

γ)E f˜sˆs (x, u)

(35)

Next, we derive a reverse inequality. To this end, consider the method used in [17]. Let {Dj }j≥1 be a Borel partition11 of n , s, T ] with diameter diamDj < δ. For each j, there is a 5-tuple (Ωj , Fj , Pj , Wj (·), uj (·)) ∈ U w [ˆ such that s, xj ) + , J(ˆ s, αsˆ, xj ; uj (·)) ≤ V (ˆ

xj ∈ Dj

(1− γR )

Choose δ = C β , where C is the same constant that appears in Lemma 5.9. Hence for any x ∈ Dj the following holds. s, αsˆ, xj ; uj (·)) + ≤ V (ˆ s, xj ) + 2 ≤ V (ˆ s, x) + 3 , J(ˆ s, αsˆ, x; uj (·)) ≤ J(ˆ 10 11

˜s

˜s

∀x ∈ Dj

Let g(αs ) = log E(eαs fsˆ (x,u)+αs J(ˆs,αsˆ,x(ˆs);u(·)) ) and f (αs ) = log E(eαs fsˆ (x,u)+αs V (ˆs,x(ˆs)) ) i.e., Dj ∈ B(n ), Di ∩ Dj = ∅ for i = j, ∪j≥1 = n

20

(36)

Then by the definition of the weak-admissible 5-tuple (Ω, F , P, W (·), u(·)), there is a function 12 uj (t, ω) = Ψj (t, Wj (· ∧ t, ω)), Pj -a.s., ω ∈ Ωj , ∀t ∈ [ˆ s, T ]. Now for Ψj ∈ A m T (U) such that w any (Ω, F , P, W (·), u(·)) ∈ U [s, T ], let x(·) ≡ x(·; s, y, u(·)), and define a new control

u(t, ω); if t ∈ [s, sˆ) s, ω) ; if t ∈ [ˆ s, T ] and x(t, ω) ∈ Dj Ψj t, Wj (· ∧ t, ω) − W (ˆ

u˜(t, ω) =

Clearly (Ω, F , P, W (·), u ˜(·)) ∈ U w [s, T ]. From (34) we have,

E V (s, y) ≤

s ˜s f˜sˆs (x, u) + γ1 J(ˆ s, αsˆ, x(ˆ s); u(·)) eαs fsˆ (x,u)+ γ J(ˆs,αsˆ,x(ˆs);u(·))

γ

α

αs f˜sˆs (x,u)+ αγs J(ˆ s,αsˆ,x(ˆ s);u(·))

E e +(1 − γ)E f˜sˆs (x, u)

Applying inequality (36) and Lemma 5.7 in the above expression leads to the following upper bound.

E V (s, y) ≤

s, xsˆ) + f˜sˆs (x, u) + γ1 V (ˆ

γ

3 γ

αs f˜sˆs (x,u)+ γ1 αs V (ˆ s,xsˆ)

e

αs f˜sˆs (x,u)+ γ1 αs V (ˆ s,xsˆ)

e

3 γ

E e

+(1 − γ)E f˜sˆs (x, u)

(37)

By combining (35) and (37), and letting → 0, the desired result is derived. •

5.3

Generalized Hamilton-Jacobi-Bellman Equation

Next we use the dynamic programming equation of the previous theorem to derive a generalized partial differential equation satisfied by the value function V (·, ·) known as HJB equation. The derivation assumes that V ∈ C1,2 ([0, T ] × n ) (the set of all continuous functions V : [0, T ] × n → such that the partial derivatives Vt , Vx , Vxx are continuous in (t, x)). Theorem 5.11 Suppose Assumptions 5.3 hold and the value function V ∈ C1,2 ([0, T ] × n ). Define

G(t, x, u, p, q, r) = (1 −

12

αt 1 α2 r) < p, b(t, x, u) > + tr(q.σσ T ) + t2 r.tr(p2 .σσ T ) − f (t, x, u), γ 2 2γ ∀(t, x, u, p, q, r) ∈ [0, T ] × n × U × n × S n ×

n Am ˆ : [0, T ] × WTn → U where T (U ) is the set of all {Bt+ (WT )}t≥0 -progressively measurable processes η

WTn = C([0, T ]; n ).

21

where S n is the space of n by n symmetric matrices. Then V is the solution of the following second-order partial differential equation of generalized Hamilton-Jacobi-Bellman type.

−Vs (s, y) + supu∈U G(s, y, u, −Vx(s, y), −Vxx(s, y), −V (s, y)) = 0, (s, x) ∈ [0, T ) × n (38) V (s, x)|s=T = h(x), x ∈ n

Proof. Using the result of Theorem 5.10, for every u ∈ U w [s, T ], we have E

sˆ

γ

V (s, y) ≤

s

αs

sˆ

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)) e

αs

E e +(1 − γ)E

sˆ

sˆ s

s

f (t,x(t),u(t))dt+ γ1 αs V (ˆ s,x(ˆ s))


f (t, x(t), u(t))dt

s

Then E{V (s, y) − V (ˆ s, x(ˆ s))} sˆ − s

sˆ

E

1 ≤ sˆ − s

γ

s

αs

sˆ


αs

E e

−E{V (ˆ s, x(ˆ s))} + (1 − γ)E

sˆ

sˆ s

s



f (t, x(t), u(t))dt

(39)

s

Now the numerator in the first term on the righthand side of the above inequality can be decomposed as follows.

E

sˆ

γ

sˆ


s

=E

αs

sˆ

γ

s


f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)) eαs

sˆ s

f (t,x(t),u(t))dt+ αγs V (ˆ s,x(ˆ s))

s

+E

sˆ

γ

s

αs V

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)) − V (s, y) e γ

(s,y)

αs V

−eγ

(s,y)

αs V (s,y)

+ V (s, y)e γ

(40)

Using the above the first two terms on the right-hand side of (39) can be written as follows.

1 sˆ − s

E

sˆ

γ

s

αs


αs

E e

sˆ s

sˆ s



−E{V (ˆ s, x(ˆ s))} 22

=

E

sˆ

γ

s

1 1 + × sˆ − s sˆ − s


sˆ

γ

s

E eαs

sˆ s

sˆ s



αs V


E eαs

sˆ s


αs V

−eγ

(s,y)

(s,y)

αs

V (s, y)e γ V (s,y) 1 + s, x(ˆ s)) − V (s, y)} − V (s, y) − E{V (ˆ sˆ − s E eαs ssˆ f (t,x(t),u(t))dt+ γ1 αs V (ˆs,x(ˆs))

(41)

Notice that

αs

V (s, y) e γ V (s,y) lim −1 sˆ→s s ˆ − s E eαs ssˆ f (t,x(t),u(t))dt+ αγs V (ˆs,x(ˆs))

αs

V (s, y)e γ V (s,y) d = |ξ=s dξ E eαs sξ f (t,x(t),u(t))dt+ αγs V (ξ,x(ξ))

αs ∂ V (s, y) = −V (s, y) αs f (s, y) + γ ∂t

(42)

In order to find the limit for the first term in (41), we apply Itô rule to the function F (τ ) = τ αs f dτ + αγs V (τ,x(τ )) . e s ∂F ∂F 1 ∂2F dτ + , dx + tr d < x, x > ∂τ ∂x 2 ∂x2 τ αs αs ∂ V (τ, x(τ )) eαs s f (t,x(t),u(t))dt+ γ V (τ,x(τ )) dτ = αs f (τ, x(τ )) + γ ∂τ τ α αs s Vx (τ, x(τ ))eαs s f (t,x(t),u(t))dt+ γ V (τ,x(τ )) , b(τ, x(τ ), u(τ ))dτ + σ(τ, x(τ ), u(τ ))dW (τ ) + γ τ α αs 1 s Vxx (τ, x(τ ))eαs s f (t,x(t),u(t))dt+ γ V (τ,x(τ )) + tr 2 γ τ α 2 s αs f (t,x(t),u(t))dt+ αγs V (τ,x(τ )) T Vx (τ, x(τ )) e s + σ(τ, x(τ ), u(τ ))σ (τ, x(τ ), u(τ )) dτ γ

dF (τ ) =

Using the above we can compute the following limit.

sˆ 1 1 E F (ˆ E s) − F (s) = dF (τ ) sˆ − s sˆ − s s

sˆ 1 αs ∂ E V (τ, x(τ )) = αs f (τ, x(τ )) + sˆ − s γ ∂τ s α αs 1 s Vx (τ, x(τ )), b(τ, x(τ ), u) + tr Vxx (τ, x(τ )) + γ 2 γ τ α 2 s αs f (t,x(t),u(t))dt+ αγs V (τ,x(τ )) T Vx (τ, x(τ )) σ(τ, x(τ ), u(τ ))σ (τ, x(τ ), u(τ )) e s + dτ γ

23

After taking the limit in the above, and substituting the result in the first term of (41) we get the following. 1 lim sˆ→s s ˆ− s

×

E

sˆ

γ

s

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s))

E eαs

sˆ s

eαs

sˆ s



αs αs ∂ V (s, y) + VxT (s, y)b(s, y, u) γ ∂s γ αs2 2 1 αs Vxx (s, y) + 2 Vx (s, y) σ(s, y, u)σ T (s, y, u) + tr 2 γ γ

αs V

−eγ

(s,y)

= V (s, y) αs f (s, y, u) +

(43)

Also performing the limit of the second term in (41) we get.13 1 lim sˆ→s s ˆ− s

sˆ

γ

s

αs V (s,y)


E eαs 1 E sˆ→s s ˆ− s

= γf (s, x, u) + lim

sˆ s


sˆ

dV (t, x(t))

s

1 = γf (s, x, u) + Vs (s, y) + Vx (s, y)T b(s, y, u) + tr Vxx (s, y)σ(s, y, u)σ T (s, y, u) (44) 2 By substituting (44), (43) and (42) in (41) and (39), we arrive at the following inequality.

−

E{

sˆ s

dV (t, x(t))} ≤ sˆ − s

αs V (s, y). Vx (s, y), b(s, y, u) γ α 1 αs2 2 s Vxx (s, y) + 2 Vx (s, y) σ(s, y, u)σ T (s, y, u) + V (s, y).tr 2 γ γ sˆ 1 +γf (s, y, u) + (1 − γ) lim f (t, x(t), u(t))dt E sˆ→s s ˆ− s s

After simplification we get −Vs (s, y) − −

1 αs Vx (s, y), b(s, y, u) + tr(Vxx (s, y)σ(s, y, u)σ T (s, y, u)) 1 + V (s, y) 2 γ

αs2 2 T V (s, y)tr V (s, y)σ(s, y, u)σ (s, y, u) − f (s, y, u) ≤ 0 x 2γ 2

(45)

Let

1 αt G(t, x, u, p, q, r) = pT b(t, x, u) + tr(qσ(t, x, u)σ T (t, x, u)) (1 − r) 2 γ 2 α + t2 r.tr p2 σ(t, x, u)σ T (t, x, u) − f (t, x, u), ∀(t, x, u, p, q, r) ∈ [0, T ] × n × U × S n 2γ

13

Notice that for any > 0, there exists a sample path x0 such that V (ˆ s, x(ˆ s)) ≤ J(ˆ s, x(ˆ s)) ≤ 0 0 f (x (τ ), u(τ ))dτ + h(x (T )) + . Using the uniform continuity of f one can easily show that s ˆ sˆ 1 αs f (t,x(t),u(t))dt+ γ1 αs V (ˆ s,x(ˆ s)) α V (s,y) s = eγ s limsˆ→s E e T

24

Using this definition in (45) leads to the following −Vs (s, y) + G(s, y, u, −Vx(s, y), −Vxx(s, y), −V (s, y)) ≤ 0; ∀u ∈ U Taking the supremum over u ∈ U leads to −Vs (s, y) + sup G(s, y, u, −Vx(s, y), −Vxx(s, y), −V (s, y)) ≤ 0

(46)

u∈U

Now for the reverse inequality, notice that for any > 0 and 0 ≤ s < sˆ ≤ T with (ˆ s − s) w small enough, there exists a control u(·) ≡ u,ˆs(·) ∈ U [s, T ] such that

E

sˆ

γ

s

V (s, y) + (ˆ s − s) ≥

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)) e αs

E e +(1 − γ)E

sˆ

αs

sˆ s

sˆ s



f (t, x(t), u(t))dt

s

Then E{V (s, y) − V (ˆ s, x(ˆ s))} sˆ −s 1 − sˆ − s

E

sˆ

γ

s

sˆ

αs


αs

sˆ

E e

(1 − γ) E −E{V (ˆ s, x(ˆ s))} − sˆ − s

s

sˆ s

s



f (t, x(t), u(t))dt ≥ −

By taking supremum over u ∈ U, we arrive at the following.

sup u∈U

−1 E sˆ − s

sˆ

1 E − sˆ − s

γ

s


sˆ

γ

s

αs

E e

sˆ s

sˆ s



1

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)) − V (s, y) e γ αs V (s,y)

E eαs

sˆ s


1

V (s, y)e γ αs V (s,y) 1 − sˆ − V (s, y) sˆ − s E eαs s f (t,x(t),u(t))dt+ γ1 αs V (ˆs,x(ˆs))

1 − γ sˆ E − f (t, x(t), u(t))dt ≥ − sˆ − s s

25

1

− e γ αs V (s,y)

Since by Assumptions 5.3, f, b and σ are uniformly continuous, then when taking limit the sˆ ↓ s we can exchange supremum and limit. Hence −Vs (s, y) + sup G(s, y, u, −Vx(s, y), −Vxx(s, y), −V (s, y)) ≥ −

(47)

u∈U

Now let → 0 and combine (47) with (46) to get the final result. −Vs (s, y) + sup G(s, y, u, −Vx(s, y), −Vxx(s, y), −V (s, y)) = 0

(48)

u∈U

• Remark 5.12 It is important to determine whether the generalized HJB equation given in Theorem 5.11 reduces to the classical one under no uncertainty. Recall that in Problem 5.5, if γ = 0 then the problem reduces to the classical integral pay-off with expectation taken with β then γ = 0 implies β = 0, which respect to the nominal measure. However, since γ = β+1 contradicts the assumption that β > 2 in Theorem 4.5. Hence, the only possibility is to take the limit as, αs → 0, in which case, G(t, x, u, p, q, r) =

1 < p, b(t, x, u) > + tr(q.σσ T ) − f (t, x, u) 2

and from the generalized HJB equation of Theorem 5.11, we deduce the classical HJB equation associated with the integral pay-off. Notice also that when η = P as considered in this section, then from (15) in the limit as, αs → 0, we have Qu,∗ (E) = γP (E)+(1−γ)P (E) = P (E), E ∈ F , as expected.

Remark 5.13 The generalized HJB equation derived in Theorem 5.11 depends on two extra parameters, specifically, γ and αs . These should be chosen so that the following two conditions are satisfied. 1) The optimal measure occurs on the boundary of the variational norm constraint, i.e., ||Qu,∗ − P || = R. β should satisfy the following equation. 2) The parameter γ = 1+β R β =

T

E

s

f (t, x(t), u(t))dt + h(x(T ))

T s

∞

f (t, x(t), u(t))dt + h(x(T )) eαs

E eαs

T s

T s

f (t,x(t),u(t))dt+αs h(x(T ))

f (t,x(t),u(t))dt+αs h(x(T ))

(49)

Notice that by considering only terminal pay-off that is, letting f = 0, condition (49) is simplified.

26

Remark 5.14 It is possible to consider α fixed. Then bacause of (49), parameters β and γ will become functions of initial time s. In this case, in all the derivations of principle of γ = β1 and the parameter αγs of G optimality and HJB equation, one has to replace γ by 1−γ in (38) should be replaced by βαs . In the next theorem, the continuity property of αs as a function of s is shown. Theorem 5.15 Suppose Assumptions 5.3 holds, then αs in Problem 5.5 is a continuous function of s.

Proof. We first show right continuity. Let Δ > 0, define s = sT f (t, x(t), u(t))dt + h(x(T )), and suppose α is not continuous at point s, then we consider two cases: 1) Assume that the right limit is limsˆ→s+ αsˆ = αs + Δ. Since sˆ > s, by the non-negativity of s we have || sˆ||∞ ≤ || s ||∞

(50)

Then for ξ < 1, there exists a function α such that14 ξ|| sˆ||∞ =

E( sêαsˆsˆ ) Eeαsˆsˆ

and

ξ|| s||∞ =

E( s eαs s ) Eeαs s

By substituting the above in the inequality in (50) and taking the right limit sˆ → s+ , we arrive at the following. E( s e(αs +Δ)s ) E( s eαs s ) ≤ Ee(αs +Δ)s Eeαs s ) se But this is a contradiction since E( is monotone non-decreasing in z. Therefore the Eezs right limit satisfies this inequality limsˆ→s+ αsˆ ≤ αs . zs

2) Now assume that the right limit is limsˆ→s+ αsˆ = αs − Δ, Δ > 0. By definition of sˆ, for any sample path x(·) we have

T sˆ

f (t, x(t), u(t))dt + h(x(T )) ≤ || sˆ||∞ =

1 E( sêαsˆsˆ ) ξ E(eαsˆsˆ )

(51)

E(s ezs ) Eezs

to get the following.

Now let sˆ → s+ in (51) and apply the non-decreasing property of

T s

f (t, x(t), u(t))dt + h(x(T )) ≤

1 E( s eαs s ) 1 E( s e(αs −Δ)s ) < , ξ E(e(αs −Δ)s ) ξ E(eαs s )

P − a.s.

The above holds for any sample path x(·). By taking supremum over all possible sample paths, we arrive at the following. || s ||∞ 14

Here ξ =

1 E( s eαs s ) = || s ||∞ < ξ E(eαs s )

R β

27

which is a contradiction. Hence, again Δ = 0. Finally, by combining the results of 1) and 2), right continuity of αs is established.

For left continuity, we follow a similar method. Let Δ > 0, define s = sT f (t, x(t), u(t))dt + h(x(T )) and suppose α is not continuous at point s, then we consider two cases: 1) Assume that the left limit is limsˆ→s− αsˆ = αs − Δ, Δ > 0. Since sˆ < s, we have (by the non-negativity of s ) || sˆ||∞ ≥ || s ||∞

(52)

Then for ξ < 1, there exists a function α such that ξ|| sˆ||∞ =

E( sêαsˆsˆ ) Eeαsˆsˆ

and

ξ|| s||∞ =

E( s eαs s ) Eeαs s

By substituting the above in the inequality in (52) and taking the right limit sˆ → s− , we arrive at the following. E( s eαs s ) E( s e(αs −Δ)s ) ≥ Ee(αs −Δ)s Eeαs s ) se But this is a contradiction since E( is monotone non-decreasing in z. Therefore the left Eezs limit satisfies this inequality limsˆ→s− αsˆ ≥ αs . zs

2) Now assume that the left limit is limsˆ→s− αsˆ = αs + Δ. By definition of sˆ, for any > 0 there exists a sample path xo (·) such that

T sˆ

f (t, xo (t), u(t))dt + h(xo (T )) + > || sˆ||∞ =

1 E( sêαsˆsˆ ) ξ E(eαsˆsˆ )

Now let sˆ → s− in (53) to get the following. || s ||∞ + ≥

T

f (t, xo (t), u(t))dt + h(xo (T )) + >

s

1 E( s e(αs +Δ)s ) ξ E(e(αs +Δ)s )

Since is arbitrary, letting → 0, then || s ||∞ > By the non-decreasing property of

1 E( s e(αs +Δ)s ) ξ E(e(αs +Δ)s )

E(s ezs ) Eezs

|| s ||∞ >

(in z)

1 E( s eαs s ) = || s ||∞ ξ E(eαs s )

which is a contradiction. Hence, again Δ = 0. Finally, by combining the results of 1) and 2), left continuity of αs is established. • 28

(53)

6

Viscosity Solutions

In this section, we discuss the case in which value function is not smooth enough and thus it may not admit first or second derivatives V1,2 ∈ C([0, T ] × n ). Hence, we try to characterize the value function as a viscosity solution of generalized HJB equation (38). First, the definition of viscosity solutions is introduced (for extensive elaboration see [17]). Definition 6.1 A function V ∈ C([0, T ] × n ) is called a viscosity subsolution of the generalized HJB equation (38), if V (T, x) ≤ h(x),

∀x ∈ n

(54)

and for any φ ∈ C1,2 ([0, T ] × n )15 , whenever V − φ attains a local maximum at (t, x) ∈ [0, T ) × n we have −φt (s, y) + sup G(s, y, u, −φx(s, y), −φxx (s, y), −φ(s, y)) ≤ 0

(55)

u∈U

where G is defined in (5.11). A function V ∈ C([0, T ]×n ) is called a viscosity supersolution of the generalized HJB equation (38) if in (54) and (55), inequality ≤ is changed to ≥ and local maximum is changed to local minimum. Theorem 6.2 Under the assumption 5.3, value function V is a viscosity solution of the generalized HJB equation (38).

Proof. For any φ ∈ C1,2 ([0, T ] × n ) such that E eθ|φ(t,x(t))| < ∞ for θ > 0, let V − φ attain a local maximum at (s, y) ∈ [0, T ) × n . For a u ∈ U, let x(·) = x(·; s, y, u) be the state trajectory with the control u(t) ≡ u, then by the stochastic principle of optimality we have

≤

V (s, y)

E (γ

sˆ s

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)))eαs

αs

sˆ

E e +(1 − γ)E

sˆ

s

sˆ s

f (t,x(t),u(t))+ αγs V (ˆ s,x(ˆ s))


f (t, x(t), u(t))dt

(56)

s

Also from the local maximum assumption we have V (ˆ s, x(ˆ s)) ≤ V (s, y) + φ(ˆ s, x(ˆ s)) − φ(s, y)

(57)

By (57) and Lemma 5.7, the above can be written as follows

V (s, y) ≤

E (γ

+(1 − γ)E

sˆ s

f (t, x(t, u(t))dt + V (s, y) + φ(ˆ s, x(ˆ s)) − φ(s, y))eαs

E eαs sˆ

sˆ s


sˆ s

f (t,xt ,ut )dt+ αγs V (ˆ s,x(ˆ s))

f (t, x(t), u(t))dt

s

15

C1,2 ([0, T ] is the space of all continuous functions g : [0, T ] × n → such that gt , gx and gxx are all continuous in (t, x).

29

The above expression can be written as follows.

φ(s, y) ≤

E (γ

sˆ s

f (t, x(t), u(t))dt + φ(ˆ s, x(ˆ s)))eαs

E eαs

+(1 − γ)E

sˆ

sˆ s

sˆ s



f (t, x(t), u(t))dt

s

Subsequently, E{φ(s, y) − φ(ˆ s, x(ˆ s))} sˆ − s

1 ≤ sˆ − s

sˆ

E

γ

s

αs

sˆ

f (t, x(t), u(t))dt + φ(ˆ s, x(ˆ s)) e

αs

E e

sˆ s

s

f (t,x(t),u(t))dt+ γ1 αs φ(ˆ s,x(ˆ s))


−E{φ(ˆ s, x(ˆ s))} + (1 − γ)E

sˆ

f (t, x(t), u(t))dt

s

The limit of the above inequality as sˆ → s can be computed following the exact same approach as in the derivation of Theorem 5.11 (with V replaced by φ), to show that −φt (s, y) + G(s, y, u, −φx(s, y), −φxx(s, y), −φ(s, y)) ≤ 0,

∀u ∈ U

(58)

and thus −φt (s, y) + sup G(s, y, u, −φx(s, y), −φxx (s, y), −φ(s, y)) ≤ 0 u∈U

Hence, it is shown that V is a viscosity subsolution the generalized HJB equation (38). Now let V − φ attain a local minimum at (s, y) ∈ [0, T ) × n . Then by definition V (ˆ s, x(ˆ s)) ≥ V (s, y) + φ(ˆ s, x(ˆ s)) − φ(s, y)

(59)

By the principle of optimality, for any > 0 and any sˆ > s, there exists a control u(·) ≡ u,ˆs (·) ∈ U w [s, T ] such that

V (s, y) + (ˆ s − s)

≥

E (γ

sˆ s

f (t, x(t), u(t))dt + V (ˆ s, x(ˆ s)))eαs

αs

E e +(1 − γ)E

sˆ

sˆ s

sˆ s



f (t, x(t), u(t))dt

s

Applying (59) and Lemma 5.7 to the previous equation yields V (s, y) + (ˆ s − s)

≥

E (γ

sˆ s

+(1 − γ)E

f (t, x(t), u(t))dt + V (s, y) + φ(ˆ s, x(ˆ s)) − φ(s, y))e

αs

E eαs sˆ

sˆ s


f (t, x(t), u(t))dt

s

30

sˆ s


After simple manipulations we obtain

φ(s, y) + (ˆ s − s) ≥

E (γ

sˆ s

f (t, x(t), u(t))dt + φ(ˆ s, x(ˆ s)))eαs

E eαs +(1 − γ)E

sˆ s

sˆ

sˆ s



f (t, x(t), u(t))dt

s

Then E{φ(s, y) − φ(ˆ s, x(ˆ s))} + sˆ − s 1 ≥ sˆ − s

E

sˆ

γ

s

αs


αs

E e

−E{φ(ˆ s, x(ˆ s))} + (1 − γ)E

sˆ

sˆ s

sˆ s



f (t, x(t), u(t))dt

s

Taking the supremum over u ∈ U.

sup u∈U

E{φ(s, y) − φ(ˆ s, x(ˆ s))} sˆ − s −

− αγs O(ˆ s−s) E

e

sˆ

γ

s

αs


sˆ − s

αs

E e

−E{φ(ˆ s, x(ˆ s))} + (1 − γ)E

sˆ

sˆ s

s



f (t, x(t), u(t))dt

s

sˆ

≥ −

By Assumptions (5.3), when sˆ → s and → 0, the above expression leads to the following inequality. −φt (s, y) + sup G(s, y, u, −φx(s, y), −φxx (s, y), −φ(s, y)) ≥ 0

(60)

u∈U

Hence, V is a viscosity supersolution of the generalized HJB equation (38). Finally, by combining (58) and (60), we conclude that V is a viscosity solution of the generalized HJB equation (38). •

7

Conclusion

In this paper, a new class of uncertain stochastic systems is considered when the uncertainty class is described by variational distance between the unknown and the nominal distribution. The problem is formulated using minimax strategies in which the minimizing player the control seeks to minimize the pay-off while the maximizing player the distribution seeks to 31

maximize it over the uncertainty class. The theory is developed on abstract spaces (Polish Spaces) while the results are applied to stochastic optimal control problems in which the nominal systems is governed by stochastic differential equation. In view of the non-linearity of the pay-off as a functional of the measure, a new dynamic programming or principle of optimality associated with the value function is derived. It is then employed to derive a generalized HJB equation satisfied by the value function using smooth solutions, while the results are also generalized by showing that the value function is a viscosity solution of the generalized HJB equation.

8

Appendix

Proof of Lemma 4.2. Suppose there exists a set E ∈ B(Σ) such that ν ∗ (E) < 0, then E contains a negative subset16 E1 (with respect to measure ν ∗ ). Now define measure ν1∗ as follows. ⎧ ∗ ∗ ⎪ ⎨ ν1 (F ) = ν (F ), if F ⊆ E1 ⎪ ⎩

ν1∗ (F ) = −ν ∗ (F ) + 2μ(F ), if F ⊆ E1

Measure ν1∗ belongs to the constraint set, as shown below. Given > 0, there exists a partition of Σ named P , such that

||ν1∗ − μ|| ≤

F ∈P

|ν1∗ (F ) − μ(F )| +

The above inequality can be re-written as follows. ||ν1∗ − μ||

≤

F ∈P ,F ∩E1 =∅

|ν1∗ (F ) − μ(F )| +

+

F ∈P ,F ∩E1 =∅,F ⊆E1

Using the definition of ||ν1∗ − μ||

≤

ν1∗ ,

we have

≤

|ν1∗ (F

∩ E1 ) − μ(F ∩ E1 )| + |ν1∗ (F ∩ E1c ) − μ(F ∩ E1c )| +

|ν ∗ (F ) − μ(F )| +

F ∈P ,F ∩E1 =∅,F ⊆E1

|ν ∗ (F ) − μ(F )|

F ∈P ,F ⊆E1

F ∈P ,F ∩E1 =∅

+

|ν ∗ (F ) − μ(F )|

F ∈P ,F ⊆E1

F ∈P ,F ∩E1 =∅,F ⊆E1

Then

F ∈P ,F ⊆E1

|ν1∗ (F ) − μ(F )|

− μ(F )| +

|ν ∗ (F ) − μ(F )| +

F ∈P ,F ∩E1 =∅

+

||ν1∗ − μ||

|ν1∗ (F )

∗

|ν (F ∩ E1 ) − μ(F ∩ E1 )| + |ν ∗ (F ∩ E1c ) − μ(F ∩ E1c )| + (61)

16

A set is called negative with respect to a measure, if all of its measurable subsets have negative measure.

32

Define a new partition of Σ by dividing any set F in P into two disjoint sets F ∩ E1 and F ∩ E1c . This will give us a new partition named P1, . Then (61) can be re-written as follows. ||ν1∗ − μ|| ≤

|ν ∗ (F ) − μ(F )| +

F ∈P1,

Subsequently ||ν1∗ − μ|| ≤ ||ν ∗ − μ|| + Since > 0 is arbitrary and ν ∗ ∈ BR (μ), we conclude that ν1∗ ∈ BR (μ) as well. On the other hand, we have

Σ

(x)ν1∗ (dx)

=

(x)ν1∗ (dx) + c

E1

=

(x)ν ∗ (dx) + c

E1

>

E1c

∗

E1

E1

(x)ν (dx) +

(x)ν1∗ (dx) (x)ν1∗ (dx) ∗

(x)ν (dx) = E1

Σ

(x)ν ∗ (dx)

(62)

where the last inequality follows from the assumption that is non-negative and the fact that over E1 , ν1∗ is a non-negative measure, whereas ν ∗ has a negative value for any subset of E1 . It is evident that inequality (62) contradicts the optimality of ν ∗ . Hence ν ∗ must be a non-negative measure. •

Proof of Lemma 5.7. Suppose the inequality (21) is violated on all points of an interval d d f (σ) > dσ g(σ) for all σ ∈ [0, σ0 ]. Taking integral of both sides from 0 to like [0, σ0 ], i.e., dσ σ0 leads to f (σ0 ) > g(σ0 ) which is a contradiction. Now assume that (21) is violated on all points of an interval such as [σ1 , σ2 ], where σ1 > 0 but it holds outside that interval. Then given > 0, we have d g(σ1 ).(σ2 − σ1 ) (63) dσ d g(σ1 − ).(σ2 − σ1 ) ≥ (64) dσ d ≥ (65) f (σ1 − ).(σ2 − σ1 ) dσ where (63) is due to convexity of g and (64) follows from the non-decreasing property of g (σ). From the above, we have g(σ2 ) − g(σ1 )

≥

g(σ2 ) − g(σ1 ) d ≥ f (σ1 − ) σ2 − σ1 dσ Now let σ2 → σ1 and → 0 which leads to g (σ1 ) ≥ f (σ1 ), and contradicts the assumption that (21) is violated on [σ1 , σ2 ]. Hence, we conclude that the inequality in (21) can not be violated on all points of any interval. The only possibility left is its violation on individual points, but this is not possible since the derivatives are continuous. • 33

References [1] P.R. Kumar, J.H., Van Schuppen, On the Optimal Control of Stochastic Systems with an Exponential-of-Integral Performance Index, Journal of Mathematical Analysis and Applications, Vol. 80, pp.312-332, 1981. [2] W.M. McEneany, Connections between Risk-Sensitive Stochastic Control, Differential Games, and H ∞ Control: The Nonlinear Case, PhD Dissertation, Brown University, 1993. [3] C.D. Charalambous, Partially Observable Nonlinear Risk-Sensitive Control Problems: Dynamic Programming and Verification Theorems, IEEE Transactions on Automatic Control, Vol. 42, No. 8, August 1997, pp.1130-1138. [4] P. Dai Pra, L. Meneghini, W. J. Runggaldier, Connections between Stochastic control and Dynamic Games, Matehmatics of Control, Signals and Systems, vol.9, No.4, 1996, pp.303-326. [5] V. A. Ugrinovskii, I.R. Petersen, Finite Horizon Minimax Optimal Control of Stochastic Partially Observed Time Varying Uncertain Systems, Mathematics of Control, Signal and Systems, vol.12, 1999, pp.1-23. [6] I.R. Petersen, M.R. James, P. Dupuis, Minimax Optimal Control of Stochastic Uncertain Systems with Relative Entropy Constraints, IEEE Transactions on Automatic Control, Vol. 45, No. 3, March 2000, pp.398-412. [7] L.P. Hansen, T. J. Sargent, Robust Control and Model Uncertainty, The American Economic Review, Vol.91, No.2, pp.60-66, 2001. [8] F. Rezaei, C.D. Charalambous, A. Kyprianou, Optimization of fully observable nonlinear stochastic uncertain controlled diffusion, Proceeding of the 43rd IEEE Conference on Decision and Control, pp.2555-2560, Dec. 2004. [9] N.U. Ahmed, C.D. Charalambous, Relative Entropy Applied to Optimal Control of Stochastic Uncertain Systems on Hilbert Space, Proceeding of the 44th IEEE Conference on Decision and Control, pp.1776-1781, Dec. 2005. [10] C.D. Charalambous, F. Rezaei, Stochastic Uncertain Systems Subject to Relative Entropy Constraints: Induced Norms and Monotonicity Properties of Minimax Games, IEEE Transactions on Automatic Control, Vol.52, No.4, pp.647-663, April 2007. [11] F. Rezaei, C.D. Charalambous, N.U. Ahmed, Optimization of Stochastic Uncertain Systems with Variational Norm Constraints, in 2007 Proceedings of the 46th IEEE Conference on Decision and Control, New Orleans, U.S.A., December 12-14, 2007.

34

[12] M.S. Pinsker, Information and Information Stability of Random Variables and Processes, Holden-Day, San Francisco, 1964. [13] H.V. Poor, On Robust Wiener Filtering, IEEE Transactions on Automatic Control, Vol. 25, No. 3, June 1980, pp.531-536. [14] K.S. Vastola and H.V. Poor, On Robust Wiener-Kolmogorov Theory, IEEE Transactions on Information Theory, Vol. 30, No. 2, March 1984, pp.315-327. [15] W. Rudin, Functional Analysis, McGraw-Hill, New York, 1991. [16] E. Yang, Z. Zhang, On the Redundancy of Lossy Source Coding with Abstract Alphabets, IEEE Transactions on Information Theory, Vol. IT-45, pp. 1092-1110, 1999. [17] J. Yong and X. Y. Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations, New York: Springer-Verlag, 1999. [18] N. Dunford and J. T. Schwartz, Linear Operators: Part 1: General Theory, Interscience Publishers, Inc., New York, 1957. [19] P. Protter, Stochastic Integration and Differential Equations: A New Approach (Second Edition), Springer Verlag, New York, 1992. [20] V. H. De La Pe˜ na, N. Eisenbaum, Exponential Burkholder Davis Gundy Inequalities, Bull. London Math Soc., Vol.29, 239-242, 1997.

35