TA09-1 Controlled Markov Chains with Risk

0 downloads 0 Views 651KB Size Report
t the system is in state Xt = i e If, and the control. At = a E A(i) is chosen, then. (i) a cost c(i, a) is in- curred, and (ii) the system moves to a new state Xt+l according.
Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998

CONTROLLED RISK-SENSITIVE

MARKOV CRITERIA:

TA09-1 10:00

CHAINS WITH SOME (COUNTER)

EXAMPLES1 Agustin Department

of Mathematics,

Rolando Departamento

de Estadistica

Brau-Rojas2

y Czilculo,

Emmanuel Systems and Industrial

Universidad

First,

we study

policy,

for finite

the risk-sensitive

to a fixed

Aut6noma

Agraria

Antonio

Narro,

M&ico,

Department,

The University

of Arizona

in these criteria by means of an exponential disutility function UT(z) = sgn(-y)e~’, where -y # O is the con-

This paper is concerned with risk-sensitive versions of the standard average coat and discounted coat criteria for controlled Markov chains. Risk-sensitivity is modelled by means of the family of exponential disutility functions.

of Arizona,

Fern&ndez-Gaucherand

Engineering

Abstract

cost corresponding

The University

Cavazos-Cadena3

stationmy

state space models.

average

deterministic

We examine

some

(counter) examples which illustrate how the behavior of the risk-sensitive model departs from that of the risk-

stant risk-sensitivity coefficient, so that we will refer to them as the exponential average cost (EAC) and the exponential discounted see [16, 18, 20]. The paper Section

2, the model

terminology the EAC bitrary)

cost (EDC), respectivel~ is organized as follows. In

and some generzd notation

are introduced. corresponding

stationary

In Section

to a fixed

deterministic

(but

policy

and

3, we discuss otherwise ~m.

ar-

First,

we

state a result, Theorem in terms of the spectral

1, which characterizes the EAC radii of certain matrices associ-

an example of a controlled Markov chain with infinite state space which, as opposed to Jaquette’s result for

ated to the irreducible recurrent and transient

communicating classes of both states. Theorem 1 generalizes

finite timal

Howard-Matheson’s characterization, which treats only the particular case in which the probability transition Then, we employ matrix P induced by f is primitive.

null one in the non-irreducible

case. Finally,

we present

models, does not have ultimately stationary oppolicies with respect to the risk-sensitive (expo-

nential)

discounted

cost. The controlled

in that

example

dition,

an assumption

count

approach

ample

provides

the mentioned

satisfies

a simultaneous

that

enables

in the risk-null

might be even harder already indicated.

con-

the vanishing

dis-

to think

Thus, that

for the risk-sensitive

than

what

chain

Doeblin

models.

more evidence approach

Markov

Jaquette’s

the ex-

obtaining models result

had

Theorem

1 to illustrate,

by means

two peculiar

features

is transient:

1) The EAC

abilities

of entering

of two examples,

of the EAC when the initial is not affected

the irreducible

state

by the prob-

closed classes the

initial state leads to, as long as those probabilities are kept positive, but only by the EAC on those classes, and 2) The EAC may depend on the cost structure at the transient st atea. In Section 4, we provide an example

in connection

with

the question

of the feasi-

bility of a vanishing discount approach for exponential risk-sensitive models; see [1] and references therein for

1 Introduction

the vanishing

discount

approach

in the risk-null

model.

In this paper we investigate the risk-sensitive versions of the standaxd average cost and the discounted cost

Jaquette [15] proved that the optimal policies for the control problem with exponential discounted cost and

criteria for controlled Markov chains (CMC’S) introduced by Howard and Matheson [14] and Jaquette [15], respectively; see [4, 7, 6, 5, 9, 10, 11, 16] for other recent results on theee criteria. Risk-sensitivity is modelled

finite state space are not stationary, in general, He showed, however, that for the mentioned control problem, there must exist optimal policies that are uMmateiy stationary, i.e., that apply the same decision

1Research partially supported by the National Science Foundation under grant NSF-INT 9602939 zThe work of this author was partially supported by Univer-

function after a determined stage. Our example consists of a CMC with injinite state space for which the EDC control problem does not have ultimately station-

sidad

de Sonora,

3Resoarch (No.

and

partially

CONACyT, supported

M&ico, by

a grant

from

CONACyT

ary optimal policies, thus showing that for infinite state space models, the mentioned control problem is in gen-

E 120.3336).

0-7803-4394-8/98 $10.00 (c) 1998 IEEE

1853

Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998

eral

even

models,

“more

non-stationary”

Moreover,

a simultaneous

the CMC

Doeblin

than

condition,

discount

context,

the mentioned

Therefore, applying

in the risk-sensitive what

Jsquette’s

for some recent

might

that

results

in that

indicated;

direction.

em appendix

expectation

write

The

P:

(risk-null)

and initial

than

and denote

we

cost due to a policy

i G X will

limsup .n+cc

~E~ n

~Ct t=o

Considers

a stochastic

(2&A, {A(i)

[1

The certainty

dynamic

: i c X}, P,c),

Model.

equivalent

8(7, Z) :=

system,

specified

by

where:

Heu.risticaJly,

of a random

X= {1,2,..., in the infinite

N} when X is finite case.

b) A, the action

or control

set.

We will

3 The

set.

as K := {(i)a) : i G &a ~ A(i)}. d) P = {P( [ i, a) : (i, a) c K}, is a transition probability on K x 2X, where 2X is the family of all subsets For brevity,

or Ptj (a) instead

sometimes of P({j}

we will

write

P(j

I i, a)

I z, a).

e) c : K + R is the one-stage cost function. We will assume that c(-,.) > 0 and that K := sup{c(i, a) : (i,a)

c K}

0 for some n z 1 (observe that with this definition it may happen that i * i, in which case we say that i is an irrelevant state.) Also, i + j, which is read “i and j commu-

Lemma

2 The EAC

satisfies

J(T, i) = max{sgn(~)

the equation

J(cY, j) : i -+ j},

i =X,

nicate”, means that i ~ j and j“ ~ i. Similarly, for C C X, i + C (’the class C is accesible from i’) means i ~ j for some j 6 C. A class of states C c X is

that

called self-communicating

if

a) i ++ j for every i, j c C,

and b) there does not exist a subclam

C’ c X and con-

o

0

...

0

Remark

0

P2

o

...

0

risk-sensitive

(2),

the

matrices

P,,

-Pmo . . . Wm r

=

P is expressed in

then for

each i G X we

{

Iogp(>r)

(3)

: i ~ C,}

2 Two

remarkable

differences

and the risk-neutral

between

the

model become appar-

ent ~rom Theorem 2: (1) The EAC when beginning at a transient state is not a typical average of the EA C7S over the closed classes

(2)

ducible transition probability to the closed and irreducible

matrix

form,

J(T, i) = ~ max

PI

oo-. WI W2 [)

canonical

probability matrix form (cf. [13]):

p=;:.,,;.

In

1 If the probability

the irreducible have

taining C such that (a) holds for C’. We will denote the spectral radius of a square matrix A by p(A). We will assume that the transition P is put in its irreducible canonicai

Theorem

T

that are accesible from

the initial

state; and

(2) The EAC may depend on the cost structure transient states.

1,,, , , m are the irre-

matrices corresponding classes Cr, r = 1, ... , m.

at the

The entries of the matrices Wi, T = 1,... , m, are the one step transition probabilities from transient to re-

Remark 3 Notice that the proof of the theorem is still valid if we substitute lim sup by lim inf wherever the

current states. We also assume that T, the matrix whose entries are the one-step transition probabilities

first one appears. We deduce from that observation that, for the finite state space model, the limit exists in the definition of the EA C.

among

transient

states, is in turn

Pm+l

00.

G21

Pm+2

in the form 0

It can be proved

0

;l 1

the risk-neutral average cost, i.e., J(O, i) := @Vi c X, then J(c, i) turns out to be continuous at y = O for every i c X (cf. [2, 3]). Our first example demonstratees how characteristic (1) in Remark 2 may cause the mentioned property of the EAC not to hold in the

is irreducible

non-irreducible case. The example considers a model wit h a transient state x which leads to more than one

. o

.

.

? Gi-I–m,I

G1–I-m,2 G1-m,2

[’ G-rn,~

where each matrix or equal





Pi-l Gl_m,t_qn_I

P~, r = m + 1,...,1,

to the one by one zero matrix.

forsomer=m+

If P. #

(0)

closed class.

l,...,

i, then the entries of P. # (0) are the transition probabilities among states in a selfcommunicating class Cr of transient states. If F?r = (Pjj)

that if we define the EAC for ~ = O w

= (0), we say that G = {j}

is an irrelevant

Consider the cost process with Example 1. space X = {1,2, 3}, transition probability y matrix

state

class. 100

Before stating the announced representation of the EAC, we will state two lemmas which provide addl-

0-7803-4394-8/98 $10.00 (c) 1998 IEEE

P=o ()p

lo, l–p

o

1855

Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998

and cost vector

c = (1,3, 5).

From

Theorem

1 we see

that

J(v, 3) = { so that

hl+

max{c(l),

c(2)}

= 3

if -y >0,

min{c(l),

c(2)}

= 1

if -y 0 such that for every transient state i and

J(~, i) = ~ max

cost:

stationary

if-f0. Let )ti. real random

ables Z and Y~, n = 1,2, . . . with

distributions

stationary

policy

as i I co.

as above, a similar (but more technically inexample can be constructed where ultimately

stationary initial

The first step is to take independent

no ultimately

since tiT w

Remark 6 Note that the optimal policy r’ is ultimately stationary for each initial state i jlxed. Using the same method volved)

2 ( (,

=) A..

a survey.

SIAM

J. Control

and Optimization,

31(2):282-344,

March

[2]

and E. Fern4ndez-Gaucherand.

A. Brau

1993. Con-

trolled Markov chains with risk-sensitive exponential average cost criterion. In Proceedings o,f the 86th IEEE Conference on Decision and Control, pages 2260-2264, San Diego,

CA,

1997.

Thus,

if we define the subsequence {’)’k} := {h,} of : & < fi} aud ~k = {2.} recursively by VI = m~{~n for k > 2. then, it msx{~~ : An < @,& < Vk-1}

[3] A. Brau-Rojaa and E. Fern&ndez-Gaucherand. Controlled Markov chains with risk-sensitive average cost criterion Foundations and (counter) examples. In

is easy to verify that Z and the sequences W~ := JLi y

preparation,

ti = max{t

[4] R. Cavazos-Cadena and E. Fern6ndezControlled Markov chains with riskGaucherand. A counter-example sensitive average cost criterion: and necessary conditions for optimal solutions un-

: @t > -y~} satisfy (5). We are now prepared the announced example.

to construct

Example

4. Consider

the CMC

defined

der

by

strong

publication, X;=

N,

A:=

{al,

a2},

P(1 I I,al) P(i C(i,

CIl)

w

Z,

I i,az)

A(l) = = P(I

=P(l

C(l, aI)

N

Z

{01},

I i,al)

and

40

‘f%

= 1,

Ii,a2)= ;, and

C(i,

0-7803-4394-8/98 $10.00 (c) 1998 IEEE

aa)

N

Wi,

,

1998.

recurrence

assumptions.

Ca-os-Cadena [5] R. Controlled Gaucherand. sensitive criteria: Average and optimal solutions. Methods

Submitted

for

1998.

in Operations

emd Markov

Ferw4ndezE. chains with risk-

cost, optimalit y equations, To appear, Mathematical

Research,

March

1998.

1857

Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998

[6]

R.

Cavazos-Cadena

risk-sensitive stochastic

and

Markov

Gaucherand,

average

E,

decision

cost criterion:

games approach,

The

Submitted

with

discounted

for publication,

[7] R. Cavazos-Cadena and E. FerntidezThe vanishing discount approach in Gaucherand. Markov chains with risk-sensitive criteria. Submitted for publication, 1998. [8] A. Dembo Techniques and Boston,

MA,

W.

and O. Zeitouni. Large Deviations Applications. Jones and Bartlett,

1993.

H. Flemiug

and

D.

control

of Markov

and S. Marcus.

processes in countable

space. Systems and Control

Letters,

[11] D. Hern6ndez-Hern6ndez tence of risk sensitive optimal controlled

Markov

R.

sensitive

Control

A.

(preprint),

and J. Laserre.

Markov

and

decision

J.

E.

Markov

D, Gilliam,

Control

in Systems

in

the

Matriz

decision

.“.

;

‘..

; Aml

[

A7nz

where the diagonal

[18]

J. W. Pratt.

Arn,m-l

Amrn ‘1

:1
Am._l,m-l

The following

1982.

[17] J. Neveu. Mathematical Foundations of the Calculus of Probability. Holden-Day,Inc., San Ikancisco, Cal.,

0 o

.-.

P(A) = max {p(Aii)

1997.

the large.

Am_l,z

lim sup

criterion for Markov Science, 23(1):43-49,

Twenty-First

and Control,

...

O

Analysis.

E. Fern&ndez-Gaucherand, S. Coraluppi, and P. Fard.

Marcus, [16] S. L D. HernAndez-Hernfidez,

B. Data,

New York,

1972.

[15] S. C. Jaquette. A utility decision processes. Management September 1976.

Risk sensitive

Discrete

Springer,

processes.

March

0

state

1998,

Press, Cambridge,

Howard

o A2z

and S, Marcus. Exisstationary policies for

Processes.

University

ence, 18(7):356-369,

and

in

A is

square matrix

A21

Risk

1997. (to appear).

R. A. Horn and C. R. Johnson.

Cambridge [14]

processes,

O. Hern&ndez-Lerma

Time Marlcov 1996, [13]

can be found

All

A=

Hern6ndez-Hern6ndez.

D. Hern4ndez-Hern6ndez

sensitive

[12]

proposition

Proposition 1 If a non-negative in the block form

Am_l,l

Risk sensitive control of finite state machines on an infinite horizon I. SIAM Journal on Control and Optimization, 35(5): 1970-1810, September 1997. [10]

The proof of the following ([13]).

1998.

[9]

Appendix

Fern&ndez-

processes

TA09-1 10:00

=0

if A=

>0

if~

numbers

{Am :

An,

2.

1858