t the system is in state Xt = i e If, and the control. At = a E A(i) is chosen, then. (i) a cost c(i, a) is in- curred, and (ii) the system moves to a new state Xt+l according.
Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998
CONTROLLED RISK-SENSITIVE
MARKOV CRITERIA:
TA09-1 10:00
CHAINS WITH SOME (COUNTER)
EXAMPLES1 Agustin Department
of Mathematics,
Rolando Departamento
de Estadistica
Brau-Rojas2
y Czilculo,
Emmanuel Systems and Industrial
Universidad
First,
we study
policy,
for finite
the risk-sensitive
to a fixed
Aut6noma
Agraria
Antonio
Narro,
M&ico,
Department,
The University
of Arizona
in these criteria by means of an exponential disutility function UT(z) = sgn(-y)e~’, where -y # O is the con-
This paper is concerned with risk-sensitive versions of the standard average coat and discounted coat criteria for controlled Markov chains. Risk-sensitivity is modelled by means of the family of exponential disutility functions.
of Arizona,
Fern&ndez-Gaucherand
Engineering
Abstract
cost corresponding
The University
Cavazos-Cadena3
stationmy
state space models.
average
deterministic
We examine
some
(counter) examples which illustrate how the behavior of the risk-sensitive model departs from that of the risk-
stant risk-sensitivity coefficient, so that we will refer to them as the exponential average cost (EAC) and the exponential discounted see [16, 18, 20]. The paper Section
2, the model
terminology the EAC bitrary)
cost (EDC), respectivel~ is organized as follows. In
and some generzd notation
are introduced. corresponding
stationary
In Section
to a fixed
deterministic
(but
policy
and
3, we discuss otherwise ~m.
ar-
First,
we
state a result, Theorem in terms of the spectral
1, which characterizes the EAC radii of certain matrices associ-
an example of a controlled Markov chain with infinite state space which, as opposed to Jaquette’s result for
ated to the irreducible recurrent and transient
communicating classes of both states. Theorem 1 generalizes
finite timal
Howard-Matheson’s characterization, which treats only the particular case in which the probability transition Then, we employ matrix P induced by f is primitive.
null one in the non-irreducible
case. Finally,
we present
models, does not have ultimately stationary oppolicies with respect to the risk-sensitive (expo-
nential)
discounted
cost. The controlled
in that
example
dition,
an assumption
count
approach
ample
provides
the mentioned
satisfies
a simultaneous
that
enables
in the risk-null
might be even harder already indicated.
con-
the vanishing
dis-
to think
Thus, that
for the risk-sensitive
than
what
chain
Doeblin
models.
more evidence approach
Markov
Jaquette’s
the ex-
obtaining models result
had
Theorem
1 to illustrate,
by means
two peculiar
features
is transient:
1) The EAC
abilities
of entering
of two examples,
of the EAC when the initial is not affected
the irreducible
state
by the prob-
closed classes the
initial state leads to, as long as those probabilities are kept positive, but only by the EAC on those classes, and 2) The EAC may depend on the cost structure at the transient st atea. In Section 4, we provide an example
in connection
with
the question
of the feasi-
bility of a vanishing discount approach for exponential risk-sensitive models; see [1] and references therein for
1 Introduction
the vanishing
discount
approach
in the risk-null
model.
In this paper we investigate the risk-sensitive versions of the standaxd average cost and the discounted cost
Jaquette [15] proved that the optimal policies for the control problem with exponential discounted cost and
criteria for controlled Markov chains (CMC’S) introduced by Howard and Matheson [14] and Jaquette [15], respectively; see [4, 7, 6, 5, 9, 10, 11, 16] for other recent results on theee criteria. Risk-sensitivity is modelled
finite state space are not stationary, in general, He showed, however, that for the mentioned control problem, there must exist optimal policies that are uMmateiy stationary, i.e., that apply the same decision
1Research partially supported by the National Science Foundation under grant NSF-INT 9602939 zThe work of this author was partially supported by Univer-
function after a determined stage. Our example consists of a CMC with injinite state space for which the EDC control problem does not have ultimately station-
sidad
de Sonora,
3Resoarch (No.
and
partially
CONACyT, supported
M&ico, by
a grant
from
CONACyT
ary optimal policies, thus showing that for infinite state space models, the mentioned control problem is in gen-
E 120.3336).
0-7803-4394-8/98 $10.00 (c) 1998 IEEE
1853
Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998
eral
even
models,
“more
non-stationary”
Moreover,
a simultaneous
the CMC
Doeblin
than
condition,
discount
context,
the mentioned
Therefore, applying
in the risk-sensitive what
Jsquette’s
for some recent
might
that
results
in that
indicated;
direction.
em appendix
expectation
write
The
P:
(risk-null)
and initial
than
and denote
we
cost due to a policy
i G X will
limsup .n+cc
~E~ n
~Ct t=o
Considers
a stochastic
(2&A, {A(i)
[1
The certainty
dynamic
: i c X}, P,c),
Model.
equivalent
8(7, Z) :=
system,
specified
by
where:
Heu.risticaJly,
of a random
X= {1,2,..., in the infinite
N} when X is finite case.
b) A, the action
or control
set.
We will
3 The
set.
as K := {(i)a) : i G &a ~ A(i)}. d) P = {P( [ i, a) : (i, a) c K}, is a transition probability on K x 2X, where 2X is the family of all subsets For brevity,
or Ptj (a) instead
sometimes of P({j}
we will
write
P(j
I i, a)
I z, a).
e) c : K + R is the one-stage cost function. We will assume that c(-,.) > 0 and that K := sup{c(i, a) : (i,a)
c K}
0 for some n z 1 (observe that with this definition it may happen that i * i, in which case we say that i is an irrelevant state.) Also, i + j, which is read “i and j commu-
Lemma
2 The EAC
satisfies
J(T, i) = max{sgn(~)
the equation
J(cY, j) : i -+ j},
i =X,
nicate”, means that i ~ j and j“ ~ i. Similarly, for C C X, i + C (’the class C is accesible from i’) means i ~ j for some j 6 C. A class of states C c X is
that
called self-communicating
if
a) i ++ j for every i, j c C,
and b) there does not exist a subclam
C’ c X and con-
o
0
...
0
Remark
0
P2
o
...
0
risk-sensitive
(2),
the
matrices
P,,
-Pmo . . . Wm r
=
P is expressed in
then for
each i G X we
{
Iogp(>r)
(3)
: i ~ C,}
2 Two
remarkable
differences
and the risk-neutral
between
the
model become appar-
ent ~rom Theorem 2: (1) The EAC when beginning at a transient state is not a typical average of the EA C7S over the closed classes
(2)
ducible transition probability to the closed and irreducible
matrix
form,
J(T, i) = ~ max
PI
oo-. WI W2 [)
canonical
probability matrix form (cf. [13]):
p=;:.,,;.
In
1 If the probability
the irreducible have
taining C such that (a) holds for C’. We will denote the spectral radius of a square matrix A by p(A). We will assume that the transition P is put in its irreducible canonicai
Theorem
T
that are accesible from
the initial
state; and
(2) The EAC may depend on the cost structure transient states.
1,,, , , m are the irre-
matrices corresponding classes Cr, r = 1, ... , m.
at the
The entries of the matrices Wi, T = 1,... , m, are the one step transition probabilities from transient to re-
Remark 3 Notice that the proof of the theorem is still valid if we substitute lim sup by lim inf wherever the
current states. We also assume that T, the matrix whose entries are the one-step transition probabilities
first one appears. We deduce from that observation that, for the finite state space model, the limit exists in the definition of the EA C.
among
transient
states, is in turn
Pm+l
00.
G21
Pm+2
in the form 0
It can be proved
0
;l 1
the risk-neutral average cost, i.e., J(O, i) := @Vi c X, then J(c, i) turns out to be continuous at y = O for every i c X (cf. [2, 3]). Our first example demonstratees how characteristic (1) in Remark 2 may cause the mentioned property of the EAC not to hold in the
is irreducible
non-irreducible case. The example considers a model wit h a transient state x which leads to more than one
. o
.
.
? Gi-I–m,I
G1–I-m,2 G1-m,2
[’ G-rn,~
where each matrix or equal
“
“
Pi-l Gl_m,t_qn_I
P~, r = m + 1,...,1,
to the one by one zero matrix.
forsomer=m+
If P. #
(0)
closed class.
l,...,
i, then the entries of P. # (0) are the transition probabilities among states in a selfcommunicating class Cr of transient states. If F?r = (Pjj)
that if we define the EAC for ~ = O w
= (0), we say that G = {j}
is an irrelevant
Consider the cost process with Example 1. space X = {1,2, 3}, transition probability y matrix
state
class. 100
Before stating the announced representation of the EAC, we will state two lemmas which provide addl-
0-7803-4394-8/98 $10.00 (c) 1998 IEEE
P=o ()p
lo, l–p
o
1855
Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998
and cost vector
c = (1,3, 5).
From
Theorem
1 we see
that
J(v, 3) = { so that
hl+
max{c(l),
c(2)}
= 3
if -y >0,
min{c(l),
c(2)}
= 1
if -y 0 such that for every transient state i and
J(~, i) = ~ max
cost:
stationary
if-f0. Let )ti. real random
ables Z and Y~, n = 1,2, . . . with
distributions
stationary
policy
as i I co.
as above, a similar (but more technically inexample can be constructed where ultimately
stationary initial
The first step is to take independent
no ultimately
since tiT w
Remark 6 Note that the optimal policy r’ is ultimately stationary for each initial state i jlxed. Using the same method volved)
2 ( (,
=) A..
a survey.
SIAM
J. Control
and Optimization,
31(2):282-344,
March
[2]
and E. Fern4ndez-Gaucherand.
A. Brau
1993. Con-
trolled Markov chains with risk-sensitive exponential average cost criterion. In Proceedings o,f the 86th IEEE Conference on Decision and Control, pages 2260-2264, San Diego,
CA,
1997.
Thus,
if we define the subsequence {’)’k} := {h,} of : & < fi} aud ~k = {2.} recursively by VI = m~{~n for k > 2. then, it msx{~~ : An < @,& < Vk-1}
[3] A. Brau-Rojaa and E. Fern&ndez-Gaucherand. Controlled Markov chains with risk-sensitive average cost criterion Foundations and (counter) examples. In
is easy to verify that Z and the sequences W~ := JLi y
preparation,
ti = max{t
[4] R. Cavazos-Cadena and E. Fern6ndezControlled Markov chains with riskGaucherand. A counter-example sensitive average cost criterion: and necessary conditions for optimal solutions un-
: @t > -y~} satisfy (5). We are now prepared the announced example.
to construct
Example
4. Consider
the CMC
defined
der
by
strong
publication, X;=
N,
A:=
{al,
a2},
P(1 I I,al) P(i C(i,
CIl)
w
Z,
I i,az)
A(l) = = P(I
=P(l
C(l, aI)
N
Z
{01},
I i,al)
and
40
‘f%
= 1,
Ii,a2)= ;, and
C(i,
0-7803-4394-8/98 $10.00 (c) 1998 IEEE
aa)
N
Wi,
,
1998.
recurrence
assumptions.
Ca-os-Cadena [5] R. Controlled Gaucherand. sensitive criteria: Average and optimal solutions. Methods
Submitted
for
1998.
in Operations
emd Markov
Ferw4ndezE. chains with risk-
cost, optimalit y equations, To appear, Mathematical
Research,
March
1998.
1857
Proceedings of the 37th IEEE Conference on Decision & Control • Tampa, Florida USA • December 1998
[6]
R.
Cavazos-Cadena
risk-sensitive stochastic
and
Markov
Gaucherand,
average
E,
decision
cost criterion:
games approach,
The
Submitted
with
discounted
for publication,
[7] R. Cavazos-Cadena and E. FerntidezThe vanishing discount approach in Gaucherand. Markov chains with risk-sensitive criteria. Submitted for publication, 1998. [8] A. Dembo Techniques and Boston,
MA,
W.
and O. Zeitouni. Large Deviations Applications. Jones and Bartlett,
1993.
H. Flemiug
and
D.
control
of Markov
and S. Marcus.
processes in countable
space. Systems and Control
Letters,
[11] D. Hern6ndez-Hern6ndez tence of risk sensitive optimal controlled
Markov
R.
sensitive
Control
A.
(preprint),
and J. Laserre.
Markov
and
decision
J.
E.
Markov
D, Gilliam,
Control
in Systems
in
the
Matriz
decision
.“.
;
‘..
; Aml
[
A7nz
where the diagonal
[18]
J. W. Pratt.
Arn,m-l
Amrn ‘1
:1
Am._l,m-l
The following
1982.
[17] J. Neveu. Mathematical Foundations of the Calculus of Probability. Holden-Day,Inc., San Ikancisco, Cal.,
0 o
.-.
P(A) = max {p(Aii)
1997.
the large.
Am_l,z
lim sup
criterion for Markov Science, 23(1):43-49,
Twenty-First
and Control,
...
O
Analysis.
E. Fern&ndez-Gaucherand, S. Coraluppi, and P. Fard.
Marcus, [16] S. L D. HernAndez-Hernfidez,
B. Data,
New York,
1972.
[15] S. C. Jaquette. A utility decision processes. Management September 1976.
Risk sensitive
Discrete
Springer,
processes.
March
0
state
1998,
Press, Cambridge,
Howard
o A2z
and S, Marcus. Exisstationary policies for
Processes.
University
ence, 18(7):356-369,
and
in
A is
square matrix
A21
Risk
1997. (to appear).
R. A. Horn and C. R. Johnson.
Cambridge [14]
processes,
O. Hern&ndez-Lerma
Time Marlcov 1996, [13]
can be found
All
A=
Hern6ndez-Hern6ndez.
D. Hern4ndez-Hern6ndez
sensitive
[12]
proposition
Proposition 1 If a non-negative in the block form
Am_l,l
Risk sensitive control of finite state machines on an infinite horizon I. SIAM Journal on Control and Optimization, 35(5): 1970-1810, September 1997. [10]
The proof of the following ([13]).
1998.
[9]
Appendix
Fern&ndez-
processes
TA09-1 10:00
=0
if A=
>0
if~
numbers
{Am :
An,
2.
1858