Drift and monotonicity conditions for continuous ... - Semantic Scholar

4 downloads 0 Views 887KB Size Report
Manuscript received September 18, 2001; revised April 6, 2002. Recom- mended by Associate Editor X. Y. Zhou. This work was supported in part by CONACYT ...
236

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 2, FEBRUARY 2003

Drift and Monotonicity Conditions for Continuous-Time Controlled Markov Chains With an Average Criterion Xianping Guo and Onésimo Hernández-Lerma, Senior Member, IEEE

Abstract—In this paper, we give conditions for the existence of average optimal policies for continuous-time controlled Markov chains with a denumerable state–space and Borel action sets. The transition rates are allowed to be unbounded, and the reward/cost rates may have neither upper nor lower bounds. In the spirit of the “drift and monotonicity” conditions for continuous-time Markov processes, we propose a new set of conditions on the controlled process’ primitive data under which the existence of optimal (deterministic) stationary policies in the class of randomized Markov policies is proved using the extended generator approach instead of Kolmogorov’s forward equation used in the previous literature, and under which the convergence of a policy iteration method is also shown. Moreover, we use a controlled queueing system to show that all of our conditions are satisfied, whereas those in the previous literature fail to hold. Index Terms—Average (or ergodic) reward/cost criterion, continuous-time controlled Markov chains (or continuous-time Markov decision processes), drift and monotonicity conditions, optimal stationary policy, unbounded transition and reward/cost rates.

I. INTRODUCTION

A

continuous-time controlled Markov chain (also known as a Markov decision process) is specified by four primitive , data: a countable state–space ; control constraint sets ; transition rates which may depend on the current state and reward (or cost) rates . In this paper, we consider continuous-time controlled Markov chains (CMCs) with the average reward (or cost) criterion, in which the transition rates may be unbounded, and the reward (or cost) rates may have neither upper nor lower bounds. We give a new set of conditions that ensure the existence of optimal (deterministic) stationary policies in the class of randomized Markov policies. Moreover, a key feature of these conditions is that they are based on the primitive data, in contrast to all of the previous literature in which, for instance, some hypotheses are imposed on the relative difof discounted optimal values [see (4.1)]. ference Manuscript received September 18, 2001; revised April 6, 2002. Recommended by Associate Editor X. Y. Zhou. This work was supported in part by CONACYT under Grant 37355-E. The work of X. Guo was supported by JIRA’2001/15 of CINVESTAV-IPN, by the Natural Science Foundation of China, and by the Foundation of Hong Kong and Zhongshan University Advanced Research Centre, China. X. Guo is with the Department of Mathematics, CINVESTAV-I.P.N., México City 07000, Mexico, and also with the School of Mathematics and Computational Science, Zhongshan University, Guangzhou 510275, P.R. China (e-mail:[email protected]). O. Hernández-Lerma is with the Department of Mathematics, CINVESTAVI.P.N., México City 07000, Mexico (e-mail: [email protected]). Digital Object Identifier 10.1109/TAC.2002.808469

The average reward (or cost) criterion for stochastic control problems is one of the most popular performance criteria because of its applications in telecommunication and queueing systems, manufacturing processes, and many other real-world situations. The existence of optimal policies for this criterion has been studied for continuous-time CMCs with bounded reward/cost rates (e.g., [2], [11], [14]–[16], and [21], [22]), and/or uniformly bounded transition rates (e.g., [2], [15], [16], [19], [20], [22], and [23]). The case of unbounded transition and reward/cost rates has also been considered, for instance, in [9], [10], [13], and [14], but these references, and also [11], [15], and [21], require an assumption on the relative difference , which is of course not easy to verify because, to begin with, it requires to know the CMCs -discounted optimal value functions. By the same token, [23] imposes conditions on both the mean first entrance time into a finite set and the total expected reward obtained until the first entrance time for the CMCs associated to stationary policies. The latter is another example of hypothesis imposed on the nonprimitive data. On the other hand, to prove the existence of optimal policies the common approach in [9]–[11], [14]–[16], and [21]–[23] is via Kolmogorov’s forward equation, which requires additional assumptions on the interchange of certain integrals and summations. From the viewpoint of applications, however, it is desirable to give conditions on the CMCs primitive data for the existence of optimal policies because such conditions are usually easier to verify; this was the main motivation for this paper. In the spirit of the “drift and monotonicity” conditions for continuous-time Markov processes in [1] and [17], here we propose reasonably mild conditions on the primitive data. Under these conditions and the usual continuity-compactness assumptions, but nothing else, using the extended generator approach instead of Kolmogorov’s forward equation we not only prove the existence of optimal stationary policies, but also give a policy iteration method which converges to an optimal stationary policy. Finally, to further emphasize the difference between our conditions and those in previous papers we give an example on a controlled queueing system in which all of our conditions are satisfied, whereas the hypotheses in, e.g., [9]–[11], [14]–[16], and [21]–[23] fail to hold (see Remark 5.1). The rest of this paper is organized as follows. Section II introduces the control model and the optimal control problem we are concerned with. The results on the existence of optimal policies are given in Section IV, after some technical preliminaries in Section III. A policy iteration method and its convergence are shown in Section V. Our results and approach are illustrated with an example in Section VI.

0018-9286/03$17.00 © 2003 IEEE

GUO AND HERNÁNDEZ-LERMA: DRIFT AND MONOTONICITY CONDITIONS

II. OPTIMAL CONTROL PROBLEM In this section, we introduce the optimal control model we are concerned with

(2.1) is the where is the state space, a denumerable set, and set of admissible control actions in the state , which is assumed . to be a Borel space, endowed with the Borel -algebra Without loss of generality, we shall assume that is the set of . Let nonnegative integers, i.e.,

237

the existence of such processes we will restrict ourselves to control policies in the class is continuous in for each fixed Observe that contains , and, on the other hand, also conservative and stable, i.e.,

and The numbers rates satisfying: Moreover, the matrix that is,

in (2.1) denote the system’s transition for all and . is assumed to be conservative, for all and stable, that is, for each , where for all . Further, is a measurable for each fixed . Finally, the reward/cost function on is assumed to be a measurable rate for each fixed . function in To introduce the optimal control problem we are interested in, we first introduce the class of admissible policies. such Definition 2.1: A function is in for all is called a decision function. that Let be the family of all decision functions. We denote by the family of functions from to such that and , is a probability measure 1) for each ; on and , is a 2) for each . Lebesgue measurable function on in is called a randomized Markov A family such that policy. If in addition there is a function is concentrated at for all and , then is said to be a (deterministic) stationary policy. In the latter case will be identified with , and so will be regarded as the family of stationary policies. The class of randomized Markov policies is . denoted by , the associated transiFor each policy tion and reward/cost rates are defined, respectively, as follows:

for (2.2) for (2.3) , we write and In particular, when as and , respectively, and regard as a function on . . Any (possibly substochastic Let with and nonhomogeneous) transition function is called a -process. To guarantee transition rates

is

(2.4)

, the existence of a -process (such Hence, for each ) as the minimum -process, denoted by is indeed guaranteed, but, as is well known [1], [3], [6], [14], it is not necessarily regular, that is, we might have for some and To ensure the regularity of a -process and finiteness of the criterion (2.5), we give the following “drift” conditions. of subAssumption A: There exist a sequence on and constants sets of , a nondecreasing function , and such that and for each ; 1) ; 2) for all 3) and , where is the indicator function of ; the set for all and . 4) Remark 2.1: a) For the case of uniformly bounded transition rates (i.e., ; see, for instance, [2], [15], [16], [19], [20], [22], and [23]), Assumptions A(1) and A(2) are not required because they are only used to guarantee the regularity of a -process. For the case of unbounded transition rates (e.g., [9]–[11], and [21]), the conditions for a -process to be regular are usually imposed on both the possibly nonhomogeneous minimum -processes and the transition rates. Hence, our Assumption A is quite different from those in [9]–[11], [14], and [21], for instance. b) Assumption A is a variant of both the “drift condition” (2.4) in [17] and the hypotheses of [1, Cor. 2.2.16] for a homogeneous -process to be regular. c) Under Assumption A, by [8, Th. 3.1] we have that a -process with transition rate matrices is regfor all ular, that is and . Thus, under Assumption A we will the associated Markov process, and denote by write the regular transition function simply as Furthermore, for each initial at time , we denote by and state the probability measure determined by and the corresponding expectation operator, respectively. If , we write as . Moreover, we will see that under Assumption A the following criterion (2.5) is well defined and finite; see Lemma 3.2(b).

238

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 2, FEBRUARY 2003

Under Assumption A, the average reward/cost criterion is deand fined as follows: for each (2.5) The corresponding optimal value function is

A policy is said to be average optimal if for all The main aim of this paper is to give conditions on the primitive data for the existence of optimal stationary policies.

Multiplying both sides of (3.5) by and then , we have that the functions summing over satisfy (3.1) with equality. is an Suppose now that arbitrary nonnegative solution of (3.1). Then, from (3.1) and we have

and Arguing by induction, let us now suppose that

III. PRELIMINARIES In this section, we give some preliminary facts that are needed to prove our main results, Theorems 4.1 and 5.4. We shall use the following notation. in Assumption A, we define the For the function for real-valued functions weighted supremum norm on by and the Banach space

and for some . Then, from (3.1)–(3.3) and the induction hypothesis, we get

Note that Lemmas 3.1 and 3.2(a) do not require Assumption A. be an arbitrary nonnegative funcLemma 3.1: Let and , tion on . Then for each is the minimal nonnegative solution of the following “backward” inequality

(3.1) and, moreover, it satisfies (3.1) with equality. Proof: It is well known that the minimum -process can be constructed as follows (see, for , , and instance, [1], [3], [6], and [14]): For each let be the Kronecker delta and

and so

(3.2)

(3.3) Then, the minimum

-process is

The latter inequality together with (3.4) gives that , and so Lemma 3.1 follows. Lemma 3.2: Let be an arbitrary nonnegative function on , and , two constants such that and . Then the following statements are equiva) for each fixed alent:

(3.4) From (3.2)–(3.4), we have that for each

and

for all

and

b) If Assumption A holds, then for each

(3.5)

where the constants

; for all and

. ,

and are as in Assumption A.

GUO AND HERNÁNDEZ-LERMA: DRIFT AND MONOTONICITY CONDITIONS

from

Proof: , we have

and letting

which gives

. For each

,

and

239

,

, the Fatou–Lebesgue Lemma yields

. . Note that the right-hand side of

is nonneg-

ative, i.e.,

for all show that and

Hence, (3.6) holds. b) Under Assumption A we have already mentioned in Re, the -process is regular, and so mark 2.1c) that for each it is the same as the minimum -process. Thus, b) follows from Assumption A and part a). Remark 3.1: a) It should be noted that in Lemmas 3.1 and 3.2a) Assumption A is not required, and that the process involved is the minimum -process. b) Lemma 3.2b) gives conditions for the average criterion (2.5) to be finite in the subclass of randomized Markov policies, and it improves [8, Th. 3.1]. Moreover, it is an extension of parts (i) and (ii) of [18, Th. 2.1]. To prove Theorem 4.1, in addition to the previous results, we also need some facts on the -discounted reward criterion and defined as follows. For any given discount factor , the -discounted reward criterion and the corresponding optimal value function are given by

and Now, by Lemma 3.1, it suffices to satisfies the inequality (3.1), that is, for each

for (3.7) and

(3.6) For each we have

and

, by (2.4) and the condition

,

respectively. is called -discount optimal if A policy for all . To ensure the existence of -discount optimal stationary policies, we use the following standard continuity-compactness assumption (see, for instance, [9], [10], [13], [19], [20], and their references). Assumption B: , is compact. 1) For each and are continuous in for each 2) . fixed is continuous in 3) The function for each fixed . and and con4) There exists a nonnegative function , , and such that for all stants : and , with as in Assumption A. Lemma 3.3: Suppose that Assumptions A and B hold. Then, the following hold. , and a) For each and

where are as in Assumption A. is the unique solution in b) The optimal value function of the optimality equation:

240

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 2, FEBRUARY 2003

Moreover, for each there exists an -discount i.e., optimal stationary policy for all . Proof: a) Under Assumption A, we have already seen that the -process is the same as the minimum for each -process. Thus, part a) follows from Lemma 3.2a), Assumption A and (3.7), whereas part b) follows from [8, Th. 3.2].

of average-optimal policies are imposed in the aforementioned references (4.1) where is an arbitrary but fixed state, and comes from Lemma 3.3. Lemma 4.1: Suppose that Assumptions A and C hold. Then satisfies that

IV. EXISTENCE OF AVERAGE-OPTIMAL POLICIES In this section, we prove the existence of average-optimal stationary policies. To this end, in addition to the “drift conditions” we also propose the following monotonicity conditions. , the following hold. Assumption C: For each fixed for all 1) such that . or there 2) For any two states and , either and , and is an integer , which may depend on distinct states such that

3) For each , either or there is an and , and distinct integer , which may depend on such that and states

and where and are as in Assumption A. Proof: By Assumptions A1)–A3), we know [1], [17] that the corresponding Markov process has for each a unique invariant probability measure , and by Assumption A3) and [8, Lemma 6.1], we see that the “drift” condition [17, as in Assumpeq. (2.4)] is satisfied with the drift function . On the tion A. Moreover, by Assumption A4), other hand, as Assumption C1) is equivalent to the conditions [1, eq. (3.8), p. 249], by [1, Th. 7.3.4] we have that the process is stochastically ordered (or monotone). Furthermore, by Assumptions C2), C3) and (3.2)–(3.4), this process is irreducible and satisfies the condition [17, eq. (2.1)]. Thus, under Assumptions A and C, from [17, Th. 2.2], we obtain

and Remark 4.1: a) Assumption C1) is a variant of the “monotonicity conditions” in [1, p. 249]. On the other hand, Assumption , the process C2) implies that for any given policy is irreducible (see [1, Prop. 5.3.1]), whereas Asthe sumption C3) gives that for any two states process can travel with positive probability from the state to the set for some , . without passing through b) Obviously, Assumption C is easily verified and rather different from those in [9]–[11], [14], [15], [21], and [23] in because it is imposed on the primitive datum dethe model (2.1) instead of the relative difference fined in (4.1), on which some conditions for the existence

(4.2)

for any function on . where , by (3.7) and (4.1), (4.2) we have, for each As and , the equation shown at the bottom of the page, and so the lemma follows. Theorem 4.1: Suppose that Assumptions A, B, and C hold. Then, the following hold. and a staa) There exists a constant , a function satisfying the average reward optimality tionary equation

(4.3)

GUO AND HERNÁNDEZ-LERMA: DRIFT AND MONOTONICITY CONDITIONS

b)

is the average reward optimal value function, that is for all , and, in addition, any realizing the maximum of (4.3) is average-optimal. of discount Proof: a) Choose an arbitrary sequence . By Assumption A and Lemma 3.3a), factors such that is bounded in . Therefore, there exists a of and a constant satisfying that subsequence . By Lemma 4.1, is a sewith quence in the compact metric space as in Lemma 4.1. Then, the Tychonoff Theorem gives the of and a function existence of both a subsequence on such that and (4.4) From Lemma 4.1 and (4.4), we have for all , which implies that belongs to . and , by Lemma On the other hand, for each we have 3.3b) and the conservative property of

241

which can be rewritten as

(4.8) Thus, combining (4.6) and (4.8), we get the first equality in satisfying the second (4.3). Finally, the existence of equality in (4.3) follows from Assumption B1) and well-known “measurable selection theorems;” see, e.g., [12, Lemma 8.3.8(a)]. This completes the proof of part a). and the function in part a), from b) For each (4.3), (2.2) and (2.3), and Assumption B3), we get and On the other hand, as have that for each

is in and

(4.9) , by [8, Lemma 6.1] we

(4.10) (4.5) for all . where Also note that, by Lemma 4.1, for all . Thus, under Assumption B, the “extension of Fatou’s Lemma” [12, 8.3.7], together with (4.4) and (4.5), gives that

and

is the so-called extended generator of the nonhomowhere (see, for instance, [4], [8], geneous -process and [13]). Moreover, by the definition of in (4.4) and Lemma 3.2b) we get

which yields

which implies and (4.6)

(4.11)

Thus, from (4.9)–(4.11) and [13, Prop. 3.5], we get and

and arbitrary Moreover, for each as , from Lemma 3.3 there exists and

such that such that for all

and so (4.12) Similarly, from (4.3), we have (4.13)

(4.7) By Assumption B(1), w.l.o.g. we may suppose that for all for some . Applying again the “ extension of Fatou’s Lemma” [12, 8.3.7], in (4.7), we get letting

for Combining (4.12) with (4.13) gives , and so b) follows. all Remark 4.2: The optimality equation (4.3) for our average optimality criterion (2.5) is obviously equivalent to the following:

242

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 2, FEBRUARY 2003

which, because of the denominators , is different from the optimality equation for jump Markov decision processes; see, for instance, [19, pp. 568 and 555–557] or [20, pp. 243–246]. V. POLICY IMPROVEMENT AND CONVERGENCE In this section, we show that the policy iteration algorithm to obtain an average optimal policy converges. Throughout this section, we suppose that Assumptions A, B, and C hold. , recall that denotes the unique invariant For a given , and, by [1, Prop. probability measure associated a given 5.4.1] it can be uniquely determined by the equation

We now describe the policy iteration algorithm.

Step I. Take . Step II. Solve (5.1) for and then caland as in (5.2). culate Step III. Define a new stationary policy in the following way: for all for which Set

(5.6)

(5.1) Lemma 5.1: For a given

and each

, let

otherwise, i.e., when (5.6) does not such that hold, choose

and

Then, the function

is in

(5.2)

(5.7)

(5.3)

Step IV. If satisfies (5.6) for all , then stop: is an optimal policy (by Theorem 4.1(b)); otherwise, with and go back to Step II. replace

(5.4)

Let be an arbitrary “initial” stationary policy, and let be the sequence of stationary policies obtained by the above for some , then policy iteration method. If is average optimal. Thus, from now on we (by Theorem 4.1) for all . On the other hand, for suppose that each , from (4.2) and (5.2), we can derive that

and satisfies that

Proof: To begin, from (5.2) and (4.2), we have

which, together with (5.2), yields that for all , and so is in . Moreover (noting that (5.4) allows interchange of integrals), by (5.2) we have that for any

and

(5.8)

is a strictly monotonic sequence, for We next show that which we need the following notation and results. For given and each , let

Obviously, for all those and , and with Moreover, by (5.3) and (5.8), we have

which gives

for which (5.6) holds for as in (5.7).

(5.5) (5.9) Then, by [8, Lemma 6.1], letting

in (5.5) we obtain (5.3). Theorem 5.2: If

is not optimal, then

.

GUO AND HERNÁNDEZ-LERMA: DRIFT AND MONOTONICITY CONDITIONS

243

As the proof of Theorem 4.1a), letting Lemma 5.3 and (5.12), we have

Proof: By (5.9), we get

in (5.13), by

(5.10) Moreover, by (5.9) again, and [8, Lemma 6.1], is in . Thus, as in the proof of (4.13), by (5.2), (5.8), and (5.10), we get

that is

which, together with Theorem 4.1b), gives that is average for all . Therefore, optimal, and that has been proved to be monotonically since increasing, by (5.12) we get for all , and so the theorem follows.

(5.11) for all , and This yields this theorem because for at least one (otherwise, is optimal). Theorem 5.2 shows that is a strictly monotonically increasing sequence. Moreover, by Lemma 3.2b) and (5.8), is bounded. Thus, converges to a finite number, . The improvement at and so each iteration is given by (5.9). In the following lemma, we . show that the improvement tends to zero as , as . Lemma 5.3: For each associated to each Proof: Since the process is ergodic, from the result in [7] or [16, eq. (2.11)] we get for all Hence, noting that for all , by (5.11) we get that for any

which together with

gives that

.

Using Lemma 5.3, we may now establish the convergence of the policy iteration algorithm. Theorem 5.4: Starting from any stationary policy , the seobtained by the policy iteration algorithm conquence verges to an average optimal stationary policy. be the sequence of stationary poliProof: Let cies obtained by the aforementioned policy iteration algorithm for some given stationary policy . Since the class is compact, as in the proof of Theorem 4.1 there exists a subsequence of such that and

VI. EXAMPLE In this section, we illustrate our assumptions and results with an example. A controlled queueing system: Consider a queueing system in which the state variable denotes the total number of jobs (in . There service and waiting in the queue) at any time are “natural” arrival and service rates, say and , respectively, and service parameters and arrival parameters controlled by a decision-maker. When the state of the system is , the decision-maker takes an action from , which may increase or decrease a given set the service parameter, and admit or reject arriving jobs. This action results in a re(or ). In addition, the ward (or cost) rate decision-maker gets a reward (or incurs a cost) for each unit of time during which the system remains in the state , where (or is a fixed reward (or cost) fee per customer. We now formulate this queueing system as a continuous-time controlled Markov chain. The corresponding transition rates and reward/cost rates are given as follows. and each For

and for

and all if if if otherwise; (6.1) for

(5.12)

(6.2)

(5.13)

The aim now is to find conditions under which there exists an optimal average stationary policy in the set of randomized Markov policies. To do so, we consider the following conditions. . a) ; b) for , and for all ; and c) and for all . . For each , is a compact metric space.

On the other hand, by (5.3), (5.7), and (5.9)

244

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 2, FEBRUARY 2003

. The functions , and are bounded in the supremum norm and continuous. Let . . Either or when , where and . Under these conditions, we obtain the following. , , and , the aforemenProposition 6.1: Under tioned queueing system satisfies the Assumptions A, B, and C. Therefore (by Theorems 4.1 and 5.4), there exists an average optimal stationary policy in which can be obtained by the policy iteration algorithm. Proof: We shall first verify Assumption A. Let for each , for all , and

when Then Assumptions A1) and A2) are obviously true, and for each and , from Assumptions , and (6.1) we have that for

when when

(6.3)

, let For . Then

(6.4) From (6.3) and (6.4), we see that Assumption A3) holds with and when , or and when . Moreover, Assumption A4) follows for the inequality: . By , and (6.1), we see that Assumption B is all together with (6.1) we satisfied. Finally, from the condition see that Assumption C holds. Remark 6.1: a) It should be noted that for this example both the reward/cost and transition rates are unbounded, and that in (4.1) may have neither upper the relative difference nor lower bounds. b) Proposition 6.1 shows that for the above queueing system our Assumptions A, B, and C are satisfied, whereas the hypotheses in [9]–[11], [14]–[16], and [21]–[23] fail to hold because the reward/cost rates in [11], [14]–[16],

[21], and [22], and the transition rates in [15], [16], [22], and [23], are all uniformly bounded, whereas the relative difference in [9] and [10] is uniformly bounded below in the state and the discount factor . To conclude, it is worth noting that our queueing system can be easily modified to analyze many classes of controlled birth–death processes. For instance, in a population control problem the parameters and would represent the population’s natural birth and death rates, respectively, while the , could be used by the controller to, parameters say, either prevent or encourage immigration. REFERENCES [1] W. J. Anderson, Continuous-Time Markov Chains. New York: Springer-Verlag, 1991. [2] X. R. Cao, “The relations among potentials, perturbation analysis, and Markov decision processes,” Discrete Event Dyna. Syst.: Theory Appl., vol. 8, pp. 71–87, 1998. [3] K. L. Chung, Markov Chains With Stationary Transition Probabilities. Berlin, Germany: Springer-Verlag, 1960. [4] M. H. A. Davis, Markov Models and Optimization. London, U.K.: Chapman and Hall, 1993. [5] E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes. New York: Springer-Verlag, 1979. [6] W. Feller, “On the integro-differential equations of purely discontinuous Markoff processes,” Trans. Amer. Math. Soc., vol. 48, pp. 488–515, 1940. [7] L. Fisher, “On the recurrent denumerable decision process,” Ann. Math. Statist., vol. 39, pp. 424–434, 1968. [8] X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains with discounted rewards, submitted for publication. [9] X. P. Guo and W. P. Zhu, “Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under average criterion,” ANZIAM J., pt. 541, p. 557, 2002. , “Optimality conditions for continuous-time Markov decision pro[10] cesses with average cost criterion,” in Markov Processes and Controlled Markov Chains, Z. T. Hou, J. A. Filar, and A. Y. Chan, Eds. Dordrecht, The Netherlands: Kluwer, 2001. [11] X. P. Guo and K. Liu, “A note on optimality conditions for continuous-time Markov decision processes with average cost criterion,” IEEE Trans. Automat. Contr., vol. 46, pp. 1984–1988, Dec. 2001. [12] O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999. [13] O. Hernández-Lerma, Lectures on Continuous-Time Markov Control Processes. Mexico City, Mexico: Sociedad Matematica Mexicana, 1994, vol. 3. [14] Z. T. Hou and X. P. Guo, Markov Decision Processes (in Chinese). Changsha, China: Science and Technology Press, 1998. [15] P. Kakumanu, “Nondiscounted continuous-time Markov decision processes with countable state space,” SIAM J. Control, vol. 10, pp. 210–220, 1972. [16] , “Continuous time Markov decision processes with average return criterion,” J. Math. Anal. Appl., vol. 52, pp. 173–188, 1975. [17] R. B. Lund, S. P. Meyn, and R. L. Tweedie, “Computable exponential convergence rates for stochastically ordered Markov proceses,” Ann. Appl. Prob., vol. 6, pp. 218–237, 1996. [18] S. P. Meyn and R. L. Tweedie, “Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes,” Adv. Appl. Prob., vol. 25, pp. 518–548, 1993. [19] M. L. Puterman, Markov Decision Processes. New York: Wiley, 1994. [20] L. I. Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999. [21] J. S. Song, “Continuous-time Markov decision programming with nonuniformly bounded transition rates,” Sci. Sin., vol. 12, pp. 1258–1267, 1987. [22] A. A. Yushkevich and E. A. Feinberg, “On homogeneous Markov model with continuous-time and finite or countable state space,” Theory Prob. Appl., vol. 24, pp. 156–161, 1979. [23] S. H. Zheng, “Continuous-time Markov decision programming with average reward criterion and unbounded reward rate,” Acta Math. Appl. Sinica, vol. 7, pp. 6–16, 1991.

GUO AND HERNÁNDEZ-LERMA: DRIFT AND MONOTONICITY CONDITIONS

Xianping Guo received the B.S. and M.S. degrees from the Department of Mathematics of Hunan Normal University, P.R. China, and the Ph.D. degree in probability and statistics from Changsha Railway University, P.R. China, in 1987, 1990, and 1996, respectively. From September 1996 to August 1998, he held a Postdoctoral Appointment at Zhongshan University, P.R. China. From August 2000 to July 2002, he was a Visiting Professor at CINVESTAV-IPN, Mexico City, Mexico. Now he is with Zhongshan University, P.R. China. His research interests include Markov control processes, stochastic games, and stochastic processes.

245

Onésimo Hernández-Lerma (S’76–M’78–SM’92) received the “Licenciatura” degree from the National Polytechnic Institute (I.P.N.), Mexico City, Mexico, and the M.Sc. and Ph.D. degrees in applied mathematics from Brown University, Providence, RI, in 1971, 1976, and 1978, respectively. Since August 1978, he has been with the Mathematics Department of CINVESTAV-IPN, Mexico City, Mexico. His research interests include Markov control processes, stochastic games, and infinite-dimensional linear programming. Dr. Hernández-Lerma received the Sciences and Arts National Award from the Government of Mexico in 2001.

Suggest Documents