RECURSIVE UTILITIES WITH A VARIABLE BOUND

0 downloads 0 Views 224KB Size Report
Abstract In this paper we study a new class of recursive utilities in dynamic ... economic agent. .... However, if the impatience is represented by a linear function.
RECURSIVE UTILITIES WITH A VARIABLE BOUND ON IMPATIENCE Anna Ja´ skiewicz, 1 Janusz Matkowski,2 Andrzej S. Nowak 2 ,∗ (November, 2008) Abstract In this paper we study a new class of recursive utilities in dynamic choice processes in a stochastic environment. The basic idea is to introduce a variable measure of impatience for an economic agent. In the literature, this kind of measure is presented by a Lipschitz constant (often a contraction coefficient) concerning the aggregator. In this paper, on the other hand, the contraction property is described by some increasing real-valued function. When this function is linear, then our theory coincides with the well-known case. We make use of an extension of the Banach contraction principle given by Matkowski to derive recursive utilities and solve the associated dynamic programming problem. We present two approaches in order to take into account randomness of future outcomes. Our first approach is in spirit of the von Neumann-Morgenstern concept and is based on the notion of expectation. We construct a recursive utility on the space of trajectories of the process and then take its expected value. It turns out that the associated optimization problem leads to a non-stationary dynamic programming and an infinite system of Bellman equations, which result in obtaining persistently optimal policies. In our second approach, we construct recursive utilities on the space of policies of an agent that have a natural interpretation. The associated optimization problem leads to a solution of a single Bellman equation and deriving a stationary optimal policy for the agent. Our theory is enriched by various applications , e.g., to many growth stochastic models.

Keywords: Koopmans’ equation, Recursive utility, Nonlinear contraction mapping theorem, Dynamic programming, Bellman equation

JEL Classification Numbers: C61, D90.

1. Introduction This paper is devoted to recursive utilities with a new measure of impatience given exogenously by a real-valued function. Our theory is developed for a vast class of decision processes under uncertainty and includes various stochastic growth models as specific examples. To explain the basic idea behind, we first refer to a simple case. Let us consider a standard deterministic growth model with some fixed production technology and a bounded instantaneous utility (in other words, subutility) function u. If c := {ct } is a feasible consumption sequence, then the standard time additive and separable utility is 1

Institute of Mathematics and Computer Science, WrocÃlaw University of Technology, Wybrze˙ze Wyspia´ nskiego 27, 50-370 WrocÃlaw, Poland, e-mail: [email protected] 2 Faculty of Mathematics, Computer Science and Econometrics, University of Zielona G´ ora, Podg´ orna 50, 65-246 Zielona G´ ora, Poland, e-mails: [email protected], [email protected] (corresponding author) ∗ The authors thank Darrell Duffie for encouragement and pointing out related papers by Ozaki, Streufert and Weil.

1

given by U (c) :=

∞ X

β t−1 u(ct )

(1)

t=1

where β ∈ (0, 1) is a fixed discount factor. It is well-known that preferences over the space of consumption sequences involved in utility (1) need satisfy rather restrictive conditions, see for example Debreu (1959) and Theorem 2 in Epstein (1983). Therefore, Koopmans (1960) and Koopmans et al. (1964) proposed a more general approach to construction of recursive utilities via the so-called aggregator. Such an aggregator, roughly speaking, is a function G(a, r) of two real variables. Then, a recursive utility U ∗ is a unique solution to the following equation U ∗ (ct , ct+1 , ...) = G(u(ct ), U ∗ (ct+1 , ct+2 , ...))

(2)

for any t and c = {cτ }. This equation indicates that utility enjoyed from period t on depends on current consumption ct and the aggregate utility U ∗ (ct+1 , ct+2 , ...) from period t + 1 on. Clearly, U in (1) can be obtained by applying the aggregator G(x, y) = a + βr. In the literature, equation (2) is referred to as Koopmans’ equation. One of the commonly used conditions imposed on G is the following |G(a, r1 ) − G(a, r1 )| ≤ β|r1 − r2 | (3) for each a, r1 , r2 and some β ∈ (0, 1) (see Boyer (1975); Lucas and Stokey (1984); Stokey et al. (1989). 3 ) It ensures (by the Banach contraction principle) the existence of a unique bounded solution to equation (2). Recursive utilities derived in this way need not possess time additivity and separability properties, see Beals and Koopmans (1969); Boyer (1975); Epstein and Hynes (1983); Iwai (1972); Lucas and Stokey (1984); Uzawa (1968) and Le Van and Vailakis (2005). For instance, the utilities introduced by Uzawa (1968), and further developed by Epstein (1983, 1987) and Epstein and Hynes (1983), inherit neither of these features and, moreover, can be obtained assuming a similar condition to (3), see Le Van and Vailakis (2005); Rinc´on-Zapatero and Rodrigues-Palmero (2007). Following Boyd III (2006), one can call β in (3) the time perspective factor, since it gives us our first measure of impatience. It coincides with a discount factor, when recursive utility is time additive and separable, e.g., as in (1). Generally, it is not a discount factor. As discussed in Boyer (1975), Koopmans et al. (1964), or Backus et al. (2008), discounting can be either implicit or given endogenously, and, in contrast to the time-additive case, may depend on consumption path. For a broad discussion on recursive utilities (including also an unbounded case) the reader is referred to Boyd III (1990); Becker and Boyd III (1997), Becker and Boyd III (1997); Le Van and Vailakis (2005) and Rinc´on-Zapatero and Rodrigues-Palmero (2007). Moreover, a detailed review concerning time preferences, recursive utilities and endogenous (variable) discounting together with some implications for economics can also be found in Becker and Mulligan (1997); Gong (2006) and references therein. The key idea of this paper is to introduce a variable impatience factor. Namely, we propose to replace the constant β in (3) by an increasing real-valued function δ satisfying some natural conditions such as δ(0) = 0, δ(r) < r for all r > 0. Moreover, we shall assume that |G(a, r1 ) − G(a, r1 )| ≤ δ(|r1 − r2 |)

(4)

holds for all a and r1 , r2 . Obviously, our study is devoted to more involved cases, and therefore condition (4) can be of a little more complicated form. Note that if δ(r) = βr, β ∈ (0, 1), inequality (4) reduces to (3). An extension of the Banach contraction principle given by Matkowski (1975) (see 3

A similar contraction assumption for stochastic dynamic programming was used by Denardo (1967); Bertsekas (1977) and Porteus (1982).

2

also Dugundji and Granas (2003)), enables us to construct a much larger class of recursive utilities. Even in the simple case with aggregator G(a, r) = a + δ(r), we obtain a new utility Uδ∗ (c) = u(c1 ) + δ(u(c2 ) + δ(u(c3 ) + · · · )),

c = {ct }.

(5)

Clearly, (5) is not separable. It reduces to (1), when δ(r) = βr for all r. But, if δ is merely piecewise linear, that is, for some constants β1 , β2 ∈ (0, 1), δ(r) = β1 r for r ≥ 0, and δ(r) = β2 r for r < 0, then we deal with a non-separable case. In this paper, we are concerned with bounded aggregators, but we allow for uncertainty. The lack of separability, even in the simplest example, leads us to nonstationary stochastic dynamic programming, see Hinderer (1970), Kreps (1977a); Kreps (1977b), Kertz and Nachman (1979), Sch¨al (1975, 1981). This happens when we construct recursive utilities on the space of sample paths of the decision process and then consider their expected values (the von Neumann-Morgenstern concept). We would like to emphasize that our approach can be applied to a number of stochastic growth models, and therefore embraces also those studied by Brock and Mirman (1972), Epstein (1983). Unfortunately, within such a framework we cannot expect to obtain a stationary or Markov optimal policy. However, in Section 4, we prove the existence of persistently optimal policies and give their characterization by a system of Bellamn’s optimality equations. Related results were obtained by Dubins and Savage (1976) (in gambling theory), Kertz and Nachman (1979); Sch¨al (1975, 1981) (in stochastic dynamic programming), and by Nowak (1986) (in dynamic games). On the other hand, Kreps and Porteus (1978) suggested a new approach to uncertainty in multiperiod decision processes. Their ideas were further developed and applied to study various issues in macroeconomics by Epstein and Zin (1989) and Weil (1990) where aggregators were assumed to be functions of current utility and some measure (called certainty equivalent) of random future utilities. A concise description of these ideas can be found in Backus et al. (2008). Further results in this direction, some generalizations and applications to stochastic optimization can be found in Ozaki and Streufert (1996) and Streufert (1996). Our Section 5 resembles to some extent approach undertaken in Epstein and Zin (1989), but it is mainly inspired by the works of Denardo (1967); Porteus (1982) and Ozaki and Streufert (1996). Namely, in Section 5, we study preferences on the space of general policies of an agent. Since we are interested in constructing general (history dependent) policies, our construction of recursive utilities is based on an infinite system of Koopmans’ equations. In general, these recursive utilities are “nonexpected” as in Epstein and Zin (1989). However, if the impatience is represented by a linear function δ our approach coincides with that of von Neumann-Morgenstern, see Denardo (1967). Furthermore, we would like to point out that in the simple cases (with aggregators introduced by Uzawa (1968) and Epstein and Hynes (1983) or with the aggregator a + δ(r) 4 ), we are able to obtain a generalized form of the Bellman equation, which can be used to construct stationary optimal policies. For linear functions δ, our results reduce to those of Blackwell (1965), Bertsekas (1977); Bertsekas and Shreve (1978) (in discounted dynamic programming) and Epstein (1983) (in the theory of optimal economic growth).

2. Preliminaries We start with some preliminaries. By R we denote the set of all real numbers. Let S, A be non-empty Borel (subsets of complete separable metric) spaces. Assume that ∆ is a Borel subset of S × A such that A(s) := {c ∈ A : 4

(s, c) ∈ ∆}

In Section 5, we modify the definition of an aggregator to take into account randomness of future utilities and non-stationarity.

3

is non-empty and compact. Then, it is well-known that there exists a Borel mapping ϕ : S 7→ A such that ϕ(s) ∈ A(s) for each s ∈ S, see Corollary 1 in Brown and Purves (1973). A set-valued mapping s → A(s) (induced by the set ∆) is called upper semicontinuous if {s ∈ S : A(s) ∩ K 6= ∅} is closed for each closed set K ⊂ A. Lemma 1: (a) Let g : ∆ 7→ R be a Borel measurable function such that c 7→ g(s, c) is upper semicontinuous on A(s) for each s ∈ S. Then, g ∗ (s) := max g(s, c) c∈A(s)

is measurable and there exists a measurable mapping f ∗ : S 7→ A such that f ∗ (s) ∈ arg max g(s, c) c∈A(s)

for all s ∈ S. (b) If, in addition, we assume that s 7→ A(s) is upper semicontinuous and g is upper semicontinuous on ∆, then g ∗ is also upper semicontinuous. Part (a) follows from Corollary 1 in Brown and Purves (1973) and (b) is a corollary to Berge’s maximum theorem, see pages 115-116 in Berge (1963). Let δ : R 7→ R be a fixed function. By δ (n) , we denote the nth composition of δ with itself. In the sequel, we shall use the following assumptions: Conditions (D): (D1) δ is continuous, non-decreasing, and for any r > 0, δ (n) (r) → 0 as n → ∞. (D2) |δ(s) − δ(r)| ≤ δ(|s − r|) for any r ∈ R and s < 0. The main results in this paper are based on the following extension of the Contraction Mapping Principle given by Matkowski (1975), which is also stated as Theorem 5.2 in Dugundji and Granas (2003). Proposition 1: Let (Y, d) be a complete metric space, δ a function satisfying (D1). If an operator T : Y 7→ Y satisfies the inequality d(T y1 , T y2 ) ≤ δ(d(y1 , y2 )) for any y1 , y2 ∈ Y, then T has a unique fixed point y ∗ ∈ Y, and lim T (k) y = y ∗ ,

k→∞

for each y ∈ Y, where T (k) is the kth composition of T with itself. As in Matkowski (1975), we note that (D1) implies that δ(r) < r for each r > 0. On the contrary, suppose that δ(r0 ) ≥ r0 for some r0 > 0. Then, by the monotonicity of δ, we have that δ (n) (r0 ) ≥ r0 for each n. Hence, δ (n) (r0 ) does not converge to zero, which contradicts (D1). Moreover, (D1) implies that δ(0) = 0. Example 1: Consider a piecewise linear function δ defined as   β1 r, for r ≥ 0 δ(r) :=  β r, for r < 0. 2

4

Observe that δ satisfies (D2) if and only if β2 ≤ β1 . When δ is a discounting function, then this case means that the discount rate is higher for negative utilities. If we restrict our attention to non-negative utilities, then condition (D2) can be dropped. Another example of non-linear δ satisfying (D1) is the following δ(r) := βr + (1 − β) log(1 + r), r ≥ 0, where β is some constant from the interval (0, 1).

3. The dynamic programming model In this section, we examine a dynamic programming model with non-separable utility. Our approach is inspired by the works on non-stationary dynamic programming (Hinderer (1970); Kertz and Nachman (1979); Sch¨al (1975, 1981)), and therefore includes also models studied by Blackwell (1965) and Brock and Mirman (1972). It is worth emphasizing that non-separability of utilities naturally arises in the economic study of various dynamic choice problems. Let us mention here the papers of Hicks (1965); Wan (1971) that deal with certain issues on consumption and production over time or the papers of Grandmont (1977); Jordan (1976, 1977); Stigum (1972) that concern resource allocation models or equilibrium problems in market economies. Let Y be a metric space, and let B(Y ) stand for the the space of all bounded real-valued Borel measurable functions on Y. By k · k we shall denote the supremum norm on B(Y ). A discrete-time decision process is specified by the following objects: (i) X is the state space and is assumed to be a Borel space. (ii) A is the action space and is assumed to be a Borel space. (iii) D is non-empty Borel subset of X × A. We assume that for each x ∈ X, the non-empty x-section A(x) := {c ∈ A : (x, c) ∈ D} of D represents the set of actions available in state x. (In the context of growth theory, c ∈ A(x) will be often referred to as a feasible consumption when the stock is in state x ∈ X.) We presume that A(x) is compact for each x ∈ X. (iv) q is a transition probability from D to X. ˆ 1 = X and Let H ˆ n+1 = (X × A) × . . . × (X × A) ×X, H | {z }

ˆ = (X × A) × (X × A) × . . . . H

n

ˆ n is the set of histories up to the nth period, and H ˆ is the set of infinite histories. We assume Then, H ˆ n and H ˆ are equipped with product Borel σ-algebras. Let xk and ck describe the state and that H ˆ n and action at period k. By hn = (x1 , c1 , . . . , xn ) and h = (x1 , c1 , . . .) we denote the elements of H ˆ H, respectively. Let Hn and H be the sets of feasible histories hn and h, respectively, where each ˆ n (H). ˆ ck ∈ A(xk ). Clearly, Hn (H) is a Borel subset of H (v) U ∈ B(H) is a utility function.

5

A policy 5 π = {pn } is defined as a sequence of Borel measurable mappings pn : Hn 7→ A such that pn (hn ) ∈ A(xn ), n ≥ 1, hn ∈ Hn . We write Π to denote the set of all policies. Let F be the set of all Borel measurable functions f : X 7→ A such that f (x) ∈ A(x), x ∈ X. A Markov policy is a sequence {fn }, where fn ∈ F, n ≥ 1. A stationary policy is a Markov policy {fn }, where fn = f for each n ≥ 1. The set of all Markov (stationary) policies we denote by ΠM (ΠS ). Since A(x) is compact, we note that F is non-empty (see Corollary 1 in Brown and Purves (1973)). Consequently, ΠS 6= ∅, and ΠM 6= ∅, Π 6= ∅. For n ≥ 1 we define H n as the space of future feasible histories of the process from period n onwards. An element of H n is denoted by hn = (cn , xn+1 , cn+1 , xn+2 , ...). According to the IonescuTulcea theorem (Proposition V.1.1 in Neveu (1965)), for any policy π ∈ Π, there exists a unique conditional probability measure P π (·|x) on H 1 given an initial state x1 = x. The expectation operator corresponding to this measure is Exπ . For any initial state x1 = x and π ∈ Π, we define the expected utility as follows Z π V (x, π) := Ex U = U (x, h1 )P π (dh1 |x).

(6)

H1

0 = D × · · · × D for m ≥ 1. An element of H 0 is denoted by h0 = (x , c , · · · , x , c ). Clearly, Let Hm 1 1 m m m m {z } | m

0 ) can be regarded as a function from the space B(H). In this paper, we every function w ∈ B(Hm make the following convergence assumption.

Condition (C): There is a bounded sequence of functions Un ∈ B(Hn0 ) such that kUn − U k → 0 as n → ∞. In the classical example of optimal economic growth model, with X = [0, x ¯] being the set of possible capital stocks and A(x) = [0, x], the function Un is of the form Un (h0n ) =

n X

β t−1 u(ct ).

(7)

t=1

Here, u is a bounded subutility function, β ∈ (0, 1) and ct ∈ A(xt ) is a consumption in state xt . Clearly, the sequence {Un } is convergent. Condition (C) also holds for the utilities introduced by Uzawa (1968) and further studied by Epstein (1983). In this case, Un (h0n ) =

n X

t X u(ct ) exp( v(cτ )),

t=1

(8)

τ =1

where u < 0, v > 0 are bounded continuous functions. Let π = {pn } ∈ Π be any policy. If w is a function of a ∈ A, then pk w(hk ) := w(pk (hk )), hk ∈ Hk , for any k ≥ 1. If w ∈ B(Hk+1 ), then we define Z 0 qw(hk ) := w(h0k , xk+1 )q(dxk+1 |xk , ck ) X

for k ≥ 1. Note that if

w0



0 ), B(Hk+1

then Z qpk+1 w0 (h0k ) := w(h0k , xk+1 , pk+1 (hk+1 ))q(dxk+1 |xk , ck ). X

5

For convenience, we restrict our attention to non-randomized policies. We wish to emphasize that no improvements of the results can be obtained by allowing for randomized ones.

6

According to the Ionescu-Tulcea theorem (Proposition V.1.1 in Neveu (1965)), for any policy π ∈ Π and m ≥ 1, there exists a unique conditional probability measure P π (·|hm ) on H m given hm ∈ Hm . By Ehπm we denote the expectation operator corresponding to the measure P π (·|hm ). If n > m and w ∈ B(Hn0 ), then Z π Ehm w := w(hm , hm )P π (dhm |hm ) = pm qpm+1 · · · qpn w(hm ). Hm

For a given history hm ∈ Hm and π ∈ Π, we define the expected utility from period m onwards as follows V m (hm , π) := Ehπm U. (9) We note that V 1 = V in (6). By Condition (C), and the dominated convergence theorem, we infer that Ehπm U = lim Ehπm Un = lim pm qpm+1 . . . qpn Un (hm ). (10) n→∞

n→∞

For each m ≥ 1 we define P (hm ) := {ν : ν = P π (·|hm )

for some π ∈ Π}

as the set of attainable probability measures on the future given a history hm ∈ Hm . Definition 1: A policy π ∗ ∈ Π is called persistently optimal if Z ∗ Ehπm U = Vm (hm ) := sup U (hm , hm )ν(dhm ) ν∈P (hm ) H m

for hm ∈ Hm and each m ≥ 1. We shall need the following two sets of assumptions, which will be used alternatively. Conditions (S): e ⊂ X, the function q(X|x, e ·) is continous on A(x). (S1) For each x ∈ X and every Borel set X (S2) Every function Un from Condition (C) is upper semicontinuous in actions for any fixed sequence of states. Conditions (W): (W0) The set-valued mapping x 7→ A(x) is upper semicontinuous. (W1)The transition law q is weakly continuous, that is, Z φ(y)q(dy|x, c) X

is a continuous function of (x, c) ∈ D for each bounded continuous function φ. (W2) Every function Un from Condition (C) is upper semicontinuous on Hn0 . Note that Conditions (W) also allow us to consider deterministic optimal growth models with a continuous production function as in Stokey et al. (1989). Proposition 2: Assume (C) and either (S) or (W). Then, there exists a persistently optimal policy. al Remark 1: A persistently optimal policy π ∗ = {πk∗ } is (as in Kertz and Nachman (1979); Sch¨ (1981)) shown to satisfy the following system of Bellman equations: Z Z ∗ Vm (hm ) = Vm+1 (hm+1 )q(dxm+1 |xm , πm (hm )) = max Vm+1 (hm+1 )q(dxm+1 |xm , c) c∈A(xm ) X

X

7

for all m and hm ∈ Hm . The proof under Conditions (C) and (S) is provided in Appendix. It can be also easily adapted to the models with Conditions (C) and (W). It is worth emphasizing that the existence of persistently optimal policies under (W) was already shown by Kertz and Nachman (1979), see Theorem 5.2. However, their proof proceeds along different lines than ours, since they do not assume (C). In consequence, their route to existence is more involved. On the other hand, the result under Conditions (C) and (S) has not been clearly presented so far, and only some remarks were given by Sch¨ al (1981). Persistently optimal policies for non-stationary decision processes are counterparts of stationary optimal ones in stationary dynamic programming. This concept was extensively used in gambling theory, see Dubins and Savage (1976). The dynamic programming models considered in this section were also studied by Hinderer (1970); Kertz and Nachman (1979); Sch¨al (1975, 1981) and extended to stochastic games by Nowak (1986). In addition, such models with general utilities were examined by Kreps (1977a); Kreps (1977b), but from a different point of view. Finally, we would like to mention that Furukawa and Iwamoto (1973, 1976) dealt with similar issues as well, but their works are devoted to other types of recursive utilities with no economic interpretation.

4. Recursive utility on the space of sample paths and its expected value In this section, we first construct a recursive utility on the space of all trajectories of the decision process with the aid of some aggregator G. From this point of view, our approach is similar to the methods, when dealing with deterministic models. The second part is only devoted to the von Neumann-Morgenstern expected utilities. As we have already mentioned in the introduction, if δ is not linear, then even the simplest aggregator leads to non-separable utilities. Within such a framework, persistent optimality arises as a natural concept. We accept the following assumptions: Condition (A0): G : D × R 7→ R is a Borel measurable function such that G(x, c, r) is bounded for every (x, c) ∈ D and r belonging to any bounded set of real numbers. Condition (A1): The function c 7→ G(x, c, r) is upper semicontinuous on A(x) for x ∈ X, r ∈ R. Condition (A2): The function (x, c) 7→ G(x, c, r) is upper semicontinuous on D for r ∈ R. Condition (L): For any r1 , r2 ∈ R and (x, c) ∈ D, it holds |G(x, c, r1 ) − G(x, c, r2 )| ≤ δ(|r1 − r2 |), where δ is a function satisfying Condition (D1). Definition 2: A function U ∗ ∈ B(H) is called a recursive utility with aggregator G, if for every sequence {(xk , ck )} of pairs in D, and any n ≥ 1, it holds U ∗ (xn , cn , xn+1 , cn+1 , . . .) = G(xn , cn , U ∗ (xn+1 , cn+1 , . . .)). Let h = (y1 , b1 , y2 , b2 , ...) ∈ H. The shift operator s : H 7→ H is s(h) := (y2 , b2 , y3 , b3 , ...). Let TG : B(H) 7→ B(H) be defined as (TG u)(h) := G(y1 , b1 , u(s(h))).

8

(11)

By Condition (L), for every u1 , u2 ∈ B(H), we have |G(y1 , b1 , u1 (s(h))) − G(y1 , b1 , u2 (s(h)))| ≤ δ(ku1 − u2 k). Hence, kTG u1 − TG u2 k ≤ δ(ku1 − u2 k). Since B(H) (endowed with the supremum norm) is a Banach space, we infer from Proposition 1 that there is a unique U ∗ ∈ B(H) such that U ∗ (h) = G(y1 , b1 , U ∗ (s(h))),

(12)

for each h = (y1 , b1 , y2 , b2 , ...) ∈ H. Now by defining yk = xn+k−1 , bk = cn+k−1 for k ≥ 1, we conclude from (12) the following result. Theorem 1: Under Conditions (A0) and (L), there exists a unique recursive utility U ∗ ∈ B(H) (k) associated with the aggregator G. Moreover, U ∗ = limk→∞ TG u0 in the supremum metric, where u0 ≡ 0. ³ ´ (k) Remark 2: We notice that TG u0 (h) depends only on the first 2k coordinates of ³ ´ (3) h = (x1 , c1 , x2 , c2 ...) ∈ H. For example, TG u0 (h) = G(x1 , c1 , G(x2 , c2 , G(x3 , c3 , 0))). Thus, U ∗ satisfies Condition (C). Let C(H) be the space of all bounded upper semicontinuous functions on H, and CA (H) the space of all bounded functions on H, which are upper semicontinuous in actions for fixed sequence of states. Clearly, both C(H) and CA (H) are closed subsets of the Banach space B(H). Theorem 2: Assume (A0) and (L). If G is non-decreasing in r ∈ R, and (A1) [(A2)] is satisfied, then there exists a unique recursive utility U ∗ ∈ CA (H) [U ∗ ∈ C(H)] that satisfies Condition (C). Proof: Assume first (A2). Let (xn , cn ) → (x0 , c0 ) in D, rn → r0 as n → ∞. Note that G(xn , cn , rn ) ≤ G(xn , cn , r0 ) + G(xn , cn , rn ) − G(xn , cn , r0 ) ≤ G(xn , cn , r0 ) + δ(|rn − r0 |). Hence, G is upper semicontinuous. Using this fact and our assumption that G is non-decreasing in the last variable r ∈ R, it is easy to see that TG u ∈ C(H) for every u ∈ C(H). By Proposition 1, TG has a unique fixed point U ∗ ∈ C(H) that satisfies Condition (C), see Remark 2. The proof under (A1) is analogous. ¤

The von Neumann-Morgenstern utility and persistently optimal policies If U ∗ is the unique recursive utility (constructed above) in B(H), C(H) or CA (H), then one can consider its expected value Z ∗ π ∗ V (x, π) := Ex U = U ∗ (x, h1 )P π (dh1 |x) (13) H1

for any policy π ∈ Π. Theorem 3: Assume (A0) and (L). Let G be non-decreasing in r ∈ R. If additionally (A1) and (S1) [(A2) and (W0, W1)] are satisfied, then there exists a persistently optimal policy. proof: Assume first (A1) and (S1). From Theorem 2 it follows that there exists a unique recursive utility U ∗ ∈ CA (H) that satisfies Condition (C) (see Remark 2). Furthermore, we notice that

9

assumptions of Proposition 2 also hold. Hence, in this model there exists a persistently optimal policy. Clearly, similar arguments yield the conclusion for the model satisfying Conditions (A2) and (W0, W1). ¤ We now give two examples of aggregators, which are certain modifications of the ones widely used in the literature. We shall assume throughout that δ is a function satisfying Conditions (D1) and (D2). Example 2: Assume that u ∈ B(D) is upper semicontinuous. Define G(x, c, r) := u(x, c) + δ(r) and note that by (D2), Condition (L) holds. Moreover, (A0) is also satisfied and G is non-decreasing in r ∈ R. From Theorem 2, it follows that there exists a unique recursive utility U ∗ ∈ C(H). By Proposition 2, U ∗ can be obtained as a limit (in the supremum metric) of the sequence ³ ´ (k) Uk∗ (h) := TG u0 (h) = u(x1 , c1 ) + δ[u(x2 , c2 ) + δ[u(x3 , c3 ) + ... + δ[u(xk , ck )]...]] when k → ∞ and u0 ≡ 0. Observe that this case also contains models with the utility given in (7). If in addition we assume (W0, W1), then Theorem 3 guarantees the existence of a persistently optimal policy. Example 3: In this example we consider a modified Uzawa’s aggregator (see Uzawa (1968); Epstein and Hynes (1983)). Assume that u ∈ B(D) is upper semicontinuous and negative and b ∈ B(D) is continuous and non-negative. Put G(x, c, r) := e−b(x,c) (u(x, c) + δ(r)). Clearly, since (D2) holds and 0 < e−b(x,c) ≤ 1, for (x, c) ∈ D, one can easily see that Condition (L) is satisfied. Observe also that G is non-decreasing in r ∈ R. By Theorem 2, there exists a unique (k) recursive utility U ∗ ∈ C(H). Moreover, U ∗ = limk→∞ TG u0 , u0 ≡ 0. It is difficult to write a general (k) 0 formula for T u . Therefore, we only give an exact form for T (3) u0 : ³ h i´ (3) (2) (TG u0 )(h) = e−b(x1 ,c1 ) u(x1 , c1 ) + δ (TG u0 )(s(h)) ³ h ³ h i´i´ = e−b(x1 ,c1 ) u(x1 , c1 ) + δ e−b(x2 ,c2 ) u(x2 , c2 ) + δ e−b(x3 ,c3 ) u(x3 , c3 ) . If we additionally assume (W0, W1), then in this model there exists a persistently optimal policy. The exact formulae for functions Uk∗ that correspond to utilities studied by Uzawa (1968) and Epstein (1983) are given in (8).

5. Recursive utility on the space of policies Kreps and Porteus (1978) developed an axiomatic theory of utilities in decision making under uncertainty which is different from the von Neumann-Morgenstern approach. Their ideas were applied by Epstein and Zin (1989) and Weil (1990), who extended the notion of aggregator function for recursive models in economics by taking into account “certainty equivalents” of the future random utilities. The recursive utilities which they obtain are “non-expected” ones. In this section, we present a construction of recursive utilities on the space of policies of an agent. We use aggregators of a general form that include non-stationary policies and sequences of future random utilities. To some extent our approach leads to similar “non-expected” recursive utilities as in Epstein and Zin (1989) and Ozaki

10

and Streufert (1996). However, there are essential differences. Using non-stationary policies we have to take into account partial histories of the decision process as well as certain sequences of random utilities. Therefore, the Koopmans’ equation is replaced by a system of equations. On the other hand, such an approach is also advantageous. Namely, within a fairly simple framework, we are able to derive a single Bellman equation and obtain a stationary optimal policy. In an analogous case studied in Section 4, we were able to construct only persistently optimal policies by a system of infinitely many Bellman equations.

General case Let us fix a policy π = {pn } ∈ Π. By B we denote the space of sequences v := {vn } with vn ∈ B(Hn ) which are bounded in the norm k · k∗ , that is, kvk∗ := sup sup |vn (hn )| < ∞. n≥1 hn ∈Hn

B is a Banach space. Let Πn be the space of all Borel measurable mappings pn : Hn 7→ A. Define Pn := {pn ∈ Πn : and put K :=

∞ [

pn (hn ) ∈ A(xn )},

for n ≥ 1,

Hn × Pn × B(Hn+1 ).

n=1

The supremum norm in each B(Hn ) is denoted by k · k. The idea of aggregation 6 is represented by a function W : K 7→ R, called the aggregator, that satisfies Conditions (Z): (Z1) For any elements vˆ = {ˆ vn }, ve = {e vn } of B, pn ∈ Pn , and hn ∈ Hn , it holds |W (hn , pn , vˆn+1 ) − W (hn , pn , ven+1 )| ≤ δ(kˆ vn+1 − ven+1 k), where δ is a function satisfying Condition (D1). 0 )} ∈ B when v 0 (Z2) {W (hn , pn , vn+1 n+1 ≡ 0 for each n ≥ 1, pn ∈ Pn , and hn ∈ Hn . Using a policy π the agent generates a history hn of the process up to state xn . The expression W (hn , pn , vn+1 ) describes the utility enjoyed by the agent when a “decision function” pn (the nth component of π) is used at hn and future utility from period n + 1 onwards is represented by a random variable vn+1 ∈ B(Hn+1 ). Let us consider, for instance, the classical discounted stochastic model with a constant discount factor β ∈ (0, 1) and subutility u ∈ B(D). Then the aggregator W is defined as follows Z W (hn , pn , vn+1 ) = u(xn , pn (hn )) + β vn+1 (hn , pn (hn ), xn+1 )q(dxn+1 |xn , pn (hn )), X

6

A similar general idea of aggregation is also studied in Ozaki and Streufert (1996) and Streufert (1996). The difference that should be pointed out is that we are interested in solving Koopmans’ equations using a generalized version of the Banach contraction principle. This leads us to non-separable utilities even in the case of very simple aggregators as a + δ(r). However, these utilities have a nice recursive structure. Ozaki and Streufert (1996) and Streufert (1996) prefer to study recursive preferences under some convergence assumptions as in positive or negative dynamic programming, see Bertsekas and Shreve (1978).

11

for all n ≥ 1. Definition 3: We say that a sequence {vn∗ } ∈ B represents recursive utilities with aggregator W, if it verifies the following system of Koopmans’ equations ∗ vn∗ (hn ) = W (hn , pn , vn+1 ).

For an aggregator W , define an operator TW on B by setting TW v := {T1 v, T2 v, . . .}, where (Tn v)(hn ) := W (hn , pn , vn+1 ) for every n ≥ 1, hn ∈ Hn . Note that recursive utilities will be represented by a fixed point v ∗ of TW . Theorem 4: Assume (Z1) and (Z2). Then, there exists a unique sequence v ∗ = {vn∗ } ∈ B (k) representing recursive utilities with aggregator W and limk→∞ kv ∗ − TW v0 k∗ = 0 for any v0 = {v01 , v02 , ...} ∈ B. Proof: We observe that for any v ∈ B, kTW vk∗ = sup sup |(Tn v)(hn )| < ∞. n≥1 hn ∈Hn

Indeed, by (Z1) and (D1) we obtain for every n ≥ 1 that 0 |W (hn , pn , vn+1 ) − W (hn , pn , vn+1 )| ≤ δ(kvn+1 k) ≤ δ(kvk∗ ).

Thus, from (Z2) we easily get that {W (·, pn , vn+1 )} ∈ B. Hence, TW maps the space B into itself. Moreover, TW is contractive. To see that, let vˆ = {ˆ vn }, ve = {e vn } be any elements of B. Then, by (Z1) and (D1) we infer that kTW vˆ − TW vek∗ = sup sup |(Tn vˆ)(hn ) − (Tn ve)(hn )| n≥1 hn ∈Hn

= sup sup |W (hn , pn , vˆn+1 ) − W (hn , pn , ven+1 )| n≥1 hn ∈Hn

≤ sup sup δ(kˆ vn+1 − ven+1 k) ≤ δ(kˆ v − vek∗ ). n≥1 hn ∈Hn

Now the conclusion easily follows from Proposition 1. ¤

The Markovian case When we are concerned with a construction of a recursive utility for any π ∈ ΠM , the situation becomes simpler. An aggregator may then be defined with the aid of a measurable function Γ : D × B(X) 7→ R, a decision rule f ∈ F, and the transition probability q in the following way Z W ∗ (x, f, v) := Γ(x, c, v(y))q(dy|x, f (x)), v ∈ B(X). (14) X

Conditions (Z1) and (Z2) can be replaced by: Conditions (Z*): (Z1*) For any (x, c) ∈ D, r1 and r2 ∈ R, |Γ(x, c, r1 ) − Γ(x, c, r2 )| ≤ δ(|r1 − r2 |),

12

with δ satisfying Conditions (D). (Z2*) There exists k0 > 0 such that for any (x, c) ∈ D, and v ∈ B(X), it holds |Γ(x, c, v(y))| ≤ k0 + kvk. Let B0 be the space of sequences v = {vk } ∈ B such that each vk is a function of xk ∈ X. Thus, vk ∈ B(X) ⊂ B(Hk ). Clearly, B0 is a complete subspace of the Banach space B endowed with the norm k · k∗ . Definition 4: Let π = {fm } ∈ ΠM . A sequence v ∗ = {vk∗ } ∈ B0 represents recursive utilities associated with generator W ∗ , if ∗ ∗ vm (xm ) = W ∗ (xm , fm , vm+1 )

for all xm ∈ X, m ≥ 1. The operator TW ∗ is then defined on B0 by the formula TW ∗ v = {Tn v}, where (Tn v)(xn ) = and v = {vk }. From Theorem 4 we deduce the following result.

W ∗ (xn , fn , vn+1 )

Theorem 5: Let (Z1*) and (Z2*) be satisfied. Then there exists a unique sequence v ∗ = {vk∗ } representing recursive utilities associated with aggregator W ∗ . Moreover, (l)

v ∗ (x1 ) = lim (T1 v0 )(x1 ) = lim W ∗ (x1 , f1 , W ∗ (·, f2 , ..., W ∗ (·, fl , φ)...)), l→∞

l→∞

where v0 = {φ, φ, . . .} ∈ B0 .

Two simple aggregators Example 4: Let u ∈ B(D) and δ be a function satisfying Conditions (D1) and (D2). The aggregator W is defined as follows Z W (hn , pn , vn+1 ) = u(xn , pn (hn )) + δ[vn+1 (hn , pn (hn ), xn+1 )]q(dxn+1 |xn , pn (hn )). (15) X

From Condition (D2), it follows that W satisfies (Z1), and since u ∈ B(D) we conclude that (Z2) also holds. Hence, from Theorem 4 there exists a unique sequence representing recursive utilities v ∗ = {vn∗ } ∈ B. By Proposition 1, we know that (k)

(k)

v ∗ = lim TW v0 = lim TW v 0 k→∞

k→∞

(16)

(in the norm k · k∗ ) for any v0 ∈ B and v 0 ≡ {0, 0, ...}. Having in mind that xk is the last coordinate of hk for each k, we can write the first three iterations of TW for nth coordinate: (Tn v 0 )(hn ) = u(xn , pn (hn )), and Z (Tn(2) v 0 )(hn ) = u(xn , pn (hn )) + = u(xn , pn (hn )) +

ZX X

δ[(Tn+1 v 0 )(hn , pn (hn ), xn+1 )]q(dxn+1 |xn , pn (hn )) δ[u(xn+1 , pn+1 (hn+1 ))]q(dxn+1 |xn , pn (hn )),

13

and Z (Tn(3) v 0 )(hn ) = u(xn , pn (hn ))

(2)

+ ZX

δ[(Tn+1 v 0 )(hn , pn (hn ), xn+1 )]q(dxn+1 |xn , pn (hn ))

= u(xn , pn (hn )) + δ[u(xn+1 , pn+1 (hn+1 )) X Z + δ[u(xn+2 , pn+2 (hn+2 ))]q(dxn+2 |xn+1 , pn+1 (hn+1 ))]q(dxn+1 |xn , pn (hn )). X

(l)

Generally, it is difficult to write a formula for Tn v 0 for an arbitrarily large positive integer l. The situation becomes simpler, if we are concerned with Markov policies. For any f ∈ F and a function φ ∈ B(X), put Z (Qf φ)(x) :=

φ(y)q(dy|x, f (x)).

(17)

X

Fix a Markov policy π = {fk }. For k ≥ 1, define u(fk ) as a function of xk ∈ X in the following way u(fk )(xk ) := u(xk , fk (xk )).

(18)

For any h1 = x1 and v0 = {φ, φ, ...}, φ ∈ B(X), we set (l)

(l)

(T1 φ)(x1 ) := (T1 v0 )(x1 ). Then, we obtain the following formula (l)

(T1 φ)(x1 ) = u(f1 )(x1 ) + Qf1 δ[u(f2 ) + Qf2 δ[u(f3 ) + . . . + Qfl−1 δ[u(fl ) + Qfl δ[φ]] . . .]](x1 ).

(19)

For instance, for l = 3 we have (3)

(T1 φ)(x1 ) = u(f1 )(x1 ) + Qf1 δ[u(f2 ) + Qf2 δ[u(f3 ) + Qf3 δ[φ]]](x1 ). By Theorem 4, the operator TW , associated with the function W defined in (15) and Markov policy π = {fk }, has a unique fixed point v ∗ = {v1∗ , v2∗ , ...}. Clearly, v1∗ is the recursive utility when the process starts at state x1 ∈ X and the policy π = {fk } is employed. As a corollary to (16), we have that (l)

v1∗ (x1 ) = lim (T1 φ)(x1 ) l→∞

(20)

for any φ ∈ B(X). Example 5: Assume that u, b ∈ B(D) and b ≥ 0, whilst u < 0 (compare to Example 3). Let δ be a function satisfying Conditions (D1) and (D2) and let π = {pk } ∈ Π. Define W as follows Z W (hn , pn , vn+1 ) := e−b(xn ,pn (hn )) [u(xn , pn (hn )) + δ[vn+1 (hn , pn (hn ), xn+1 )]q(dxn+1 |xn , pn (hn ))]. X

Clearly, the imposed assumptions imply that W satisfies Conditions (Z). By Theorem 4, there exists (l) a unique sequence v∗ = {v∗n } ∈ B representing recursive utilities. Moreover, v∗ is the limit of TW v0 , (l) when l → ∞ and v0 ∈ B. It is difficult to give a formula for TW v0 , when general policy is employed. Therefore, we confine ourselves to the set of Markov policies. Let π = {fk } ∈ ΠM and v0 = {φ, φ, ...}, (l) φ ∈ B(X). Making use of the notation introduced in (17) and (18), we can express (T1 φ)(x1 ) := (l) (T1 v0 )(x1 ) as follows

14

(l)

(T1 φ)(x1 ) = e−b(f1 )(x1 ) (u(f1 )(x1 ) + Qf1 δ[e−b(f2 ) (u(f2 ) + Qf2 δ[e−b(f3 ) (u(f3 ) + . . . + δ[e

−b(fl )

(21)

(u(fl ) + Qfl δ[φ])] . . .)])])(x1 )

Similarly, as in Example 4, we conclude that (l)

v∗1 (x1 ) = lim (T1 φ)(x1 ) l→∞

(22)

for any φ ∈ B(X).

The Bellman equation and stationary optimal policies The fixed point v ∗ ∈ B of TW was obtained in Theorem 4 for a given policy π ∈ Π. Therefore, we can write vn∗ (hn ) = vn∗ (π)(hn ) (hn ∈ Hn ) to emphasize this dependence. The sequence {vn∗ (π)} captures the recursive structure of utilities depending on a general policy π. However, if we wish to deal with an optimization problem, we are only interested in v1∗ (π)(h1 ), with h1 = x1 . Define Φ(x1 , π) := v1∗ (π)(x1 ). We shall show (under some regularity conditions) that in both models considered in Examples 4 and 5, the agent has a stationary optimal policy. Related results for linear function δ were obtained by Blackwell (1965); Brock and Mirman (1972); Epstein (1983). Let us consider the model from Example 4. Assume that u ∈ B(D) and δ is a function satisfying Conditions (D1) and (D2). Define the dynamic programming operator M by · ¸ Z (M w)(x) := sup u(x, c) + δ[w(y)]q(dy|x, c) , w ∈ B(X). X

c∈A(x)

Our aim is to show the existence of a solution w∗ to the Bellman equation: · ¸ Z ∗ ∗ ∗ w (x) = (M w )(x) = sup u(x, c) + δ[w (y)]q(dy|x, c)

(23)

X

c∈A(x)

for all x ∈ X. We shall impose either of the following two sets of assumptions. Conditions (S*): (S*1) = (S1) and (S*2) The function u(x, ·) is upper semicontinuous for each x ∈ X. Conditions (W*): (W*0) = (W0), (W*1) = (W1) and (W*2) The function u is upper semicontinuous on D. Let C(X) stand for a closed subset of upper semicontinuous functions in the Banach space B(X). Theorem 6: Consider the model from Example 4 and assume (S*) [(W*)]. (a) There exists a unique solution w∗ ∈ B(X) [w∗ ∈ C(X)] to equation (23) and w∗ = liml→∞ M (l) w0 in the supremum metric, with w0 ≡ 0. (b) The supremum in (23) can be replaced by the maximum. Moreover, there exists f ∗ ∈ F such that Z ∗ ∗ w (x) = u(x, f (x)) + δ[w∗ (y)]q(dy|x, f ∗ (x)) =: (Mf ∗ w∗ )(x) (24) X

15

for each x ∈ X. The stationary policy π ∗ := {f ∗ , f ∗ , . . .} ∈ ΠS is optimal within the class of all (l) policies, and w∗ = liml→∞ Mf ∗ w0 . For a proof see Appendix. Let us now turn to studying the model from Example 5. For the recursive utility v∗ ∈ B obtained in Example 5, we define ¯ 1 , π) := v∗1 (π)(x1 ) Φ(x for any policy π ∈ Π and an initial state x1 ∈ X. We shall assume that b, u ∈ B(D), b ≥ 0, u < 0, and δ is a function satisfying Conditions (D1) and (D2). Define the dynamic programming operator M by · ¯ w)(x) := sup (M

Z e

−b(x,c)

(u(x, c) +

¸ δ[w(y)]q(dy|x, c)) ,

w ∈ B(X).

X

c∈A(x)

Our objective is to prove the existence of a solution w∗ to the Bellman equation: · ¸ Z ¯ w∗ )(x) = sup e−b(x,c) (u(x, c) + w∗ (x) = (M δ[w∗ (y)]q(dy|x, c))

(25)

X

c∈A(x)

for all x ∈ X. We shall impose either of the following two sets of assumptions. ¯ ¯ Conditions (S*): (S*1) = (S1) and ¯ (S*2) The function u(x, ·) is upper semicontinuous, whilst the function b(x, ·) is continuous for each x ∈ X. ¯ ¯ ¯ Conditions (W*): (W*0) = (W0), (W*1) = (W1) and ¯ (W*2) The function u is upper semicontinuous, whilst b is continuous on D. The proof of our next result is given in Appendix. ¯ [(W*)]. ¯ Theorem 7: Consider the model from Example 5 and assume (S*) ¯ (l) w0 (a) There exists a unique solution w∗ ∈ B(X) [w∗ ∈ C(X)] to equation (25) and w∗ = liml→∞ M 0 in the supremum metric, with w ≡ 0. (b) The supremum in (25) can be replaced by the maximum. Moreover, there exists f∗ ∈ F such that Z ¯ f w∗ )(x) w∗ (x) = e−b(x,f∗ (x)) (u(x, f∗ (x)) + δ[w∗ (y)]q(dy|x, f∗ (x))) =: (M (26) ∗ X

for each x ∈ X. The stationary policy π∗ := {f∗ , f∗ , . . .} ∈ ΠS is optimal within the class of all policies, ¯ (l) w0 . and w∗ = liml→∞ M f∗ The results given in Theorems 6 and 7 can be generalized to decision processes involving more general aggregators. To show the main idea of this extension we restrict attention to the Markovian case with the aggegator W ∗ in (14). Then, the recursive utilities corresponding to Markov policies are described by Theorem 5. The dynamic programming operator is defined as follows Z ˆ (M w)(x) := sup Γ(x, c, w(y))q(dy|x, c), w ∈ B(X). c∈A(x) X

16

A function w ˆ is a solution to the Bellman equation, if Z ˆ w(x) ˆ = (M w)(x) ˆ = sup Γ(x, c, w(y))q(dy|x, ˆ c)

(27)

c∈A(x) X

for all x ∈ X. We accept the following assumptions: ˆ∗ ): (S ˆ∗ 1) = (S1) and Conditions (S ˆ 2) The function Γ(x, ·, r) is upper semicontinuous for each x ∈ X, r ∈ R and Γ(x, c, ·) is non(S decreasing. ∗

ˆ ∗ ): (W ˆ ∗ 0) = (W0), (W ˆ ∗ 1) = (W1) and Conditions (W ˆ ∗ 2) The function Γ is upper semicontinuous on D and Γ(x, c, ·) is non-decreasing. (W We then have the following result. ˆ ∗ ) [(W ˆ ∗ )] and (Z*). Theorem 8: Assume (S (a) There exists a unique solution w ˆ ∈ B(X) [w ˆ ∈ C(X)] to equation (27) and w ˆ = liml→∞ M (l) w0 in 0 the supremum metric, with w ≡ 0. (b) The supremum in (27) can be replaced by the maximum. Moreover, there exists fˆ ∈ F such that Z ∗ ˆ ˆ ˆw)(x) w(x) ˆ = W (x, f , w) ˆ = Γ(x, c, w(y))q(dy|x, ˆ fˆ(x)) =: (M (28) f ˆ X

for each x ∈ X. The stationary policy π ˆ := {fˆ, fˆ, . . .} ∈ ΠS is optimal within the class of all Markov (l) ˆ w0 . policies, and w ˆ = liml→∞ M ˆ f

The proof is given in Appendix. An extension of this result to the class of general policies is quite obvious, although the notation is more complex. Remark 3: We believe that aggregators W ∗ given in (14) (where non-linear function δ is under the intergral sign) can be useful for studying dynamic stochastic games. On the hand, one can replace W ∗ by W ∗∗ of the form Z W ∗∗ (x, f, v) := Γ(x, c, v(y)q(dy|x, f (x))),

v ∈ B(X).

(29)

X

For instance, Γ can be defined as follows Z Z Γ(x, c, v(y)q(dy|x, f (x))) := u(x, c) + δ[ v(y)q(dy|x, f (x))], X

u ∈ B(D),

v ∈ B(X).

X

This approach is more related to Epstein and Zin (1989) (see also Backus et al. (2008)) or Ozaki and Streufert (1996) in the sense that a “certainty equivalent” of the random variable v (expected value of v) is a substitute for r in Γ. However, we would like to point out that no discussion on any contraction function δ is included in Epstein and Zin (1989) or Ozaki and Streufert (1996), which is a new component in our approach. All results in this section concerning a solution to the Bellman equation can be conveyed to the case with the aggregator W ∗∗ .

17

Appendix Proof of Proposition 2: The proof makes use of some arguments given in Nowak (1986). Assume first (S). Let hm ∈ Hm . For any n ≥ m, consider a finite horizon decision problem from period m till period n with the utility function Un introduced in Condition (C). A policy in this model is a sequence {pm , . . . , pn }, with pk defined in Section 3. By the backward induction, making use of Lemma 1(a) and Condition (S), we conclude that there exists an optimal policy {pom , . . . , pon }, i.e., Z n vm (hm ) := sup Un (hm , hm )ν(dhm ) = pom qpom+1 . . . qpon Un (hm ). ν∈P (hm ) H m

n is a Borel measurable function of h . We claim that {v n (h )} converges (as This implies that vm m m m n → ∞) to Z Vm (hm ) = sup U (hm , hm )ν(dhm ) (30) ν∈P (hm ) H m

uniformly in hm ∈ Hm . This fact is a consequence of Condition (C), since ¯ ¯ Z Z ¯ ¯ ¯ ¯ Un (hm , hm )ν(dhm ) − sup U (hm , hm )ν(dhm )¯ ¯ sup ¯ ¯ν∈P (hm ) H m m ν∈P (hm ) H Z ≤ sup |Un (hm , hm ) − U (hm , hm )| ν(dhm ) ν∈P (hm ) H m

Z



sup ν∈P (hm )

Hm

kUn − U kν(dhm ) = kUn − U k → 0

n → ∞. Hence, vm is Borel measurable. Using the uniform convergence showed above one can easily see that Z Vm (hm ) = max Vm+1 (hm+1 )q(dxm+1 |xm , c) (31) c∈A(xm ) X

for any m. (Recall that xk is the last component of hk , k = m, m + 1.) By Lemma 1(a), for any m ∗ of h ∈ H such that the maximum is attained in (31) there exists a Borel measurable function πm m m ∗ at the point πm (hm ). Thus, we have Z ∗ ∗ Vm (hm ) = πm qVm+1 (hm ) = Vm+1 (hm+1 )q(dxm+1 |xm , πm (hm )), hm ∈ Hm . (32) X

Iterating (32), we obtain for any m and k > m that ∗ ∗ Vm (hm ) = πm qπm+1 q · · · πk∗ qVk+1 (hm ).

(33)

∗ } with π ∗ defined by (32) for each m. We show that Let π ∗ := {πm m ∗

∗ ∗ lim πm qπm+1 q · · · πk∗ qVk+1 (hm ) = Ehπm U.

k→∞

(34)

By the triangle inequality, we have that ∗

∗ |πm qπ ∗ q · · · πk∗ qVk+1 (hm ) − Ehπm U | ¯ ∗ m+1∗ ¯ ∗ ∗ ≤ ¯πm qπm+1 q · · · πk∗ qVk+1 (hm ) − πm qπm+1 q · · · πk∗ Uk (hm )¯ ¯ ¯ ∗ ¯ ∗ ∗ ¯ + ¯πm qπm+1 q · · · πk∗ Uk (hm ) − Ehπm U ¯ .

18

(35)

By Condition (C), the second term in (35) tends to 0 uniformly on Hm . Now let us consider the the first term in (35) and note that ¯ ¯ ∗ ∗ ∗ ∗ ¯πm qπm+1 q · · · π ∗ qVk+1 (hm ) − πm qπm+1 q · · · πk∗ Uk (hm )¯ k ∗ ∗ ∗ ≤ πm qπm+1 q · · · πk−1 q |πk∗ qvk+1 − πk∗ Uk | (hm ).

(36)

Using (30) and (36), we get |πk∗ qVk+1 (hk ) − πk∗ Uk (hk )| ¯ ¯Z Z ¯ ¯ ¯ ¯ U (hk+1 , hk+1 )ν(dhk+1 )q(dxk+1 |xk , πk∗ (hk )) − Uk (hk , π ∗ (hk ))¯ =¯ sup ¯ ¯ X ν∈P (hk+1 ) H k+1 Z Z ¯ ¯ ¯ ¯ ≤ sup ¯U (hk+1 , hk+1 ) − Uk (hk , π ∗ (hk ))¯ ν(dhk+1 )q(dxk+1 |xk , πk∗ (hk )) X ν∈P (hk+1 ) H k+1

≤ kU − Uk k → 0. The former inequality in the above expression is due to the fact that Uk depends neither on xk+1 nor hk+1 . From (30), (33) and (34), it follows that π ∗ is persistently optimal. The proof under Conditions (W) makes use of Lemma 1(b) and proceeds analogously as in the case of Conditions (S). In fact, the existence of a persistently optimal policy under (W) was established by Kertz and Nachman (1979) (see Theorem 3.4). However, their proof is more complicated than ours, since Kertz and Nachman (1979) do not assume Condition (C). ¤

Proof of Theorem 6: If we assume (S*), then from Lemma 1, it follows that M : B(X) 7→ B(X). Similarly, Conditions (W*) and Lemma 1 imply that M maps the complete metric space C(X) into itself. Moreover, from (D1) and (D2) we conclude that M is contractive. These facts together with Proposition 1 imply part (a). The replacement of ’sup’ with ’max’ is justified by Lemma 1. Moreover, Lemma 1 yields the existence of f ∗ ∈ F such that (24) holds. Now we turn to proving that the {f ∗ } ∈ ΠS is an optimal policy. By iteration of (24), we get that w∗ (x1 ) = (M (l) w∗ )(x1 ), that is, Z ∗



Z ∗

w (x1 ) = u(x1 , f (x1 )) + δ[u(x2 , f (x2 )) + δ[u(x3 , f ∗ (x3 )) + . . . + X X Z δ[w∗ (xl+1 )]q(dxl+1 |xl , f ∗ (xl )) . . .]q(dx3 |x2 , f ∗ (x2 ))]q(dx2 |x1 , f ∗ (x1 )).

(37)

X

The right-hand side of (37) equals to the right-hand side of formula (19) with fk = f ∗ for k ≥ 1, and φ = w∗ . This observation together with (20) imply that the right-hand side of (37) converges to Φ(x1 , π ∗ ) as l → ∞ and π ∗ = {f ∗ , f ∗ , ...}. Hence, w∗ (x1 ) = Φ(x1 , π ∗ ). On the other hand, note that Z w∗ (x) ≥ u(x, c) + δ[w∗ (y)]q(dy|x, c) X

for any (x, c) ∈ D. Let π = {fk } ∈ ΠM . Clearly, Z ∗ w (xn ) ≥ u(xn , fn (xn )) + δ[w∗ (xn+1 )]q(dxn+1 |xn , fn (xn )) X

19

for all xn ∈ X, fn , n ≥ 1. Iterating this inequality and making use of the fact that δ is non-decreasing, we get the following expression Z

Z



w (x1 ) ≥ u(x1 , f1 (x1 )) + δ[u(x2 , f2 (x2 )) + δ[u(x3 , f3 (x3 )) + . . . + X X Z δ[w∗ (xl+1 )]q(dxl+1 |xl , fl (xl )) . . .]q(dx3 |x2 , f2 (x2 ))]q(dx2 |x1 , f1 (x1 )).

(38)

X

Note that the right-hand side of (38) equals to the right-hand side of (19), when the Markov policy π = {fk } is used and φ = w∗ . This fact and (20) imply that the right-hand side of (38) converges to Φ(x1 , π) as l → ∞. Thus, we have shown that Φ(x1 , π ∗ ) ≥ Φ(x1 , π) for any π ∈ ΠM . The proof that π ∗ = {f ∗ , f ∗ , ...} is optimal within the class of all policies proceeds along similar lines, but the notation within such a framework is more complex. Therefore, we only show the basic idea employing the set of Markov policies. The fact that w∗ = liml→∞ M (l) w∗ = liml→∞ M (l) w0 follows from the proof of Proposition 1. ¤ ¯ Proof of Theorem 7: If we assume (S*), take w ∈ B(X), then from Lemma 5.5(ii, iii) in Hinderer (1970) or Proposition 7.31 in Bertsekas and Shreve (1978), we conclude that the function Z −b(x,c) c 7→ e (u(x, c) + δ[w(y)]q(dy|x, c)) X

¯ : B(X) 7→ B(X). On is upper semicontinuous for each x ∈ X. Hence, by Lemma 1 it follows that M ¯ the other hand, assuming Conditions (W*), taking w ∈ C(X) and arguing in a similar way, we can ¯ : C(X) 7→ C(X). Moreover, from (D1), (D2) and assumptions imposed on the functions b, u, prove M ¯ is contractive. These facts together with Proposition 1 imply part (a). we conclude that M The replacement of ’sup’ with ’max’ is justified by Lemma 1. Moreover, Lemma 1 yields the existence of f∗ ∈ F such that (26) holds. Now we show that the {f∗ } ∈ ΠS is an optimal policy. By ¯ (l) w∗ )(x1 ), that is, iteration of (26), we get that w∗ (x1 ) = (M Z −b(x1 ,f∗ (x1 ))

w∗ (x1 ) = e Z

ZX X

(u(x1 , f∗ (x1 )) +

X

δ[e−b(x2 ,f∗ (x2 ) )(u(x2 , f∗ (x2 )) +

(39)

δ[e−b(x3 ,f∗ (x3 )) (u(x3 , f∗ (x3 )) + . . . + δ[w∗ (xl+1 )]q(dxl+1 |xl , f∗ (xl )) . . .)]q(dx3 |x2 , f∗ (x2 )))]q(dx2 |x1 , f∗ (x1 ))).

The right-hand side of (39) equals to the right-hand side of formula (21) with fk = f∗ for k ≥ 1, and φ = w∗ . This observation together with (22) imply that the right-hand side of (39) converges to ¯ 1 , π∗ ) as l → ∞ and π∗ = {f∗ , f∗ , ...}. Hence, w∗ (x1 ) = Φ(x ¯ 1 , π∗ ). On the other hand, for any Φ(x Markov policy π = {fk } we have that Z w∗ (xn ) ≥ e−b(xn ,fn (xn )) (u(xn , fn (xn )) + δ[w∗ (xn+1 )]q(dxn+1 |xn , fn (xn ))) X

for all n ≥ 1, xn ∈ X. Iterating this inequality and making use of the fact that δ is non-decreasing, we obtain that

20

Z −b(x1 ,f1 (x1 ))

w∗ (x1 ) = e Z

X

(u(x1 , f1 (x1 )) +

X

δ[e−b(x2 ,f2 (x2 ) )(u(x2 , f2 (x2 )) +

(40)

δ[e−b(x3 ,f3 (x3 )) (u(x3 , f3 (x3 )) + . . . +

Z

X

δ[w∗ (xl+1 )]q(dxl+1 |xl , fl (xl )) . . .)]q(dx3 |x2 , f2 (x2 )))]q(dx2 |x1 , f1 (x1 ))).

We again observe that the right-hand side of (40) equals to the right-hand side of (21), when the Markov policy π = {fk } is used and φ = w∗ . This fact and (22) imply that the right-hand side of (40) ¯ 1 , π), as l → ∞. Hence, goes to Φ(x ¯ 1 , π∗ ) ≥ Φ(x ¯ 1 , π) Φ(x for any π ∈ ΠM . The proof that π∗ = {f∗ , f∗ , ...} is optimal within the class of all policies proceeds along ¯ (l) w∗ = liml→∞ M ¯ (l) w0 follows from Proposition similar lines. Obviously, the equality w∗ = liml→∞ M 1. ¤ ˆ ∗ ). Since Γ is non-decreasing in the last variable, it is easy Proof of Theorem 8: Assume (W to prove that Γ is upper semicontinuous (see the proof of Theorem 2). Then by Proposition 7.31 in ˆ w ∈ C(X) for any w ∈ C(X). Moreover, from Bertsekas and Shreve (1978) and Lemma 1(b), M (Z*) we conclude that M is contractive. These facts together with Proposition 1 imply part (a). The replacement of ’sup’ with ’max’ is justified by Lemma 1(b). Moreover, Lemma 1 yields the existence of fˆ ∈ F such that (28) holds. The fact that π ˆ = {fˆ, fˆ, ...} is optimal within the class of all Markov policies follows Theorem 5 and iteration procedure. We wish to emphasize that iteration pocedure of inequalities is strongly based on our assumption that Γ is nondecreasing in the last argument. The ˆ∗ ) and (Z*) is similar and makes use of Lemma 1(a). ¤ proof under Conditions (S

References Backus, D.K., Routledge, B.R., Zin, S.E. (2008) Recursive Preferences. In: The Palgrave Dictionary in Economics Online, (2nd edition, Durlauf, S.N., Blume, L.E., eds.), Palgrave Macmillan. Beals, R., Koopmans, T.C. (1969) Maximizing stationary utility in a constant technology. SIAM Journal on Applied Mathematics 17: 1001-1014. Becker, R.A., Boyd III, J.H. (1997) Capital Theory, Equilibrium Analysis and Recursive Utility. Blackwell Publishers, New York. Becker, G.S., Mulligan, C.B. (1997) The endogenous determination of time preference. Quarterly Journal of Economics 112: 729-753. Berge, C. (1963) Topological Spaces. MacMillan, New York. Bertsekas, D.P. (1977) Monotone mappings with application in dynamic programming, SIAM Journal on Control and Optimization: 15, 438-464. Bertsekas, D.P., Shreve, S.E. (1978) Stochastic Optimal Control: the Discrete Time Case. Academic Press, New York. Blackwell, D. (1965) Discounted dynamic programming. Annals of Mathematical Statistics 36: 226235. Boyd III, J.H. (1990) Recursive utility and the Ramsey problem. Journal of Economic Theory 50: 326-345. Boyd III, J.H. (2006), Disrete-time recursive utility. In: Handbook of Optimal Growth 1, Discrete Time (Dana, R.A., Le Van, C., Mitra, T., Nishimura, K., eds.), Chapter 9, Springer, New York. Boyer, M. (1975) An optimal growth model with stationary non-additive utilities. Canadadian Journal of Economics 8: 213-237.

21

Brown, L.D., Purves, R. (1973) Measurable selections of extrema. Annals of Statistics 1: 902-912. Brock, W.A., Mirman, L.J. (1972) Optimal economic growth and uncertainty: the discounted case. Journal of Economic Theory 4: 479-513. Debreu, G. (1959) Topological methods in cardinal utility theory. In: Mathematical Methods in the Social Sciences (Arrow, K., Karlin, S., Suppes, P., eds.), Stanford University Press, Stanford. Denardo, E.V. (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Review 9: 165-177. Dugundji, J., Granas, A. (2003) Fixed Point Theory. Springer-Verlag, New York. Dubins, L.E., Savage, L.J. (1976) Inequalities for Stochastic Processes (How to Gamble if You Must). Dover. New York. Epstein, L. (1983) Stationary cardinal utility and optimal growth under uncertainty. Journal of Economic Theory 31: 133-152. Epstein, L. (1987) A simple dynamic general equilibrium model. Journal of Economic Theory 41: 168-95. Epstein, L., Hynes, J.A. (1983) The rate of time preference and dynamic economic analysis. Journal of Political Economy 91: 611-635. Epstein, L., Zin, S.E. (1989) Substitution, risk aversion, and the temporal behavior of consumption and asset returns: a theoretical framework. Econometrica 57: 937-969. Furukawa, N., Iwamoto, S. (1973) Markov decision processes with recursive reward functions. Bulletin of Mathematical Statistics 15: 79-91. Furukawa, N., Iwamoto, S. (1976) Dynamic programming on recursive reward systems. Bulletin of Mathematical Statistics 17: 103-126. Gong, L. (2006) Endogenous time preference, inflation, and capital accumulation. Journal of Economics 87: 241-255. Grandmont, J.M. (1977) Temporary general equilibrium theory. Econometrica 45: 535-572. Hicks, J.R. (1965) Capital and Growth. Oxford University Press, Oxford. Hinderer, K. (1970) Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes in Operations Research 33, Springer-Verlag, NY. Iwai, K. (1972) Optimal economic growth and stationary ordinal utility - a fisherian approach. Journal of Economic Theory 5: 121-151. Jordan, J.S. (1976) Temporary competitive equilibria and the existence of self-fulfilling expectations. Journal of Economic Theory 12: 455-471. Jordan, J.S. (1977) The continuity of optimal dynamic decision rules. Econometrica 45: 1365-1376. Kertz, R.P., Nachman, D.C. (1979) Persistently optimal plans for nonstationary dynamic programming: the topology of weak convergence case. Annals of Probabability 7: 811-826. Koopmans, T.C. (1960) Stationary ordinal utility and impatience. Econometrica 28: 287-309. Koopmans, T.C., Diamond, P.A., Williamson, R.E. (1964) Stationary utility and time perspective. Econometrica 32: 82-100. Kreps, D.M. (1977a) Decision problems with optimality criteria, I: upper and lower convergent utility. Mathematics of Operations Research 2: 45-53. Kreps, D.M. (1977b) Decision problems with optimality criteria, II: stationarity. Mathematics of Operations Research 2: 266-274. Kreps, D.M., Porteus, E.L. Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46: 185-200. Le Van, C., Vailakis, Y. (2005) Recursive utility and optimal growth with bounded or unbounded returns. Journal of Economic Theory 123: 187-209. Lucas Jr., R.E., Stokey, N. (1984) Optimal growth with many consumers. Journal of Economic Theory 32: 139-171. Matkowski, J. (1975) Integral solutions of functional equations. Dissertationes Mathematicae 127, 1-68.

22

Neveu, J. (1965) Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco. Nowak, A.S. (1986) Semicontinuous nonstationary stochastic games. Journal of Mathematical Analysis and Applications 117: 84-99. Ozaki, H., Streufert, P.A. (1996). Dynamic programming for non-additive stochastic objectives. Journal of Mathematical Economics 25: 391-442. Rinc´on-Zapatero, J. P., Rodrigues-Palmero, C. (2007) Recursive utility with unbounded aggregators. Economic Theory 33: 381-391. Sch¨al, M. (1975) On dynamic programming: compactness of the space of policies. Stochastic Processes and their Applications 3: 345-364. Sch¨al, M. (1981) Utility functions and optimal policies in sequential decision problems. In: Game Theory and Mathematical Economics (Moeschlin, O., Pallaschke, D., eds.), North-Holland, Amsterdam, 357-365. Streufert, P.A. (1996). Biconvergent stochastic dynamic programming, asymptotic impatience, and ’average’ growth. Journal of Economic Dynamics and Control 20: 385-413. Porteus, E. (1982) Conditions for characterizing the structure of optimal strategies in infinite-horizon dynamic programs. Journal of Optimization Theory and Applications 36: 419-431. Stigum, B.P. (1972) Resource allocation under uncertainty. International Economics Review 13: 431459. Stokey, N.L., Lucas, Jr., R.E., Prescott, E. (1989) Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA. Uzawa, H. (1968) Time preference, the consumption function, and optimum asset holding. In: Value, Capital, and Growth: Papers in Honour of Sir John Hicks (Wolfe, J.N., ed.), Edinburgh University Press, Edinburgh. Wan, H.Y., Jr. (1971) Economic Growth. Harcourt-Brace-Jovanivich, New York. Weil, P. (1990) Nonexpected utility in macroeconomics. Quarterly Journal of Economics 105: 29-42.

23

Suggest Documents