Sep 25, 2011 - University of Wisconsin, Madison. Yuichiro Waki. University of Queensland ..... with the boundedness of u
Computing dynamic optimal mechanisms when hidden types are Markov and controlled by hidden actions Kenichi Fukushima
Yuichiro Waki
University of Wisconsin, Madison
University of Queensland
September 25, 2011
This note documents how the main theoretical results in Fukushima and Waki (2011) extend to a richer setting where the agent can inuence the evolution of his hidden type
θt
through a hidden action
yt .
An example of such a setting is one where
hidden stock of wealth or human capital and
yt
θt
represents a
is a hidden investment. The notation follows
Fukushima and Waki (2011) unless otherwise indicated.
rt ∈ Θ to the planner. The planner chooses an outcome xt ∈ X and recommends an action yt ∈ Y given the agent's 0 history of reports. The agent then chooses an action yt ∈ Y which may or may not equal yt . We assume Y is a nite set with cardinality M . If the agent's current type is θt and he chooses action yt , his next period type θt+1 is drawn from the density π(·|θt , yt ) > 0. The initial distribution is π(·|θ−1 , y−1 ) where (θ−1 , y−1 ) is ∞ t+1 publicly known. We let Y denote the set of function sequences y = {yt }t=0 , yt : Θ →Y for each t, and write In each period the agent draws a type
Pr(θ
t
θt ∈ Θ
and sends a report
|θ−1 , y−1 , y) = π(θt |θt−1 , yt−1 (θt−1 )) × · · · × π(θ1 |θ0 , y0 (θ0 )) × π(θ0 |θ−1 , y−1 ). y|θt−1 = {yt+s (θt−1 , ·)}∞ s=0
t−1 denote the continuation of y after θ . ∞ t+1 → X and yt : An allocation is then a sequence (x, y) = {xt , yt }t=0 , where xt : Θ Θt+1 → Y for each t. We do not introduce randomizations to keep the notation simple, We also let
although doing so is quite straightforward and useful for computations (as it helps obtain convexity). takes place, that is, if after each shock history θ t occurs and the agent chooses yt (θ ), the agent obtains lifetime utility: If allocation
(x, y)
U (x, y; θ−1 , y−1 ) =
∞ X X t=0
t
the outcome
β t u(xt (θt ), yt (θt ); θt )Pr(θt |θ−1 , y−1 , y)
θt
and the planner incurs cost:
C(x, y; θ−1 , y−1 ) =
∞ X X t=0
q t c(xt (θt ))Pr(θt |θ−1 , y−1 , y).
θt
1
xt (θt )
An allocation
(x, y)
is therefore incentive compatible if
U (x, y; θ−1 , y−1 ) ≥ U (x ◦ r, y0 ; θ−1 , y−1 ),
∀(r, y0 ) ∈ R × Y
(1)
and satises promise keeping if
U (x, y; θ−1 , y−1 ) ≥ U0 . The planning problem starting from of
(x, y)
(θ−1 , y−1 , U0 )
(2)
is to minimize
C(x, y; θ−1 , y−1 )
by choice
subject to incentive compatibility and promise keeping.
We have the following analog of Lemma 1: Lemma A1.
An allocation (x, y) is incentive compatible if and only if
u(xt (θt ), yt (θt ); θt ) + βUt+1 (θt ; θt , yt (θt )) ≥ u(xt (θt−1 , θt0 ), yt0 ; θt ) + βUt+1 (θt−1 , θt0 ; θt , yt0 )
(3)
for all t, θt−1 , θt , θt0 , and yt0 , where Ut (θ
t−1
; θ− , y− ) =
∞ X X s=t
β s−t u(xs (θt−1 , θts ), ys (θt−1 , θts ); θs ) Pr(θts |θ− , y− , y|θt−1 ).
θts
Proof.
0 The only if part is clear. So let (x, y) satisfy (3) and x (r, y ) ∈ R × Y . For s s s 0 0 t s t 0 t t s each t, dene r| and y | by (r|s (θ ), y |s (θ )) = (rs (θ ), ys (θ )) for all s ≤ t and θ , and s )) for all s ≥ t + 1 and θs . I.e., (r|t , y0 |t ) follows (r, y0 ) (r|ts (θs ), y 0 |ts (θs )) = (θs, y s (rt (θt ), θt+1 until period t and then reverts back to truth-telling and obedience from t + 1. Applying (3) 0 0 0 1 0 1 inductively we have U (x, y; θ−1 , y−1 ) ≥ U (x ◦ r| , y | ; θ−1 , y−1 ) ≥ U (x ◦ r| , y | ; θ−1 , y−1 ) ≥ · · · ≥ U (x ◦ r|t , y0 |t ; θ−1 , y−1 ) for any t. Since u is bounded and β ∈ (0, 1), this implies:
U (x, y; θ−1 , y−1 ) ≥ lim U (x ◦ r|t , y0 |t ; θ−1 , y−1 ) = U (x ◦ r, y0 ; θ−1 , y−1 ). t→∞
Hence
(x, y)
is incentive compatible.
Notice here that the continuation utility prole
Θ×Y,
Ut (θt−1 ; ·, ·)
is a function of
(θ− , y− ) ∈ N ×M
so a recursive formulation in the spirit of Fernandes and Phelan (2000) has
continuous state variables. We say that
π
has an order
K
mixture representation if we can write:
π(θ|θ− , y− ) =
K X
pk (θ)wk (θ− , y− ),
k=1 K where p : Θ → R+ and w : Θ × Y → PK k=1 wk (θ− , y− ) = 1 for each (θ− , y− ). Under this representation, we can dene
RK +
satisfy
P
θ∈Θ
pk (θ) = 1
k
and
p(θt )
(4)
for each
( at (θ
t−1
)=
X
u(xt (θt ), yt (θt ); θt )
θt
+β
∞ X X
s β s−t−1 u(xs (θs ), ys (θs ); θs ) Pr(θt+1 |θt , yt (θt ), y|θt )
s s=t+1 θt+1
2
and write
Ut (θ
t−1
; ·, ·) =
K X
akt (θt−1 )wk (·, ·).
k=1
at
This suggests that, by using
instead of
Ut
as an endogenous state variable, it should be
possible to reduce the dimensionality from N × M to K . t−1 Let us now write at (θ ; x, y) to describe the mapping from (x, y) to at (θt−1 ) dened by −1 (4). Let us also write a0 (x, y) = a0 (θ ; x, y), as this is independent of θ−1 . We then dene the auxiliary planning problem starting from to minimize
C(x, y; θ−1 , y−1 )
(θ−1 , y−1 , a0 )
as the problem of choosing
(x, y)
subject to incentive compatibility (1) and
a0 (x, y) = a0 . We let
A∗ ⊂ V K
(5)
a0 's for which the constraint set of this problem is non(θ−1 , y−1 )) and let J ∗ : Θ × Y × A∗ → R denote the optimal
denote the set of
empty (which is independent of value function. If
a∗0 ∈ arg min∗ J ∗ (θ−1 , y−1 , a0 ) a0 ∈A
s.t.
a0 · w(θ−1 , y−1 ) ≥ U0 ,
then a solution to the auxiliary planning problem starting from the planning problem starting from
(θ−1 , y−1 , a∗0 ) is a solution to
(θ−1 , y−1 , U0 ).
The analog of Lemma 2 is:
An allocation (x, y) satises the constraints of the auxiliary planning problem t ∗ (1) and (5) if and only if there exists a = {at }∞ t=0 , at : Θ → A , such that (x, y, a) satises Lemma A2.
u(xt (θt ), yt (θt ); θt ) + βat+1 (θt ) · w(θt , yt (θt )) ≥ u(xt (θt−1 , θt0 ), yt0 ; θt ) + βat+1 (θt−1 , θt0 ) · w(θt , yt0 ) at (θt−1 ) =
X
u(xt (θt ), yt (θt ); θt ) + βat+1 (θt ) · w(θt , yt (θt )) p(θt )
(7)
θt
for all t, θ , t
Proof.
θt0
,
yt0
, and a0 (θ−1 ) = a0 .
Virtually identical to that of Lemma 2.
The analog of the
B
operator therefore maps
A⊂VK
into
B(A) = {a ∈ V K |∃(x, y, a+ ) ∈ F (a; A)} where
F (a; A)
is the set of function triples
(x, y, a+ ) : Θ → X × Y × A
satisfying:
u(x(θ), y(θ); θ) + βa+ (θ) · w(θ, y(θ)) ≥ u(x(θ0 ), y 0 ; θ) + βa+ (θ0 ) · w(θ, y 0 ), X a= u(x(θ), y(θ); θ) + βa+ (θ) · w(θ, y(θ)) p(θ). θ
3
(6)
∀θ, θ0 , y 0
At this point it is useful to construct a particular incentive compatible allocation as follows. First pick any
x¯ ∈ X
and let
W (θ) = max y∈Y
For each for each
Ut (θ
W :Θ→R
u(¯ x, y; θ) + β
X θ+
; θ− , y− ) =
∞ X X s=t
So for each
solve the Bellman equation:
W (θ+ )π(θ+ |θ, y) .
θ let y¯(θ) solve the right hand side problem. t and θt . We then have for any t and θt−1 :
t−1
¯) (¯ x, y
Then set
x¯t (θt ) = x¯ and y¯t (θt ) = y¯(θt )
β s−t u(¯ x, y¯(θs ); θs )Pr(θts |θ− , y− , y|θt−1 ) =
θts
X
W (θ)π(θ|θ− , y− ).
θ
t, θt−1 , θt , θt0 , yt0 :
u(¯ x, y¯(θt ); θt ) + βUt+1 (θt ; θt , y¯(θt )) ≥ u(¯ x, yt0 ; θt ) + βUt+1 (θt−1 , θt0 ; θt , yt0 ). It follows from Lemma A1 that
¯) (¯ x, y
is incentive compatible.
We have the following analog of Proposition 3:
A∗ is a non-empty and compact set, and is the largest xed point of B . If A0 ⊂ V K is a compact set satisfying A0 ⊃ B(A0 ) ⊃ A∗ (one example being A0 = V K ) then n ∗ K satises A∗ ⊃ B(A0 ) ⊃ A0 B n (A0 ) is decreasing in n and ∩∞ n=0 B (A0 ) = A . If A0 ⊂ V n n (one example being A0 = {a0 (¯x, y¯ )}), then B (A0 ) is increasing in n and cl(∪∞ n=0 B (A0 )) = A∗ . Proposition A3.
Proof. A ⊂ B(A)
The analogs of Lemmas 5-8 follow from virtually identical arguments. Thus: (i) V K , A ⊂ B(A) =⇒ B(A) ⊂ A∗ , (ii) B(A∗ ) = A∗ , (iii) A ⊂ A0 ⊂ V K =⇒ ⊂ B(A0 ), and (iv) A is compact =⇒ B(A) is compact. Similarly for the rst two
parts of the proposition.
∗ To prove the nal part of the proposition, suppose A0 ⊂ B(A0 ) ⊂ A . Then from (ii), (iii), n ∗ ∗ n ∞ and the compactness of A , we know that B (A0 ) is increasing and cl(∪n=0 B (A0 )) ⊂ A . ∗ ∞ n ∗ ∞ n To prove A ⊂ cl(∪n=0 B (A0 )), pick any a ∈ A . We construct a sequence in ∪n=0 B (A0 ) 0 ∗ ∗ that converges to a. For this, rst pick another a ∈ A0 (⊂ A ). By the denition of A 0 0 there exist incentive compatible allocations (x, y) and (x , y ) such that a = a0 (x, y) and 0 0 0 a = a0 (x , y ). Next for each n ≥ 1, do the following. Dene xn = {xnt }∞ t=0 by truncating x 0 after n periods and appending x . Thus for t > n: t n+1 )). (xn0 (θ0 ), ..., xnt (θt )) = (x0 (θ0 ), ..., xn (θn ), x00 (θn+1 ), ..., x0t−n−1 (θn+1 And let
(rn , yn ) ∈ arg
max
ˇ ). U (xn ◦ ˇr, y
(ˇ r,ˇ y)∈R×Y 0 0 Here, since (x , y ) is incentive compatible, we can assume without loss that for t > n, n t n t 0 t ˆ n ) = (xn ◦ rn , yn ). By construction, rt (θ ) = θt and yt (θ ) = yt−n−1 (θn+1 ). Finally, let (ˆ xn , y ˆ n ) is incentive compatible, a0 (ˆ ˆ n ) ≥ a0 (xn , y), and an+1 (θn ; x ˆn, y ˆ n ) = a0 for all θn . (ˆ xn , y xn , y
4
n ˆ n ) ∈ ∪∞ We next show a0 (ˆ xn , y n=0 B (A0 ) for all n. From the incentive compatibility of ˆ n ) and an+1 (θn ; x ˆn, y ˆ n ) = a0 we obtain by induction a0 (ˆ ˆ n ) ∈ B n+1 ({a0 }). This, (ˆ xn , y xn , y n (iii), and the fact that B (A0 ) is increasing in n then imply the result. 0 0 n n ˆ n )}∞ ˆ ) → a as n → ∞, we pick an arbitrary subsequence {a0 (ˆ To verify a0 (ˆ x ,y xn , y n0 =1 00 00 ˆ n )}∞ and show that it has a further subsequence {a0 (ˆ xn , y that converges to a . Applying 00 n =1 n n n to (r , y ) the argument we applied to r in the proof of Proposition 3, we obtain a subindex 00 00 00 ˜ ). Also for each t we have xnt = xt for n00 along which (rn , yn ) converges to some (˜r, y 00 00 00 00 00 ˆ n ) = a0 (xn ◦ rn , yn ) → n00 ≥ t. This together with the boundedness of u implies a0 (ˆ xn , y 00 00 ˜ ). Combining this with a0 (ˆ ˆ n ) ≥ a0 (xn , y) and a0 (xn , y) → a, we obtain a0 (x ◦ ˜r, y xn , y
˜ ) ≥ a. But the incentive compatibility of (x, y) implies a0 (x ◦ ˜r, y ˜ ) ≤ a0 (x, y) = a, a0 (x ◦ ˜r, y ˜ ) = a. so a0 (x ◦ ˜ r, y ¯ )}. From the incentive compatibility of (¯ ¯ ), (ii), and (iii), Now let A0 = {a0 (¯ x, y x, y ∗ + we have B(A0 ) ⊂ A . To see A0 ⊂ B(A0 ), observe that if we set (x(θ), y(θ), a (θ)) = + ¯ )) for each θ we have (x, y, a ) ∈ F (a0 (¯ ¯ ); A0 ). (¯ x, y¯(θ), a0 (¯ x, y x, y The analog of the
T
operator maps
J : Θ × Y × A∗ → R
into
T J : Θ × Y × A∗ → R,
dened as:
T J(θ− , y− , a) =
inf +
(x,y,a )∈F (a;A∗ )
X
c(x(θ)) + qJ(θ, y(θ), a+ (θ)) π(θ|θ− , y− ).
(8)
θ
The analog of Proposition 4 is therefore:
J ∗ is a bounded lower semicontinuous function, and ||T n J − J ∗ || → 0 as n → ∞ for any bounded J : Θ × Y × A∗ → R. There exists a function g ∗ : Θ × Y × A∗ → (X × Y × A∗ )Θ which attains the inmum on the right hand side of (8) when J = J ∗ , and for any such g ∗ the allocation (x∗ , y∗ ) dened recursively by (x∗t (θt ), yt∗ (θt ), a∗t+1 (θt )) = ∗ (θt−1 ), a∗t (θt−1 ))(θt ) solves the auxiliary planning problem starting from (θ−1 , y−1 , g ∗ (θt−1 , yt−1 a∗0 (θ−1 )). Proposition A4.
Proof.
Virtually identical to that of Proposition 4.
References Fernandes, A.,
and
C. Phelan (2000): A Recursive Formulation for Repeated Agency with
History Dependence, Fukushima, K.,
and
Journal of Economic Theory, 91(2), 223247.
Y. Waki (2011):
Computing Dynamic Optimal Mechanisms When
Hidden Types Are Markov, Working paper.
5