Economics Department
Economics Working Papers The University of Auckland
Year
A Simple Introduction to Dynamic Programming in Macroeconomic Models Ian King University of Auckland,
[email protected]
This paper is posted at ResearchSpace@Auckland. http://researchspace.auckland.ac.nz/ecwp/230
A Simple Introduction to Dynamic Programming in Macroeconomic Models Ian King* Department of Economics University of Auckland Auckland New Zealand April 2002 (October 1987)
Abstract This is intended as a very basic introduction to the mathematical methods used in Thomas Sargent's book Dynamic Macroeconomic Theory. It assumes that readers have no further mathematical background than an undergraduate "Mathematics for Economists" course. It contains sections on deterministic finite horizon models, deterministic infinite horizon models, and stochastic infinite horizon models. Fully worked out examples are also provided.
*
Financial support of the Social Sciences and Humanities Research Council of Canada is gratefully acknowledged. I am indebted to David Backus, Masahiro Igarashi and for comments. I would like to thank Tom McCurdy for his comments on earlier drafts, and for suggesting the last example presented here.
FOREWARD (2002) I wrote this guide originally in 1987, while I was a graduate student at Queen’s University at Kingston, in Canada, to help other students learn dynamic programming as painlessly as possible. The guide was never published, but was passed on through different generations of graduate students as time progressed. Over the years, I have been informed that several instructors at several different universities all over the world have used the guide as an informal supplement to the material in graduate macro courses. The quality of the photocopies has been deteriorating, and I have received many requests for new originals. Unfortunately, I only had hard copy of the original, and this has also been deteriorating. Because the material in this guide is not original at all – it simply summarizes material available elsewhere, the usual outlets for publication seem inappropriate. I decided, therefore, to simply reproduce the handout as a pdf file that anyone can have access to. This required retyping the entire document. I am extremely grateful to Malliga Rassu, at the University of Auckland for patiently doing most of this work. Rather than totally reorganize the notes in light of what I’ve learned since they were originally written, I decided to leave them pretty much as they were – with some very minor changes (mainly references). I hope people continue to find them useful.
INTRODUCTION (1987) This note grew out of a handout that I prepared while tutoring a graduate macroeconomics course at Queen's University. The main text for the course was Thomas Sargent's Dynamic Macroeconomic Theory. It had been my experience that some first year graduate students without strong mathematical backgrounds found the text heavy going, even though the text itself contains an introduction to dynamic programming. This could be seen as an introduction to Sargent's introduction to these methods. It is not intended as a substitute for his chapter, but rather, to make his book more accessible to students whose mathematical background does not extend beyond, say, A.C. Chaing's Fundamental Methods of Mathematical Economics. The paper is divided into 3 sections: (i) Deterministic Finite Horizon Models, (ii) Deterministic Infinite Horizon Models, and (iii) Stochastic Infinite Horizon Models. It also provides five fully worked out examples.
1
1.
DETERMINISTIC, FINITE HORIZON MODELS
Let us first define the variables and set up the most general problem, (which is usually unsolvable), then introduce some assumptions which make the problem tractable. 1.1
The General Problem:
Max U ( x0 , x1 ,⋅ ⋅ ⋅, xT ; v0 ,v1 ,⋅ ⋅ ⋅,vT −1 ) {v(t )} subject to
Where: xt
i)
G ( x0 , x1 ,⋅ ⋅ ⋅, xT ; v0 ,v1 ,⋅ ⋅ ⋅,vT −1 ) ≥ 0
ii)
vt ∈ Ω for all t = 0 ⋅ ⋅ ⋅ T − 1
iii)
x0 = x0 given
iv)
xT ≥ 0
is a vector of state variables that describe the state of the system at any point in i
time. For example, xt could be the amount of capital good i at time t. vt
is a vector of control variables which can be chosen in every period by the j
decision-maker. For example vt could be the consumption of good j at time t. U (⋅)
is the objective function which is, in general, a function of all the state and control variables for each time period.
G (⋅)
is a system of intertemporal constraints connecting the state and control variables.
Ω
is the feasible set for the control variables – assumed to be closed and bounded.
In principle, we could simply treat this as a standard constrained optimisation problem. That is, we could set up a Lagrangian function, and (under the usual smoothness and concavity assumptions) grind out the Kuhn-Tucker conditions.
2
In general though, the first order conditions will be non-linear functions of all the state and control variables. These would have to be solved simultaneously to get any results, and this could be extremely difficult to do if T is large. We need to introduce some strong assumptions to make the problem tractable.
1.2
Time Separable (Recursive) Problem:
Here it is assumed that both the U (⋅) and the G (⋅) functions are time-separable. That is: U ( x0 ,⋅ ⋅ ⋅, xT ; v0 ,⋅ ⋅ ⋅,vT −1 ) ≡ U 0 ( x0 ,v0 ) + U1 ( x1 ,v1 ) + ⋅ ⋅ ⋅ + U T −1 ( xT −1 ,vT −1 ) + S ( xT ) where S ( xT ) is a "scrap" value function at the end of the program (where no further decisions are made). Also, the G (⋅) functions follow the Markov structure:
x1 = G0 ( x0 , v0 ) x2 = G1 ( x1 , v1 )
xT = GT −1 ( xT −1, vT −1 )
“Transition equations”
i
Note: Recall that each xt is a vector of variables xt where i indexes different kinds of state variables. Similarly with vt . Time separability still allows interactions of different state & control variables, but only within periods.
The problem becomes:
Max
v t ; t = 0 ,1,..., T −1 v t ∈Ω
subject to i)
ii)
T −1 t =0
U t ( xt , vt ) + S ( xT )
x ti+1 = G ti ( x t , v t ) i
i
x0 = x0 given
∀i = 1 ⋅ ⋅ ⋅ n and t = 0,..., T − 1 ∀i = 1 ⋅ ⋅ ⋅ n
Once again, in principle, this problem could be solved using the standard constrained optimisation techniques. The Lagrangian is:
3
L=
T −1 t =0
U t ( xt ,vt ) + S ( xT ) +
λ [G (x ,v ) − x ] T −1 n
t = 0 i =0
i t
i t
t
t
i t +1
This problem is often solvable using these methods, due to the temporal recursive structure of the model. However, doing this can be quite messy. (For an example, see Sargent (1987) section 1.2). Bellman's "Principle of Optimality" is often more convenient to use.
1.3
Bellman's Method (Dynamic Programming):
Consider the time-separable problem of section 1.2 above, at time t=0. Problem A:
Max
v t ; t = 0 ,1,..., T −1 v t ∈Ω
T −1 t =0
U t ( xt , vt ) + S ( xT )
subject to i)
ii)
x ti+1 = G ti ( x t , v t ) i
i
x0 = x0 given
∀i = 1 ⋅ ⋅ ⋅ n and t = 0,..., T − 1 ∀i = 1 ⋅ ⋅ ⋅ n
Now consider the same problem, starting at some time t0 > 0. Problem B:
Max
T −1
vt ; t = t 0 ,..., T −1 t =t 0 vt ∈Ω
U t ( x t , vt ) + S ( xT )
subject to i) ii)
x ti+1 = G ti ( x t , v t )
∀i = 1 ⋅ ⋅ ⋅ n and t = t 0 ,..., T − 1
x ti0 = x ti0 given
∀i = 1 ⋅ ⋅ ⋅ n
(
)
Let the solution to problem B be defined as a value function V xt 0 ,T − t0 . Now, Bellman' s "Principle of Optimality" asserts:
Any solution to Problem A (i.e. on the range t = 0 ⋅ ⋅ ⋅ T ) which yields xt 0 ≡ xt 0 must also solve Problem B (i.e.: on the range t = t0 ⋅ ⋅ ⋅ T ).
4
(Note: This result depends on additive time separability, since otherwise we couldn' t "break" the solution at t0 . Additive separability is sufficient for Bellman' s principle of optimality.)
Interpretation:
If the rules for the control variables chose for the t0 problem are optimal for *
any given xt 0 , then they must be optimal for the xt 0 of the larger problem.
Bellman' s P. of 0. allows us to use the trick of solving large problem A by solving the smaller problem B, sequentially. Also, since t0 is arbitrary, we can choose to solve the problem
t0 = T − 1 first, which is a simple 2-period problem, and then work backwards:
Step 1:
Set t0 = T − 1, so that Problem B is simply:
Max U T −1 ( xT −1 ,vT −1 ) + S ( xT ) {vT −1} subject to: i) xT = GT −1 ( xT −1 , v T −1 ) ii) xT −1 = xT −1 given One can easily substitute the first constraint into the objective function, and use straightforward calculus to derive: vT −1 = hT −1 ( xT −1 ) ⋅ ⋅ ⋅ Control rule for vT −1 This can then be substituted back into the objective fn to characterize the solution as a value function:
V ( xT −1 ,1) ≡ U T −1 ( xT −1 ,hT −1 ( xT −1 )) + S (GT −1 ( xT −1 ,hT −1 ( xT −1 )))
5
Step 2:
Set t0 = T − 2 so that Problem B is:
{U !" v v Max
T −2
(xT − 2 , vT − 2 ) + U T −1 (xT −1 , vT −1 ) + S (xT )}
T −1
T −2
subject to: i) xT = GT −1 (xT −1 ,vT −1 ) ii) xT −1 = GT − 2 ( xT − 2 ,vT − 2 ) iii) xT − 2 = xT − 2 given Bellman' s P.O. implies that we can rewrite this as: Max vT - 2
'(& U
T −2
%
#$
(xT − 2 , vT − 2 ) + Max {U T −1 (xT −1 , vT −1 ) = S (xT )} v T −1
subject to (i), (ii) and (iii). Recall that step 1 has already given us the solution to the inside maximization problem, so that we can re-write step 2 as: Max {U T − 2 ( xT − 2 , vT − 2 ) + V ( xT −1 ,1)} {vT − 2 } subject to: i) xT −1 = GT − 2 ( xT − 2 ,vT − 2 ) ii) xT − 2 = xT − 2 given
Once again, we can easily substitute the first constraint into the objective function, and use straightforward calculus to derive: vT − 2 = hT − 2 ( xT − 2 ) ⋅ ⋅ ⋅ Control rule for vT − 2 This can be substituted back into the objective fn. to get a value function: V ( xT − 2 ,2) ≡ Max{U T − 2 ( xT − 2 , vT − 2 ) + V ( xT −1 ,1)} {vT − 2 }
= U T − 2 ( xT − 2 ,hT − 2 ( xT − 2 )) + V (GT − 2 ( xT − 2 ,hT − 2 ( xT − 2 )),1)
6
Step 3:
Using an argument analogous to that used in step 2 we know that, in general, the problem in period T-k can be written as: " Bellman's" V ( xT − k , k ) = Max {U T − k ( x T − k , v T − k ) + V ( xT − k +1 , k − 1)} { vT − k } Equation
subject to: i) xT − k +1 = GT − k ( xT − k ,vT − k ) ii) xT − k = xT − k given This maximization problem, given the form of the value function from the previous round, will yield a control rule: vT − k = hT − k ( xT − k ) Step 4:
After going through the successive rounds of single period maximization problems, eventually one reaches the problem in time zero:
V (x 0 , T ) =
Max {U 0 ( x 0 , v 0 ) + V ( x1 , T − 1)} {v 0 }
subject to: i) x1 = G0 ( x0 ,v0 ) ii) x0 = x0 given
This will yield a control rule: v0 = h0 ( x0 )
Now, recall that x0 is given a value at the outset of the overall dynamic problem.
This means that we have now solved for v0 as a number,
independent of the x's (except the given x0 ).
7
Step 5:
Using the known x 0 and v 0 and the transition equation: x1 = G0 ( x0 ,v0 )
it is simple to work out x1 , and hence v1 from the control rule of that period. This process can be repeated until all the xt and vt values are known. The overall problem A will then be solved.
1.4
An Example:
This is a simple two period minimization problem, which can be solved using this algorithm.
Min { vt }
)
1 t =0
[x
2 t
]
+ v t2 + x 22
(1)
subject to: i) xt +1 = 2 xt + vt ii) x0 = 1
(2) (3)
In this problem, T=2. To solve this, consider first the problem in period T-1 (i.e.: in period 1): Step 1:
{
Min x12 + v12 + x 22
}
(4)
{v1 } subject to: i) x2 = 2 x1 + v1 ii) x1 = x1 given
(5) (6)
Substituting 5 and 6 into 4 yields:
{
Min x12 + v12 + [2 x1 + v1 ]
2
}
{v1 }
FOC: 2v1 + 2[2 x1 + v1 ] = 0
* v = −x 1
1
Control rule in period 1
(7)
8
Now substituting 7 back into 4 yields (using 5): V ( x1 ,1) = x1 + x1 + (2 x1 − x1 ) 2
2
2
* V (x ,1) = 3x
2 1
1
Step 2:
(8)
In period T-2 (i.e.: period 0) Bellman' s equation tells us that the problem is:
{
}
(9)
subject to: i) x1 = 2 x0 + v0
(10)
Min x 02 + v 02 + V ( x1 ,1) {v 0 }
ii) x0 = 1
(11)
Substituting 8 into 9, and then 10 and 11 into 9 yields:
{
Min 1 + v 02 + 3[2 + v 0 ]
2
}
{v 0 }
FOC: 2v 0 + 6[2 + v 0 ] = 0
+
Step 5:
v0 =
−3 2
Control value in period 0
(12)
Substituting 11 and 12 into 10 gives: x1 + 2 +
/01 − 3 . ,- + 2
x1 =
1 2
(13)
Now substitute 13 into 7 to get: v1 =
−1 2
Control value in period 1
(14)
Finally, substitute 13 and 14 into 5 to get: x2 = 1 −
1 2
+
x2 =
1 2
(15)
Equations 11-15 characterize the full solution to the problem. 9
2.
DETERMINISTIC, INFINITE HORIZON MODELS
2.1. Introduction: One feature of the finite horizon models is that, in general the functional form of the control rules vary over time: vt = ht ( xt ) That is, the h function is different for each t. This is a consequence of two features of the problem: i)
The fact that T is finite
ii)
The fact that U t ( xt ,vt ) and Gt ( xt ,vt ) have been permitted to depend on time in arbitrary ways.
In infinite horizon problems, assumptions are usually made to ensure that the control rules to have the same form in every period. Consider the infinite horizon problem (with time-separability):
6 75
Max
vt ;t = 0,..., ∞ v t ∈Ω
42 3 8
∞ t =0
U t ( xt , v t )
subject to:
i) x t +1 = G t ( x t , v t ) ii) x0 = x0 given
For a unique solution to any optimization problem, the objective function should be bounded away from infinity. One trick that facilitates this bounding is to introduce a discount factor β t where 0 ≤ β t < 1 . A convenient simplifying assumption that is commonly used in infinite horizon models is stationarity: Assume i)
β t = β ∀t
ii)
U t (xt , v t ) = β t U (xt , v t )
iii)
Gt ( xt ,vt ) = G ( xt ,vt )
10
A further assumption, which is sufficient for boundedness of the objective function, is boundedness of the payoff function in each period: Assume:
0 ≤ U ( xt ,vt ) < M < ∞ where M is any finite number.
This assumption, however, is not necessary, and there are many problems where this is not used. The infinite horizon problem becomes:
= >