May 15, 1999 - Lars Peter Hansen. University of Chicago ... Whittle (1990) as modified discounting by Hansen and Sargent (1995). 1 See Hansen and Sargent ...
Five Games and Two Objective Functions that Promote Robustness
Lars Peter Hansen University of Chicago
Thomas J. Sargent Stanford University and Hoover Institution May 15, 1999
We thank Kenneth Kasa, Larry Jones, Rodolfo Manuelli, and Aaron Tornell for valuable discussions.
ABSTRACT Robust decision rules are designed to work well despite a set of possible model misspecifications. For discounted infinite horizon linear quadratic control problems, we describe two frequency domain criteria for designing robust decision rules: the H∞ and the entropy criteria. We compare them with the standard H2 criterion. We associate these criteria with five versions of a zero sum two player game in which a malevolent player chooses a misspecified model out of a set of models near a reference model. A Lagrange multiplier associated with the constraint on the set of misspecified models occurs as a parameter indexing the preference for robustness in the entropy criterion. The H∞ and the entropy criteria arise from different assumptions about initial conditions. We also describe a dual state reconstruction (filtering) problem associated with the entropy criterion. We give four examples.
Keywords: Lucas critique, robustness, frequency domain, linear prediction theory, entropy, H∞ .
1. Introduction A hallmark of rational expectations models is a mapping from serial correlations of shocks to decision rules. That mapping is implied by the assumption that agents use ordinary or so-called modern control theory. That control theory assumes that decision makers know laws of motion. The theory associates a distinct decision rule with each specification of shock serial correlation properties. Many aspects of rational expectations models flow from this. Thus, the cornerstone of the Lucas Critique is the finding that, under rational expectations, decision rules are functions of the serial correlations of shocks. Rational expectations econometrics achieves parameter identification by exploiting the structure of the function mapping shock serial correlation properties to decision rules.1 This paper describes a robust decision theory that loosens the mapping from shock temporal properties to decision rules. Robust control theory treats laws of motion as approximations and seeks a single rule to use across a set of shock serial correlations. Our robust decision maker represents model misspecifications as errors that operate through the same endogenous dynamics as the model’s shocks. By feeding back on endogenous variables, shock processes can capture misspecified endogenous dynamics. Concern for these errors induces a preference for rules to apply across a range of specifications. For a class of experiments that alter the shock serial correlations, a robust decision maker violates the Lucas critique because he believes his model is an approximation.2 In the context of a linear quadratic model, this paper displays alternative ways to transform a decision maker’s doubts about model specification from the transition law into an altered objective function. We study two sets of ways to express these altered objective functions: (1) in terms of a two-player, zero-sum game where one of the players is ‘nature’, an agent that chooses from a class of model specifications in a way to make the other player, the decision maker, value robust decision rules; and (2) in terms of the value function or indirect utility function from one of those two-player games. This value function incorporates nature’s best choice of model specification in response to the decision maker’s decision rule. In category (1), we present a detailed account of five related but distinct two-person zero-sum games that induce robust decision rules. For category (2), we present three preference specifications for inducing robust rules. Two of them are expressed in the frequency domain. They are called H∞ and entropy criteria, respectively. The entropy objective function summarizes the extent of model specification doubts with a single parameter. We describe how that parameter is linked to the various two-player games, and also to the risk-sensitivity parameter in a third criterion function that can be used to induce robustness, the risk sensitive control specification of Jacobson (1973) and Whittle (1990) as modified discounting by Hansen and Sargent (1995). 1 2
See Hansen and Sargent (1980). Mark Salmon (1997) has written about the relationship of robust control theory to the Lucas critique.
1
Overview We modify the true objective function in a way that induces robust decision rules. The modified objective functions are shorthand devices for summarizing a decision maker’s concern about model misspecification. The decision maker can find decision rules that are less sensitive to model misspecification by replacing his true objective function (the one he really cares about) with another one that embeds his doubts about the constraints (his model). We construct these substitute loss functions as value functions of alternative two-person zero sum games. It will be useful for the reader recurrently to remind himself that our criteria for promoting robustness are the indirect utility functions for these games. For technical reasons that will emerge as we proceed, we analyze several distinct games with identical physical settings (preferences and transition laws), but different timing protocols. (For ‘timing protocol’ you can read ‘nature of commitment’.) Although their distinct timing protocols make them different games, equilibria of these games have identical outcomes, affording us a variety of useful computational and conceptual connections. Reader’s Guide to five games We shall eventually refer to five distinct two-player zero-sum games for inducing robustness. We use ‘Stackelberg’ as an adjective to denote that one of the players precommits to a strategy. (1) Game 1. A Stackelberg game in the time domain. (2) Game 2. A Stackelberg game in the frequency domain. (3) Game 3. A Stackelberg multiplier game in the frequency domain. (4) Game 4. A Nash multiplier game in sequences. (5) Game 5. A Markov perfect multiplier game in state-feedback rules. To explain these games, we refer to the law of motion for a linear time invariant system. An n × 1 state vector evolves according to xt+1 = Ao xt + But + Cwt+1 where x0 is given, ut is a vector of controls, and wt+1 is a vector of disturbances with wt = 0 ∀t < 0. The shock wt+1 represents model misspecification as well as other disturbances. 2
Stackelberg games (1) and (2) are representations of the same game. In this game, a decision maker chooses a state-feedback rule ut = −F xt while nature chooses a sequence of shocks {wt+1 }. Here ‘nature’ commits to an entire shock sequence, earning the game the adjective ‘Stackelberg’. Game (3) is a reformulation of game (2) in terms of a Lagrange multiplier. Game (4) has the decision maker choose a sequence of decisions {ut } while simultaneously nature chooses a sequence of shocks {wt+1 }. Here there is two-sided commitment to sequences. Game (5) has the decision maker choose a feedback rule ut = −F xt while nature chooses a feedback rule wt+1 = Kxt . All five games share a common equilibrium concept, Nash equilibrium. The games differ in their timing protocols. In games (1), (2), and (3), the w-player takes the state feedback rule as given, while the u-player takes the {wt+1 } sequence as given. In game (4), each player takes the sequence of u or w chosen by the other player as given. In game (5) each player takes the feedback rule chosen by the other player as given. We establish the following results and relationships among these games: (i) The value function for game 3 produces the entropy criterion for inducing robustness. (ii) A special case of game 3 in which the initial condition x0 is restricted to be zero leads to the H∞ criterion for inducing robustness. (iii) In general, all of the games with non zero initial conditions have identical outcomes that can be represented recursively. (iv) The equilibrium of game 4 is easy to compute, and is a good source of guesses about solutions for the other games. (v) Game 5 leads directly to the risk sensitivity criterion for inducing robustness and supplies alternative algorithms for computing the optimal rule under risk-sensitivity. Game 5 supports Hansen and Sargent’s (1995) method of computing the optimal risk-sensitive decision rule by iterating to convergence on the T ◦ D operator. Game 3 supports a ‘Howard policy improvement’ algorithm for computing the optimal risk-sensitive control law. The following subsection briefly describes an example that helps motivate the frequency domain as a tool for displaying the effects of misspecifying the intertemporal properties of shocks. Motivation, example, and plan 3
Figure 1 shows frequency domain decompositions of a government’s objective function for three alterative policy rules labelled θ = ∞, θ = 10, θ = 5. The parameter θ measures a preference for robustness, with θ = +∞ corresponding to the special case of rational expectations, and lower values of θ representing a concern for misspecification. The model is Lawrence Ball’s (1998) and the criterion function to be maximized is C = −E πt2 + Eyt2 , where π, y are the deviations of inflation and output from their means.3 That C is minus a weighted sum of variances means that it can be represented as C = H2 ≡ −
Z
′
(1) [H2crit]
trace G (ζ) G (ζ) dλ (ζ) , Γ
1 where dλ(ζ) = 2πiζ dζ, Γ is the unit circle in the complex plane, the prime denotes complex conjugation and matrix transposition, and G(ζ) is the transfer function from the shocks in Ball’s model to the targets, the inflation rate and output. The transfer function G depends on the government’s choice of a feedback rule Fθ . Ball computed F∞ . Of the three rules whose transfer functions are depicted in figure 1, Ball’s rule (θ = ∞) is the best because the area under the curve is smallest.
10
9
8
7
6
5 θ=5
4
3
2
θ = 10 θ=∞
1
0
0
0.5
1
1.5
2
2.5
3
3.5
Figure 1: Frequency decompositions of [trace G(ζ)′ G(ζ)] as functions of ω (ζ = exp(iω)) for objective function of Ball’s model under three decision rules.
3
See Ball (1998) and Sargent (1998).
4
The transfer function G gives a frequency domain representation of how targets respond to serially uncorrelated shocks. The frequency domain decomposition C depicted by the θ = ∞ curve in Figure 1 exposes the frequencies that are most vulnerable to small misspecifications of the temporal and feedback properties of the shocks. Low frequency misspecifications are most troublesome under Ball’s optimal feedback rule because for those frequencies, traceG(ζ)′ G(ζ) is highest. We can obtain more robust rules by optimizing a criterion that penalizes fluctuations across frequencies of G(ζ)′ G(ζ). We derived the rules with θ < ∞ in Figure 1 by choosing Fθ to maximize Z ′ (2) entropy = log det θI − G (ζ) G (ζ) dλ (ζ) . [entropy] Γ
(One of the goals of this chapter is to justify this criterion.) Lowering θ causes the decision maker to choose Fθ to make (trace G(ζ)′ G(ζ)) flatter as a function of frequency ω, where ζ = exp(iω), and to lower its larger values at the cost of raising smaller ones. Flattening (trace G(ζ)′ G(ζ)) makes the realized value of the criterion function less sensitive to departures of the shocks from the benchmark specification of no serial correlation. The justifications for (2) in the literature assume no discounting, as did Ball (1998). For many economic problems, we want to discount the future. This paper formulates discounted versions of our five games and discounted versions of the corresponding indirect utility functions, which for Game 3, leads to a version of (2) under discounting. The control literature describes undiscounted versions of our frequency domain criteria and their relationship to an undiscounted risk-sensitivity criterion.4 The relationships among these criteria under discounting match those without discounting. In particular, the entropy and risk-sensitivity criteria generate equivalent preference orderings; and the one-parameter family of entropy criteria generates a sequence of decision rules that approximate the H∞ decision rule. For most of the paper, we adopt deterministic formulations where the shocks are deterministic sequences. However, each of our criteria applies directly to a corresponding stochastic control problem, where the shocks are stationary random sequences. After describing robust control, we reinterpret the mathematics in terms of a robust filtering problem. We end the paper by displaying four examples: (i) Ball’s model with discounting, (ii) a permanent income model with adjustment costs, (iii) a pure prediction example, and (iv) a robust version of Muth’s (1960) filtering problem. The status of our robustness-inducing objective functions to the two-player zero sum games allows uss to associate with each example a worst case shock process. That process is in effect determined jointly with the optimal decision rule. Properties of the worst case shock process usually 4 See Zhou, Doyle and Glover (1996, section 15.5), Mustafa and Glover (1988), and Whittle (1990, chapter 17).
5
help to explain features of the robust decision rule. For example, the nature of the worst case errors in example (iii) make the representative investor display overconfidence about dividends. Technical details are reported in five Appendixes.
3. Three versions of a Stackelberg game in the time domain
The system An n × 1 state vector evolves according to xt+1 = Ao xt + But + Cwt+1
(3) [state]
where x0 is given, ut is a vector of controls, √ and wt+1 is a vector 5of disturbances with wt = 0 ∀t < 0. We assume that the pair ( βAo , B) is stabilizable, where β ∈ (0, 1] is a discount factor. We evaluate a discounted infinite-horizon criterion under alternative time-invariant decision rules (4) ut = −F xt , [decis] where the control law F is restricted to be in an admissible set: 1 F = {F : A − BF has eigenvalues with modulii strictly less than √ }. β Substituting (4) into (3) gives the closed-loop law of motion for the state: xt+1 = Axt + Cwt+1 , where A = Ao − BF.
(5) [Ao]
We define the objective function in terms of a target vector: zt = H0 xt + Jut .
(6) [target]
Under ut = −F xt the target becomes zt = Hxt 5
The pair (A, B) is stabilizable if y ′ B = 0 and y ′ Ao = λy ′ for some complex number y and some complex vector y implies that |λ| < 1 or y = 0. Stabilizability is equivalent to the existence of a timeinvariant control law F that makes (A − BF ) a stable matrix.
6
where H = H0 − JF . We have the state-space representation xt+1 = Axt + Cwt+1 zt = Hxt .
(7) [orig]
The decision maker wants to maximize the objective function −
∞ X
β t zt′ zt .
(8) [obj]
t=0
A time domain game We compute robust decision rules F via the following two-person game. Game 1: Find (F, {w}) that attain max inf − F
{wt }
subject to (7) and ∞ X t=0
∞ X
β t zt′ zt
(9) [game1]
t=0
β t wt′ wt ≤ η 2 + w0′ w0
(10a) [rob1;a]
x0 = Cw0 .
(10b) [rob1;b]
The game is indexed by two parameters (w0 , η). We consider three versions: 1. Set η = 0, with arbitrary w0 . 2. Set w0 = 0, but let η > 0 be arbitrary. 3. Set arbitrary w0 6= 0 and arbitrary η > 0. The first version makes the inf part trivial and causes the game to become a standard single-person linear-quadratic optimum problem. The second and third versions induce robust decision rules. To enable using the frequency domain, (10b) restricts the initial condition. The solution of the game under this restriction can be represented recursively as a pair of feedback rules wt+1 = Kxt , ut = −F xt . The solution also solves a multiplier game to be described below. The time domain representation of the solution of this multiplier game is valid for an arbitrary initial x0 . 7
4. Three versions of a Stackelberg game in the frequency domain
Fourier transforms Define one-sided Fourier transforms: X (ζ) ≡ W (ζ) ≡ Z (ζ) ≡
∞ X
xt ζ t ,
t=0
∞ X
(11) [f ourier]
wt ζ t ,
t=0
∞ X
zt ζ t ,
t=0
where ζ is a complex variable. Then (7) and (11) imply that ζ −1 [X(ζ) − x0 ] = AX(ζ) + ζ −1 C [W (ζ) − w0 ]. Using (10b) and solving for X(ζ) gives X(ζ) = (I − ζA)−1 CW (ζ), and hence (12) Z (ζ) = G (ζ) W (ζ) [sys1] where
−1
G (ζ) ≡ H (I − ζA)
C
is the transfer function from shocks to targets. Applying Parseval’s equality to (12) gives the following representation: ∞ X t=0
t
′
β zt zt =
Z
′
′
W (ζ) G (ζ) G (ζ) W (ζ) dλ (ζ) , Γ
(13) [integral]
where the operation ′ denotes both matrix transposition and complex conjugation, where the measure λ has a density given by dλ (ζ) ≡
1 √ dζ, 2πi βζ
and where the region of integration is the following circle in the complex plane p Γ ≡ {ζ : |ζ| = β}. √ The region Γ can be parameterized conveniently in terms of ζ = β exp(iω) for ω in the interval (−π, π]. Here the measure λ satisfies dλ (ζ) = 8
1 dω. 2π
Thus the contour integral on the right side of (13) can be expressed as: Z
′
′
W (ζ) G (ζ) G (ζ) W (ζ) dλ (ζ) Γ Z π i p p ′ hp i′ hp 1 β exp (iω) W W β exp (iω) G β exp (iω) G β exp (iω) dω. = 2π −π
(14) [simple]
We use the contour integral on the left of (14) to simplify notation. Parseval’s equality also implies ∞ X
β
t
wt′ wt
t=0
=
Z
′
W (ζ) W (ζ) dλ (ζ) .
Γ
(15) [integralW ]
The game restated To represent the game in the frequency domain, we define the following two sets of admissible W (ζ)’s: W a ={W (ζ) : W (ζ) is analytic on the interior of Γ with coefficients wt that are vectors of real numbers and W (0) = w0 } ∞ X a W ={W (ζ) ∈ W : β t wt′ wt < ∞} t=0
We use (13) and (15) to represent game 1 as: Game 2: Find (F, W (ζ)) that attain max inf − F
W
Z
′
′
W (ζ) G (ζ) G (ζ) W (ζ) dλ (ζ)
(16) [f game1]
′
(17) [f game2]
Γ
subject to Z
Γ
W (ζ) W (ζ) dλ (ζ) ≤ η 2 + w0′ w0 .
Three versions of game 2 correspond to the three versions of game 1: 1. Set η = 0, with W (0) = w0 arbitrary. 9
2. Set arbitrary η > 0 but W (0) = w0 = 0. 3. Set W (0) = w0 6= 0 and arbitrary η > 0. Under version 1, wt = 0 ∀t > 0, implying W (ζ) = w0 . This makes (14) become Z
′
′
W (ζ) G (ζ) G (ζ) W (ζ) dλ (ζ) =
Γ
w0′
Z
Γ
G (ζ) G (ζ) dλ (ζ) w0 . ′
(18) [H20]
The best feedback rule F is independent of the initial condition x0 = Cw0 . This lets us replace (18) by the frequency domain H2 criterion for evaluating decision rules: H2 ≡ −
Z
Γ
′ trace G (ζ) G (ζ) dλ (ζ) .
(19) [H2crit]
Version 2 has the side condition that W (0) = 0, but otherwise leaves W (ζ) free. Version 3 requires W (0) = w0 6= 0 and also restricts W (ζ) to keep the associated {wt } sequence zero for t < 0. Version 2: H∞ Criterion Let ρ(ζ) denote the eigenvalues of G(ζ)′ G(ζ). The following proposition tells how version 2 of the game leads to the H∞ criterion defined as: H∞ ≡ − sup [ρ (ζ)]
1/2
.
ζ∈Γ
Proposition 1. For any F ∈ F , Z ′ ′ 2 2 η inf − W (ζ) G (ζ) G (ζ) W (ζ) dλ (ζ) = −H∞ W
Γ
(20) [Hcrit5]
(21) [prop1]
where the infimization is subject to (17). Proof. Given G(ζ), for each ζ =
√
β exp(iω) solve the eigenvalue problem6 ′
G (ζ) G (ζ) v = ρ (ζ) v 6
It may be useful to remind the reader of the ‘principal components’ problem. Let a be an (n × 1) random vector with covariance matrix V . The first principal component of a is a scalar b = p′ a where p is an (n × 1) vector with unit norm (i.e., p′ p = 1), for which the variance of b is maximal. Thus, the first principal component solves the problem: max p′ V p p
10
for the largest eigenvalue ρ(ζ). This problem has a well defined solution with eigenvalue √ ρ(ω) for each ζ = β exp(iω). Then Z
Γ
′
′
W (ζ) G (ζ) G (ζ) W (ζ) dλ (ζ) ≤
Z
′
ρ (ζ) W (ζ) W (ζ) dλ (ζ) Γ Z ′ ≤ sup ρ (ζ) W (ζ) W (ζ) dλ (ζ) ζ∈Γ
Γ 2
≤ sup ρ (ζ)η . ζ∈Γ
The bound on the right side is attained by the limit of a sequence of approximating {wt } sequences described in appendix A. The square of the value of the optimized H∞ criterion plays an important role below. We denote it by 2 (24) θ˜ = inf H∞ (F ) . [thetatilde] F
For technical reasons indicated in appendix A, the infimum in (21) is not necessarily attained by an analytic function W ∈ W.
If version 2 of game 2 has a maximizer F , that same F maximizes (20). We can drop η from the performance criterion (20) because it becomes a positive scale factor that is independent of the control law F . This feature emerges because this problem sets w0 = 0.
subject to p′ p = 1. Putting a Lagrange multiplier λ on the constraint, the first order conditions for this problem are (V − λI) p = 0,
(22) [princ1]
with the value of the variance of p′ b evidently from (22) being p′ V p = λp′ p = λ.
(23) [princ2]
Thus (22) and (23) indicate that p is the eigenvector of V associated with the largest eigenvalue; and that the variance of b equals the largest eigenvalue λ.
11
5. Game 3: A multiplier game and the entropy criterion
The H2 control problem comes from eliminating the role of model misspecification by setting η = 0. Under discounting, the H∞ control problem comes from initializing w0 to zero, but allowing for model misspecification. Alternatively, we can view the H∞ problem as one in which the infimizer is not committed to respect the initial condition, and therefore is free to set w0 . That is, the H∞ criterion ranks control laws F when the malevolent agent can choose the initial w0 along with the sequence of future wt ’s. We now consider an intermediate case where misspecification is permitted but the malevolent agent is committed to respect the initial condition w0 . To analyze this case, we reformulate Game 2 in terms of a Lagrange multiplier θ on the constraint to obtain: Game 3: Find (θ, F, W (ζ)) that attain Z ′ ′ 2 ′ W (θI − G G) W d λ − θ η + w0 w0 . sup sup inf θ
F
W
Γ
(25) [mgame]
Here η > 0 and w0 6= 0.7
In appendix C, we establish the following things about (25). i. Let θ ∗ be the optimal multiplier for (25). It satisfies: ˜ θ ∗ ≥ θ.
(26) [multineq]
If (26) does not hold, the inner inf W ∈W in (25) is −∞ independently of the control law F . ˜ we are led to study the inner ii. When the optimal multiplier θ ∗ satisfies θ ∗ > θ, two-player Stackelberg multiplier game: Z (27) sup inf W ′ (θ ∗ I − G′ G) W dλ [mgame2] F
W
Γ
This game connects to the following single agent decision problem: Z ′ sup log det θ ∗ I − G (ζ) G (ζ) dλ (ζ) . F
Γ
(28) [entropy22]
The F ∗ that attains (28) is the F component of the solution of the two-player multiplier game (27). 7
The η = 0 (H2 ) and w0 = 0 (H∞ ) cases have already been studied.
12
6. The multiplier problem and the entropy criterion To study the inf W part of game (27), we take θ ∗ , F and therefore G as given. We refer to the resulting optimization problem as the multiplier problem, which we state as: The multiplier problem:
inf
W,W (0)=w0
Z
Γ
′ ′ W (ζ) θ ∗ I − G (ζ) G (ζ) W (ζ) dλ (ζ) .
(29) [newlabel]
For this problem to have an optimized value that exceeds −∞, we require that θ ∗ I − G′ G be positive semidefinite. As a consequence, 2
θ ∗ ≥ [H∞ (F )] , which is a sharper restriction than h i2 θ ∗ ≥ θ˜ = inf H∞ (F ) . F
In what follows we strengthen the positive semidefinite restriction by requiring entropy to be finite: Z log det (θ ∗ I − G′ G) dλ > −∞. Γ
Of course, it is only necessary to check this condition at 2
θ ∗ = [H∞ (F )] . For larger values of θ ∗ , it is satisfied automatically. As we will see, for any value of θ ∗ that exceeds the threshold [H∞ (F )]2 , the entropy measure is closely related to the minimized value of the multiplier problem. Provided that the entropy criterion is met, we may associate choices of θ ∗ with restrictions on the specification errors. That is, consider the constrained worst case minimization problem: Constrained worst case problem:
min − W
Z
W ′ G′ GW dλ
Γ
13
subject to Z
Γ
W ′ W dλ ≤ w0′ w0 + η 2 .
Proposition 2. For any θ ∗ > [H∞ (F )]2 , there exists an η such that the multiplier problem and the constrained worst case problem have the same solution. Proof. See Appendix C. If the infimum of the multiplier problem is attained for θ ∗ = [H∞ (F )]2 , then there is a finite η such that the two problems continue to have the same solution. If the infimum is not attained, then any finite η is associated with a multiplier θ ∗ that exceeds [H∞ (F )]2 . Thus we can think of the θ ∗ ’s in the multiplier problem as indexing the magnitude θ of the allowable specification errors. A robustness bound The multiplier problem interesting for another reason. Let J denote the minimized value of the objective for the multiplier problem. Then −
Z
′
Γ
′
W G GW dλ ≥ J − θ
∗
Z
W ′ W dλ. Γ
(30) [deteriorate]
R Inequality (30) shows how θ ∗ governs the rate at which the function − Γ W ′ G′ GW dλ R objective deteriorates with model misspecification measured by W ′ W d λ. Note how lowering θ ∗ gives more robustness, i.e., less sensitivity of the objective function to misspecifications W. In the remainder of this section, we study the existence of a solution to the multiplier problem and its relation to the entropy criterion. Then in the next section, we return to the Stackelberg multiplier game. Entropy is the indirect utility function of the multiplier problem In establishing our next result, it is convenient to rewrite the multiplier problem as inf
W (ζ)
Z
Γ
′ ′ W (ζ) θ ∗ I − G (ζ) G (ζ) W (ζ) dλ (ζ)
(31) [object]
subject to Z
Γ
W (ζ) dλ (ζ) = w0 6= 0, 14
(32) [constr1]
and Z
W (ζ) ζ j dλ (ζ) = 0,
Γ
(33) [constr2]
for j = 1, 2, .... Constraint (32) can be restated as: W (0) = w0 . Constraint (33) states that wj = 0 for j < 0. The infimum in (31) is over W (ζ) that have coefficients such that P ∞ t ′ t=−∞ β wt wt < ∞. R Proposition 3. Assume that F and θ ∗ are such that Γ log det(θ ∗ I − G′ G)dλ > −∞.8 Then multiplier problem (29) has an optimized value function w0′ D(0)′ D(0)w0 , where D(0) is nonsingular and independent of w0 . The minimized value is attained if θ ∗ I − G′ G is nonsingular on Γ. Proof. The solution to the multiplier problem can be found using techniques from linear prediction theory.9 We must factor a spectral density like matrix: ∗ ′ ′ θ I − G (ζ) G (ζ) = D (ζ) D (ζ)
(34) [f actoriz]
where D is rational in ζ, has no poles inside or on the circle Γ, is invertible inside Γ, and the matrix coefficients of its power series expansion inside Γ can be chosen to be real. The matrix analytic function D is unique only up to premultiplication by an orthogonal matrix but can be chosen to be independent of w0 . The existence of this factorization follows from results known about the linear extrapolation of covariance stationary stochastic processes. In particular, it is known from Theorems 4.2, 6.2 and 6.3 of Rozanov (1967) that the infimum of the objective is given by: ′
w0′ D (0) D (0) w0 .
(35) [Hentropy]
When θ ∗ I − G′ G is nonsingular on Γ, the infimum is attained. To verify this, write the first-order conditions for maximizing (31) subject to (32) and (33) as
′ ′ θ ∗ I − G (ζ) G (ζ) W (ζ) = L (ζ) .
(36) [f oc2]
Then the matrix D in the factorization (34) is nonsingular with an inverse that is rational and well defined inside and on the circle Γ. It follows from (34) that D (ζ) W (ζ) = D (0) w0 . 8 9
(37) [wf ormula1]
Under case (ii) or case (iii), such an F exists. Appendix B displays a linear prediction problem that leads to the spectral factorization problem
here.
15
In addition, the infimum is attained by:10 −1
W ∗ (ζ) = D (ζ)
D (0) w0 .
(40) [solutionW ]
Substituting into (31), we confirm that the minimized solution is (35). As is evident from the proof, the infimum in (31) may not be attained when θ ∗ I − G G is singular somewhere on Γ. But this problem can be remedied by enlarging the space from W to W a . R Corollary 1. Assume that F is such that Γ log det(θI − G′ G)dλ > −∞. Then the problem Z ′ ∗ ′ W (ζ) θ I − G (ζ) G (ζ) W (ζ) dλ (ζ) min a ′
W
Γ
has a solution and the minimized value is w0′ D(0)′ D(0)w0 .
Proof. Solution (40) is in W a even when θ ∗ I − G′ G is singular somewhere on Γ. From Corollary 1, we see that a solution always exists for the multiplier problem, provided that the entropy restriction Z log det (θ ∗ I − G′ G) dλ > −∞ Γ
10
The factorization is also the key for calculating (see Whittle (1983, pp. 99-100)) the projection of yt on the semi-infinite history xs , s ≤ t where {yt , xt } is a covariance stationary process. Thus, substituting the factorization (34) into (36) gives (38) [f oc3]
D (ζ)′ D (ζ) W (ζ) = L (ζ)′ ,
where D(ζ), W (ζ), being analytic inside Γ, have expansions in nonnegative powers of ζ, and D(ζ)′ and L(ζ)′ have expansions in nonpositive powers of ζ in the interior of Γ. If D(ζ)′ is invertible, then following Whittle (1983, p. 100), W (ζ) satisfies h i , D (ζ) W (ζ) = D (ζ)′−1 L (ζ)′ +
where [·]+ is the annihilation operator that sets negative powers of ζ to zero. Because D(ζ)′−1 and L(ζ)′ are both one-sided in nonpositive powers of ζ, [D(ζ)′−1 L(ζ)′ ]+ = D(0)′−1 L(0)′ . Therefore, the solution is D (ζ) W (ζ) = D (0)′−1 L (0)′ .
(39) [new]
Then from (38), L(0)′ = D(0)′ D(0)W (0). Substituting into (39) gives the outcome asserted in the text, D(ζ)W (ζ) = D(0)W (0). The condition (38) corresponds to the solution of Whittle’s projection problem where D(ζ)′ D(ζ) is interpreted as the spectral density of x and L(ζ) is interpreted as the cross-spectral density between y and x.
16
is satisfied. But unless the matrix (θ ∗ I − G′ G) is nonsingular at all frequencies, the minimizing shock sequence may not be stable and may not stabilize the state vector sequence. Problems occur when W ∗ (ζ) = D(ζ)−1 D(0)w0 has a pole on Γ, or equivalently when D(ζ)−1 has a pole on Γ that is not annihilated by D(0)w0 . Nevertheless, even these destabilizing solutions for W ∗ can be approximated by a sequence of W ’s, each of which is in W and hence each of which stabilizes the state vector sequence. The multiplier problem depends on the choice of initialization W (0) = w0 . In what follows we seek to replace this multiplier problem by an entropy criterion, which does not depend on the initialization. To justify this, we will eventually have to show that for a given θ ∗ , the control law that solves the multiplier game does not depend on the initialization w0 and is the same control law that solves the entropy control problem. The criterion for the entropy control problem is motivated by the following representation result: R Proposition 4. Assume that θ ∗ and F are such that Γ log det(θ ∗ I − G′ G)dλ > −∞. The criterion log det[D(0)′ D(0)] can be represented Z ′ ′ (41) log det D (0) D (0) = log det θ ∗ I − G (ζ) G (ζ) dλ (ζ) . [entropy] Γ
Proof. That D(0)′ D(0) corresponds to a ‘one-step’ prediction error covariance matrix, and the origin of D(ζ) from the spectral factorization (34) allows us to use a result from linear prediction theory to verify the representation (41). See Theorem 6.2 of Rozanov (1967, page 76). R Proposition 3 and Proposition 4 both require that Γ log det(θ ∗ I − G′ G)dλ > −∞ but allow for θ ∗ I − G′ G to be singular at isolated points in Γ. Evaluating the right-hand side of (41) requires no spectral factorization, just integration over frequencies. The contour integral on the right side of (41) is our criterion. In the undiscounted case, it coincides with the measure of entropy used by Mustafa and Glover (1988).11 When β = 1, the F that maximizes (41) is often motivated as an approximator of the F that maximizes the H∞ criterion, one that maintains analyticity of W. Finally, we show that when W ∗ stabilizes the state vector sequence, the solution has a Markov representation (i.e, the solution wt+1 can be represented as a function of the time t state xt ). Proposition 5. 11
Assume θ ∗ and F such that θ ∗ I − G′ G is nonsingular on Γ. Then the
It coincides with their measure of entropy at s0 = ∞.
17
solution to the multiplier problem can be represented recursively as: (42) [W f orm2]
wt+1 = Kxt where K = (θ ∗ I − C ′ ΣC)
−1
(43) [Ksoln1]
C ′ ΣC,
and Σ is the positive semidefinite solution to the Ricatti equation Σ = H ′ H + βA′ ΣA + βA′ ΣC (θ ∗ I − C ′ ΣC)
−1
C ′ ΣA
for which A + CK has eigenvalues that are inside the circle Γ. Moreover, Z log det [θ ∗ I − G′ G] dλ = log det (θ ∗ I − C ′ ΣC) . Γ
(44) [ricatti2]
(45) [key2]
Proof. See Appendix D. The Stackelberg Multiplier Game and an Entropy Control Problem We ultimately want to study the Stackelberg multiplier game (27). We call it ‘Stackelberg’ because of the structure of commitment: the agent chooses a decision rule F that feeds back on the state, assuming that ‘nature’ then commits to a sequence {wt } that can depend on F but that does not feed back on the state. Here ‘nature’ is the Stackelberg leader. We begin by studying a different game, which we call a Nash multiplier game in sequences (NMGS). In this game, the agent and nature, respectively, choose sequences {ut } and {wt+1 } given w0 and given the other player’s choice of sequence. It turns out that the equilibrium to the NMGS can be represented recursively. That equilibrium then supplies guesses that can be used to construct the equilibrium of the Stackelberg multiplier game (27). The equilibrium of the Stackelberg game can be represented in terms of a pair of decision rules F, K and yields identical outcomes to yet another game, a Markov perfect multiplier game that can be cast in terms of a value function common to both players. Game 4: The Nash multiplier game in sequences We shall study the Nash multiplier game in sequences (NMGS) by forming a Lagrangian in the time domain. We take the time t common objective to be: C=
∞ X t=0
′ β t −zt′ zt + βθwt+1 wt+1
(46) [objtime]
subject to the state-evolution equation (3) and the target vector relation (6). The initial state vector x0 is given and is not restricted to satisfy x0 = Cw0 . One player chooses {ut } 18
to maximize the objective and the other player chooses {wt+1 } to minimize the objective. The control sequences for the two players are restricted to be stabilizing ∞ X t=0
∞ X
β t u′t ut < ∞
(47) [stabilizing]
′ β t wt+1 wt+1 < ∞,
t=0
and the matrix Ao must have all of its eigenvalues inside Γ. Then stable sequences of {ut } and {wt+1 } are guaranteed to imply a stable state vector sequence. We shall use the following Definition 1. An equilibrium of the Nash multiplier game in sequences (NMGS) is defined in terms of the following components. The agent’s problem is to maximize (46) subject to (3) by choosing a sequence {ut }, taking the {wt+1 } sequence as given. Nature’s problem is to minimize (46) subject to (3) by choosing a sequence {wt+1 }, taking {ut } as ∗ given. An equilibrium of the NMGS is a pair of sequences {u∗t }, {wt+1 } that solve both the agent’s problem and nature’s problem. To find an equilibrium of the NMGS, we begin by substituting from (6) for zt in the objective function. We can then form a Lagrangian for each player. These Lagrangians have the special feature that the first-order conditions for the two players’ problems make their co-state variables have the same laws of motion. This reflects the zero-sum nature of the game.12 This outcome allows us to analyze the game by forming a single Lagrangian:13 L=−
∞ X
β t {[x′t H0′ H0 xt + u′t J ′ Jut + 2u′t J ′ H0 xt ]
t=0 ′ 2βµt+1 (A0 xt
+ But + Cwt+1 − xt+1 ) − βθwt+1 wt+1 },
where we have equated the costate sequences of the two players. First-order conditions with respect to ut , wt+1 , xt+1 , respectively, are: J ′ Jut + J ′ H0 xt + βB ′ µt+1 = 0 −θwt+1 + C ′ µt+1 = 0 βA0 ′ µt+1 + H0 ′ H0 xt + H0 ′ Jut − µt = 0
Assume that J ′ J is nonsingular, and solve for ut and wt+1 : −1
ut = − (J ′ J) wt+1 = 12 13
1 ′ C µt+1 . θ
−1
J ′ H0 xt − β (J ′ J)
B ′ µt+1
(48) [f oncu] (49) [f oncw]
The co-states are derivatives of a common value function. We do not impose (10b).
19
Substituting from these expressions for ut and wt+1 makes the state equation become xt+1
h i 1 −1 ′ −1 ′ ′ ′ ′ = A0 − B (J J) J H0 xt − βB (J J) B − CC µt+1 , θ
and the co-state equation become i h i h −1 ′ −1 ′ ′ ′ ′ ′ ′ ′ β A0 − H0 J (J J) B µt+1 + H0 H0 − H0 J (J J) J H0 xt − µt = 0. Write the system as
x L t+1 µt+1 where
and
L=
N =
xt , =N µt
(50) [hamiltonian]
i h −1 βB (J ′ J) B ′ − θ1 CC ′ h i, −1 ′ ′ ′ ′ β A0 − H0 J (J J) B
I 0
h i −1 A0 − B (J ′ J) J ′ H0
h i −1 ′ ′ ′ ′ − H0 H0 − H0 J (J J) J H0
0 I
.
It can be verified that the matrix pencil ( √λβ L − N ) is symplectic. (See Anderson, Hansen, McGrattan, and Sargent (1996) for the definition and properties of symplectic pencils.) It √ follows that the generalized eigenvalues of (L, N ) come in β-symmetric pairs: for every eigenvalue λi there is another eigenvalue λ−i such that λi λ−i = β −1 . To guarantee that an equilibrium of the NMGS exists, we must rule out generalized eigenvalues of (L, N ) in Γ. As a consequence, half of the generalized eigenvalues are inside the circle Γ and the other half are outside this circle. The generalized eigenvectors √ associated with the eigenvalues inside Γ generate what we refer to as the ( β) stable deflating subspace. The dimension of this space equals the number of entries in the state vector xt . We assume that there exists a positive semidefinite matrix P such that the I stable deflating subspace can be depicted as x. P Under these restrictions, we can construct an equilibrium of the NMGS that takes the form µt = P xt , and can find an equilibrium state vector sequence that satisfies L
I P
xt+1 = N
Thus we can state: 20
I P
xt .
(51) [symp1]
Proposition 6. Suppose (i) J ′ J is nonsingular and J + H0 ζ(I − ζA)−1 B has full column rank on Γ;14 (ii) (L, N ) has no generalized eigenvalues on Γ; √
(iii) Any element of the ( β) deflating subspace of (L, N ) can be represented as for some vector x where P is a symmetric matrix;
I P
x
(iv) θI − G0 ′ G0 is nonsingular on Γ where G0 = H0 (I − ζA0 )−1 C. Then there exists an equilibrium of the NMGS of the form ut = −F ∗ (A∗ )t x0 and wt+1 = K ∗ (A∗ )t x0 for some K ∗ and F ∗ , where A∗ = A0 − BF ∗ + K ∗ C has eigenvalues that are inside Γ. Proof. The first player maximizes (46) by choice of {ut } subject to the state evolution: xt+1 = A0 xt + But + CK ∗ x ˆt x ˆt+1 = A∗ x ˆt ; The second player minimizes (46) by choice of {wt+1 } subject to the state evolution: xt+1 = A0 xt + Cwt+1 − BF ∗ x ˆt x ˆt+1 = A∗ x ˆt Both problems are special cases of what Anderson et al (1996) call an augmented regulator problem. Condition (i) guarantees that the objective is strictly concave in the decision sequence of the first player and condition (ii) guarantees that the objective is strictly convex in the decision sequence of the second player. We compute the equilibrium by stacking the state-costate equations of the two players and solving the resulting difference equation. We impose the equilibrium conditions that x ˆt = xt and seek a solution in which the costate sequences coincide for both players. This leads to the linear system (51) containing the stacked first-order conditions √ and incorporating x ˆt = xt for all t. Thus we find an equilibrium by constructing a β stable sequence of state vectors that satisfies (51). From the first partition of (51), we see that h i −1 (I + GP ) xt+1 = A0 − B (J ′ J) J ′ H0 xt 14 This condition is weaker than the corresponding detectability condition for the regulator problem √ without robustness. It guarantees the existence of a solutoin to this problem under β stability.
21
where G = βB (J ′ J)
−1
1 B ′ − CC ′ . θ
It follows from Theorem 21.7 of Zhou, Doyle and Glover (1996) that the matrix P is symmetric and the matrix I + GP is nonsingular. Hence we have the state evolution: xt+1 = A∗ xt where A∗ = (I + GP )
−1
h i −1 A0 − B (J ′ J) J ′ H0 .
By (iii), the matrix A∗ has eigenvalues that are inside Γ. From the first-order conditions, we have that 1 wt+1 = C ′ P A∗ xt θ = K ∗ xt ut = − (J ′ J) = −F ∗ xt .
−1
(J ′ H0 + βB ′ P A∗ ) xt
Thus the equilibrium of the NMGS is t
∗ wt+1 = K ∗ (A∗ ) x0 , t
u∗t = −F ∗ (A∗ ) x0 .
Relationship of NMGS to Stackelberg multiplier game We are really interested in the Stackelberg multiplier game (27), not the NMGS. But the NMGS is useful as a way of discovering components of the equilibrium of (27). As we have seen, the equilibrium conditions of the NMGS give rise to a linear difference equation system. Relative to the Stackelberg multiplier game (27), the NMGS imposes too much commitment about the choice of the control sequence {ut }: compare the commitment to a sequence for ut in the NMGS with the commitment to a feedback rule ut = F xt in (27).15 We now turn to the Stackelberg multiplier game (27), in which the controls are chosen through control laws of the form ut = −F xt . The NMGS provides us with a good guess for the solution to the Stackelberg game. Our next proposition supports this guess. 15
In effect, the optimization over {ut } in the Stackelberg multiplier game (27) allows the u-decision maker to contemplate that the disturbances will differ from the {wt+1 } computed from the NMGS equilibrium. Formally, this connects to issues of perfection in game theory.
22
Proposition 7. Suppose (i) J ′ J is nonsingular and J + H0 ζ(I − ζA)−1 B has full column rank on Γ; (ii) (L, N ) has no generalized eigenvalues on Γ; √
(iii) Any element of the ( β) deflating subspace of (L, N ) can be depicted as some vector x where P is a symmetric, positive semidefinite matrix;
I P
x for
(iv) θI − C ′ P C is positive definite. Then there exists an equilibrium of the Stackelberg multiplier game (27) in which F = F ∗ and K = K(F ) from (43) of Proposition 5 satisfies K(F ∗ ) = K ∗ ; consequently wt+1 (F ∗ ) = −F ∗ (A∗ )t x0 , where F ∗ , K ∗ and A∗ are the same matrices that represent the equilibrium of the NMGS. Proof. The regularity conditions imposed in this Proposition differ from those in Proposition 6. As we will see, this difference is due to the change in the vantage point of the player choosing {wt+1 } when we move from the NMGS to the Stackelberg multiplier game. Nevertheless, the formulas for the equilibrium of the NMGS can still be used to study this equilibrium of the Stackelberg multiplier game, and the appropriate notion of stability remains in force. For the Stackelberg multiplier game (27), one player submits a decision rule of the form: ut = −F xt . The other player chooses a sequence {wt+1 (F )} to minimize (46). For some F ’s the infimum may not be attained. Nevertheless, we can form the criterion C(F, x0 ), although it may be −∞ for some choices of F . We wish to show that C (F ∗ , x0 ) ≥ C (F, x0 ) for any F ∈ F . To demonstrate this inequality, we first show that {wt+1 (F ∗ )} coincides with the of the equilibrium NMGS. Thus, we study the problem of minimizing (46) by choice of {wt+1 } subject to xt+1 = (Ao − BF ∗ ) xt + Cwt+1 .
∗ {wt+1 }
This differs from the optimum problem of the malevolent agent (over w) within a NMGS. In the present problem, the malevolent agent no longer takes the control sequence {ut } as exogenous, and we do not enter x ˆt into the perceived state evolution equation. Now the malevolent agent knows that xt = x ˆt when solving his optimization problem. 23
∗ To show that {wt+1 (F ∗ )} coincides with the {wt+1 } from the NMGS equilibrium, we form the discrete-time Hamiltonian system for the choice of {wt+1 } as a function of F ∗ , as required in the Stackelberg multiplier equilibrium. From (50), the first-order conditions for wt+1 collapse to: −θwt+1 + C ′ µt+1 = 0
βA0 ′ µt+1 + H0 ′ Hxt − µt = 0.
Next impose that ut = −F ∗ xt . Then from (48), BF ∗ xt = −B (J ′ J)
−1
−1
J ′ H0 xt − βB (J ′ J)
B ′ µt+1 .
Thus the state equation becomes: xt+1 = Axt +
1 ′ (C C) µt+1 θ
for A = A0 − BF ∗ . In addition, note that H0 xt + Jut = Hxt for H = H0 − JF ∗ . Note also from (48) that F ∗ ′ J ′ Jut = F ∗ ′ J ′ H0 xt + βF ∗ ′ B ′ µt+1 . Thus the modified co-state equation is βA′ µt+1 + H ′ Hxt − µt = 0. Thus
I 0
− θ1 CC ′ βA′
I P
xt+1 =
A −H ′ H
0 I
I P
xt .
It follows that P = Σ satisfies Ricatti equation√(91)and hence (44). It is the unique solution that implies that the state vector sequence is β stable. From the proof of Proposition 6 it follows that K = K ∗ , and the positive definiteness of θ ∗ I −G′ G follows from our restriction that θ ∗ I − C ′ P C is positive definite. From this result, we can compute C(F ∗ , x0 ) by simply evaluating the objective of the NMGS. Consider now the evaluation of C(F, x0 ) for some other choice of F in F . We can ∗ bound this criterion as follows. First, recursively generate the NMGS equilibrium {wt+1 } sequence as x ˆt+1 = A∗ x ˆt ∗ wt+1 = K ∗x ˆt
24
where x ˆ0 = x0 . Then form the state equation: xt+1 = (A0 − BF ) xt + CK x ˆt zt = (H0 − JF ) xt .
ˆ Using these recursions to evaluate (46), we obtain an upper bound C(F, x0 ) on C(F, x0 ). ˆ A convenient feature of this upper bound is that we can dominate C(F, x0 )√by solving the following augmented regulator problem. Maximize (46) by choice of a ( β) stabilizing control sequence {ut } for the state evolution xt+1 = A0 xt + But + CK ∗ x ˆt x ˆt+1 = A∗ x ˆt ∗ with wt+1 = K ∗x ˆt . But this is just the problem of the player who sets ut in the NMGS. Following Anderson et. al., we solve this problem by stacking a state-costate system with the composite state (xt , x ˆt ) and the costate corresponding to xt . The costate for x ˆt can be omitted because the x ˆt is an uncontrollable state vector. Thus we form a system xt+1 xt La µt+1 = N a µt x ˆt+1 x ˆt
where:
−1 ′ ′ I βB (J J) B 0 h i La = 0 β A′0 − H0 ′ J (J ′ J)−1 B ′ 0 0 0 I −1 ′ ′ ∗ A − B (J J) J H 0 CK 0 0 i h N a = − H0 ′ H0 − H0 ′ J (J ′ J)−1 JH0 I 0 . 0 0 A∗ √ To solve the problem we now look for the β deflating subspace of (La , N a ) parameterized as I x I P2 x + Pˆ x ˆ = P2 (x − x ˆ) + P2 + Pˆ x ˆ. x ˆ 0 I
We √ will show that we can reduce the problem to that of locating two smaller dimensional β deflating subspaces. The first is for the pair (L2 , N2 ) with ! −1 ′ ′ I βB (J J) B h i L2 = −1 0 β A′0 − H0 ′ J (J ′ J) B ′ 25
N2 =
−1
′ ′ h A0 − B (J J) J H0 i −1 ′ ′ ′ − H0 H0 − H0 J (J J) JH0
0 I
!
.
This is the subspace associated with the component of xt − x ˆ0 that must be set to zero to solve the control problem. Notice that (L2 , N2 ) gives the state-costate system for the H2 √ control problem. Thus we can restrict xt − x ˆt to reside in the β stable deflating subspace of (L2 , N2 ) using the matrix P for the H2 problem. To study the second subspace, we next seek to sustain a solution to: I I La P2 + Pˆ x ˆt+1 = N a P2 + Pˆ x ˆt . I I
Alternatively, and more conveniently, we pose this problem as being (i) to find a matrix Pˆ √ ˆ N ˆ ) as parameterized by: such that we can represent the β deflating subspace of (L, I P2 + Pˆ x ˆ I where
ˆ= L ˆ = N
I 0
−1
′ ′ h βB (J J) B i −1 ′ ′ ′ ′ β A0 − H0 J (J J) B
!
−1
Ah0 − B (J ′ J) J ′ H0 + CK ∗ i −1 − H0 ′ H0 − H0 ′ J (J ′ J) JH0
0 I
!
,
and (ii) to show that the implied law of motion for xˆt+1 agrees with x ˆt+1 = A∗ x ˆt .
(52) [evolveh]
In constructing the deflating subspace in (i), we will show that P2 + Pˆ = P. ∗ This can be done by simply imitating the argument that {wt+1 (F ∗ )} = {wt+1 } but re∗ versing the roles of {wt+1 } and {u √ t }. Thus we impose wt+1 = K xt . It follows that P can indeed be used to depict the β deflating subspace and that the implied evolution for {ˆ xt+1 } is given by (52) as required by (ii). √ Thus we have shown that the β deflating subspace can indeed be uncoupled. By initializing xˆ0 = x0 , it follows that x ˆt = xt . Moreover, the optimized objective coincides ∗ with the C(F , x0 ). Thus
C (F, x0 ) ≤ Cˆ (F, x0 ) ≤ C (F ∗ , x0 ) . 26
Conditions (ii) and (iii) are guaranteed to be satisfied when βB(J ′ J)−1 B ′ − θ1 CC ′ is positive semidefinite. However, this condition is too strong for many applications (unless of course θ = ∞). Relation of Stackelberg multiplier game to entropy criterion One step remains to show that the Stackelberg multiplier game justifies the entropy criterion (28). The extra step is needed because criterion (35) depends on w0 while (28) does not. But Proposition 7 showed that the F that solves (27) is independent of w0 . Therefore, we will attain the same decision rule F by maximizing a criterion defined in ˆ ′ D(0)w ˆ terms of D(0)′ D(0) alone, ignoring w0 . Thus, let w0′ D(0) 0 denote criterion (35) for another control law, say Fˆ . If ′ ˆ (0)′ D ˆ (0) w0 w0′ D (0) D (0) w0 ≥ w0′ D
for all w0 , then ′ ˆ (0)′ D ˆ (0) D (0) D (0) ≥ D
where ‘≥’ is the standard partial ordering of positive semidefinite matrices. As a consequence, h i ′ ˆ (0)′ D ˆ (0) , trace D (0) D (0) ≥ trace D
or alternatively:
h i ′ ˆ ˆ log det D (0) D (0) ≥ log det D (0) D (0) .
′
Because we want the criterion to apply for all initial conditions w0 , we take our criterion to be ′ log det D (0) D (0) . Proposition 4 shows that this is the entropy criterion used to define (28).
27
Game 5: Markov perfect equilibrium
The equilibrium outcome of the Stackelberg game matches the outcome from our next game, a Markov perfect equilibrium to the multiplier game. This game is interesting because it connects directly to the discounted risk-sensitivity criterion of Hansen and Sargent (1995). Definition 2. A Markov perfect equilibrium (MPE) of the multiplier game is a pair of strategies ut = −F ∗ xt , wt+1 = K ∗ xt such that (a) Given K ∗ , ut = −F ∗ xt maximizes (46), subject to xt+1 = Ao X + But + Cwt+1 .
(53) [transmark]
(b) Given F ∗ , wt+1 = −K ∗ xt minimizes (46) subject to (53). Associated with a MPE is the following pair of Bellman equations ′ x′ P x = max − (H0 x + Ju) (H0 + Ju) − βθw∗′ w∗ − βy ′ P y
(54a) [bell1;a]
′ x′ P x = min − (H0 x + Ju∗ ) (H0 + Ju∗ ) − βθw′ w − βy ′ P y
(55a) [bell2;a]
u
w∗ = K ∗ x
w
u∗ = −F ∗ x,
(54b) [bell1;b]
(55b) [bell2;b]
where both extremizations are subject to the common law of motion y = Ao X + Bu + Cw. The (P, K ∗ , F ∗ ) that form a MPE also solve the following closely related zero-sum game: ′ x′ P x = max min − (H0 x + Ju) (H0 + Ju) − βθw′ w − βy ′ P y u
w
where the maximization is again subject to
y = A0 x + Bu + Cw. The outcome is u = −F ∗ x, 28
(56) [game55]
and w = K ∗ x. Though it yields the same equilibrium strategies and outcomes, notice that the game (56) has a slightly different timing protocol from game (54)–(55). In (56), within each period, the w-player moves after the u-player, while (54)-(55) incorporates simultaneous moves within periods. The (P, K ∗ , F ∗ ) associated with the equilibrium of the Stackelberg multiplier game solves (54) and (55), and thereby determines a MPE. We summarize the connections between a MPE and an equilibrium of the Stackelberg multiplier game in Proposition 8. The (P, K ∗ , F ∗ ) that solve the Stackelberg multiplier game also support a Markov perfect equilibrium of the multiplier game. Proof. The required marginal conditions match.
The functional equation (56) leads directly to computing the equilibrium by iterating to convergence on Hansen and Sargent’s (1995) composite operator T ◦ D. The T piece represents the maximization over u and the D piece the minimization over w in (56). Formulation (54)-(55) would lead to a single Bellman like operator formed as B ≡ T ◦ D, and a single associated Ricatti equation. Match to risk-sensitivity In appendix D, we remark how iterating to convergence on the composite operator T ◦ D produces the optimal risk-sensitive decision rule. Appendix D also describes another algorithm for computing the optimal risk-sensitive control that is more directly related to entropy and the Stackelberg multiplier game. The algorithm iterates to convergence on the S operator of Hansen and Sargent (1995) in order to form the entropy criterion for a fixed F ; then it chooses an improved F and iterates to convergence on S again. Appendix D iterprets this algorithm as a Howard policy improvement algorithm.
29
‘Time consistency’ and θ
Maximization of (41) is over a time-invariant F . Suppose that we were to ask the decision maker to reoptimize after several periods have elapsed, say at the beginning of period t1 > 0? What change has to be made to game 2 to induce the decision maker to choose the original F ? The answer is that the magnitude of the right side of the constraint (17), call it ηt21 , would have to be adjusted to keep θ at its original value. Keeping θ fixed in this way amounts to causing the ‘great deceiver’ to commit to a feedback rule making wt+1 a function of xt . The recursive formula (42) shows how w feeds back on the state.16 4. Digression: Re-interpretation of H2 This section can be skipped without interrupting the flow of the presentation. Its purpose is to display another game producing H2 where the shocks wt are permitted to be nonzero for t > 0. Recall that wt is m × 1, where m is the number of shocks. We continue to assume that wt = 0 for all t < 0. We state Game 4: Choose (F, {wt }) to attain max inf − F
{wt }
∞ X
(57) [game5]
β t zt′ zt
t=0
subject to (58a) [game6;a]
x0 = Cw0 ∞ X
(58b) [game6;b]
β t wt wt′ = σ 2 I
t=0 ∞ X
t
β 2 wt
t=0
β
t−j 2
wt−j
σ2 ≤ η2
′
= 0 ∀j 6= 0
(58c) [game6;c] (58d) [game6;d]
Equations (58b), (58c) imply that ′
W (ζ) W (ζ) = σ 2 I, 16
|ζ| =
p
β,
(59) [game7]
Hansen, Sargent, and Tallarini (1999) use a version of this formula to compute ηt21 for a permanent income model.
30
Further, (59) implies (58b), (58c). Game 4 has the following counterpart in the frequency domain: Game 5: Find F, σ 2 that attain max inf2 −σ F
σ
2
Z
Γ
′ trace G (ζ) G (ζ) dλ (ζ) ,
(60) [game8]
subject to (61) [game9]
σ2 ≤ η2 .
We have substituted (59) into (16) to obtain (60). The solution of game 5 sets σ 2 at its uppper bound η 2 , and sets F to maximize the H2 criterion (19). Criterion (19) emerges when the shock process {wt }∞ t=1 is taken to be a martingale difference sequence adapted to Jt , the sigma algebra generated by x0 and the history of ′ w, where Ewt+1 wt+1 |Jt = I. The martingale difference specification implies ∞ t−j ′ 2 X −1 t E β 2 wt β 2 wt−j = σ (1 − β) I 0 t=0
if j = 0; otherwise.
(62) [wcondn2]
2 Equation (62) is equivalent with EW (ζ)W (ζ)′ = σP (1 − β)−1 I for ζ ∈ Γ. With this ∞ representation, (19) is proportional to −(1 − β)−1 E t=0 β t zt zt′ .17
17
See Whiteman (1985b).
31
5. Filtering In this section, we reinterpret the entropy criterion in the context of a filtering problem dual to our control problem. We formulate the filtering problem as follows. We write the state-space model as: x−t =Ax−t−1 + Cw−t
(63a) [f state1;a]
y−t =Gx−t−1 + Dw−t .
(63b) [f state1;b]
−t−1 ˆ Here t ≥ 0. Let y −t denote the history of y up to −t. Let E[(·)|y ] denote a predictor conditioned on the history of y up to time −t − 1. We want to construct filtered values ˆ −t |y −t ], yˆ−t−1 = E[y ˆ −t−1 |y −t−1 ], w ˆ −t−1 |y −t−1 ], where E(·) ˆ x ˆ−t = E[x ˆ−t−1 = E[w is the counterpart of an expectations operator. We restrict the filter to be time-invariant.
We construct a robust filter by solving a zero sum two player game in which an evil agent chooses the sequence of shocks {w−t } to maximize the prediction error criterion. The solution of the game has the evil agent set the shock according to w−t = Ox−t−1 . Knowing O but not x−t−1 , the filter will take w ˆ−t = Oˆ x−t−1 . We formulate the game by considering the robust analogs of an observer system (see Kwakernaak and Sivan (1972)). Emulating the measurement equation, we require that the estimator of y take the form: yˆ−t = Gˆ x−t−1 + Dw ˆ−t . It follows that the prediction error for y−t is: y−t − yˆ−t = G (x−t−1 − x ˆ−t−1 ) + D (w−t − w ˆ−t ) . We consider updating schemes for x ˆ parameterized by a fixed gain matrix K and taking the form: (64) x ˆ−t = Aˆ x−t−1 + K (y−t − yˆ−t ) + C w ˆ−t [f ilter] or x ˆ−t = Aˆ x−t−1 + K (y−t − Gˆ xt−1 ) + (C − KD) w ˆ−t . Subtracting (64) from (63a) gives x−t − x ˆ−t = (A − KG) (x−t − x ˆ−t ) + (C − KD) (wt − w ˆ−t ) .
(65) [f orecerror]
Define the forecast errors: e−t =x−t − x ˆ−t v−t =w−t − w ˆ−t . 32
(66a) [f ore1;a] (66b) [f ore1;b]
Then, (65) can be expressed e−t = (A − KG) e−t−1 + (C − KD) v−t .
(67) [f ores]
The filter is designed to minimize forecast errors in the following linear combination of the forecast errors in the state: (68) z−t = He−t . [f orec] We take the criterion of the multiplier form of the two-person game for the filtering problem to be ∞ X 1 ′ (69) ′ z−t − θv−t v−t . β t z−t [criterf ] 2 t=0 The person composing the filter aims to minimize this criterion by choosing K, while the evil agent aims to maximize it by choosing v−t ’s. Here θ is a penalty on the v−t sequence. Paralleling our construction above for the control problem, we now focus on the problem of the evil agent for the filtering problem. We take K as fixed and study the problem of maximizing (69) by choice of {v}. We proceed to form the conjugate problem associated with choosing the v−t ’s to maximize (69). Thus, let λt denote the multiplier on (67), and φt the multiplier on (68) and form a Lagrangian. The first order conditions for the problem of minimizing the Lagrangian with respect to {v−t , e−t }∞ t=0 and maximizing it with respect to {λt , φt }∞ are: t=0 1 ′ v−t : v−t = − (C − KD) λt θ z−t : z−t = −φt ′
e−t−1 : βλt+1 = (A − KG) λt + βH ′ φt+1 e−0 : λ0 = H ′ φ0
(70a) [f f oc1;a] (70b) [f f oc1;b] (70c) [f f oc1;c] (70d) [f f oc1;d]
Use (70a) and (70b) to write 1 ′ ′ λt (C − KD) (C − KD) λt 2 θ = φ′t φt .
′ v−t v−t = ′ z−t z−t
Substituting these into (69) gives the dual criterion ∞
1 X t ′ ′ β −λt (C − KD) (C − KD) λt + θφ′t φt . θ t=0 33
(71) [dualcrit]
The dual problem is to minimize (71) by choice of {φt }∞ t=0 , subject to (70a) and (70d). For convenience, rewrite (70c), (70d) as ′
λt+1 = β −1 (A − KG) λt + H ′ φt+1 λ0 = H ′ φ0 .
(72a) [f f oc11;a] (72b) [f f oc11;b]
This is a discounted linear regulator problem with state λt and control φt+1 . Note that the optimized value of the objective functions of the original problem and the dual problem are equal. Now form the time-domain version of the multiplier version of the problem associated with (9): ∞ X ′ (73) β t −x′t (H0 − DF ) (H0 − DF ) xt + θwt′ wt [criterp] t=0
subject to xt+1 = (Ao − BF ) xt + Cwt+1 x0 = Cw0 .
(74a) [constrc;a] (74b) [constrc;b]
Notice that the conjugate filtering problem (71), (72a), (72b) corresponds to (73), (73), (74) with the following settings: [φt , λt , H ′ , C, K, D, β −1A, β −1 G] for the filtering problem match [wt , xt , C, H0′ , F ′ , G′ , A′o , G′ ] for the control problem. Because the optimized value of the original and the dual filtering problem are equal, it follows that the optimized criterion function for the control problem for (9) or (16), (17) has an interpretation as a criterion function for the optimal filtering problem given by the match ups in the preceding paragraph. An F that solves the robust control problem thus corresponds to a K ′ for the filtering problem. From a solution of the dual problem, we can use (70a) to compute the worst case shocks for the filtering problem. The formula for the worst case shock from the control problem is the worst case zt in the filtering problem.
34
10
10
9
9
8
8
7
7
6
6
5
5 θ=5
4
4
3
3
2
0
2
θ = 10
1
θ = 10 1
θ=∞ 0
1
θ=5
2
0
3
θ=∞ 0
1
2
3
√ Figure 2: Frequency decompositions of [trace G(ζ)′ G(ζ)] as functions of ω (ζ = β exp(iω)) for objective function of Ball’s model under three decision rules; β = 1 on left panel, discount factor β = .9 on right panel.
6. Examples This section briefly displays four examples: (1) a version of Ball’s model with discounting; (2) a permanent income model of consumption; (3) a pure prediction problem to be used for a risk neutral asset pricing model, and (4) a robust version of a signal extraction problem of Muth (1960). 1. Ball’s model with discounting P∞ We discount the objective in Ball’s model, altering it to be −E t=0 β t (πt2 +yt2 ) and derive the associated three values of θ. Figure 2 displays frequency decompositions √ −iω ′ rules √ for−iω of [trace G( βe ) G( βe )] for robust rules with β = 1 and β = .9.
35
2. A permanent income, adjustment cost economy A planner chooses an allocation for an adjustment cost economy. Let ct , it , kt−1 , dt denote consumption, investment, initial capital stock, and an exogenous endowment pro′ cess. Given the state x0 ≡ [ k−1 d0 d−1 ] , the planner chooses a stochastic process for {ct , it , kt }∞ t=0 to maximize h i 2 2 v (x0 ) = (ct − b) + φit + βRt v (xt+1 ) subject to ct + it = γkt−1 + dt kt = δkt−1 + it dt = (1 − ρ1 − ρ2 ) µd + ρ1 dt−1 + ρ2 dt−2 + wt . We set δ = .95, β = .95−1 , γ = .1, φ = .02, ρ1 = 1.5, ρ2 = −.8, µd = 5, b = 30, Ewt = 19 0, Ewt2 = 1.18 We let α ≡ −θ −1 and consider the three values α = The 0, −.0075, √ √ −.015. −iω ′ left panel of figure 3 plots the frequency domain decomposition of trace G( βe ) G( βe−iω ) across frequencies for the α = 0 rule. The right panel shows the ratio of [trace G′ G] for α = −.0075, α = −.015 rules, respectively, to [trace G′ G] for the α = 0 rule. Two features of the model’s dynamics show up in the plots. The endogenous dynamics associated with the capital accumulation decision determine the zero frequency peak in the values of [trace G(ζ)′ G(ζ)]. The exogenous dynamics associated with the second-order a.r. endowment shock, chosen to have a spectral peak at ω = .5666 radians, determine an associated peak near that frequency. Increasing the decision maker’s preference for robustness by raising the absolute value of α lowers the maximum value of [trace G′ G] at low frequencies by flattening it across frequencies.
18 19
When φ = 0, this is a permanent income economy. The parameter α = −θ −1 is the risk-sensitivity of Jacobson and Whittle.
36
8
10
2
1.8 7
α = − .015
10
1.6
1.4
6
10
1.2 α = − .0075 5
10
1
0.8 4
10
0.6
0.4
3
10
0.2 2
10
0
1
2
3
0
4
0
1
2
3
′ Figure √ 3: Left panel: frequency decomposition of [trace G(ζ) G(ζ)] as functions′ of ω (ζ = β exp(iω)) for permanent income model. Right panel: ratios of [trace G(ζ) G(ζ)] for robust rules to optimal rules.
3. Pure Prediction In the pure prediction version of the control problem (see Whittle (1996, page 302)), F = 0 and we are interested in adjusting the standard j step ahead prediction formula Aj xt for the worst case shocks. That is, we solve only the inf part of game (9) or (16), (17). Let the worse case shocks be w ˆt+1 = Oxt , where we give a formula for O as a function of θ in (75) below. Under the worst case shocks, the dynamics are xt+1 = (A + CO) xt , leading to the modified prediction formula ˆt xt+j = Aˆj xt , E where Aˆ = A + CO. The formula for O is found by applying formulas in Appendix D for the special case F = 0. First, compute a matrix V by iterating to convergence on S (V ) = −H ′ H + βA′ D (V ) A 37
where D, S are defined in (96), (98) in Appendix D. Then compute O from O = α (I − αC ′ V C)
−1
C ′ V,
(75) [M worst]
where α = −θ −1 . In the spirit (but not to the letter) of Aaron Tornell (1998), let zt = Hxt be the dividend of a stock and consider the risk-neutral cum-dividend price ˆt pt = E
∞ X
β t zt+j
t=0
or
−1 ˆ pt = H I − β A xt .
As an example, take β = .96, A = .98, C = 1, H = 1 with α ≡ −θ −1 = −.001. Then O = .0161 and Aˆ = .9961. It follows that for α = 0, pt = 16.89zt , while for α = −.001, pt = 22.85zt . The coefficients on zt in these two formulas are price–dividend ratios. Notice how the worst case analysis in the robust forecast leads to ‘over optimism’ in the form of a higher price-dividend ratio than justified by taking the reference model to be true. This example shows again how the worst case shocks accentuate the low frequency components of the xt process. In the pure prediction problem, the state is known but the ‘agent’ is passive because he has no controls. All he can do is forecast the process in a robust way. In the next section, we describe an example where the state is not known. 4. Muth’s filtering example We have described a filtering problem dual to our control problem. We can use an undiscounted stochastic counterpart to get a robust version of Muth’s (1960) signal extraction interpretation of permanent income.20 A recursive version of Muth’s model for income y is xt+1 = xt + w1,t+1 yt = xt + w2,t+1 ,
(76a) [muth1;a] (76b) [muth1;b]
where xt , yt are scalars and wt is a 2 × 1 vector Gaussian martingale difference sequence ′ with Et wt+1 wt+1 = I. 20
See Whittle (1996) for deterministic formulations of equivalent stochastic filtering problems.
38
The robust filter has representation x ˆt+1 = Aˆ xt + K (yt − Gˆ xt ) + (C − KD) Oˆ xt , where A = 1, G = 1, C = [ 1 0 ] , D = [ 0 1 ], and O is a (1 × 2) vector expressing the dependence of the worst case shocks on xt . For Muth’s special case, the robust filter takes the form (77) x ˆt+1 = (1 − K (1 + O2 ) + O1 ) x ˆt + Kyt . [rf ilter] In the special case that θ = +∞, O is a vector of zeros and K coincides with the ordinary Kalman gain, making (77) coincide with the ‘adaptive expectations’ formulation found by Muth. When θ < +∞, O deviates from zero and K deviates from that found by the usual Kalman filter, causing adjustments to the terms on both x ˆt and yt . For example, for θ −1 ≡ α = −.02, for this example we calculate that K = .6271 (as opposed to the golden ratio .618 for α = 0) and calculate that −K(O2 ) + O1 = .0125. Thus, turning on a preference for robustness changes both the discount factor on past observations and the weight on the most recent observations in (77).
Conclusions This paper is part of a research program to use risk-sensitivity to model a preference for robustness in a tractable way (see Hansen, Sargent, and Tallarini (1997), and Anderson, Hansen, and Sargent (1997)). For economic problems, discounting is important. That is why we have extended the results in Mustafa and Glover (1990) to allow discounting. The frequency domain is a natural setting because misspecification is being modeled in terms of temporal properties of the shocks impinging on a fixed system. Robustness emerges from the following thought process. A decision maker considers a class of perturbations to the temporal properties of the shocks and wants decisions that will work well even against the worst pattern of shocks. But what is worst depends on his decision rule. For a given decision rule, the discounted value attained can be represented in the frequency domain as minus the integral across frequencies of the trace of a ‘spectral density’ matrix (a Hermitian matrix for each frequency). Given his decision rule and where initial conditions can be ignored (x0 = 0), the worst serial correlation pattern focuses spectral power at the frequency that attains the highest weight in the frequency domain representation of the variance. The contribution of that frequency to discounted costs is measured by the maximal eigenvalue of the ‘spectral density’ matrix. The decision maker optimizes against this worst serial correlation pattern by selecting the feedback rule that minimizes the maximum eigenvalue across all frequencies. The entropy or risk-sensitivity criterion puts initial conditions back into the picture. It produces a one-parameter family of decision rules that approximate the H∞ rule. This 39
family emerges from restraining the ‘worst case’ shock sequence by requiring it to respect some initial conditions disregarded under the H∞ formulation. For our robust decision theories, the Lucas critique holds for experiments that impose a change in the reference model, i.e., a change in (Ao , B, C) (the environment) or in H (preferences). The Lucas critique rests on the observation that the optimal choice of F depends on (Ao , B, C, H), and remains true here. But under our robust decision theories, the Lucas critique is suspended for experiments that fix the reference model (Ao , B, C, H) and impose alternative admissible alterations of the intertemporal patterns of the {wt } process.
Appendix A To verify that we have found the infimum of version 2 of (16)-(17) , let ω ∗ be the frequency associated with the maximum value of ρ and let v(ω ∗ ) denote the corresponding eigenvector. This eigenvector can be complex. We can find a W ∗ (ζ) with all √ real coeffi∗ cients, with an initial coefficient zero, and that coincides with v(ω ) for ζ = β exp(iω ∗ ). We accomplish this while setting all values of wt to zero except possibly those for w1 and √ w2 . In particular, that the coefficients of W ∗ (ζ) be real requires symmetry, i.e., √ W ∗ ( β exp(iω))′ = W ∗ ( β exp(−iω))⊤ , where ⊤ denotes transposition. This leads to ′ two equations of the form W ∗ (ζ ∗ ) = w1 ζ ∗ √ + w2 ζ ∗2 , W ∗ (ζ ∗ ) = w1 ζ ∗′ + w2 ζ ∗′2 , where here ′ denotes the complex conjugate, and ζ ∗ = β exp(iω). These two equations determine real valued vectors w1 , w2 . To form the infimizing W (ζ), we shall construct an approximating sequence of ‘distributed lags’ of W ∗ (ζ) that converge to it. To get distributed lags of the desired form, create a sequence of continuous positive scalar functions {gn } such that: (i) gn (ω) = gn (−ω); Rπ 1 (ii) 2π g (ω)dω = 1; −π n (iii) {gn (ω ∗ )} diverges;
(iv) {gn } converges uniformly to zero outside any open interval containing ω ∗ ; Rπ (v) −π log gn (ω)dω > 0.
Then associated with each gn is a real√scalar (one-sided) sequence with transform bn (ζ) such that bn (ζ)∗ bn (ζ) = gn (ω) for ζ = β exp(iω). Construct Wn (ζ) ∝ bn (ζ)W ∗ (ζ), where the constant of proportionality makes the resulting Wn satisfy constraint (17). We have designed the sequence {Wn } to approximate the direction v(ω ∗ ). The sequence of transforms {gn } converges to a generalized function, namely a Dirac–delta function with mass concentrated at frequency ω ∗ . It is 40
straightforward to show that: lim
n→∞
Z
′
Γ
′
2
Wn (ζ) G (ζ) G (ζ) Wn (ζ) dλ (ζ) = η 2 (H∞ ) .
Appendix B: A Dual Prediction Problem
A prediction√problem is dual to maximizing (31) subject to (32)–(33). Let [θI − G(ζ) G(ζ)] for ζ = β exp(iω) denote a spectral density matrix for a covariance stationary process {yt }. The purpose is to predict (w0 )′ yt linearly from past values of yt . A candidate forecast rule of the form ∞ X ′ (78) − (wj ) yt−j [f orecast] ′
j=1
has forecast error ∞ X
′
(wj ) yt−j .
j=0
Then criterion (31) is interpretable as the forecast-error variance associated with this prediction problem. The constraints (33) prevent the forecast from depending on yt+j for j ≥ 1.
41
Appendix C: Duality
Evaluating a Given Control Law For a given control law F form the corresponding G and define: 2 θ F = H∞ (F ) .
It follows that for all W (ζ) θF Therefore, for all θ ≥ θF , θ < θF .
R
Γ
Z
′
Γ
W W dλ ≥
′
Z
W ′ G′ GW dλ.
Γ
′
W [θI − G G] d λ is well defined for all θ ≥ θF but not for
For fixed F , consider the inf part of Game 2 (16): Original (Worst Case) Minimization Problem Z Problem 1 min − W ′ G′ GW dλ W
subject to
Z
Γ
Γ
W ′ W dλ ≤ w0′ w0 + η 2 .
This problem minimizes a concave function subject to a convex constraint set, so standard duality theory does not apply. In the interests of applying duality theory, we study the following alternative problem:
A Related Constrained Problem: Problem 2 subject to:
min W
Z
Γ
Z
Γ
W ′ (θF I − G′ G) W dλ
W ′ W dλ ≤ η 2 + w0 ′ w0 .
This problem is to minimize a convex function subject to a convex constraint set, so duality theory applies to it. We shall first show that a solution of Problem 2 with binding constraint also solves Problem 1. Then we shall apply standard duality theory to problem 2. 42
Proposition 9. A solution to problem 2 with binding constraint solves problem 1. Proof. Let W ∗ solve Problem 2 with the magnitude constraint binding: Z
W ∗′ W ∗ dλ = η 2 + w0 ′ w0
Γ
and W ∗ (0) = w0 . Consider any other W such that Z
Γ
W ′ W dλ ≤ η 2 + w0 ′ w0 .
and W (0) = w0 . Then
Z
′
Γ
′
W (θF I − G G) W dλ ≥
and
Z
θF
Γ
Therefore −
Z
′
Γ
Z
Γ
′
W W dλ ≤ θF ′
W G GW dλ ≥ −
W ∗ ′ (θF I − G′ G) W ∗ dλ, Z
Z
W ∗ ′ W ∗ dλ. Γ
W ∗ ′ G′ GW ∗ dλ,
Γ
which implies that W ∗ also solves Problem 1. Thus a way to solve Problem 1 is to solve Problem 2 and verify the solution satisfies the magnitude constraint with equality. We now apply duality theory to problem 2 by forming:
Saddle Point Version of Problem 2: inf sup
W θ≥θF
Z
Γ
′
′
2
W (θI − G G) W dλ − (θ − θF ) η +
w0′ w0
.
We interpret θ − θF as the Lagrange multiplier for Problem 2 and θ as the Lagrange multiplier for Problem 1. Because Problem 2 entails minimizing a convex function subject 43
to a convex constraint set, standard duality theory applies to it. The conjugate problem is obtained by switching the order of the inf and sup operations: Z (79) ′ ′ 2 ′ W (θI − G G) W dλ − (θ − θF ) η + w0 w0 . sup inf [philly2] θ≥θF W
Γ
We can use this problem to construct the Lagrange multiplier θ for each η > 0. By construction the saddle-point value for the conjugate problem coincides with the optimized value for Problem 2. When the specification-error constraint is binding for Problem 2, we can obtain the optimized value for Problem 1 by subtracting the constant θF (η 2 + w0′ w0 ) from (79). The resulting conjugate problem is sup inf
θ≥θF W
Z
′
Γ
′
2
W (θI − G G) W dλ − θ η +
w0′ w0
.
(80) [conj1]
Thus we have eliminated the influence of θF on the objective of the saddle-point problem. But θF still affects the constraint set limiting the choice of θ (through the appearance of θF under the sup operator). This dependence can also be removed by virtue of the following proposition. Proposition 10. If the value of (80) is finite, then θ ≥ θF . Proof. Suppose that θ < θF , and consider the inner infimum part of the saddle-point problem (80): Z (81) W ′ (θI − G′ G) W dλ. inf [conj2] W
Γ
√ ′ ∗ β. Given the construction of θF , (θI − G G) has negative eigenvalues for some |ζ | = √ ∗ Parameterize Γ by forming ζ = β exp(iω), and let ω be the frequency associated with ζ ∗ . Thus there exists a complex vector v such that v ′ (θI − G′ G) v < 0 on a nondegenerate interval of ω’s containing ω ∗ . Imitating the argument in Appendix A, we can form a W ∗ (ζ) = w1 ζ + w2 ζ such that W ∗ (ζ ∗ ) = v. We can then use the Appendix A construction to form: Wn (ζ) ∼ bn (ζ)W ∗ (ζ). Then it is straightforward to show that: Z ′ lim Wn′ (θI − G′ G) Wn dλ = v ′ θI − G (ζ ∗ ) G (ζ ∗ ) v < 0. n→∞
Γ
By construction Wn (0) = 0 and hence fails to satisfy the constraint for problem (81). Also problem (81) does not constrain the magnitude of W . We now form the sequence: ˜ n = nWn + w0 , W 44
˜ n (0) = w0 . Given our multiplication of Wn by n, it which by construction satisfies W clearly follows that Z lim Wn′ (θI − G′ G) Wn dλ = −∞. n→∞
Γ
Therefore, the optimized value of problem (81) is −∞ whenever θ < θF . Given what the proposition establishes about the behavior of the inner infimum part of saddle-point problem (80) when θ < θF , we can state that (80) equals (81) defined as:
Conjugate Saddle Point Version of Problem 1 sup inf θ
W
Z
′
Γ
′
2
W (θI − G G) W dλ − θ η +
w0′ w0
.
(82) [conj2]
Whenever this problem has a solution for W that satisfies the specification-error constraint with equality, the resulting W also solves Problem 1 and the value of the conjugate saddle-point problem coincides with that of Problem 1. This conjugate problem provides the Lagrange multiplier θ ≥ θF associated with Problem 1. Armed with this multiplier, consider the inner infimum problem, which we call the multiplier problem:
(Problem 3)
inf W
Z
Γ
′
′
W (θI − G G) W dλ .
The solution of Problem 3 coincides with that of the prediction problem described in Appendix B and analyzed in the text. Given any η, we have just shown how to find the multiplier θ. We now suppose that the multiplier θ ≥ θF is given and want to deduce the corresponding value of η. Thus, suppose that we have a solution of the multiplier problem (Problem 3). It is sufficient for this problem to have a solution with θ > θF . (Later we shall discuss the case in which θ = θF .) We assume that: Z
log det (θF I − G′ G) dλ > −∞.
Later we will describe what happens when this condition is violated. 45
(83) [linpred1]
Proposition 11. Suppose that θ > θF and that W (ζ) solves the multiplier Problem 3. Then there exists η > 0 such that W (ζ) solves Problem 1. Proof. From the dual prediction problem of Appendix B, we know that when θ > θF , the solution to the multiplier problem is: −1
W (ζ) = D (ζ)
D (0) w0
(84) [sol1]
where D′ D = (θI − G′ G)
√ and D is continuous and nonsingular on the region |ζ| ≤ β. Notice that D depends implicitly on θ The resulting objective function is: w0′ D(0)′ D(0)w0 . The η corresponding to this choice of θ satisfies: Z −1 ′ (85) 2 η = w0′ D (0) (θI − G′ G) D (0) w0 dλ − w0′ w0 . [sec1]
When θ = θF Next consider the possibilities when θ is equal to the lower threshold value θF . Condition (83) implies that we can still obtain the factorization: D′ D = θF I − G′ G,
√ where√D is nonsingular on the region |ζ| < β, but now it is singular at some points |ζ| = β. Thus the candidate solution for W given by (84) may not be well defined, and the infimum in the multiplier Problem 3 may not be attained. Nevertheless, the infimum is still given by the quadratic form: w0′ D(0)′ D(0)w0 and the implied ηF satisfies (85), and will typically be infinite. When ηF = ∞, we can find a θ > θF that yields any positive η. Sometimes ηF is finite for a small (Lebesgue measure zero) set of initializations w0 . When this happens, we may only find θ ≥ θF for values of η ≤ ηF . Failure of (83) Finally, we consider what happens when Z log det (θF I − G′ G) dλ = −∞. 46
Since G is a rational√function of ζ with no poles in the region |ζ| ≤ singular for all |ζ| = β. Factorizations still exist now take the form:
√
β, θF I − G′ G is
D′ D = θF I − G′ G √ where D has fewer rows than columns and has full rank on the region |ζ| < β (see Rozanov (1967) pages 43–50). This makes it possible to have a variety of solutions to Problem 2, including solutions for which the specification-error is slack. ˜ such To understand better the multiplicity, note that it is now possible to find a W that: ˜ =0 DW
(86) [linpred5]
˜ (0) = 0. Given any solution W ∗ to Problem 2, we may form W ∗ + r W ˜ for and for which W any real number r without altering the objective of Problem 2. The value of r is restrained by the specification-error constraint, but it possible for this range to be nondegenerate. When the specification-error constraint for Problem 2 can be slack at an optimum, the Lagrange multiplier, θ − θF , is zero, or equivalently θ = θF . Problem 2 will then have solutions in which the specification-error constraint is binding (but with a zero multiplier), and it is only these solutions that also solve Problem 1. As a consequence, solving the multiplier problem (Problem 3) for choices of θ greater than θF may not correspond to fixing an η for Problem 1. We illustrate this possibility in the following example.
Exceptional Example ˜ satisfying (86) and W ˜ > 0 ∀ζ ∈ Γ. Suppose that In this example, we construct a W A − BF = 0 and hence G = HC, which is constant across frequencies. Then θF is the largest eigenvalue of the symmetric matrix C ′ H ′ HC, and det[θF I − G′ G] = 0 for all ζ ∈ Γ. Let µ be an eigenvector associated with θF with norm one. Solutions W ∗ to Problem 2 are given by: w0∗ = w0 wt∗ = αt µ for t > 0 and the real numbers αt chosen so that the magnitude constraint is satisfied. The resulting objective for Problem 2 is: w0 ′ (θF I − C ′ H ′ HC) w0 . Provided that η 2 > 0, the magnitude constraint can be made slack (say by letting αt be zero). 47
A solution to Problem 1 is obtained by setting αt to make the magnitude constraint be satisfied with equality. Then the objective for Problem 1 is: −θF η 2 − w0 ′ C ′ H ′ HCw0 . Finally, the Lagrange multiplier obtained from the conjugate problem is given by its lower threshold θF . Optimizing the Control Law We next study what happens when the control law is chosen among a family of admissible laws. The choice of F alters the transfer function G, and we are led to study the game: Z max inf − F
W
W ′ G′ GW dλ
Γ
subject to
Z
Γ
W ′ W d λ ≤ η 2 + w0′ w0 .
Again it is fruitful to analyze a conjugate formulation. With this in mind, first solve: Z ′ ′ 2 ′ W (θI − G G) W dλ − θ η + w0 w0 C (θ, F ) = inf W
Γ
for a given (θ, F ) pair. Then solve the conjugate game:
max C (θ, F ) = max sup C (θ, F ) = max sup C (θ, F ) . F
F,θ
θ
θ
F
Therefore given a solution F ∗ to the original game we can find a corresponding θ ∗ such that (F ∗ , θ ∗ ) solves the conjugate game. Moreover, if F ∗ is optimal for all nonzero initializations w0 , then F ∗ solves the entropy criterion associated with this θ ∗ . We want to show the converse. Proposition 12. Fix a θ ∗ . Find the F ∗ that solves the entropy problem for θ ∗ . Compute θˆ = H∞ (F ∗ )2 and verify that the control law F ∗ satisfies: Z (87) ˆ − G∗′ G∗ dλ > −∞ log det θI [philly5] Γ
where G∗ is the transfer function associated with the control law F ∗ . Then there exists W ∗ and an η ∗ > 0 such that F ∗ , W ∗ solves Game 2. Proof. If inequality (87) is satisfied, factor θ ∗ I − G∗′ G∗ θ ∗ (I − G∗′ G∗ ) = D∗′ D∗ , 48
and construct the W ∗ :
−1
W ∗ (ζ) = D∗ (ζ)
D∗ (0) w0 .
Finally, find η ∗ that solves η
∗2
=
Z
Γ
W ∗′ W ∗ dλ − w0 ′ w0 .
Appendix D: Risk Sensitivity and a Representation Result
In this appendix we use a recursive formulation and solution of the spectral factorization problem (34) to prove Proposition 5. Proof. To compute D in the spectral factorization θ ∗ − G′ G = D′ D, we apply the factorization result given by Zhou, Doyle and Glover (1996). Recall that G = H(I − ζA)−1 C. The spectral density matrix to be factored is: i−1 h h i−1 p p H ′ H I − β exp (iω) A θ ∗ I − G′ G = θ ∗ I − C ′ I − β exp (−iω) A′ C i−1 h h p i−1 p C, H ′ H exp (−iω) I − βA = θ ∗ I − C ′ exp (iω) I − βA′
√ where we have used the parameterization: ζ = β exp(iω). From Theorem 21.26 of Zhou, Doyle and Glover (1996, pages 557 and 558), we obtain the factorization:
i−1 h h p i−1 p H ′ H exp (−iω) I − βA θ ∗ I−C ′ exp (iω) I − βA′ C h h p i−1 p ′ i−1 p p ′ ′ βK R I − βK exp (−iω) I − βA = I − C exp (iω) I − βA C −1 −1 = I − ζ ′ C ′ [I − ζ ′ A′ ] K ′ R I − ζK [I − ζA] C (88) [glovedoyle−1]
where R = θ ∗ I − C ′ ΣC,
(89) [glovedoyle0]
K = R−1 C ′ ΣA,
(90) [glovedoyle1]
and Σ ≥ 0 is the stabilizing solution of the Ricatti equation −1 1 ′ ′ βA Σ I − ∗ CC Σ A − Σ + H ′ H = 0. θ 49
(91) [glovedoyle2]
We establish that formula (91) is equivalent with (44) by showing that −1 1 −1 ′ = I + C (θ ∗ I − C ′ ΣC) C ′ Σ I − ∗ CC Σ θ We verify this result by multiplying the matrix I − θ1∗ CC ′ Σ by the matrix I + C(θ ∗ I − C ′ ΣC)−1 C ′ Σ: h i 1 −1 ′ I − ∗ CC Σ I + C (θI − C ′ ΣC) C ′ Σ θ 1 1 ′ −1 ′ = I − ∗ CC Σ + C I − ∗ C ΣC (θI − C ′ ΣC) C ′ Σ θ θ 1 1 = I − ∗ CC ′ Σ + ∗ CC ′ Σ θ θ = I. √ √ For the stabilizing solution, K from (90) is such that I − ζ βK[I − βζA]−1 C has zeroes outside the unit circle of the complex plane (Zhou, Doyle and Glover (1996)). As a consequence, I − ζK[I − ζA]C has zeroes outside of the circle Γ. Therefore, (88) and (34) imply −1 (92) D∗ (ζ) = R1/2 I − ζK [I − ζA] C [Df ormula1] has zeroes outside Γ, and Furthermore,
θ ∗ I − G′ G = D′ D. ′
D∗ (0) D∗ (0) = R = θ ∗ I − C ′ ΣC.
The entropy criterion (28) can thus be represented as log det(θ ∗ I − C ′ ΣC). From formula (37), the solution for W (ζ) can be represented as D∗ (ζ) W (ζ) = D∗ (0) w0 . Using (92) gives ζ −1 [W (ζ) − w0 ] = K (I − ζA)
−1
CW (ζ)
and using X(ζ) = (I − ζA)−1 CW (ζ) gives the recursive formula wt+1 = Kxt .
A key step in the following section will be to recognize that the right side of (44) defines an operator S(Σ) closely associated with the discounted risk-sensitivity criterion. In the next sections, we shall describe a pair of recursive algorithm for minimizing (41) with respect to F that as a byproducts solve the min-max problem associated with (16)–(17). Both of these algorithms solve some Ricatti equations associated with (41). 50
Risk-sensitivity and Entropy We now leave the frequency domain and take up the discounted risk-sensitivity criterion, which is defined via a recursion using a distorted expectation operator Rt defined as Rt V (xt+1 ) =
1 log Et exp [αV (xt+1 )] , α
where α ≤ 0, and where more negative values of α < 0 indicate more risk-sensitivity. For a one-period return function U (xt ) and fixed Markov process xt , the discounted risk-sensitive criterion is defined recursively according to V α (xt ) = U (xt ) + βRt V α (xt+1 ) ,
(93) [risksens2]
which implements expected utility when α = 0. Following Jacobson (1973), Whittle(1990), and Hansen and Sargent (1995), consider the special case of (93) where U is quadratic, the transition law is linear, and the shocks are Gaussian. In particular, we assume that U (xt ) = x′t R∗ xt , and that given F , the transition law of x is given by the first line of (7) with Gaussian martingale difference wt . Hansen and Sargent (1995) show that the limit of recursions on (93) is V α (xt ) = x′t V xt −
α−1 β log det (I − αC ′ V C) 1−β
(94) [risksens3]
where V is the limit of the recursion (95) [recursv]
Vj+1 = R∗ + βA′ D (Vj ) A,
where Vj is a square matrix determining the quadratic form for the part of the value function depending on the initial state and D is the operator D (V ) = V + αV C (I − αC ′ V C)
−1
C ′ V.
(96) [DV ]
The operator D comes from computing Rt under the assumption that wt is Gaussian. Reinterpreting the recursion in (95) in terms of another operator S(V ) yields Proposition 13. The discounted risk sensitive criterion can be represented as x′0 V x0 − α−1
β log det (I − αC ′ V C) . 1−β
(97) [risksens1]
where V = S(V ) and where the operator S is defined by S (V ) = R + F ′ QF + βA′ D (V ) A. 51
(98) [SV ]
Proof. Sketched above, with remaining details in Hansen and Sargent (1995). Here x0 V x0 is the part of the value function corresponding to the initial condition, and the log det term corresponds to the ‘stationary steady state’. The log det(I − αC ′ V C) term is the log determinant of a one-step ahead prediction error covariance matrix and can be expressed as the right hand side of (41) for some stationary process with the proper associated spectral density. We claim that the appropriate spectral density is identical with that used in defining (41). Proposition 14. Setting θ = −α−1 makes the term log det D(0)′ D(0) = log det(θI − C ′ ΣC) in (45) equal n log(−α−1 ) + log det(I + αC ′ ΣC), which equals the log det term in (97). Proof. Let V = −Σ, H ′ H = −(R + F ′ QF ). Then iterations on S(V ) = S(−Σ) are identical to (44). Setting θ= −α−1 , from (45) the term log det D(0)′ D(0) can be written log det −α−1 (I + αC ′ ΣC) or n log(−α−1 ) + log det(I + αC ′ ΣC).
Iterating to convergence on S gives the value of forever adhering to fixed F , embedded in A = Ao − BF .21 For a given F , let VF satisfy VF = S(VF ). Then the optimal risk-sensitive rule F solves (99) max log det (I − αC ′ VF C) . [risksens5] F
A policy improvement algorithm The following recursive algorithm can be used to find F that maximizes (41). It is like a Howard improvement algorithm. Step 1. Fix an arbitrary F such that A is
√
β-stable, and iterate to convergence on S(V ).
Step 2. Given the V from step 1, solve the two-period problem: ′ max x′t Rxt + u′t Qut + β (Axt + But ) V (Axt + But ) . ut
(100) [howard2]
Express the answer in the feedback form ut = −F xt .
Step 3. Return to step 1 and iterate to convergence. Step 1 is the ‘evaluation’ part of a Howard policy improvement algorithm (‘evaluate the consequences of using this fixed policy forever’). Step 2 is the two-period ‘optimization 21 For a fixed F , the log det part of optimized value of the ‘malevolent agent’s’ criterion function for the Hentropy (−α−1 ) problem, has the same form as (97), with V = S(V ).
52
with arbitrary terminal value function’ part of the algorithm. The algorithm differs from the application of the Howard algorithm to the usual LQ problem in how S(V ) deviates from the ordinary evaluation of the future through the appearance of D(V ) rather than just V on the right side of (98). We summarize the implications of this section in Proposition 15. If α = −θ −1 , then the risk-sensitivity criterion (93) puts the same preference ordering over decision rules F as the Hentropy (θ) criterion (41). Proposition 15 is the counterpart for discounted problems of a result stated for undiscounted problems by Glover and Doyle (1988).
A recursive algorithm from risk-sensitivity
Hansen and Sargent (1995) construct the optimal risk sensitive decision rule not via (99) but recursively. The construction is sequential and so incorporates backward induction by both the decision maker and a ‘great deceiver’. Thus consider the standard operator associated with the ordinary linear quadratic Bellman equation, namely, −1
T (W ) = R + βA′o W Ao − β 2 A′o W B (Q + βB ′ W B)
B ′ W Ao ;
(101) [ricatti1]
T (W ) takes one-step on the Bellman equation (or the usual Riccati equation from the LQG setting). Hansen and Sargent (1996) characterize the value function associated with the optimal rule F under risk-sensitivity in terms of recursions on the composite operator T ◦ D. Iterating to convergence on the composite operator T ◦ D builds in backward induction. The T part represents the decision maker’s one-step optimization, the D part represents one-step of the ‘deceiver’s’ choice of a ‘worst shock’.22 Proposition 4 can be extended to allow for isolated singularities. Thus, in Appendix E we show that the entropy formula (45) of Proposition 5 continues to√hold provided that √ ∗ ′ θ I − G G is positive semidefinite and nonsingular at either β or − β. 22
′ ′ ′ nHansen and Sargent (1995) show that o x [R + F QF + βA D(V )A]x = β ′ minw,y − α w w + x′ (R + F ′ QF )x + βy ′ V y where the minimization is subject to y = Ax + Cw and the shock w is treated as a control. The minimizing w is
? −1 ′ w = α I − αC ′ V C C V x.
53
(102) [wf ormula]
Appendix E
This appendix restates a version of Proposition 5 under weaker assumptions about the nonsingularity of [θI − G(ζ)′ G(ζ)]. Proposition 16. Suppose that i A has eigenvalues that are inside the circle Γ; ii θI − G′ G ≥ 0 on Γ; √ √ √ √ iii Either θI − G(− β)′ G(− β) or θI − G( β)′ G( β) is nonsingular. Then the Hentropy (θ) criterion can be represented as ′
log det D (0) D (0) = log det (θI − C ′ ΣC) where Σ is defined implicitly by equation (105) below. Proof. We prove this proposition by referring to results from Zhou, Doyle and Glover (1996). We outline the proof in four steps.
Step One: Transform the discrete discounted √ ′ √formulation into continuous undiscounted formulation. Suppose that θI − G(− β) G(− β) is nonsingular. Define the linear fractional transformation: √ p s+ β (103) √ ζ=− β . [lf t1] s− β √ √ √ This transformation maps s = − β into ζ = 0, s = 0 into β, s = ∞ into − β. The transformation maps the imaginary axis into the circle Γ and points on the left side of the complex plane into points inside the circle. Note also that βζ
−1
p =− β
√ −s + β √ . −s − β
√ √ In the case that θI − G( β)′ G( β) is singular, we replace linear fractional transformation (103) with: √ p s+ β (104) √ . ζ= β [lf t2] s− β In what follows we will use (103) but the argument for (104) is entirely similar. 54
Step Two: Use parameterization (103) to write: p p p i−1 p h βA C G (ζ) = s − β H s − β I + s + β p p p h p i = s − β H s I + βA − β I − βA C −1 p = s − β H sI − Aˆ Cˆ ˆ (s) =G
where
p p p −1 I − βA β I + βA . p −1 Cˆ = I + βA C Aˆ =
ˆ as Rewrite G
−1 −1 p ˆ (s) = sH sI − Aˆ Cˆ Cˆ − βH sI − Aˆ G −1 −1 −1 p = H sI − Aˆ sI − Aˆ Cˆ + H Aˆ sI − Aˆ Cˆ Cˆ − βH sI − Aˆ −1 ˆ sI − Aˆ ˆ = H Cˆ − H C,
where
ˆ =H H
Notice that
p βI − Aˆ .
p p −1 ˆ (∞) = G − β . C=G H Cˆ = H I + βA
Step Three: Write for s imaginary ˆ′G ˆ= θI − G Notice that
Cˆ ′ −sI − Aˆ′
−1
I
ˆ ′H ˆ −H ′ ′ ˆ ˆ CHH
ˆ ′ H Cˆ H θI − Cˆ ′ H ′ H Cˆ
−1 ! ˆ sI − A Cˆ . I
p ′ p θI − Cˆ ′ H ′ H Cˆ = θI − G − β G − β
is nonsingular and in fact positive definite.
Step Four: Apply Corollary 13.20 of Zhou, Glover and Doyle (1996) to conclude that there exists a matrix F such that: −1 −1 ′ ′ ′ ˆ ′ ′ ′ ˆ ˆ ˆ ˆ ˆ ˆ θI − C H H C I − F sI − A Cˆ . θI − G G = I − C −sI − A F 55
Now inverse transform from s to ζ. The following are useful formulas for carrying out this transformation. p p −1 p I − βA , Aˆ = β I + βA We invert this relation to find that: p p I + βA Aˆ = βI − βA or
p p p β Aˆ + βI A = − Aˆ − βI
or
Similarly,
−1 p 1 p ˆ ˆ √ βI − A . A= βI + A β
or
s−
or
p p p β ζ =− β s+ β
ζ+
p p p β s= β ζ− β
s= Write:
p
β
√ ζ− β √ ζ+ β
−1 p hp p p i−1 I − F sI − Aˆ Cˆ = I − ζ + β F β ζ − β I − ζ + β Aˆ Cˆ p p i−1 p h p βI − Aˆ − β βI + Aˆ Cˆ =I− ζ+ β F ζ p −1 p −1 1 = I + ζ + β F (I − ζA) √ βI + Aˆ Cˆ β √ ζ+ β −1 √ F (I − ζA) C =I+ 2 β ˜ (ζ) =G Note that
√ p ζ+ β ζ 1 −1 −1 √ I+ F (I − ζA) C = I + F C + √ F I + βA (I − ζA) C. 2 2 β 2 β
Define Σ implicitly by: p ′ −1 ′ p −1 1 ′ ′ 1 ′ ′ θI − C ΣC = I + C F C I + FC . θI − C I + βA H H I + βA 2 2 (105) [sigmadef ]
56
References Anderson, Evan and L.P.Hansen. “Robustness and Risk-Sensitivity in General Equilibrium” Hansen’s papers (1997), forthcoming. Anderson, Brian D. O. and John B. Moore. Optimal Filtering. Englewood Cliffs, N.J.: Prentice Hall, 1979. Ba¸sar, Tamar and Pierre Bernhard. H ∞ -Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Boston-Basel-Berlin: Birkh¨auser, 1995. Dupuis, P. and R. S. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. New York: John Wiley and Sons, 1997. Glover, K. and Doyle, J. C.. “State-space formulae for all stabilizing controllers that satisfy an H∞ -norm bound and relations to risk-sensitivity” System & Control Letters 11 (1 1988): 167-72. Hansen, L.P. and T.J. Sargent. “Discounted Linear Exponential Quadratic Gaussian Control” IEEE Transactions on Automatic Control 40 (May 1995): 968-971. Hansen, L.P., T.J. Sargent and T.D. Tallarini, Jr. “Robust Permanent Income and Pricing.” University of Chicago: Working Paper 1, 1997. Kwakernaak, Huibert and Raphael Sivan. Linear Optimal Control Systems. New York: Wiley, 1972. Muth, John F. “Optimal Properties of Exponentially Weighted Forecasts” Journal of the American Statistical Association 55 (290 1960): 000-000. Mustafa, Denis and Keith Glover. Minimum Entropy H ∞ Control. Berlin: Springer-Verlag, 1990. Rozanov, Yu. A.. Stationary Random Processes. San Francisco: Holden-Day, 1967. Salmon, Mark. “On the Theoretical Irrelevance of the Lucas Critique – Twenty Years On.” European University Institute: Working Paper 1, 1996. Tornell, Aaron. “Excess volatility of asset prices with H∞ forecasts” Manuscript. Cambridge, Massachusetts: Harvard University, 1998. Whiteman, Charles. “Analytical Policy Design Under Rational Expectations” Econometrica 54 (6 1985a): 1387–1405. Whiteman, Charles. “Spectral Utility, Wiener-Hopf Techniques, and Rational Expectations” Journal of Economic Dynamics and Control 9 (2 1985b): 225-240. 57
Whittle, Peter. Prediction and Regulation by Linear Least-Square Methods, Second edition, revised. Minneapolis, Mn.: University of Minnesota Press, 1983. Whittle, Peter. Risk-Sensitve Optimal Control. New York: Wiley, 1990. Whittle, Peter. Optimal Control: Basics and Beyond. New York: Wiley, 1996. Zhou, Kemin with John C. Doyle and Keith Glover. Robust and Optimal Control. London: Prentice Hall, 1996.
58