Model Predictive Control for Portfolio Selection - IEEE Xplore

3 downloads 148 Views 235KB Size Report
Abstract— In this paper, we explain the application of Model. Predictive Control (MPC) to problems of dynamic portfolio optimization. At first we prove that MPC is ...
Proceedings of the 2006 American Control Conference Minneapolis, Minnesota, USA, June 14-16, 2006

WeB16.2

Model Predictive Control for Portfolio Selection Florian Herzog†‡, Simon Keel†‡, Gabriel Dondi†‡, Lorenz M. Schumann‡, and Hans P. Geering†

Abstract— In this paper, we explain the application of Model Predictive Control (MPC) to problems of dynamic portfolio optimization. At first we prove that MPC is a suboptimal control strategy for stochastic systems which uses the new information advantageously and thus, is better than pure optimal openloop control. For a linear Gaussian factor model, we derive the wealth dynamics and the conditional mean and variance. We state the portfolio optimization, where an investor maximizes the mean-variance objective while keeping the portfolio Valueat-Risk under a given limit. The portfolio optimization is applied in a case study to US asset market data.

I. INTRODUCTION Model Predictive Control (MPC) is a technique developed for solving constrained optimal control problems for deterministic applications. Instead of directly solving the optimal control problem through the Hamiliton-Jacobi-Bellman (HJB) equation or dynamic programming techniques (DP), MPC solves the problem in a receding horizon manner where a series of open-loop problems are consecutively solved. The constrained optimal control problem is solved, where the current control decision is obtained by calculating, at each sampling time, a finite horizon open-loop optimal control problem, using the actual state of the system as initial state. The optimization procedure yields an optimal control sequence and only the first control decision of the sequence is applied to the system. This procedure is repeated at each decision (sampling) time with a receding horizon. In deterministic problems, the MPC methods yields a closedloop optimal control decision since the closed-loop decision can be computed by solving the open-loop problem for every value of the state variables and every time step. An important advantage of this solution technique is its capability to deal with constraints on the control and state variables. For a detailed review of constrained MPC for deterministic control applications, refer to [1] or [2] and the references therein. B. Dynamic Portfolio Optimization and MPC We show that MPC for stochastic systems is a suboptimal control strategy. However, MPC uses new information advantageously and therefore, is better than pure open-loop control. Since many portfolio selection problems for real world application have constraints, often the HJB equation cannot be solved and we have to use numerical procedures or approximations. MPC is a technique that is well suited for †Swiss Federal Institute of Technology (ETH) Zurich, Measurement and Control Laboratory, ETH Zentrum, CH-8092 Zurich, Switzerland ‡swissQuant Group AG, c/o Measurement and Control Laboratory, ETH Zentrum, CH-8092 Zurich, Switzerland

1-4244-0210-7/06/$20.00 ©2006 IEEE

II. D ISCRETE - TIME PORTFOLIO OPTIMIZATION AND DISCRETE - TIME OPTIMAL CONTROL We discuss the basic modelling framework, the discretetime wealth dynamics, and the conditions of optimality, i.e., the dynamic programming (DP) method. A. Asset price models

A. Concept of Model predictive control

[email protected]

control problems with constraints and thus, we investigate the properties of MPC for stochastic systems and describe how to solve portfolio optimization problems with MPC.

The returns of assets (or asset classes), in which we are able to invest, are described by r(t+1) = μ (t, x(t)) + ε r (t) ,

(1)

where r(t) = (r1 (t), r2 (t), ..., rn (t))T ∈ Rn is the vector of asset returns, ε r (t) ∈ Rn is a white noise process with E[ε r (t)] = 0 and E(t)[ε r (t)ε rT (t)] = Σ(t) ∈ Rn×n is the conditional covariance matrix of r(t), μ (t, x(t)) ∈ Rn is the expected return of r(t), and x(t) ∈ Rm is the vector of factors. We assume that the conditional expectation and the conditional covariance can be time-varying and stochastic. The white noise process ε r (t) is assumed to be strictly covariance stable, i.e., E[ε r (t)ε rT (t)] < ∞. The prices of the risky assets evolve according to   (2) Pi (t+1) = Pi (t) 1 + ri (t) , Pi (0) = pi0 > 0 , where P(t) = (P1 (t), P2 (t), . . . , Pn (t)) denotes the prices of the risky assets. A locally risk-free bank account with interest rate r0 (t, x(t)) is given as   (3) P0 (t+1) = P0 (t) 1 + r0 (t) , P0 (0) = p00 > 0 , where P0 (t) denotes the value of the bank account. The factor process affecting the expected return of the risky assets and the interest rate of the bank account is described by x(t+1) = Θ(t, x(t)) + Ψ(t, x(t))ε x (t) ,

(4)

where Θ(t, x(t)) ∈ Rm , Ψ(t, x(t)) ∈ Rm×k , and ε x (t) ∈ Rk is a strictly covariance stable white noise process (E[ε x (t)ε xT (t)] < ∞) with unity covariance. B. Wealth dynamics in discrete-time We introduce the wealth dynamics without transaction costs. The investment into the bank account, denoted by u0 (t), and the investments into the risky assets, denoted by u(t), must sum up to 1, i.e., u0 (t) + uT (t)1 = 1, where 1 = (1, 1, . . . , 1)T , u(t) = (u1 (t), . . . , un (t))T , and u jt denotes

1252

the fraction of portfolio value (wealth) invested into the j-th risky investment. The portfolio return at time t is given by

In general, a dynamic optimization problem can be stated as J(t, y(t))=max u∈U

where u0 (t) = 1 − 1T u(t). The wealth dynamics are given by W (t+1) = [1 + R(t)]W (t) + q(t) (6)   T = 1 + r0 (t) + u(t) (r(t) − 1r0 (t)) W (t)+q(t), where W (t) ∈ R denotes the wealth (portfolio value) at time t and q(t) denotes the in- or outflow from the portfolio from non-capital gains. Using (1) and (4) we obtain the discretetime wealth dynamics as  W (t+ 1) = W (t) 1 + r0 (t, x(t)) + uT (t)(μ (t, x(t)) −  1r0 (t, x(t))) + q(t) +W (t)uT (t)ε r (t) x(t+1) = Θ(t, x(t)) + Ψ(t, x(t)ε x (t) .

where w(t) = lnW (t). The difference w(t + 1) − w(t) is interpreted as return of the portfolio, similar to the return definition based on the logarithmic difference of prices.

u(t),q(t)≤0

t=0

s.t.

(9)  T W (t+ 1) = W (t) 1 + r0 (t, x(t)) + u (t)(μ (t, x(t)) −  1r0 (t, x(t))) + q(t) +W (t)uT (t)ε r (t) x(t+1) = Θ(t, x(t)) + Ψ(t, x(t)ε x (t) . where U1 and U2 are two strictly concave and monotone utility functions as defined in [3]. The states of the system are the wealth W (t) and the current value of the factors x(t). In the case that the returns depend on economic factors and volatilities, the factor dynamics as well as the covariance of the white noise process can be written to accommodate this. We state a general dynamic optimization problem (DOP) in discrete-time and show how to write the portfolio problem in this general form. Moreover, we state the dynamic programming (DP) algorithm to solve such DOP and discuss the difficulties in finding a solution.

∑ L(t, y, u) + M(T, y(T ))

τ =t

where τ = t,t +1, ..., T −1, L(·) and M(·) are the concave value functionals, D(·) and S(·) define the state dynamics and are assumed to be continuous differentiable functions, u is the control vector, and ε (t) strictly covariance stable white noise process, i.e., E[ε (t)ε T (t)] < ∞. The DP algorithm to obtain a feedback solution to the DOP (10) is given by J(T, y(T )) = M(T, y(T )) J(τ , y(τ )) = max E[L(τ , y(τ )) u∈U

+ J(τ +1, D(τ , y, u) + S(τ , y, u) ε (τ ))] .(11)

This condition for optimality can be found in [4, Chapter 1]. We also assume that the expectation given in (11) exists. By setting y(t) = (W (t), x(t))T , u = (u(t), q(t))T , ε (t) = (ε r (t), ε x (t))T , L(t, y, u) = U1 (q(t)), M(t, y(T )) = U2 (W (T )), and

 W (t) 1 + r0 (t, x(t)) D(t, y, u) = Θ(t, x(t))  T +u (t)(μ (t, x(t)) − 1r0 (t, x(t))) + q(t)  S(t, y, u) =

C. Portfolio optimization and dynamic programming We illustrate the dynamic programming (DP) technique, which is the condition of optimality for multi-period portfolio optimization problems. Mathematically, we state the problem of portfolio optimization as    T −1 max E ∑ U1 (q(t)) +U2 (W (T ))

E

s.t. y(τ +1) = D(τ , y, u)+S(τ , y, u) ε (τ ) ,(10)

(7)

Here, the dynamics of the factor and the dynamics of the wealth equation are coupled, but we only control the wealth equation. Often we work with the logarithm of the wealth (in the case that q(t)=0) and corresponding wealth dynamics are given by   w(t+1) = ln 1 + R(t) + w(t) , (8)





T −1

R(t) = u0 (t)r0 (t) + u(t)T r(t) = r0 (t) + u(t)T (r(t) − 1r0 (t)) (5)

W (t)uT (t) 0 0 Ψ(t, x(t))

 ,

we convert (9) into the standard form of a DOP as given by (10) and may solve the problem by applying (11). Ideally, we would like to use the DP algorithm to obtain a closed-form solution for J(·) or an optimal control strategy. In many cases, an analytical solution to the problem of portfolio optimization can be derived. Examples are given in [4, Chapter 4.3]. As in the case of continuous-time methods (see [5, Chapter 4,5]), we are only capable of deriving analytical solutions under restrictive assumptions, such as normal distribution of returns or no constraints on the asset allocation. III. S TOCHASTIC M ODEL P REDICTIVE C ONTROL Most of the literature for MPC is concerned with solving control problems arising in technical applications. However, in recent years, the MPC approach has been extended to stochastic optimal control problems. The first line of research is the disturbance rejection problem with constraints in technical control applications, where the disturbances are modeled as stochastic processes. Examples of this line of research are found in [6], [7], and [8]. A second line of research in stochastic MPC is the application of MPC to inherently stochastic dynamic problems. Applications include stochastic resource allocation [9], multi-echelon supply chain networks [10] and option hedging [11].

1253

TABLE I M ODEL P REDICTIVE C ONTROL ALGORITHM

1. Based on the information at time t, determine (measure) y(t). 2. Compute the open-loop optimization problem given in (14) with information y(t). We either solve the openloop optimization with the same horizon (t + T ), a socalled receding horizon policy, or we telescope towards a fixed finite date (T ). In this case, the horizon shrinks by one at every time step. 3. We apply only the first control decision decision, i.e., u(t), of the sequence u(t), u(t+1), . . . , u(T −1)) and we move one time step ahead. 4. The algorithm returns to step 1, until we have reached the final-time of our portfolio optimization.

the maximization and using tower property for conditional expectations, (E[E[ε (t)|F (t)]|F (t −1)] = E[ε (t)|F (t −1)]), to obtain  J(t, y(t)) = max E L(t, y(t), u(t)) u(t),...,u(T−1)∈U

+L(t+1, y(t+1), u(t+1)) + . . . +L(T −1, y(T −1), u(T −1))    +M(T, y(T ))F (t) s.t.

y(τ +1) = D(τ , y, u) + S(τ , y, u) ε (τ ) ,(14)

where J(t, y(t)) is the objective value of the open-loop optimization. The open-loop optimization yields the control decision as a function of the current information but not as function of future measurements. By Jensen’s inequality (E[max{ f (·)}] ≥ max{E[ f (·)]}) it follows that J(t, y(t)) ≤ J(t, y(t)) ,

The model predictive control algorithm proceeds as given in Table I. By resolving the open-loop optimization at every time step we introduce feedback into our system. The MPC method requires the solution of T = T − t + 1 optimal control problems. We compute the control decisions along the trajectory of the system and thus, we avoid the curse of dimensionality, since the control decisions are not computed for states which we do not reach. Theorem 1: In the case of stochastic systems, Model Predictive Control is a suboptimal control policy. However, MPC yields better results than pure open-loop control policies. In general, the following inequalities hold J(t, y(t)) ≥ J ∗ (t, y(t)) ≥ J(t, y(t)) ,

(12)

where J(t, y(t)) is the value function of the DP control policy, J ∗ (t, y(t)) is the value function of the MPC control policy, and J(t, y(t)) is the value function of a pure open-loop control policy. Proof: At first we prove that the open-loop control is a suboptimal control policy. In the second part we proof that MPC possesses a higher value function than open loop control. The solution of (10) is computed by the DP algorithm (11), which can be alternatively written as  E L(t, y(t), u(t)) J(t, y(t)) = max u(t)∈U  + max E L(t+1, y(t+1), u(t+1)) + . . . u(t+1)∈U  E L(T −1, y(T −1), u(T −1)) + max u(T−1)∈U    +M(T, y(T ))F (T −1)   

   . . . F (t+1) F (t) s.t.y(τ +1) = D(τ , y, u) + S(τ , y, u) ε (τ ) ,

(13)

where F (t) denotes the information (measurements) at time t. The DP algorithm yields the feedback control decisions u(t, y(t)). We interchange iteratively the expectation and

(15)

which shows that open-loop optimization is a sub-optimal control method. In deterministic applications however, both control methods have the same value of the objective function. The second part of the proof is as follows. As in any control policy, we want to be sure that the new information is used to our advantage. We assume that we have solved the open-loop optimization problem given by u(t|F (t)),  u(t + (14) which yields the control decisions  u(T −1|F (t)). The value of the objective func1|F (t)), . . . ,  tion in this case is written as    y(t)) = E L(t, y(t),  J(t, u(t))   + E L(t+1, y(t+1),  u(t+1)) + · · · + M(T, y(T ))     J(t+1, y(t+1))

y(τ +1) = D(τ , y,  u) + S(τ , y,  u) ε (τ ) .

(16)

At time t +1 for the MPC method, we solve the open-loop optimization problem again under the information F (t+1). Note that y(t +1) = y(t +1), since the first control decision are the same for the MPC policy and the open-loop policy. For the MPC algorithm, we solve at time t+1 the open-loop problem again:  max E L(t+1, J (t+1, y(t+1)) = u(t+1),...,u(T−1)∈U

y(t+1), u(t+1)) + . . . + L(T −1, y(T −1), u(T −1))    + M(T, y(T ))F (t+1) s.t. y(τ +1) = D(τ , y, u) + S(τ , y, u) ε (τ ) ,

(17)

 which yields the new control sequence  u (t + 1|F (t +  1)), . . . ,  u (T −1|F (t +1)). It is obvious when we compare  + 1, y(t +1)). By (16) and (17) that J (t +1, y(t +1)) ≥ J(t resolving the open-loop optimal control problem at every time τ > t , we improve the remaining value of the objective function compared to the remaining value obtained at time τ − 1. Therefore, it follows that J ∗ (t, y(t)) ≥ J(t, y(t)).

1254

Furthermore, from the first part of the proof, i.e., J(t, y(t)) ≥ J ∗ (t, y(t)), it follows that J(t, y(t)) ≥ J ∗ (t, y(t)) ≥ J(t, y(t)) . The open-loop optimization without any further measurements provides a lower performance bound of the MPC method. The MPC method performs at least as well as the open-loop method without any further information. The MPC method solves at each time a suboptimal control problem and thus, the DP solution provides the true optimal control sequence. In the case of deterministic systems the inequalities becomes equalities. Similar observations to the MPC results presented above can be found in [12] and [4, Chapter 6.1]. IV. P ORTFOLIO OPTIMIZATIONS USING THE MPC APPROACH

We present the MPC method applied to a multi-period problem of portfolio optimization. First, we specify the asset model used for this method. We discuss the case of constraints for u(t) and probabilistic constraints for the wealth. Furthermore, we discuss only a terminal wealth objective with no in- or outflows from the portfolio. A. Asset Model and Portfolio Model 1) Asset Model: We used the basic modelling structure presented in (1) and (4). We assume that the conditional expectation is time-varying and stochastic. The expected returns of the assets μ (t) are thus modelled by

μ (t) = Gx(t) + g ,

(18)

where G ∈ Rn×m is the factor loading matrix, g is a constant, and x(t) ∈ Rm is the vector of factor levels. The interest rate of the bank account, described by (3), is modelled by r0 (t) = F0 x(t) + f0 ,

(19)

where F0 ∈ R1×m and f0 ∈ R. We assume that the factors are driven by a linear stochastic process where Θ(t, x(t)) = Ax(t) + a ,

Ψ(t, x(t)) = ν ,

Also, we replace R(t)2 by its conditional expectation, i.e., Var[R(t)]. The same approximation has also been used in [13] where the authors prove that in the case of Gaussian white noise terms (ε r (t)), the wealth dynamics are exact. The following wealth dynamics are obtained: w(t+1) = w(t) + F0 x(t) + f0 + uT (t)(Fx(t) + f ) 1 (23) − uT (t)Σ(t)u(t) + uT (t)ε r (t) , 2 where Var[R(t)] = uT (t)Σ(t)u(t). The portfolio dynamics are fully described by w(t+1) = w(t) + F0 x(t) + f0 + uT (t)(Fx(t) + f ) 1 T u (t)Σ(t)u(t) + uT (t)ε r (t) , − 2 x(t+1) = Ax(t) + a + ε x (t) . (24) In addition, we assume that the white noise processes for the asset returns and the factor dynamics are not independent. For this reason, we normalize the two processes and write ε x (t)=νξ x (t), with ν ∈ Rm×m , where the standard residuals are characterized by E[ξ x (t)] = 0 and E[ξ x (t)ξ xT (t)] = I ∈ Rm×m . The structure of the random process for the asset returns is ε r (t) = σ (t)ξ r (t), where we assume that E[ξ (t)] = 0 and E[ξ r (t)ξ rT (t)] = I ∈ Rn×n . In order to introduce dependence between the two white noise processes, we assume that the standard residuals of the two withe noise processes are correlated, i.e., E[ξ r (t)ξ x (t)] = ρ . B. Portfolio mean and variance For the system of equations that describe the portfolio dynamics given in (24) we calculate the mean and the variance. In the case that the two white noise variables ε x (t) and ε r (t) are Gaussian, we know that the portfolio distribution conditioned on the future asset allocation u(τ ) is Gaussian. Then the first two moments completely describe the portfolio distribution. The portfolio equations (24) can be written as      w(t+1) w(t) 1 F0 + uT (t)F = 0 A x(t+1) x(t)          y(t+1)

(20)

+

where A ∈ Rm×m , a ∈ Rm , and ν ∈ Rm×m . Moreover, we assume that both white noise processes posses a Gaussian distribution. 2) Portfolio model: For the portfolio optimization without transaction costs, we use the nonlinear wealth dynamics given in (8). We express the portfolio return as R(t) = r0 (t) + u (t)(r(t) − 1r0 (t)) T

(21) T

f0

+ uT (t) f



y(t)  1 T 2 u (t)Σ(t)u(t)

a 

 T  u (t)ε r (t) + , ε x (t)   



ay

(25)

ε (t+1)

or even more concisely as:

r

y(t+1) = Ay y(t) + ay + ε (t+1) .

where F = G − 1F0 and f = g − 1 f0 . To simplify calculations, we replace ln(1 + R(t)) by the following Taylor series approximation (around the mean of R(t)) 1 ln (1 + R(t)) ≈ R(t) − R(t)2 . 2

Ay



= F0 x(t) + f0 + u (t)(Fx(t) + f ) + u (t)ε (t) , T



Note that

(22)

1255

Et [ε (t+1)ε T (t+1)] = Ω(t)  T  u (t)Σ(t)u(t) uT (t)σ (t)ρν = . νν T ν T ρ T σ T (t)u(t)

(26)

1) Portfolio mean: The conditional mean, m(τ ) = Et [y(τ )], for a time τ can be computed as m(τ +1|t) = Ay m(τ |t) + ay and is obtained by (w)

V

(x)

(τ |t) + F0 m (τ |t) + f0   T + u (τ ) Fm(x) (τ |t) + f 1 − uT (τ )Et [Σ(τ )]u(τ ) 2 m(x) (τ +1|t) = Am(x) (τ |t) + a ,

m

(τ +1|t) = m

(w)

of the wealth equation, we insert V T |t) into V (w) (t+T |t), and write

i−1  + ∑ Ai−1−j b + uT (t+i) f j=0

(28)

(w)

where the fact that mt|t = x(t) and mt|t = w(t) is already used. 2) Portfolio variance: The dynamic evolution of the conditional covariance matrix is obtained by V (τ +1|t) = AyV (τ |t)ATy + Ω(τ ) ,

(29)

where τ > t is a future time and V (τ |t) = Et [(y(τ ) − Et [y(τ )])(y(τ ) − Et [y(τ )])T ]. The dynamics of covariance matrix are computed conditioned on the information at time t. The variance of the portfolio system can be decomposed into

(w) (wx) V (τ |t) V (τ |t) V (τ |t) = , (wx)T (x) V (τ |t) V (τ |t) where the superscripted index in parentheses indicate the covariance between w and x, and the covariance matrix of x and the variance of w. We use (29) and calculate the solutions of the difference equation by induction and get V

(x)

(t+T |t) =

T −1

∑ Ai νν T (AT )i

i=0

V

(wx)

(t+T |t) =

T −1

∑ (F0 + uT (t+i)F)V

(x)

(t+i|t)AT −i

i=0

+

T −1

∑ uT (t+i)σ (t+i)ρν AT −1−i

i=0

V

(w)

(t+T |t) =

T −1

∑ (F0 + uT (t+i)F)(V

(wx)

(t+i|t))T

i=0

+ (F0 + uT (t+i)F)V

(x)



+ Ai−j

(t+

2(F0 + uT (t+i)F) Ai−1−j ν T ρ T σ T (t+ j)u(t+ j)

(t+i|t)(F0T + F T u(t+i))

+ u (t+i)Σ(t+i)u(t+i) , T

where T denotes the investment horizon and we already used the initial value V (t|t) = 0. In order to compute the variance



∑ (Al νν T (AT )l )(F0T + F T u(t+ j))

l=0

i−1

+ (F0 + uT (t+i)F) ∑ (A j νν T (AT ) j ) j=0

·

(F0T T

+ F u(t+i)) T

 + u (t+i)Σ(t+i)u(t+i) ,

i=0

(x)



(x)

i=0  i−1

j−1

(27)

 (F0 + uT (t+i)F) Ai x(t)

−uT (t+i)Et [Σ(τ +i)]u(t+i) ,

T −1 

(t+T |t) and V

j=0

T −1



(t+T |t) = ·

where m(w) (τ ) = Et [w(τ )] and m(x) (τ ) = Et [x(τ )]. We compute the mean at time t + T where T denotes the horizon. By iterating (27) (T −1) times we obtain m(w) (t+T ) = w(t) + f0 T +

(w)

(wx)

(30)

One may notice that portfolio variance depends on multiplications of u(t + j) and u(t +i). This fact links the decision variables from one period with the decision variables from another period and thus, makes this problem a true multiperiod decision problem. 3) Portfolio distribution: The conditional density for wealth equations is   (w) w(t+T ) ∼ N m(w) (t+T ),V (t+T ) (31)  (x)  (x) x(t+T ) ∼ N m (t+T ),V (t+T ) , (32) where the mean and variance of w(t + T ) are computed by (28) and (30). The density is conditioned on the current value of the factors x(t) and the asset allocation decisions u(τ ), τ = t,t+1, . . . , T − 1. Only for this special case we are able to derive the conditional distribution. C. Constraints and Objectives MPC method is well suited for dealing with control as well as state constraints. Therefore, we propose the following state constraints for the log-wealth values as P(w(t+i) > L(t+i)) ≥ pt+i

i = 1, . . . , T ,

(33)

where L(t +i) denotes the constraint level at time t +i and pt+i the minimum probability with which the constraint be satisfied. Given the mean and the variance of the wealth equation, we know the distribution of the wealth. Therefore, we are capable of computing the probability above. In a finance related context, the probabilistic constraint (33) is known as a Value-at-Risk (VaR) constraint. Mathematically, VaR is the pt+i -quantile of the portfolio at time t + i. For a given confidence level pt+i , we specify the minimum amount of the log-wealth L(t+i) which we want to attain. Proposition 1: Given that the wealth is normally distributed, (33) can be computed as    (w) (w) m (t+i) + V (t+i)Φ−1 1 − pt+i ≥ L(t+i) , (34) where Φ−1 (·) denotes the inverse cumulative distribution function of a normal distribution with zero mean and unit variance. The constraint given by (33) is equivalent to (34). Furthermore, the constraint (34) is a convex constraint.

1256

The proof of this proposition is given in [14, Chapter 4] and follows standard arguments from convex analysis. In real-world portfolio selection problems constraints on the asset allocation are imposed, e.g., no short-selling, maximum investments to individual assets, or leverage. Often the investments to certain investment classes such as stocks or international investments, are restricted. Therefore, we assume that linear constraints are imposed such as Cu(t + i) ≤ c

(35)

where C is matric and c a vector. The objective function is to maximize a risk-averse objective function, which balances the expected return and the possible risks. In the case of a normal distribution, the density is uniquely described by its conditional mean and variance. This corresponds to maximizing a classical mean-variance objective. Mathematically, we obtain   1 (w) (w) (36) max m (t+T ) + λ V (t+T ) , 2 u(t+i) where λ ≤ 1 denotes the level of risk aversion. When we use this objective function, we do not need the probabilistic constraint to describe a meaningful optimization problem, where future gains (returns) and losses are balanced. The objective function is linear quadratic in u(t +i). In the case λ = 1, the objective function may only be used in connection with the state constraint, since otherwise we would not balance expected returns and possible risks. D. Portfolio optimization problem We discuss two portfolio optimization problems. Based on the description of objective and constraint, we combine them in order to obtain suitable portfolio optimization problems. Note that for the first objective function, the control decisions are independent of the current value of the wealth. For this reason, future asset allocation decisions do not depend on the trajectory of the portfolio, but solely on the current tradeoff between the satisfying the constraints and maximizing the objective. When we define the constraints relative to the current wealth, e.g., L(t +i) − w(t +i), the control decisions are again independent of the current value of the log-wealth. P1 The first optimization problem is mathematically described as 1 (w) max m(w) (t+T ) + λ V (t+T ) 2 u(t+i) s.t. Cu(t+i) ≤ c , (37) where C and c can be used to impose linear constraints on the asset allocation variable u(t+i). This optimization problem is often encountered in the literature and has been solved in the case of no constraints on u(t +i). This problem is often called “Strategic Asset Allocation” problem as an expression describing a portfolio optimization problem with time-varying returns and objectives typical for long-term investments, see [15]. P2 For the second optimization problem, we combine the objective of maximizing the return (λ = 0) with

VaR constraints. The portfolio optimization problem is mathematically given as max m(w) (t+T )

u(t+i)

   (w) s.t. − m(w) (t+i) − V (t+i)Φ−1 1 − pt+i +L(t+i)) ≤ 0, Cu(t+i) ≤ c .

(38)

This optimization problem can be applied to classical asset allocation for mutual funds, where the investor tries to maximize the future return while keeping the VaR under a certain limit (L(t+i)) for all future periods. When we apply the MPC algorithm in this manner, we always optimize the portfolio with a horizon of T steps ahead. Alternatively, we could optimize the portfolio distribution for a given fixed calender time, e.g., January 2034. Then the horizon of the optimization would shrink by one step at every cycle of the algorithm. The disadvantage of the fixed calender time horizon is that we may have to compute a prohibitively large number of future asset allocation decisions initially. V. C ASE STUDY WITH US ASSET MARKET DATA For the case study based on US asset market data, we want to simulate the situation of a balanced fund (strategy fund). Balanced funds invest their funds among stocks, bonds, and cash. Sometimes, to a small degree, alternative investments are allowed as well as investments outside the native market. In this case study, we assume that the fund only invests in domestic US assets. A. Data set and data analysis The data set consists of 6 indices which starts on 1/1/1982 and ends on December 31/1/2004. The indices are the S&P 500 total return index (S&P 500 corrected for dividend payments), 5- and 10-year constant maturity US Treasury bond index (total return), the Goldmann Sachs Commodity index (total return), and the Moody’s BAA 10-year corporate bond index (total return). The case study starts on 1/1/1990 and ends on 31/12/2004 with a monthly frequency, however the total time series data starts on 1/1/1982. All asset data is test for normality using the Jarque-Bera test (J-B) and the Lilliefors test (LF), see [16, Chapter 10.1] with usual confidence level of 5%. The results are reported in [14, Chapter 5] and we do not reject the assumption of normality for most of the indices and time periods. The factors to explain the expected returns of the five risky assets are given in Table II. Also the results of the normality test for standard residuals of (4) are given for the time period between 1982 and 1989. Only for the Federal Fund rate we have not tested for normality, since it is not a stochastic process. B. Implementation The implementation of the out-of-sample test for the portfolio allocation method consists of two main steps: the parameter estimation and the computation of the asset allocation.

1257

TABLE II FACTORS FOR CASE STUDY 2 WITH US DATA . Factor no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

1-m TB int. rates 10-year TB int. rates spr. 2-year 10-year TB int. rates spr. 5-year 10-year TB int. rates spr. Moody BAA - 10-year TB int. rates Federal Fund rate US $-JPY FX US $-Euro/DEM FX Oil (Brent) prices Gold prices S&P 500 EP-ratio S&P 500 DDY 3-m momentum. S&P 500 3-m momentum. GSCI 3-m momentum. 5 year TB index 3-m momentum. 10year TB index 3-m momentum. Moody BAA index 12-m momentum. S&P 500 12-m momentum. GSCI 12-m momentum. 5 year TB index 12-m momentum. 10year TB index 12-m momentum. Moody BAA index

J-B (5%) 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0

LF (5%) 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1) Parameter estimation and dependence: When we have chosen all of the factors that we need to predict the expected returns of the risky assets, we still need to estimate the parameters of (4) with (20). When we use a large set of factors, it is important that we do not introduce statistical dependencies which are questionable. For this reason, we estimate A and ν with a method of maximum likelihood (ML), starting with an unrestricted model. We then use a iterative procedure, where one insignificant factor, usually the one with highest p-value1 is removed and the ML estimates for the remaining parameters are recalculated until all insignificant parameters at 5% (or 1%) significance level have been removed. The parameter estimation for the returns and the factor dynamics are computed in every step of the out-of-sample test with 8 years of past data. The factor selection determines which factors are used and estimated. TABLE III S UMMARY STATISTICS OF THE 5 INDICES FROM 1/1/1990 TO 31/12/2004

Time series S&P 500 Goldman Sachs 5-year Treasury Bond 10-year Treasury Bond 10-year BAA Moody 1-month Treasury notes

r (%) 10.8 7.8 7.5 7.9 8.87 4.1

σ (%) 14.3 18.3 8.4 7.4 4.9 -

SR 0.45 0.19 0.39 0.48 0.93 -

krt 0.52 0.5 0.22 0.66 1.27 -

skw -0.36 0.12 -0.02 -0.36 -0.43 -

2) Asset allocation strategy: Given the factor selection, we first estimate all relevant parameters and then compute the asset allocation. The asset allocation decisions are then used in a out-of-sample test and we record the portfolio performances. In this way, the algorithm moves one step forward until we have to select the factors again. 1 The

p-value is the probability of rejection of the estimated value

C. Results of out-of-sample test In the out-of-sample test from 1/1/1990 until 1/12/2004 with a monthly frequency we use the MPC method. The portfolio optimization problem is to maximize the expected return while limiting the VaR of the portfolio with a confidence level of 99%. This corresponds to the portfolio optimization problem P2. We compute the VaR limit based on the previous value of the portfolio. In this manner, the optimization does not depend on current value of portfolio since the risk limit is defined relative to the current portfolio value. We define three relative limits, namely -2.0%, -5.0%, and -10% of portfolio loss with respect to the current portfolio value. Furthermore, we assume that the MPC Strategy is computed with a twoyear horizon. The VaR limit is not only defined at the end of the investment horizon but also for intermediate periods. In this way, we try to limit the risk exposure not only for the terminal time but also for the time in between. Moreover, we impose the following maximum limits for investments (in percentage of the portfolio value): 60 % stock market, 20% commodity index, 100% for both Treasury Bond indices, and 80% for the Moody’s index. We do not allow any short selling or leveraging. In order to compare the portfolios, we calculate two benchmarks. The composition of the two benchmarks are given in Table IV which mainly differ in their stock market weights and the weights for the bond indices. The performance results TABLE IV I NVESTMENT WEIGHTS OF THE BENCHMARK COMPOSITION .

Benchmark 1 Benchmark 2

S&P 500 45% 30%

GS Comm . 5% 10%

5y TB 5% 10%

10yr TB 30% 30%

10y BAA 5% 10%

Cash 10% 10%

of the portfolio test and the benchmarks are given in Table V, where r denotes the average return, σ denotes the volatility, SR denotes the Sharpe-ratio, m.ls. denotes the maximum loss, EVaR denotes the empirical one month VaR with 99% confidence level, and V.v. denotes how often the theoretical VaR limit is violated. In order to compare the results of the portfolios, we show the summary statistics of the six asset classes in Table III. TABLE V P ORTFOLIO RESULTS AND STATISTICS OF THE OUT- OF - SAMPLE TEST. Time series Portfolio 1 (-2%) Portfolio 2 (-5%) Portfolio 3 (-10%) Benchmark 1 Benchmark 2

r 8.17 10.6 10.7 8.5 8.0

σ 5.0 7.5 8.0 7.1 5.7

SR 0.77 0.84 0.79 0.61 0.66

m. ls. -4.4% -6.8% -6.8% -4.6% -3.4%

EVaR -3.1% -4.6% -5.1% -4.1% -3.0%

V.v. 3.5% 2.3% 1.1% -

The results are quite promising, since the portfolios have higher Sharpe ratios than both of the benchmarks. The returns of portfolio 3 are little bit lower than the average return of the S&P 500 in this period. However, the S&P 500 index outperforms the thee portfolios considerably until its

1258

peak in 2000, as Figure 2 shows. The Portfolio 2 is omitted

distribution of returns and factors have some divergence from the normality assumption.

600

VI. C ONCLUSION

550

We have shown in this paper the application of Model predictive Control to problems of dynamic portfolio optimization. At first we proved that MPC is a suboptimal control strategy which uses the new information advantageously. For a linear Gaussian factor model, we derived the wealth dynamics and the conditional mean and variance. We stated the portfolio optimization, where an invest maximizes the mean-variance objective while keeping the portfolio Valueat-Risk under a given limits. The portfolio optimization is applied in a case study to real-world asset market data, where the MPC method showed promising results. The outof-sample test showed that the portfolios generated a very good Sharpe-ratio and sufficient returns.

Portfolio 1 Portfolio3 S&P 500 Benchmark 1 Benchmark 2

500

Index value

450 400 350 300 250 200 150 100 Jun90

Mar93

Dec95

Sep98

May01

Feb04

time

R EFERENCES Fig. 1. Results of the out-of-sample test and comparison to other assets and benchmarks

form the figure since its evolution resembles the evolution of Portfolio 3. The Sharpe ratios of the three portfolios all beat the benchmarks and most of the indices used for this historical simulation. Portfolios 2 and 3 both outperform the two benchmarks with respect to average return as well as risk adjusted returns. The time-varying asset allocation is shown in Figure 2. Investment Stock Market Investments [%]

60 40 20 0

Jun90

Mar93

Dec95 Sep98 May01 Investment Commodities and Money Market

Feb04

Investments [%]

100 Commodities Money Market 50

0

Jun90

Investments [%]

100

Mar93 5−year Treasury Bond 10−year Treasury Bond Moodys 10−year BAA index

Dec95 Sep98 Investment Bonds

May01

Feb04

Dec95

May01

Feb04

50

0

Jun90

Mar93

Fig. 2.

Sep98

Asset allocation of portfolio 3

Very often, the constraints for the investments limit the asset allocation into the two most risky asset (S&P 500 and CSCI indices) and not the VaR constraints of -5% and -10%. Thus, the Portfolios 2 and 3 have often similar investment decisions and performance results. However, the theoretical confidence level of 1% violations of the VaR limit does not hold in the out-of-sample test. All three portfolios have higher empirical violations which indicates that the

[1] D. Mayne, J. B. Rawlings, C. Rao, and P. Scokaert, “Constrained Model Predictive Control: Stability and optimality,” Automatica, vol. 36, pp. 789–814, 2000. [2] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikoulos, “The Explicit Linear Quadratic Regulator for Constrained Systems,” Automatica, vol. 28, pp. 3–20, 2002. [3] D. G. Luenberger, Investment Science. Oxford University Press, 1998. [4] D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont Massachusetts: Athena Scientific, 1995, vol. I. [5] R. C. Merton, Continuous-Time Finance, 2nd ed. Blackwell Publishers Inc., Oxford UK, 1992. [6] D. van Hessem and O. Bosgra, “Closed-loop stochastic dynamic process optimization under input and state constraints,” in Proceedings of the American Control Conference, 20, 2002, pp. 2023–2028. [7] P. Li, M. Wendt, and G. Wozny, “Robust Model Predictive Control Under Chance Constraints,” Computer and Chemical Engineering, vol. 24, pp. 829–834, 2000. [8] A. J. Felt, “Stochastic Linear Model Predictive Control Using Nested Decomposition,” in Proceedings of the American Control Conference, 20, 2003, pp. 3602–3607. [9] D. A. Castanon and J. M. Wohletz, “Model Predictive Control for Dynamic Unreliable Resource Allocation,” in Proceedings IEEE Conference Decision and Control, 41, 2002, pp. 3754–3759. [10] P. Seferlis and N. F. Giannelos, “A Two-Layered Optimisation-Based Control Strategy for Multi-Echelon Supply Chain Networks,” Computers and Chemical Engineering, vol. 28, pp. 799–809, 2004. [11] P. J. Meindl and J. A. Primbs, “Dynamic Hedging with Transaction Costs using Receding Horizon Control,” in Proceedings of the 2nd IASTED International Conference on Financial Engineering and Applications - FEA 2004, Cambridge, M. Hamza, Ed. Calgary, Canada: IASTED, 2004, pp. 142–147. [12] C. C. White and D. P. Harrington, “Applications of Jensen’s Inequality to Adaptive Suboptimal Design,” Jounral of Optimization Theory and Applications, vol. 30, pp. 89–99, 1980. [13] J. Y. Campbell and L. M. Viceira, Strategic Asset Allocation: Portfolio Choice for Long-Term Investors. Oxford University Press, 2002. [14] F. Herzog, “Strategic Portfolio Management for Long-Term Investments: An Optimal Control Approach,” Ph.D. dissertation, ETH Z¨urich, ETH Diss. No. 16137, 2005. [15] M. J. Brennan, E. S. Schwartz, and R. Lagnado, “Strategic asset allocation,” Journal of Economic Dynamics and Control, vol. 21, pp. 1377–1403, 1997. [16] C. Alexander, Market Models. Chister, UK: John Wiley and Sons, 2001.

1259