A Linear Programming Model for Selecting Sparse High-Dimensional ...

High-Dimensional Static and Dynamic Portfolio Selection Problems via ℓ1 Minimization Chi Seng Puna , Hoi Ying Wonga,∗ a

Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong

Abstract This paper studies the mean-variance (MV) portfolio problems under static and dynamic settings, particularly for the case that the number of assets (p) is larger than the number of observation times (n). We prove that the classical plug-in estimation seriously distorts the optimal MV portfolio in the sense that the probability, that the plug-in portfolio will outperform the bank deposit, tends to 50% for p ≫ n and a large n. We investigate a constrained ℓ1 minimization approach for directly estimating effective parameters appearing in the optimal portfolio solution. Similar to the Dantzig Selector, the estimator is efficiently implemented with linear programming and the resulting portfolio is called the linear programming optimal (LPO) portfolio. We derive the consistency and rate of convergence for the LPO portfolios. The LPO procedure essentially filters out unfavorable assets based on the MV criterion, resulting in a sparse portfolio. The advantages of the LPO portfolio include the computational superiority, its applicability for dynamic portfolio problems and non-Gaussian distributions of asset returns. Simulations validate the theory and its finite-sample properties. Empirical studies show that the LPO-based portfolios outperform the equally weighted portfolio, the MV portfolios using shrinkage estimators and other competitive estimators. Keywords: High-Dimensional Portfolio Selection, Dynamic Mean-Variance Portfolio, ℓ1 Minimization, Dantzig Selector, Sparsity

∗

Correspondence author Email addresses: [email protected] (Chi Seng Pun), [email protected] (Hoi Ying Wong)

1

1. Introduction Portfolio theory is central to the asset pricing of financial economics and is the building block of the capital asset pricing model. The portfolio theory is based on the assumption that the parameters of the underlying stochastic model are known and contain no estimation error. For example, the optimal mean-variance (MV) portfolio assumes that the mean vector and the covariance matrix of the asset returns are known (see Markowitz (1989)). Extensions of the MV problem to dynamic settings (multi-period and continuous-time models) maintain the same assumption, as in Li and Ng (2000), Zhou and Li (2000) and Basak and Chabakauri (2010). An alternative portfolio selection approach is based on maximizing an expected utility. In such cases, the known parameter assumption still exists (see Merton (1990) for further details). When a trading strategy is applied to real data, point estimators are often plugged into the theoretical solution. For instance, empirical studies of MV portfolios often substitute the sample mean vector and the sample variance-covariance matrix into the optimal trading strategy. Michaud (1989) and Chopra and Ziemba (1993) find that the estimation error is usually so large that the optimized strategy is no longer optimal. For portfolio management, parameter uncertainty and aversion to ambiguity, such as those used in Garlappi et al. (2007), are plausible approaches to address the issue of estimation error in the mean. Unfortunately, the empirical performance of optimized strategies are disappointing even after incorporating a Bayesian framework. DeMiguel et al. (2009a) empirically examine various types of theoretical strategies, and surprisingly find that the equally weighted (EW) strategy of investing equal amounts of money in each available asset outperforms the sample-based (plugin) strategies, across many datasets. Moreover, neither the Bayesian approach nor the portfolio constraints improve the performance significantly. The minimum-variance strategy, which does not involve the mean estimates, generally performs the best among the various types of theoretical strategies, but it is still unable to beat the EW strategy, which is free of estimation error. The estimation error problem is more pronounced for a large portfolio because it requires a huge number of observations. Although the number of observations can be drawn from a long estimation window, Broadie (1993) points out that this increases the possibility of nonstationarity 2

in the estimated parameters. In other words, a fundamental challenge to obtaining reliable estimates is the curse of dimensionality, as the number of securities (p) is greater than the number of observation times (n). For a market with p securities, the number of parameters for a MV portfolio equals p(p + 1)/2 + p. According to the World Bank data in 2012, the total number of listed domestic companies in the world was at 47,520, of which 4,102 were in USA. Given 260 trading days a year, a 15-year daily data is the minimal charge for estimating the covariance matrix for the US stock market and a 150-year daily data is the minimal charge for the world. Nevertheless, practitioners tend to use datasets from a few recent years. Therefore, the estimation issue for MV portfolios is typically a high-dimensional statistical problem, as p ≫ n. The effect of estimation in high-dimensional portfolio on expected utility of terminal wealth is also studied in Gandy and Veraart (2013) and Dubois and Veraart (2015). We are interested in this high dimensional portfolio selection problem in this paper. Imposing sparsity constraint on the covariance matrix is a possible remedy to high-dimensional portfolio problems. Ledoit and Wolf (2004) shrink the sample covariance matrix before substituting it into the optimal trading strategy. Although the shrinkage covariance estimate converges to the true covariance matrix under certain sparsity constraints, it is unclear if the resulting trading strategy tends to the true trading strategy asymptotically. In addition, the shrinkage approach suggests investing in all assets available in the market. However, Boyle et al. (2012) find that ambiguity-averse investors tend to invest in a few risky assets when correlations are high. Fouque et al. (2014) show mathematically that the robust optimal strategy with respect to assets’ correlations may suggest investing in a few assets instead of all assets available in the market. Therefore, correlation uncertainty may lead to a sparse optimal portfolio. An alternative approach imposes portfolio constraints. Jagannathan and Ma (2003) find that imposing the no-short-selling constraint reduces the risk in estimated optimal portfolios. DeMiguel et al. (2009b) generalize this result to norm-constrained portfolios. Fan et al. (2012) study portfolio optimization with ℓ1 norm constraints and its statistical properties. Gandy and Veraart (2013) and Dubois and Veraart (2015) propose the utility maximizing portfolios with ℓ1 norm constraints to reduce the effect of estimation, but they are still unable to beat the EW portfolio consistently in their empirical study. Generally speaking, the existing literature solve the problem due to high 3

dimensionality by reformulating it to the framework of least absolute shrinkage and selection operator (LASSO) method. As these results are specific to static or constant rebalancing strategies, our goal is to devise a general approach to both static and dynamic strategies. The parameter estimation for p > n belongs to the field of high-dimensional statistics that has recently generated rich innovative results. In particular, the high-dimensional sample covariance matrix is known to be singular and unstable. Consistent estimators for the covariance matrix (Σ) often require some special structures on the data, such as banding (Bickel and Levina (2008a)), thresholding (Bickel and Levina (2008b), El Karoui (2008), and Shao et al. (2011)) and tapering (Cai et al. (2010)). The implementation of MV portfolios is even more complicated because it essentially needs the precision matrix (Ω := Σ−1 ). Penalized likelihood methods are used to estimate sparse precision matrices in d’Aspremont et al. (2008), Friedman et al. (2008), and Rothman et al. (2008). Cai et al. (2011) and Cai and Liu (2011) propose a constrained ℓ1 minimization to estimate the precision matrix and the effective parameters in the Fisher’s linear discriminant rule, respectively. Inspired by these statistical advances, we propose and investigate a novel constrained ℓ1 minimization approach for high dimensional portfolios that directly estimates the effective parameters appearing in the optimal trading strategies. The estimator is formulated similar to the Dantzig Selector introduced in Candès and Tao (2007), and can be efficiently implemented using linear programming. We call the new estimator as linear programming optimal (LPO) estimator and its corresponding estimated optimal portfolio as LPO portfolio. The proposed approach imposes no constraints on the original problem of interest and can be applied to fat-tailed asset returns. Therefore its applicability is not limited to the portfolio problems that can be reformulated as LASSO-type problem, and particularly includes the optimal dynamic trading strategies. The LPO procedure essentially looks for the sparsest solution that best approximates the theoretical optimal portfolio. To highlight the advantages of the LPO portfolios, we first show the pitfalls associated with the traditional plug-in approach, which directly substitutes sample estimates into the optimal portfolio strategy. Although the literature has already shown the problem of the estimation error in the sample mean, to the best of our knowledge, the effect of estimation error associated with the 4

sample covariance matrix has yet to be investigated. The estimation error in the covariance matrix is important in its own right because the global minimum-variance (GMV) portfolio does not involve the mean vector. We prove that the plug-in high-dimensional MV portfolio degenerates to the random stock picking strategy, such that the probability of beating the bank deposit tends to 50%. This paper’s contributions include proposing the LPO estimator for optimal portfolios and proving of the oracle property that the LPO portfolio asymptotically recover the true optimal portfolio under the standard regularity conditions that are commonly shared by other high dimensional statistical analysis, under the static or dynamic setting. Our simulations verify the derived theoretical results and demonstrate the superiority of the LPO portfolios. This paper also contributes to the literature by showing the advantages of the LPO portfolio and addressing the open problem posed by DeMiguel et al. (2009a). Our empirical study uses datasets from the S&P 500 and Russell 2000 component stocks to show that the LPO portfolio consistently outperforms the EW portfolio. DeMiguel et al. (2009a) find that the EW is so good that optimal portfolios hardly beat it empirically. However, how the number of stocks and the selection of stocks to the EW portfolio are determined remains unclear. They conjecture that the larger the portfolio size, the better the performance of the EW portfolio. Our empirical results show that the EW portfolio on stocks selected by the LPO procedure performs much better, in terms of the Sharpe ratio and certainty equivalent rate, than the EW portfolio on all available assets. Moreover, portfolio performance can be further enhanced by additional statistical treatments on stocks selected by the LPO procedure. Therefore, the LPO approach has advantages not only in managing (ultra-) high-dimensional portfolios, but also in filtering out unfavorable stocks. The rest of this paper is organized as follows. Section 2 reviews the classical results of MVP theory and highlights key observations underlying the proposed approach. Section 3 mathematically proves the pitfalls of the traditional plug-in MV portfolio. Section 4 introduces the LPO approach and proves that it is asymptotically optimal. We then conduct the numerical and empirical studies to verify the aforementioned assertions in Section 5. Section 6 concludes the paper. All proofs of the theorems are collected in the appendix. 5

2. Preliminary results Suppose that a MV investor wants to invest his/her initial wealth x0 with the target reward level z over a fixed investment horizon T > 0. In this paper, all random variables/processes are defined on a complete (physical) probability space (Ω, F , P). Consider a financial market with p risky assets and one risk-free asset (bank deposit). Denote Si (t) as the ith stock price at time t for i = 1, . . . , p, and S0 (t) as the risk-free asset price at time t. Define the asset return as Ri (s, t) =

Si (t)−Si (s) , Si (s)

N −1 s < t, i = 0, 1, . . . , p. Let {tj }j=0 ,

where 0 = t0 < t1 < . . . < tN −1 < tN = T , be the partition of [0, T ]. We consider an investment strategy {ui (t)}pi=0 whose total wealth at time t is denoted by X(t), where ui (t) is the cash amount

invested in the ith asset at time t. Obviously, u(t) = (u1 (t), . . . , up (t))′ characterizes the portfolio P of the investor as u0 (t) = X(t) − pi=1 ui (t). In this paper, we focus on two investment strategies: static (single-period) and discrete-time dynamic (multi-period) MV strategies.

The static MV strategy determines the initial investment, u(0), and keeps it unchanged until the end of the investment period. The objective is to minimize the portfolio variance, subject to a target expected portfolio return. Specifically, the asset returns satisfy the following equations:   R (0, T ) = r T, 0 s,f (1)  R (0, T ) = µ T + σ √T ξ , for i = 1, . . . , p, i

s,i

s,i

s

where rs,f is the risk-free interest rate, µs,i is the appreciation rate of asset i, σs,i = (σs,i1 , . . . , σs,ip ) ∈

Rp is the volatility of asset i for i = 1, . . . , p, and ξs is a p-dimensional random vector with a zero

mean and identity covariance matrix that dose not necessarily follow a Gaussian distribution. We denote the static strategy as us , which is a constant vector throughout time. The wealth after adopting the strategy us becomes X us (T ) = u′s (1p +R(0, T ))+(x0 −u′s 1p )(1+rs,f T ) = x0 (1+rs,f T )+u′s(R(0, T )−rs,f T 1p ), (2) where 1p = (1, . . . , 1)′ ∈ Rp and R(0, T ) = (R1 (0, T ), . . . , Rp (0, T ))′ is the return vector of the risky assets with mean vector µs T and covariance matrix Σs T , in which µs = (µs,1, . . . , µs,p) ′ ′ and Σs := σs σs′ with σs = (σs,1 , . . . , σs,p )′ ∈ Rp×p . Hereafter, we use the subscripts s and m to

indicate objects from the single-period and multi-period models, respectively. 6

The multi-period MV model allows investors to rebalance the portfolio at time tj , j = 0, 1, . . . , N− 1, and tN = T . For j = 0, 1, . . . , N − 1,   R (t , t ) = r ∆ , 0 j j+1 m,f j  R (t , t ) = µ ∆ + σ i

j

j+1

m,i

j

m,i

p

(3) ∆j ξm,j , for i = 1, . . . , p,

N −1 where ∆j = tj+1 − tj , rm,f , µm,i , and σm,i are defined similarly as in model (1) and {ξm,j }j=0 is a

sequence of independent p-dimensional random vectors with a zero mean and identity covariance N −1 matrix. The multi-period strategy is generally a set of feedback controls: um = {um (tj )}j=0 that

depend on the state variables and moments of returns. After adopting the strategy um , the portfolio wealth is given by X um (tj+1 ) = um (tj )′ (1p + R(tj , tj+1 )) + (X um (tj ) − um (tj )′ 1p )(1 + rm,f ∆j ) = X um (tj )(1 + rm,f ∆j ) + um (tj )′ (R(tj , tj+1 ) − rm,f ∆j 1p ),

(4)

where the risky asset return vector for each period R(tj , tj+1) = (R1 (tj , tj+1), . . . , Rp (tj , tj+1 ))′ has mean µm ∆j and covariance matrix Σm ∆j , such that µm = (µm,1 , . . . , µm,p )′ and Σm := ′ σm σm . Here for simplicity, we assume the return vector in each period has the same first two

moments. However, our proposed approach is applicable for the time-varying parameters while tedious adjustments are needed. We remark here that this assumption does not affect the nature of dynamic programming in multi-period setting, and that the multi-period optimal portfolio is theoretically superior to the myopic portfolio which repeats the static portfolio in each period. Therefore, it is interesting to investigate the dynamic multi-period strategy. Note that these two asset return dynamics do not guarantee nonnegative stock prices, but they are considered a very efficient approximation of the stock price process, see Luenberger (1998), and are probably the most popular and convenient market practice. Mathematically, the MV problem is stated as follows. min u

subject to

Var X u (T )   E[X u (T )] = z,  (X u (·), u(·)) satisfies (2) or (4). 7

(5)

Throughout this paper, we assume that z > x0 (1 + rs,f T ) = x0

QN −1 k=0

(1 + rm,f ∆k ). The optimal u

for the above problem, if it exists, is called the single-period or multi-period MV strategy. Notice that the multi-period strategy um (tj ) often depends on X(tj ), and hence is a feedback control. The single-period and multi-period MV solutions are solved by Markowitz (1989) and Li and Ng (2000), respectively. We summarize their results in the following proposition. Proposition 2.1. The MV portfolios us and um of the single-period and multi-period MV problems in (5) are given by (z − x0 (1 + rs,f T )) Σ−1 s βs , (6) ′ T βs Σ−1 s βs # " N −1 −1 Y Σ β ∗ m m (1 + rm,f ∆k )−1 , (7) u∗m (tj ) = − X um (tj )(1 + rm,f ∆j ) − γm ′ Σ−1 β 1 + ∆j βm m m k=j+1 u∗s =

for j = 0, 1, . . . , N − 1, where βs = µs − rs,f 1p and βm = µm − rm,f 1p are the excess returns and QN −1 ′ −1 z − x0 j=0 (1 + rm,f ∆j )(1 + ∆j βm Σ−1 m βm ) . γm = QN −1 ′ Σ−1 β )−1 1 − j=0 (1 + ∆j βm m m

Remark: we have used the convention

QN −1 N

(·) = 1.

To implement strategies (6) and (7), we have to estimate the parameter sets (µs , Σs ) and (µm , Σm ). These two sets are not the same in theory due to the compounding effect. However, the market practice usually estimates (µs , Σs ) and (µm , Σm ) from the (projected) mean and covariance matrix estimated from daily returns. To simplify matters, our presentation uses (µs , Σs ) = (µm , Σm ) =: (µ, Σ = σσ ′ ), although our proposed approach is generally applicable for any estimation convention. We assume that the historical return vectors with constant interval δt are independently distributed with mean µ δt and covariance matrix Σ δt, but do not necessarily follow the Gaussian distribution. The analysis in this paper can be extended to time series data of correlated asset returns but it is beyond the scope of this paper. The approximation error in the projection from daily return data is minimal for a short-term investment such as one month (T = 1/12). We also assume a constant rs,f and rm,f . The MV strategies depend on the precision matrix Ω := Σ−1 and the excess return vector. Traditional implementation directly substitutes the sample covariance matrix (Gram matrix) and 8

the sample mean estimated from daily returns into the optimal strategies in Proposition 2.1. The procedure involves estimating the precision matrix Ω or taking the (generalized) inverse of the Gram matrix. However, we will show that this leads to an unexpectedly poor performance for a high-dimensional portfolio. We notice a key observation that the strategies in (6) and (7) are proportional to Ωβ, with β representing either βs or βm . We then let ηs := Ωβs and ηm := Ωβm be the effective parameter for implementing the single-period and multi-period MV portfolios, respectively, and establish a statistical estimation for them. This considerably reduces the number of parameters from p(p + 1)/2 + p (in separate estimation of Ω and β) to 2p (in η and β). To fix the notation, ηˆs and βˆs denote estimates of ηs and βs , respectively, while ηˆm and βˆm are ′ estimates of ηm and βm . Consequently, Θs,p := βs′ Ωβs is estimated as βˆs′ ηˆs and Θm,p := βm Ωβm ′ as βˆm ηˆm . Hence, we define the following notations.

(z − x0 (1 + rs,f T )) ηˆs , (8) T βˆs′ ηˆs " # γ ˆ −ˆ η m m X uˆm (tj )(1 + rm,f ∆j ) − QN −1 uˆm (tj ) = u∗m (tj )|ηm =ˆηm ,βm=βˆm = (9), ′ η 1 + ∆j βˆm ˆm k=j+1 (1 + rm,f ∆k ) uˆs = u∗s |ηs =ˆηs ,βs =βˆs =

for j = 0, 1, . . . , N − 1, where γˆm =

z − x0

QN −1

′ (1 + rm,f ∆j )(1 + ∆j βˆm ηˆm )−1 . QN −1 ′ η 1 − j=0 (1 + ∆j βˆm ˆm )−1 j=0

(10)

For any estimators for (ˆ ηs , βˆs ) and (ˆ ηm , βˆm ), we can solve for X uˆs (T ) and X uˆm (T ) and their variances. Proposition 2.2. Given the estimators for (ˆ ηs , βˆs ) and (ˆ ηm , βˆm ), the terminal wealths become " # ′ ′ β η ˆ z − x0 (1 + rs,f T ) ′ β η ˆ s s ηˆs σξs , (11) + X uˆs (T ) = z s + x0 (1 + rs,f T ) 1 − s √ T βˆs′ ηˆs βˆs′ ηˆs βˆs′ ηˆs ! N −1 ! p N −1 ′ Y Y 1 − ∆j (βm − βˆm )′ ηˆm − ∆j ηˆm σξ m,j X uˆm (T ) = γˆm + x0 (1 + rm,f ∆k ) − γˆm (12), ′ ˆ 1 + ∆j β ηˆm j=0

k=0

9

m

where γˆm is defined in (10). Moreover, the portfolio variances read, ηˆs′ Σˆ ηs , (13) ′ ˆ T (βs ηˆs )2 2 " QN −1 N −1 (1 + rm,f ∆k ) z − x0 k=0 Y ′ [(1 − ∆j (βm − βˆm )′ ηˆm )2 + ∆j ηˆm Σˆ ηm ] = QN −1 ′ 2 ˆ ( j=0 (1 + ∆j βm ηˆm ) − 1) j=0 # N −1 Y (14) − (1 − ∆j (βm − βˆm )′ ηˆm )2 .

(σTuˆs )2 = (z − x0 (1 + rs,f T ))2 (σTuˆm )2

j=0

3. Pitfalls of the sample covariance matrix in optimal portfolios This section investigates the problems raised by using sample estimates. The empirical investigation of the related issue can be found in Garlappi et al. (2007) and DeMiguel et al. (2009a). We naturally ascribe the unsatisfactory performance of the optimal portfolio to the estimation errors in both the mean and covariance matrix. However, we focus on the estimation error of the precision matrix by inverting the Gram matrix. Our aim is to explain the poor empirical performance of the minimum-variance portfolio, which does not depend on the mean estimate. In other words, our message is that no matter how much effort is put into reducing the effect of the estimation error in the mean, the high-dimensional portfolio will still perform badly because of the difficulty of precisely estimating the precision matrix. Let {S (l) = (S1 (tl ), . . . , Sp (tl )), l = 0, 1, . . . , n} be the collection of n + 1 observations of the asset price vectors of size p, which can be converted into n observations in the return vectors. We assume a fixed sampling interval of δt. The traditional approach estimates Σ using the projected Gram matrix:

n

(l)

(l)

X ˆn = 1 Σ (r (l) − r¯)(r (l) − r¯)′ , nδt l=1

(15) (l)

i (tl−1 ) where r (l) = (r1 , . . . , rp ) is the sample return vector of the p assets, ri = Si (tSl )−S and i (tl−1 ) P r¯ = n1 nl=1 r (l) . We make the usual assumption that the return vectors r (l) are independent and

identically distributed with mean µδt and covariance matrix Σδt for l = 1, . . . , n. When p > n, ˆ n is singular and an unstable estimate of Σ in general. The problem becomes more serious for Σ ˆ − ) for a singular estimating precision matrix, which is the Moore-Penrose (generalized) inverse (Σ n 10

ˆ − βs and ηˆm,n := Σ ˆ − βm for single-period matrix. Therefore, the plug-in approach uses ηˆs,n := Σ n n and multi-period MV portfolios, respectively. When µ is known, replacing r¯ by µδt in (15) does not affect our analysis. We denote by ED and PD as the conditional expectation and probability given the historical data (σ-field D = σ({S (l) }nl=0 )). Our analyses on the estimated strategies are based on the conditional expectation with respect to the historical data, which take the variation of the data into account. We then analyze the asymptotic properties of the data-driven strategies when the number of assets grow with the sample size and both tend to infinity. The following two lemmas are useful in establishing our proof for the pitfalls associated with the plug-in approach. Lemma 3.1. For ǫn,p > 0 (possibly depends on n, p), ! ! 2 ! ′ N −1 −1 NY X p p η ˆ Σˆ η m,n m,n ′ ′ σ ∆j ξm,j > ǫn,p ≤ O PD [1 − ηˆm,n σ ∆j ξm,j ] − 1 − ηˆm,n . ǫ n,p j=0 j=0 Lemma 3.2. Let ηˆn = Σ− n β and β can either be βs or βm . Then, P

1. ηˆn′ Σˆ ηn /β ′ ηˆn − → 0 as p/n → ∞; p P 2. If β ′ Ωβ = o np , then β ′ ηˆn / ηˆn′ Σˆ ηn − → 0 (and in L2 ) as p/n → ∞; and p P P 3. If β ′ Ωβ = o np , then ηˆn′ Σˆ ηn − → 0 and β ′ ηˆn − → 0 as p/n → ∞. The condition β ′ Ωβ = o

p n

in Lemma 3.2 has a financial interpretation. Note that β ′ Ωβ =

|Σ−1/2 β|2 is the sum of the squared Sharpe ratios, which infers the inflation rate that is contributed by the stock market (see the deflator process defined in Bielecki et al. (2005)). Therefore, the condition requires that the inflation rate cannot grow faster than the growth rate of the number of stocks (p/n). This is economically reasonable. Otherwise, an economy could simply boost its GDP growth by allowing more firms to go public. We make the following distributional assumptions, which are limited to this section only. We intend to show that the portfolio performance is still poor even given the most favorable normality assumption. (A1) ξs is normally distributed. N −1 (A2) {ξm,j }j=0 are normally distributed.

11

3.1. Probability of beating the bank deposit Why are investors interested in an optimal portfolio? Investors believe that an optimal portfolio outperforms the bank deposit with a reasonably high probability. Under the normality assumption (A1), it is easy to show that P(X us (T ) > x0 (1 + rs,f T )) > 1/2 if parameters βs and Σ are known. In laymen terms, it is not by chance that the optimal MV portfolio beats the bank deposit. The same can be proven for the multi-period MV problem. In fact, the multi-period strategy is theoretically better than the single-period strategy as it has a larger opportunity set. Classical finance theory advises us to invest in many assets for the purpose of diversification. Surprisingly, both single- and multi-period high-dimensional MV portfolios implemented with the plug-in approach turn out to perform very badly in an asymptotic manner. Theorem 3.1. As p/n → ∞ and n → ∞, P

→ 21 ; 1. if (A1) holds true and Θs,p = o( np ), then PD (X uˆs (T ) > x0 (1 + rs,f T )) − QN −1 P p u ˆm 2. if (A2) holds true and Θm,p = o( n ), PD X (T ) > x0 k=0 (1 + rm,f ∆k ) ≤ Un,p → − 12 ,

′ where we recall that Θs,p = βs′ Ωβs and Θm,p = βm Ωβm .

Prior to presenting the interpretation of Theorem 3.1, we would like to highlight the difficulty of the proof. It may seem that the single-period result is a direct consequence of Bickel and Levina (2004)’s linear discriminant analysis, as they prove that the probability of discriminating two different classes of data tends to 1/2 under the worst case scenario. However, the first probability in Theorem 3.1 tends to 1/2 for all scenarios. The techniques involved are different. Our result indicates how serious the problem caused by the plug-in approach is and why it rarely gives a satisfactory empirical performance for the sample-based optimal MV portfolio. Theorem 3.1 reveals that the plug-in MV portfolio beats the bank deposit solely because of pure luck for p ≫ n. The probability of offering a positive excess return tends to 50% for a large portfolio. This performance is similar to the random stock picking or blindfolded monkey picking strategy. The multi-period MV portfolio can have even worse results, as its probability of beating the bank deposit is bounded above by 50%. Therefore, the more optimization we do, the poorer the samplebased optimal portfolio results, as although the optimization takes full advantage of the underlying 12

stochastic model including the model parameters, it also amplifies the effect of the estimation error. We believe that without additional constraints, constructing efficient and accurate estimates for the effective parameters appearing in the optimal control will achieve the desirable optimality posited by the theory. For MV portfolios, the effective parameters are (βs , ηs ) for the single-period and (βm , ηm ) for the multi-period. 4. A constrained ℓ1 minimization approach This section proposes a constrained ℓ1 minimization approach to directly estimate η, which represents either ηs or ηm . This newly proposed approach leads to a direct estimation of the optimal control. We call it the linear programming optimal (LPO) approach because it can be easily implemented using linear programming. To facilitate presentation, we suppress the subscripts such that β = βs or βm , η = ηs or ηm , and Θp = Θs,p or Θm,p . It can be recognized from (2) and (4) that the MV strategies depend on Ω only through the term η = Ωβ. Inspired by Cai et al. (2011), we estimate η as n o ˆ n η − βˆn |∞ ≤ λn , η˜ ∈ arg minp |η|1 subject to |Σ

(16)

η∈R

ˆ n is given in (15), βˆn = where Σ

1 nδt

Pn

l=1

r (l) − rf 1p , λn is a tuning parameter, |a|1 =

Pp

i=1

|ai |,

and |a|∞ = supi |ai | for a ∈ Rp . We refer readers to Candès and Tao (2007), James et al. (2009) and Cai and Liu (2011) for more details on the ℓ1 minimization method. The LPO approach is also applicable for estimating Ω1p , the effective parameters in the minimum-variance strategy, by setting β = 1p . Although the sample estimates of Σ and β are used in (16) for a simple implementation, we stress that other improved estimators are also possible candidates for (16). From this view point, we can actually assume the historical data are correlated in time and make use of time series estimation. We denote the estimated strategies (8) and (9) with the LPO estimate (16) as u˜s and u˜m , respectively. Due to the nature of this approach, when λn > |βˆn |∞ , η˜ = 0 which suggests not investing in

ˆ ˆ n is non-singular, Σ ˆ −1 risky assets. We are thus interested in λn ∈ [0, |βˆn |∞ ]. If n > p and Σ n βn is

ˆ −1 βˆn when λn = 0. Hence, the candidate of the feasible set with any non-negative λn and η˜ = Σ n

with the appropriate choice of λn , the LPO estimator is superior or equal to the plug-in sample 13

estimator. In the high dimensional case of p > n, the feasible set for the minimization problem (16) may be empty for a small λn . The admissible set can be determined by testing the feasibility of (16) by gradually decreasing the value of λn from |βˆn |∞ to the smallest positive value. Details can be found in Section 5. The idea behind this approach is to view η as the solution to the equation Ση − β = 0. When

ˆ n and βˆn (or other improved Σ and β are unknown, we replace them with the sample estimates Σ ˆ n and βˆn are so large that may make the true η estimates). However, the estimation errors of Σ ˆ n η − βˆn = 0}. Therefore, we consider the feasible set {η : |Σ ˆ n η − βˆn |∞ ≤ outside the set {η : Σ

ˆ n and βˆn , and seek for the most sparse solution within this λn } to account for the variability in Σ set that uses the least cost to recover the true η. We show in the appendix (Lemma Appendix A.1) that with the appropriate choice of λn , the true η belongs to the feasible set with high probability. The sparsity reduces the variability of the solution and hence enhances the stability. Notice that the MV strategies (6) and (7) are proportional to η. The sparsity of η is exactly the cardinality of the MV portfolios. The sparse portfolio is particularly welcomed by ambiguity-averse investors who would like to invest in only a few assets in the market instead of all available assets. The ℓ1 minimization is also mathematically attractive as it enables us to prove the desired oracle property. The LPO approach has many noteworthy advantages. 1. Classic theoretical solutions are not ignored as they are part of the implementation procedure and the corresponding interpretations remain. Therefore, this approach particularly works for dynamic strategies problems. ˆ n and for any other improved estimator of Σ. 2. This approach works for a singular Σ 3. This approach also works for fat-tailed distributions of asset returns. 4. The implementation is simple, as (16) can be implemented using linear programming. 5. The number of parameters is reduced from p(p + 3)/2 (in Ω and β) to 2p (in η and β). This significantly lowers the computational burden compared to the separate estimation approach. 6. The sparsity on η is a weaker condition than the conditions required for the consistent estimation of Ω in Shao et al. (2011), which considers the separate estimation approach. 7. As the resulting LPO portfolio is sparse, the LPO procedure automatically filters out unfavorable stocks based on the MV criterion and stimulates low-dimensional portfolios on 14

LPO-based assets. The rest of this section discusses the connections with the ℓ1 norm constrained portfolio and shows the oracle properties of the LPO portfolios. Under the (approximate) sparsity condition of η, we show that the LPO approach preserves the optimality of the MV strategy for p/n → ∞. 4.1. Connections with the ℓ1 norm constrained portfolio In this subsection, we investigate the relationship between the LPO portfolio and the ℓ1 norm constrained portfolio. These two portfolios are constructed based on the ℓ1 minimization (Dantzig Selector) approach and least absolute shrinkage and selection operator (LASSO) approach, whose relationship is widely discussed in Bickel et al. (2009) and James et al. (2009). The minimization problem in (16) shares many similarities with the LASSO optimization problem: min |π|1

subject to kˆ σn′ π − σ ˆn−1 βˆn k22 ≤ δn

(17)

ˆ n . We for some non-negative δn , where σ ˆn is the empirical volatility matrix such that σ ˆn σ ˆn′ = Σ call the solution to LASSO-type problem (17) as LASSO solution. As pointed out by James et al. (2009), when p = 2, these two solutions are identical. However, the equivalence of these two solutions in general does not hold in p ≥ 3 dimensions. When the tunning parameters δn and λn are the same, the LASSO solution is always a feasible solution of (16). Hence, when these two solutions are not identical, the LPO estimator is sparser than the LASSO solution in terms of ℓ1 norm. The conditions for their equivalence are provided in James et al. (2009). Under mild conditions, the LASSO solution to (17) is equal to the solution to min kˆ σn′ π − σ ˆn−1 βˆn k22

subject to |π|1 ≤ c

(18)

for some non-negative c. The solution to (18) is ℓ1 norm constrained portfolios considered in Gandy and Veraart (2013) and Dubois and Veraart (2015). By the analyses in Bickel et al. (2009) and James et al. (2009), the LPO portfolio and the ℓ1 norm constrained portfolio often behave the same when n > p. However, (18) requires σ ˆn−1 as input that essentially assumes the Gram ˆ n to be invertible. In the high-dimensional case of p > n, Σ ˆ n is singular and substitution matrix Σ of the generalized inverse of σ ˆn into (18) would lead to highly biased results. Compared to the 15

LASSO formulation (ℓ1 minimization with quadratic constraints), our proposed LPO approach has ˆ n and any other improved estimator of Σ. the distinctive advantage of applicability for a singular Σ Another competitive portfolio is ℓ1 norm constrained minimum-variance portfolio proposed by Fan et al. (2012), which is the solution to the risk minimization problem ˆ nw min w Σ

subject to w ′ 1 = 1, |w|1 ≤ c.

(19)

Its implementation relies on least-angle regression (LARS)-LASSO algorithm to approximately find the solution path, and it depends on the choice of the tracking portfolio. Although such an approximation is proven useful in the static portfolio selection, it is limited to the single-period (quadratic objective) problems. By recognizing the connection between LASSO and Dantzig Selector, the proposed LPO approach aims to extend the high-dimensional portfolio concept to dynamic portfolio management. Specifically, the LPO minimum-variance portfolio can be easily implemented with the following two steps: n o ˆ n η − 1|∞ ≤ λn ; 1. find η˜ ∈ arg minη∈R p |η|1 subject to |Σ 2. normalize it: w˜ := η˜/1′ η˜.

Moreover, it can be proved that the LPO minimum-variance portfolios have the desired oracle properties. 4.2. Oracle properties We write the return vector in each observation period as √ √ rδt = µδt + σ δtξ =: µδt + δtΨ, where Ψ = (Ψ1 , . . . , Ψp )′ is a random vector with a zero mean and a covariance matrix of Σ. The LPO approach can be applied beyond the normality assumption once we relax the distributional condition of Ψ. For any p-dimensional non-random vector l with |l|2 = 1 and any t ∈ R, 1

P(l′ Ω 2 Ψ ≤ t) =: Υ(t) is a continuous distribution function symmetric about 0 and does not depend on l. Such a relaxation is also considered in Shao et al. (2011) and Cai and Liu (2011). For examples, all elliptical distributions and the multivariate scale mixture of normals satisfy this condition. 16

√ ¯ = Ψ′ Ωβ/ β ′ Ωβ be a standardized random variable with zero mean and unit variance. Let Ψ Analogous to Cai and Liu (2011), we consider two moment conditions for the return vector: (M1)

(Sub-Gaussian-type tails) Suppose that log p ≤ n and there exist some constants c1 > 0 ¯ 2 ) ≤ K1 , and E exp(c1 Ψ2i /σii ) ≤ K1 for all i. and K1 > 0 such that E exp(c1 Ψ

(M2)

(Polynomial-type tails) Suppose that for some c1 , c2 > 0, p ≤ c1 nc2 , and for some ǫ > 0, ¯ 4c2 +4+ǫ ≤ K1 , and E|Ψi /√σii |4c2 +4+ǫ ≤ K1 for all i. E|Ψ|

We now show that the LPO portfolios in (11) and (12), with Ωβ estimated by η˜ as defined in (16), tend to the corresponding oracle strategies, in which the parameters Σ and β are perfectly estimated, in probability. To this end, we introduce a condition shared by many high dimensional statistical research problems such as that in Cai and Liu (2011): (C1)

log p ≤ n, max σii ≤ M for some constant M > 0, where we read Σ = (σij )p×p , 1≤i≤p

and Θp ≥ K for some K > 0. Theorem 4.1. Let λn = C

p

Θp log p/n with C > 0 being a sufficiently large constant, an =

ˆ n − Σ|∞ and bn = |βˆn − β|∞ . Suppose that the historical daily return data satisfy (M1) [or |Σ (M2)], condition (C1) holds, and dn := (λn + bn )|η|1 + an |η|21 = o(1).

(20)

Then we have as n, p/n → ∞, P

P

1. ED (X u˜s (T ) − X us (T ))2 − → 0. → 0, (σTu˜s )2 − (σTus )2 − P

P

2. ED (X u˜m (T ) − X um (T ))2 − → 0, (σTu˜m )2 − (σTum )2 − → 0. To the best of our knowledge, we are the first to provide an estimated dynamic optimal strategy with the oracle properties under the high dimensional setting. Fan et al. (2012) prove the oracle properties of the static strategy implied by their approach in terms of the variance measure only. We, however, prove that both the terminal wealth and its variance tend to the oracle benchmarks for both static and dynamic trading strategies. 17

We further establish the consistency result for the probability of the LPO portfolio beating the bank deposit. To illustrate these ideas, we only present the single-period case. Consider √ β ′ η˜ u ˜s , Fn := P(X (T ) > x0 (1 + rf T )) = Υ T√ ′ η˜ Σ˜ η where η˜ is defined in (16), Υ is the elliptical distribution function of the single-period return vector, and the subscript “s” is omitted. The oracle benchmark is the one with the perfectly estimated Σ and β:

p F := P(X us (T ) > x0 (1 + rf T )) = Υ( T Θp ).

To explicitly derive the convergence rate for Fn , we impose an additional condition on Υ: Υ(x + δ) − 1 ≤ c1 |δ|(|x| + 1)ec2 |xδ| (D) For any x > 0 and |δ| ≤ 1, Υ(x) for some positive constants c1 , c2 , which are independent of x, δ. Theorem 4.2. If the conditions in Theorem 4.1 and condition (D) hold, then Fn − 1 = OP dn eCdn F

for some positive constant C, where dn is defined in Theorem 4.1. Regarding the convergence rate of the Gram matrix, Bickel and Levina (2004) and Cai et al. (2011) show that ˆ n − Σ|∞ ≤ C an := |Σ

p log(p/n)

for some large constant C > 0 with an overwhelmingly high probability that is greater than 1 − O(p−1 ) and 1 − O(p−1 + n−ǫ/8 ) under (M1) and (M2), respectively. Therefore, the condition (20) has an overwhelmingly high probability of holding once the portfolio has sufficient sparsity p that |η|1 = o log(p/n) , if the estimation error in the mean can be ignored, such as in the case of the minimum-variance portfolio.

We further investigate the condition (20) and attempt to show that it is not restrictive. Consider a condition that is stronger than (C1): (C2)

log p ≤ n, M −1 ≤ λmin (Σ) ≤ λmax (Σ) ≤ M for some constant M > 0, and Θp ≥ K for some K > 0,

18

where λmin(Σ) and λmax (Σ) are the smallest and largest eigenvalues of Σ, respectively. By the Cauchy-Schwarz inequality, we have |η|21 ≤ |η|0|η|22 ≤ M 2 |η|0kβk2 ≤ M 3 |η|0 Θp . Therefore, the above three theorems still hold true under (C2) and |η|0(Θp

p

log p/n + bn

p

Θp ) =

o(1), instead of (C1) and (20). Although (C2) is a widely adopted condition for high-dimensional covariance matrix estimation, the preceding inequality implies that the larger the Θp the slower the convergence rate of the LPO portfolio. It is reasonable because the oracle strategy with a large Θp , or equivalently a highly inflated market, has a probability close to 1 of beating the risk-free asset. It is difficult for any data-driven method to achieve such a high-standard performance. In (20), bn reflects the estimation error in the mean vector. If this estimation error is large, additional sparsity of the MV portfolio is required to ensure the convergence. This is a reasonable trade-off. 5. Simulation and empirical studies In this section, we verify the derived theoretical results by simulations, and conduct an empirical study to show the practical performance of the LPO portfolios. All the simulations and empirical studies are out-of-sample studies. Recall that our LPO estimator η˜ is obtained by solving the minimization problem: min |η|1

ˆ n η − βˆn |∞ ≤ λ. subject to: |Σ

This type of convex optimization problem also appears in Cai et al. (2011) and Cai and Liu (2011). The usage of ℓ1 norm allows us to apply the linear programming. We solve this problem by the parametric simplex method, which is efficient and guarantees machine precision by a primal-dual gap. The details of this approach can be found in Vanderbei (2008). We transform our problem to fit the setup in Vanderbei (2008). By setting η = η + − η − , where η ± = max(±η, 0) and one of η + or η − is 0, the above minimization problem can be rewritten as        + + ˆ ˆ ˆ Σn −Σn η η λ1 + βn   ≤  p . max (−1p − 1p )   subject to:  ˆn ˆn Σ η− −Σ η− λ1p − βˆn 19

For a given λ, the parametric simplex method efficiently computes the optimal (η + , η − ) so that we recover the optimal η˜ by η˜ = η + − η − . The possible values of λ are chosen such that there exists a feasible solution. We start with the λ equal to the maximum of the absolute values of the entries of βˆn . In this case, a feasible (optimal) solution is the zero vector. Then we decrease λ gradually until it reaches zero or a level for which the corresponding feasible set is empty. As the sample covariance matrix is possibly singular, it is likely that the feasible set is empty for a small λ. We denote the feasible set of λ as Λ. We select the optimal tuning parameter λ among Λ using cross-validation (CV), a standard machine learning procedure. Specifically, we orderly divide the data {r (1) , . . . , r (n) } into f groups,

n where f is the number of folds of cross-validation (f = ⌊ n.test ⌋ in this paper, where n.test is the

sample size of test data), because the data are ordered in time. For each iteration, the groups with sizes n/f and n − n/f act as test data and training data, respectively. For each λ, we run our program using the training data and then examine it using the test data. To match the mean-variance criterion, we use the square error with the target mean (z) to evaluate the LPO portfolio in the kth iteration, which is defined as c u = (X uˆ (T ) − z)2 , SE k k

where Xkuˆ (T ) is the terminal wealth by adopting the (single-period or multi-period) LPO portfolio uˆ, computed using the test data. Then we define the cross-validation estimate as the sum of square errors: u

CV (λ ) =

f X k=1

Then, the final choice of λ is

λuCV

c uk . SE

= minλ CV (λ). The function CV approximates E[(X uˆ (T ) −

z)2 ] = Var(X uˆ (T )) + (E[X uˆ (T )] − z)2 . If the minimum is attained at several λs, we pick the

smallest one. The method of evaluating performance (the function CV ) can vary according to the user’s objective, such as a (negative) Sharpe ratio. We use the sum of square errors because it is a common statistical measurement and we expect it will better capture the properties of MV portfolios. 20

5.1. Simulation study Theorem 3.1, Theorem 4.1 and Theorem 4.2 provide the asymptotic results when n, p/n → ∞. It is interesting to numerically verify the derived theoretical results and examine the finite-sample performance of the estimated strategies. We consider two commonly used artificial background covariance matrices that exhibit different sparsity conditions. We then compare the LPO portfolio with the traditional plug-in strategy and the oracle strategy for both single- and multi-period settings. We pretend that βˆn = β in our simulation to highlight the effect of an estimation error on the covariance matrix. This allows us to verify the claims made in Section 3. The simulation is designed as follows. We simulate six-month daily prices (n = 126) of p stocks as training data for parameter estimation. We want to invest the initial wealth (x0 = 1000) into these stocks with the investment horizon T = 1/12 and hence also simulate one-month daily prices (n.test = 21) as test data. Our target terminal wealth is set as z = 1020. The risk-free interest rate is set as rs,f = rm,f = 0.01. All p assets have the common initial price si = 40. The excess mean returns over the risk-free asset are set as β = (0.3, . . . , 0.3, 0, . . . , 0)′ , where the number of non-zeros is fixed at c0 = 10 while the p varies across simulations. Consider the covariance matrix of the following form: Σ = 0.22 Γ. Two models for Γ are studied. 1. Model 1. Γ−1 = (B + δI)/(1 + δ), where B = (bij )p×p with independent bij = bji = 0.5 × Ber(1, 0.2) for 1 ≤ i ≤ c0 , i < j ≤ p; bij = bji = 0.5 for c0 + 1 ≤ i < j ≤ p; bii = 1 for 1 ≤ i ≤ p. Here Ber(1,0.2) is a Bernoulli random variable that takes a value of 1 with probability 0.2 and 0 with a probability of 0.8; and δ = max(−λmin (B), 0) + 0.05 to ensure that Γ−1 is a positive definite matrix. Finally, the matrix is standardized to have unit diagonals. 2. Model 2. Γ = (γij )p×p with γij = 0.8|i−j| for 1 ≤ i, j ≤ p. In Model 1, only the first c0 rows and columns of Ω are sparse; the rest of the matrix is not sparse and the covariance matrix Σ itself is not sparse. In Model 2, Σ is approximately sparse and Ω is a 3-sparse matrix. For each model and each p, we run 100 simulations to obtain the out-of-sample performance statistics. To verify our theory, we use the empirical frequency as an estimate of the probability. 21

We are interested in the following events: 1. each strategy beats the risk-free asset under singleand multi-period settings; 2. the multi-period strategy beats the single-period strategy for each implementation method; and 3. the LPO strategy outperforms the plug-in strategy. The multiperiod strategy uses three periods that equally partition the whole investment horizon. We also report the portfolio performance in terms of the Sharpe ratio. The numerical results are stated in Table 1. Table 1: Performance of MV portfolios with different estimators in terms of empirical frequency (%).

Model 1 Beating the bank deposit Single-period strategy

Multi-period strategy

Multi-period out-

LPO out-

performs single-period

performs Plug-in

p

Oracle

LPO

100

95

89

83

99

99

66

93

80

21

62

94

200

91

77

70

97

84

84

94

61

37

57

72

400

88

78

67

96

82

72

82

66

64

61

67

800

83

75

68

91

82

64

73

65

57

64

66

1200

81

68

60

93

78

63

82

70

57

66

79

Plug-in Oracle

LPO

Plug-in Oracle

LPO

Plug-in Single

Multi

Model 2 Beating the bank deposit Single-period strategy


Multi-period out-

LPO out-

performs single-period

performs Plug-in

p

Oracle

LPO

100

77

69

59

86

83

74

75

58

44

54

76

200

76

70

70

83

79

78

61

61

41

54

70

400

83

70

63

88

74

71

74

63

62

61

65

800

83

81

65

89

89

70

86

67

71

62

68

1200

80

73

61

83

79

65

70

68

44

57

63

Plug-in Oracle

LPO

22

Plug-in Oracle

LPO

Plug-in Single

Multi

Table 1 shows that, in the simulations implement the plug-in approach, the empirical frequency for beating the bank deposit decreases in both the single-period and multi-period cases as p increases for a large p. Although the frequency does not reach 50%, as predicted by Theorem 3.1, the trend approaches that number. We stress that the simulation is also subject to simulation p error and the convergence rate for the results in Theorem 3.1 are of the order p/n, as shown in

the proof. We attempt to further increase the p to realize the full converging trend but the computational error for the generalized inverse is so large that an error message usually occurs when p > 1200. In fact, 4 out of 100 simulation runs produce error messages for p = 1200. We re-

generate the four simulations for the plug-in approach to produce numbers. When p goes beyond 1200, error messages become common. We think that the numerical results are distorted and do not report them here. In contrast, the LPO estimate always exists and converges rapidly. Table 1 shows that the performance of LPO strategy approaches the oracle strategy for a large p. The convergence trend may have small fluctuation because the convergence is in the probability sense. Although the LPO strategy is robust to the increase in p/n, it is still slightly affected by the estimation error in the sample estimates, which are inputs of the constrained ℓ1 minimization. However, the LPO approach is consistently better than the plug-in approach. Table 1 also demonstrates the application of the LPO approach to dynamic strategies. It is clear that the multi-period strategy outperforms the single-period strategy under the LPO approach, whereas the plug-in approach makes the supposedly better multi-period strategy worse than the single-period strategy in Model 2. To further examine the out-of-sample performance, we also compute the Sharpe ratios for all implementation methods using the 100 simulated terminal wealths. The numerical results are summarized in Table 2. It is clear that the LPO strategy consistently outperforms the plug-in strategy in terms of the Sharpe ratio. Table 3 shows the averaged squared errors of the terminal wealths of the estimated and oracle strategies based on the 100 simulation paths and scaled by the squared initial wealth x0 (= 106 ): P100 1 u ˆ u 2 u ˆ u k=1 (Xk (T )−Xk (T )) , where Xk (T ) and Xk (T ) are the terminal wealths of the estimated 100x2 0

and oracle strategies in the kth simulation, respectively. The LPO strategy is close to the oracle

strategy in our simulation. This result numerically verifies the oracle properties shown in Section 23

Table 2: Sharpe ratios of the simulation.

Model 1 Single-period strategy p

Oracle

LPO

100

1.606

1.329

0.940

200

1.268

0.771

400

1.094

800 1200


Plug-in Oracle

LPO

Plug-in

2.799

2.347

0.288

0.548

2.587

0.940

0.426

0.825

0.527

1.816

1.033

0.596

0.937

0.666

0.384

1.434

0.797

0.393

0.885

0.524

0.250

1.606

0.752

0.277

Model 2 Single-period strategy p

Oracle

LPO

100

2.189

1.650

0.789

200

1.723

1.166

400

1.782

800 1200


Plug-in Oracle

LPO

Plug-in

3.258

2.139

0.287

0.894

2.532

1.468

0.368

1.240

0.932

3.507

1.837

1.167

2.018

1.107

0.732

3.044

1.501

0.754

1.807

1.293

0.617

4.317

1.547

0.626

4.2. Although our simulation verifies the developed theory, we must test it with real data. An empirical analysis offers important insights into the advantages and limitations of the LPO approach. 5.2. Empirical study The empirical study uses historical data from the S&P 500 and Russell 2000 components. We examine the out-of-sample performance of the minimum-variance portfolios with different implementation methods, including the plug-in, LPO, shrinkage covariance estimate, and adaptive LPO methods and compare them to the equally weighted (EW) and the LPO-EW portfolios. The adaptive LPO approach re-estimates the covariance matrix on the LPO-selected stocks, and LPO24

Table 3: Averaged squared errors of the terminal wealths of the estimated and oracle strategies (%).

Model 1

Model 2

Single-period strategy Multi-period strategy Single-period strategy Multi-period strategy p

LPO

Plug-in

LPO

Plug-in

LPO

Plug-in

LPO

Plug-in

100

0.0163

0.0530

0.0048

1.8299

0.0212

0.2049

0.0448

9.3845

200

0.0478

0.1378

0.0326

0.4048

0.0175

0.1286

0.0133

0.6309

400

0.0310

0.1228

0.0321

0.1257

0.0185

0.0723

0.0243

0.1000

800

0.0529

0.2125

0.0532

0.2182

0.0239

0.1450

0.0193

0.1107

1200 0.0855

0.3630

0.0753

0.3759

0.0293

0.1400

0.0364

0.1628

EW is the equally weighted portfolio with stocks selected by the LPO approach. We follow the methodology given by Ledoit and Wolf (2004) to shrink the covariance matrix. Our empirical study excludes the constrained portfolio formulation of Fan et al. (2012) because Dubois and Veraart (2015) have empirically examined it with S&P 500 component stocks and document that it fails to beat the EW portfolio. Thus, we do not repeat the same exercise here. We follow the methodology of DeMiguel et al. (2009a) to evaluate the out-of-sample performance of the sample-based strategies. We only consider the allocation on the risky assets as this is the empirical setup in DeMiguel et al. (2009a), especially for the equally weighted (EW or 1/p) strategy. This enables us to address the open problem posted by them and to compare our results with theirs. Let w ∈ Rp be the relative weights of the portfolio with p risky assets, and w ′ 1p = 1. The empirical analysis conducted in DeMiguel et al. (2009a) is actually for single-period strategies. We therefore focus on the single-period strategies as well. In accordance with the observations in DeMiguel et al. (2009a), we examine the minimum-variance portfolio rather than the meanvariance portfolio to obtain a better out-of-sample performance and to eliminate the objectivity in choosing the target mean level. Estimating the mean returns is much more demanding than 25

estimating the covariance matrix. We avoid running computationally intensive Bayesian estimations for each of the p assets. To preserve the stationarity of the asset return series, we use a short time horizon to estimate the parameters, as suggested by Broadie (1993). The issue of estimating the mean has been reported in Michaud (1989) and Chopra and Ziemba (1993). It motivates the robust optimization approach with respect to the drift terms. The minimum-variance strategy is considered in Jagannathan and Ma (2003), Garlappi et al. (2007). The EW strategy invests equal amounts of money into p risky assets, i.e., wiew = 1/p or w ew ∝ 1p . This strategy is free of estimation error and outperforms all existing sample-based strategies in terms of the various out-of-sample performance measures used in DeMiguel et al. (2009a). The optimal minimum-variance strategy invests in assets that have relative weights of w min ∝ Σ−1 1p . The corresponding implementation of the LPO approach in (16) only requires the

replacement of βˆn with 1p .

DeMiguel et al. (2009a) focus on the datasets of portfolios in which assets are selected in advance. They also pose an open question for selecting assets for the EW portfolio. The LPO approach has the ability to select assets, as only favorable assets are assigned nonzero weights. Hence, we introduce two adaptive strategies based on assets selected by the LPO approach. The first is the LPO-EW portfolio which applies the EW strategy to those selected assets. The second is the adaptive LPO portfolio which re-estimates the covariance matrix of the selected stocks for the minimum-variance strategy. The motivation for the adaptive LPO approach is that the number of selected stocks is typically less than the number of observations (n). In such a situation, the Gram matrix is a consistent estimator for many distributions. 5.2.1. Data description and empirical design. The index components lists are current as of January 7, 2015. The daily financial data are downloaded from Yahoo!Finance for the January 1, 2013 to December 31, 2014 (504 daily data). For simplicity, we eliminate the assets with incomplete data. For the Russell 2000, we also discard stocks with abnormal volatility (i.e., the standard deviation of the asset returns is greater than 0.4). This threshold for the volatility is chosen as the maximum value of realized volatilities of the S&P 500 components. The two datasets ultimately contain 486 (S&P 500) and 1420 (Russell 2000) 26

assets. We use the daily data as they are consistent with our model setup and improve the estimation of the covariance matrix compared to a monthly or yearly data. In practice, market practitioners analyze daily, or even higher-frequency, data. However, the extension of our approach to highfrequency data is left for future research. Our analysis relies on rolling sampling windows. Specifically, we choose an estimation window length n.train = 252 days. At each time ti , i = 1, . . . , n.test = 252, we use the data of the previous n.train days (including the test data for the days before ti ) to estimate the parameters needed to implement a particular strategy. Then we compute the return at time ti using the estimated strategy implemented in [ti−1 , ti ] and the data at time ti . This process is continued until all n.test out-of-sample returns are computed. The out-of-sample performance is reflected by several performance measures adopted in DeMiguel et al. (2009a). The Sharpe ratio is arguably the most common statistic for computing the ratio between the sample mean µ ˆ and the standard deviation σ ˆ of the out-sample returns. Specifically, c =µ SR ˆ/ˆ σ.

The second adopted performance measure is the certainty-equivalent (CEQ) return, defined as γ [ =µ CEQ ˆ − σˆ 2 , 2 where γ is the risk aversion, which is fixed as 1 in our study. The CEQ return is the equivalent of a “risk-free” rate for a particular trading strategy. The last measure is the portfolio turnover (TO): p n.test XX 1 d T O= |w ˆj,ti+1 − wˆj,t+i | , n.test i=1 j=1

where wˆj,ti+1 is the portfolio weight in asset j at time ti+1 , and wˆj,t+i is the portfolio weight before rebalancing at time ti+1 . The portfolio turnover measures the average percentage of wealth traded for a particular strategy. Therefore, the portfolio turnover is mainly used to determine the possible d effect associated with transaction costs. The smaller the T O the lower the costs. 27

5.2.2. Results and interpretations. Table 4 gives the empirical results for the S&P 500 components data, where EW is the equally weighted strategy and LPO, plug-in and shrinkage are the minimum-variance strategies that use the LPO estimates, plug-in sample estimates and shrinkage estimate, respectively. It is clear that the LPO strategy outperforms the EW and plug-in strategies in terms of SR and CEQ. The TO rate of LPO is, however, larger than the EW. We address this point later. The plug-in strategy is so poor that its TO statistics are the highest among all available strategies. DeMiguel et al. (2009a) conjecture that the larger the portfolio size, the better the performance of the EW strategy. However, the plug-in approach outperforms the EW in terms of SR but underperforms in terms of CEQ. More importantly, the LPO-EW consisting of fewer stocks significantly outperforms the EW. In other words, appropriately reducing the portfolio size can improve the EW performance. The key is to select the right stocks. In DeMiguel et al. (2009a), most of the selected assets are financial indices that have increased over the past decades. The LPO approach is a potential scheme for selecting stocks and determining the portfolio size based on the minimum-variance criterion for improving the EW strategy. We speculate that the minimum-variance criterion, together with the sparsity in LPO will help practitioners to select stocks with the maximal hedging effect. The EW strategy can be a robust strategy. This idea is inspired by DeMiguel et al. (2009a), who explore a linear combination of the minimum-variance portfolio and the EW portfolio. Our combination is, however, a highly nonlinear one. The shrinkage approach also performs quite well. It slightly outperforms the LPO, plug-in and EW in terms of SR, but is significantly worse than LPO in terms of CEQ. However, the shrinkage approach still suggests investing in all available assets, and lacks the ability to filter out bad stocks. The LPO selection procedure stimulates improved trading strategies. For example, both LPO-EW and adaptive LPO significantly outperform the shrinkage approach in terms of both SR and CEQ. The adaptive LPO, however, suffers from a relatively high TO. Drawing conclusion from a single dataset is not convincing. We thus extend our analysis to a different, larger dataset. Table 5 displays the empirical result for the Russell 2000 components. When we implement the plug-in approach, the generalized inverses are often computationally in28

Table 4: Empirical results for S&P 500 components (p = 486).

Strategy c (e-02) SR

EW

LPO

Plug-in Shrinkage LPO-EW

Adaptive LPO

7.813 9.605

8.540

10.301

10.939

10.386

[ (e-04) CEQ

5.555 8.651

4.991

5.125

9.314

7.231

d T O

0.004 0.135

0.735

0.223

0.167

0.327

feasible for a large p, such as the value in this dataset. We therefore discard the plug-in result to avoid misleading results. If we consider the investment of all assets in an economy without any prior knowledge of favorable assets, the traditional plug-in approach usually halts and malfunctions during implementation. The LPO approach still works efficiently. Table 5: Empirical study for Russell 2000 components.

Russell 2000 (p = 1420) Strategy

Shrinkage

LPO-EW

Adaptive LPO

2.695 5.074

12.194

9.028

16.781

[ (e-04) CEQ

2.263 7.391

4.912

9.788

14.593

d T O

0.002 0.079

0.040

0.009

0.003

c (e-02) SR

EW

LPO

The implication of the analysis of the Russell 2000 components data are similar to those discussed in the S&P 500 components data. The EW strategy further deteriorates to produce low SR and CEQ values for a large p. The LPO approach significantly outperforms the EW. Again, the shrinkage approach is better than the LPO in terms of SR but worse in terms of CEQ. The LPO-EW significantly improves over LPO, and the adaptive LPO is the best of the strategies for this dataset. The TOs of LPO-EW and adaptive LPO are particularly small and very close to that of the EW approach. Given the market constraints, the implementation of the EW strategy could be very expensive for a high-dimensional portfolio. Financial securities are purchased in broad lot size, so that there 29

is a minimal charge for each stock in the market. The minimal charge for an extensive stock could be high. The EW strategy requires investing an equal amounts of money in all available stocks. With a large p, the minimum entry ticket for the EW strategy could be unaffordable, even for financial institutions. As the LPO-EW only involves a few stocks, the implementation is less costly and probably more profitable. 5.2.3. Limitations of LPO and future research. The LPO approach is not without limitations. All LPO-based approaches, including the LPO, LPO-EW, and adaptive LPO, suffer from a high turnover rate relative to the EW approach. Although the LPO approach significantly improves the profit-based performance measures of highdimensional portfolios, the estimation stability is still a concern. The optimized portfolio weights change quite frequently and eventually increase the transaction costs in practice. In other words, the unstable estimates of Σ continues to have some effects on the estimated cardinality of η, and therefore, the LPO-EW still generates a high turnover rate. Therefore, the set of favorable stocks changes with the sampling window. Statistically, this problem is known as the false positive identification problem. Future research may consider additional statistical treatments to lower the false positive rate. We stress that the aim of the LPO strategy is to construct an estimation for the optimal strategy or for effective parameters in the optimal strategy. We use the constrained ℓ1 minimization to realize that concept. However, it could be interesting to explore other possibilities. The highdimensional Bayesian approach is probably an alternative. Although we demonstrate the use of LPO in single- and multi-period MV portfolios, the concept should be generally applicable to continuous-time models and other objective functions, including robust optimization. We have put this into our research agenda. 6. Conclusion For those who wish to invest in a few favorable assets rather than in all available assets in the market, a sparse optimal portfolio is an attractive approach. This paper proposes the LPO approach using the constrained ℓ1 minimization with this practical purpose in mind in this paper. 30

Investing in an optimized portfolio is not necessarily optimal, due to the difficulty in controlling estimation errors. We mathematically prove that the estimation error on the covariance matrix alone makes an optimized high-dimensional portfolio tend to produce performance equivalent to that based on a random stock picking strategy. The portfolio optimized with the LPO estimator, however, asymptotically converges to the true optimal portfolio in probability. The LPO approach not only improves the performance of the static MV portfolios but is also useful for the dynamic MV portfolios. The derived theory is verified with several simulation studies. Empirical analysis contrasting the LPO approach with other competitive implementations further confirms the advantages of the LPO approach and its generalizations to LPO-EW and adaptive LPO portfolios. References Basak, S., G. Chabakauri. 2010. Dynamic Mean-Variance Asset Allocation. Review of Financial Studies. 23 29703016. Bickel, P.J., E. Levina. 2004. Some Theory for Fisher’s Linear Discriminant Function, ‘Naive Bayes’, and Some Alternatives When There Are Many More Variables Than Observations. Bernoulli. 10 989-1010. Bickel, P.J., E. Levina. 2008a. Regularized Estimation of Large Covariance Matrices. The Annals of Statistics. 36 199-227. Bickel, P.J., E. Levina. 2008b. Covariance Regularization by Thresholding. The Annals of Statistics. 36 2577-2604. Bickel, P.J., Y. Ritov, A.B. Tsybakov. 2009. Simultaneous Analysis of LASSO and Dantzig Selector. The Annals of Statistics. 37 1705-1732. Bielecki, T., H. Jin, S. Pliska, X. Zhou. 2005. Continuous-time Mean-variance Portfolio Selection with Bankruptcy Prohibition. Mathematical Finance 15 213-244. Boyle, P., L. Garlappi, R. Uppal, T. Wang. 2012. Keynes Meets Markowitz: The Trade-off Between Familiarity and Diversification. Management Science. 58 253-272. Broadie, M. 1993. Computing Efficient Frontiers using Estimated Parameters. Annals of Operations Research. 45 21-58. Cai, T.T., C.H. Zhang, H.H. Zhou. 2010. Optimal Rates of Convergence for Covariance Matrix Estimation. The Annals of Statistics. 38 2118-2144. Cai, T.T., W.D. Liu, X. Luo. 2011. A Constrained l1 Minimization Approach to Sparse Precision Matrix Estimation. Journal of the American Statistical Association. 106 594-607. Cai, T.T., W.D. Liu. 2011. A Direct Estimation Approach to Sparse Linear Discriminant Analysis. Journal of the American Statistical Association. 106 1566-1577.

31

Candès, E., T. Tao. 2007. The Dantzig Selector: Statistical Estimation When p Is Much Larger Than n. The Annals of Statistics. 35 2313-2351. Chopra, V.K., W.T. Ziemba. 1993. The Effect of Errors in Means, Variances, and Covariances on Optimal Portfolio Choice. Journal of Portfolio Management. 19 6-11. d’Aspremont, A., O. Banerjee, L. El Ghaoui. 2008. First-Order Methods for Sparse Covariance Selection. SIAM Journal on Matrix Analysis and Its Applications. 30 56-66. DeMiguel, V., L. Garlappi, R. Uppal. 2009a. Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?. The Review of Financial Studies. 22 1915-1953. DeMiguel, V., L. Garlappi, F.J. Nogales, R. Uppal. 2009b. A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms. Management Science. 55 798-812. Dubois, M.S., L.A.M. Veraart. 2015. Optimal Diversification in The Presence of Parameter Uncertainty for A Risk Averse Investor. SIAM Journal of Financial Mathematics. 6 201-241. El Karoui, N. 2008. Operator Norm Consistent Estimation of Large-Dimensional Sparse Covariance Matrices. The Annals of Statistics. 36 2717-2756. Fan, J., J. Zhang, K. Yu. 2012. Vast Portfolio Selection with Gross-Exposure Constraints. Journal of the American Statistical Association. 107 592-606. Fouque, J.-P., C.S. Pun, H.Y. Wong. 2014. Portfolio Optimization with Ambiguous Correlation and Stochastic Volatilities. Working paper of UC Santa Barbara and The Chinese University of Hong Kong Friedman, J., T. Hastie, R. Tibshirani. 2008. Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics. 9 432-441. Gandy, A., L.A.M. Veraart. 2013. The Effect of Estimation in High-Dimensional Portfolios. Mathematical Finance. 23 531-559. Garlappi, L., R. Uppal, T. Wang. 2007. Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach. The Review of Financial Studies. 20 41-81. Jagannathan, R., T. Ma. 2003. Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps. The Journal of Finance. 58 1651-1684. James, G.M., P. Radchenko, J. Lv. 2009. DASSO: Connections between the Dantzig Selector and LASSO. Journal of the Royal Statistical Society, Series B. 71 127-142. Lee, S., F. Zou, F.A. Wright. 2014. Convergence of Sample Eigenvalues, Eigenvectors, and Principal Component Scores for Ultra-High Dimensional Data. Biometrika. 101 484-490. Ledoit, O., M. Wolf. 2004. Honey, I Shrunk the Sample Covariance Matrix. Journal of Portfolio Management. 30, 110-119. Li, D., W.L. Ng. 2000. Optimal Dynamic Portfolio Selection: Multiperiod Mean-Variance Formulation. Mathematical Finance. 10 387-406.

32

Lunberger, D.G. 1998. Investment Science. Wiley, New York. Markowitz, H.M. 1989. Mean-Variance Analysis in Portfolio Choice and Capital Markets. Cambridge, MA: Basil Blackwell. Merton R.C. 1990. Continuous Time Finance, Blackwell. Michaud, R.O. 1989. The Markowitz Optimization Enigma: Is ’Optimized’ Optimal?. Financial Analysts Journal. 45 31-42. Rothman, A., P. Bickel, E. Levina, J. Zhu. 2008. Sparse Permutation Invariant Covariance Estimation. Electronic Journal of Statistics. 2 494-515. Shao, J., Y.Z. Wang, X.W. Deng, S.J. Wang. 2011. Sparse Linear Discriminant Analysis by Thresholding for High Dimensional Data. The Annals of Statistics. 39 1241-1265. Vanderbei, R. 2008. Linear Programming, Fundations and Extensions. Springer. Zhou, X.Y., D. Li. 2000. Continuous-Time Mean-Variance Portfolio Selection: A Stochastic LQ Framework. Applied Mathematics & Optimization. 42 19-33.

Appendix A. Proofs Proof of Proposition 1. Both (6) and (7) are classic equations in the literature. The proof of (6) is contained in Markowitz (1989). We restate (7) from Li and Ng (2000) into an alternative, but equivalent form, to facilitate the use of the notation in this paper. Let Pj = R(tj , tj+1) − rm,f ∆j 1p

and read (4) as X um (tj )(1 + rm,f ∆j ) + um (tj )′ Pj . By noting that

′ , (E[Pj Pj′ ])−1 E[Pj ] = E[Pj ] = ∆j βm , E[Pj Pj′ ] = ∆j Σm + ∆2j βm βm

Σ−1 m βm . ′ Σ−1 β 1 + ∆j βm m m

Given the results stated in Li and Ng (2000), our result follows by a simple calculation. Proof of Proposition 2. The single-period result is trivial. For the multi-period case, we use the recursive relation (4): X uˆm (tj+1 ) = X uˆm (tj )(1 + rm,f ∆j ) + uˆm (tj )′ Pj # " QN −1 −1 ′ γ ˆ ηˆm Pj m k=j+1 (1 + rm,f ∆k ) u ˆm ′ X (tj ) + ηˆm Pj = (1 + rm,f ∆j ) 1 − 1 + ∆j βˆ′ ηˆm 1 + ∆j βˆ′ ηˆm m

=: aj X Then X uˆm (T ) =

u ˆm

QN −1 j=0

m

(tj ) + bj .

aj x0 +bN −1 +

PN −2 j=0

(bj

QN −1

k=j+1 ak ).

Some simple but tedious calculations

bring us the desirable expression of X uˆm (T ). We then compute the variance through its first and 33

second moments: E[X uˆm (T )] = γˆm +

N −1 Y

x0

k=0

E[(X uˆm (T ))2 ] = γˆm 2 + 2γˆm

(1 + rm,f ∆k ) − γˆm

x0

N −1 Y k=0

+

x0

Then the results follow.

j=0

(1 + rm,f ∆k ) − γˆm

′ η 1 + ∆j βˆm ˆm

,

! N −1 Y 1 − ∆j (βm − βˆm )′ ηˆm j=0

′ η 1 + ∆j βˆm ˆm

!2 N −1 Y (1 − ∆j (βm − βˆm )′ ηˆm )2 + ∆j ηˆ′ Σˆ m ηm (1 + rm,f ∆k ) − γˆm . (1 + ∆j βˆ′ ηˆm )2

N −1 Y k=0

! N −1 Y 1 − ∆j (βm − βˆm )′ ηˆm

m

j=0

Proof of Lemma 3.1. This Markov-type inequality is proved by noting that # "N −1 N −1 Y Y p ′ ′ (1 + ∆j ηˆm,n Σˆ ηm,n ), ED [1 − ηˆm,n σ ∆j ξm,j ]2 = j=0

"

′ ED 1 − ηˆm,n σ

ED

"N −1 Y j=0

′ [1 − ηˆm,n σ

p

N −1 X j=0

p ∆j ξm,j

p ′ ∆j ξm,j ](−ˆ ηm,n σ ∆l ξm,l

#

#

j=0

′ = 1 + T ηˆm,n Σˆ ηm,n ,

′ = ∆l ηˆm,n Σˆ ηm,n ,

for l = 0, 1, . . . , N − 1, and thus  2  −1 N −1 NY Xp p ′ ′ [1 − ηˆm,n σ ∆j ξm,j ] − (1 − ηˆm,n ED  σ ∆j ξm,j )  j=0

=

j=0

N −1 Y j=0

′ ′ ′ (1 + ∆j ηˆm,n Σˆ ηm,n ) − (1 + T ηˆm,n Σˆ ηm,n ) = O([ˆ ηm,n Σˆ ηm,n ]2 ).

Proof of Lemma 3.2. The third assertion is a direct consequence of the first and the second assertions according to Slutsky’s theorem. Hence, we only prove the first and second assertions. ˆ n , denoted as ξˆ1 , . . . , ξˆn , Consider the identity covariance matrix Σ = I. The eigenvectors of Σ ˆ1 ≥ . . . ≥ λ ˆ n of Σ ˆ n , span a subspace of Rp . Moreover, which correspond to non-zero eigenvalues λ ˆ1, . . . , λ ˆ n ) and (ξˆ1 , . . . , ξˆn ) are independent, while ξˆj are identically distributed uniformly on the (λ

ˆ n can then be expressed as unit p-sphere. The Moore-Penrose inverse of Σ n X 1 ˆ ˆ′ ˆ− = Σ ξξ. n î i i λ i=1 34

ˆ i = OP 1. As proven in the Theorem 1(ii) in Lee et al. (2014), λ recognize that ηˆn′ ηˆn β ′ ηˆn

=

Pn

p n

for all i as

p n

→ ∞. We

ˆ1 λ n ≤ . = OP 2 ˆ p λn

1 ′ˆ 2 ˆ 2 (β ξi ) i=1 λ i Pn 1 ′ ˆ 2 ˆ i (β ξi ) i=1 λ

Hence, ηˆn′ ηˆn /β ′ ηˆn converges to 0 in probability as p/n → ∞. 2. According to the Cauchy-Schwarz inequality, " #2 P n [ ni=1 λˆ1 (β ′ ξî )2 ]2 X β ′ ηˆn p ≤ (β ′ ξî )2 . = Pn 1i ′ 2 ′ ˆ ηˆn ηˆn i=1 ˆ 2 (β ξi ) i=1 λi

The expectation of the right-hand side in the above inequality is equal to (n/p)kβk2 . Hence, p β ′ ηˆn / ηˆn′ ηˆn converges to 0 in L2 once kβk2 = o np .

1 ˆ ∗ = Σ− 12 Σ ˆ n Σ− 12 and ηˆ∗ = Yet, consider the general positive definite Σ. We take β ∗ = Σ− 2 β, Σ n n

ˆ ∗− β ∗ = Σ 12 ηˆn , where Σ ˆ ∗ is a Gram matrix of pseudo-samples {r ∗(l) }n and r ∗(l) = (δtΣ)− 21 r (l) Σ n n l=1 has the identity covariance matrix. Notice that

′ Σˆ ηˆn ηn β ′ ηˆn

=

′

∗ η ∗ ηˆn ˆn ∗ β ∗′ ηˆn

′

and √β ′ηˆn

ηˆn Σˆ ηn

∗′ η ∗ ˆn ′ ∗ ∗ ηˆn ηˆn

= √β

. According

to the results of the case of the identity covariance matrix, if the condition for kβ ∗ k2 = Θp is p satisfied, then ηˆn′ Σˆ ηn /β ′ ηˆn and β ′ηˆn / ηˆn′ Σˆ ηn converges to 0 in probability as p/n → ∞. Proof of Theorem 3.1. Using (11) with βˆs = β, it is easy to have PD (X uˆs (T ) > x0 (1 + rs,f T )) = Φ

√

′

β ηˆn Tp ηˆn′ Σˆ ηn

!

P 1 − → , 2

where the Lemma 3.2 and the continuous mapping theorem are used. For the multi-period case, we have PD

X uˆm (T ) > x0

N −1 Y

(1 + rm,f ∆k )

k=0

= P 1−

N −1 Y

(1 + ∆j β ′ ηˆn )−1

j=0

≤ P(1 − ηˆn′ σ = Φ

N −1 X j=0

N −1 Y j=0

[1 −

!

p

∆j ηˆn′ σξm,j ] > 0

!

p ∆j ξm,j − ǫn,p < 1 + β ′ηˆn T + O((β ′ ηˆn )2 )) + O

β ′ ηˆn T + O((β ′ ηˆn )2 ) + ǫn,p p ηˆn′ Σˆ ηn T

!

+O 35

ηˆn′ Σˆ ηn ǫn,p

2 !

.

ηˆn′ Σˆ ηn ǫn,p

2 !

3

By taking ǫn,p = (ˆ ηn′ Σˆ ηn ) 4 and using the Lemma 3.2, the upper bound of P(X uˆm (T ) > x0 (1 + rf T )) converges to 1/2 in probability. To prove Theorems 4.1 and 4.2 in Section 4.2, we first need the Lemma 2 in Cai and Liu (2011) which shows that the background η belongs to the feasible set of (16) with an overwhelming high probability. That lemma is restated using our notation as follows. Lemma Appendix A.1. Under (M1) and (C1), we have a probability greater than 1 − O(p−1 ) that ˆ n η − βˆn |∞ ≤ λn . |Σ

(A.1)

Under (M2) and (C1), (A.1) holds with a probability greater than 1 − O(p−1 + n−ǫ/8 ). Remark: By the definition of η˜, we have a high probability that |˜ η |1 ≤ |η|1 . Proof of Lemma Appendix A.1. In Lemma 2 in Cai and Liu (2011), we let X = βδt + p ˆ n η − βˆn |∞ ≤ C Θp log p/n and thus the result follows. Y = 0. Then we have δt|Σ

√

δtΨ and

Proof of Theorem 4.1. By the definition of η˜, we have

ˆ n η˜ − (η)′ β| ≤ (λn + |βˆn − β|∞ )|η|1 ≤ (λn + bn ) |η|1. |(η)′Σ Using (A.1) (with a different moment condition), we also have ˆ n η˜ − β ′ η˜| ≤ (λn + |βˆn − β|∞ )|˜ |(η)′ Σ η |1 ≤ (λn + bn ) |η|1. Combining the above two inequalities yields |β ′η˜ − Θp | ≤ 2 (λn + bn ) |η|1 ≤ C (λn + bn ) |η|1, and |βˆn′ η˜ − Θp | ≤ (2λn + 3bn ) |η|1 ≤ C (λn + bn ) |η|1 .

(A.2) (A.3)

Next, we notice that ˆ n |∞ |˜ |Σ˜ η − βˆn |∞ ≤ |Σ − Σ η|1 + λn ≤ an |η|1 + λn , and thus |˜ η ′Σ˜ η − βˆn′ η˜| ≤ an |η|21 + λn |η|1 ≤ dn ,

(A.4)

|˜ η ′ Σ˜ η − β ′ η˜| ≤ an |η|21 + (λn + bn )|η|1 = dn . 36

(A.5)

Now we investigate the oracle properties of the LPO strategies under the single-period and multi-period cases. The subscripts “s” and “m” are omitted as they are distinct in each discussion. 1. For the single-period strategy given by (11), we have " 1 β ′ η˜ −1+ √ X u˜s (T ) − X us (T ) = (z − x0 (1 + rf T )) ′ T βˆn η˜

η˜′ σξ η ′ σξ − Θp βˆn′ η˜

!#

.

Notice that (A.3) implies, K ≤ Θp ≤ βˆn′ η˜ + |βˆn′ η˜ − Θp | ≤ βˆn′ η˜ + C(λn + bn )|η|1

⇒

βˆn′ η˜ ≥ K − oP (1). (A.6)

Taking conditional expectation with respect to the historical data, we have  ! !2 ′ ′ 1  η˜ Σ˜ η 1 2 β η˜ −1 + + − ED (X u˜s (T ) − X us (T ))2 = (z − x0 (1 + rf T ))2  ′ ′ ′ 2 ˆ ˆ ˆ T (βn η˜) βn η˜ βn η˜ Θp  !2 ! ′ ′ ′ ′ ˆ ˆ β η˜ 1 η˜ Σ˜ η − βn η˜ βn η˜ − Θp  = (z − x0 (1 + rf T ))2  −1 + + T βˆ′ η˜ βˆ′ η˜Θp (βˆ′ η˜)2 n

n

n

h i (β ′ η˜ − βˆn′ η˜)2 + |˜ η ′Σ˜ η − βˆn′ η˜| + |βˆn′ η˜ − Θp |

C (K − oP (1))2 C ≤ dn → 0 in probability, (K − oP (1))2 ≤

where (A.3) and (A.4) are used. Similarly, using (13), we have 2 ′ η ˜ Σ˜ η 1 (z − x (1 + r T )) 0 f us 2 u ˜s 2 − |(σT ) − (σT ) | = (βˆn′ η˜)2 Θp T C ′ ′ ′ ˆ ˆ ≤ |˜ η Σ˜ η − βn η˜| + |βn η˜ − Θp | (K − oP (1))2 C dn → 0 in probability. ≤ (K − oP (1))2 2. For the multi-period strategy given in (12), we have X u˜m (T ) − X um (T ) =

z − x0

N −1 Y j=0

!−1 ! N −1 Y (1 + rf ∆j )  1 − (1 + ∆j βˆn′ η˜)−1 − j=0

1−

# p p N −1 N −1 Y Y 1 − ∆j (β − βˆn )′ η˜ − ∆j η˜′ σξj 1 − ∆j η ′ σξj − + 1 + ∆Θp 1 + ∆βˆ′ η˜ j=0

n

j=0

37

N −1 Y

(1 + ∆j Θp )−1

j=0

!−1

Taking conditional expectation with respect to the historical data, we have p E (X u˜m (T ) − X um (T ))2 D !−1 !−1 N −1 N −1  Y Y (1 + ∆j βˆn′ η˜)−1 − 1− (1 + ∆j Θp )−1 ≤ C 1 −  j=0 j=0   !2  12  p p N −1 N −1  ′ ′ ′ Y 1 − ∆j (β − βˆn ) η˜ − ∆j η˜ σξj Y 1 − ∆j η σξj   + ED +  1 + ∆Θp 1 + ∆βˆn′ η˜  j=0 j=0 1

=: C(|I1 | + (I2 ) 2 ).

Using (A.6), we have |I1 | ≤

1−

!−2 N −1 N −1 Y Y ′ −1 −1 −1 ˆ (1 + ∆j βn η˜) − (1 + ∆j Θp ) (1 + ∆j (K − oP (1)))

N −1 Y j=0

≤ COP (1) ≤ COP (1)

j=0

j=0

′ −1 −1 ˆ (1 + ∆ β η ˜ ) − (1 + ∆ Θ ) j p j n

N −1 X j=0

N −1 X j=0

(1 + ∆j (K − oP (1)))−2 ∆j |βˆn′ η˜ − Θp |

≤ COP (1)(λn + bn )|η|1 → 0 in probability, where we have used a simple lemma that if complex numbers z1 , . . . , zn and w1 , . . . , wn are P Q Q of modulus ≤ θ, then | ni=1 zi − ni=1 wi | ≤ θn−1 ni=1 |zi − wi |, in which θ = 1 for our application. However, a simple calculation yields I2 =

N −1 Y j=0

N −1 (1 − ∆j (β − βˆn )′ η˜)2 + ∆j η˜′ Σ˜ η Y 1 . − 1 + ∆ Θ j p (1 + ∆j βˆn′ η˜)2 j=0

Notice that (1 − ∆ (β − βˆ )′ η˜)2 + ∆ η˜′ Σ˜ 1 η 1 + ∆j βˆn η˜ + oP (1) j n j + oP (1) ≤ 1. ≤ ≤ ′ 2 ′ 2 ˆ ˆ (1 + ∆j βn η˜) (1 + ∆j βn η˜) 1 + ∆j βˆn′ η˜

Hence, we have N −1 ′ 2 ′ X ˆ (1 − ∆ (β − β ) η ˜ ) + ∆ η ˜ Σ˜ η 1 j n j I2 ≤ − ′ 2 ˆ 1 + ∆ Θ j p (1 + ∆ β η ˜ ) j n j=0 # " N −1 X η ′ Σ˜ η − βˆn′ η˜ 1 1 (1 − ∆j (β − βˆn )′ η˜)2 − 1 + ∆j (˜ ≤ − + ′ 2 ′ 1 + ∆j βˆn η˜ 1 + ∆j Θp (1 + ∆j βˆn η˜) j=0 ≤ COP (1)dn → 0 in probability.

38

P

Therefore, ED (X u˜m (T ) − X um (T ))2 − → 0. Similarly, using (12), we have |(σTu˜m )2 − (σTum )2 | " 1 1 − QN −1 ≤C QN −1 + j=0 (1 + ∆j βˆn′ η˜) − 1 (1 + ∆ Θ ) − 1 j p j=0 # QN −1 QN −1 ′ ′ 2 ′ ˆ ˆ (1 − ∆ (β − β ) η ˜ ) [(1 − ∆ (β − β ) η ˜ ) + ∆ η ˜ Σ˜ η ] − 1 j n j n j j=0 j=0 − QN −1 QN −1 ′ 2 ′ ˆ ˆ ( j=0 (1 + ∆j βn η˜) − 1) ˜) − 1 j=0 (1 + ∆j βn η

=:C[|I3 | + |I4 |]. Notice that

QN −1 QN −1 −1 ˆ′ ˜)−1 j=0 (1 + ∆j βn η j=0 (1 + ∆j Θp ) |I3 | ≤ − Q QN −1 N −1 1 − j=0 (1 + ∆j βˆn′ η˜)−1 1 − j=0 (1 + ∆j Θp )−1 −1 N −1 NY Y (1 + ∆j Θp )−1 (1 + ∆j βˆn′ η˜)−1 − ≤ COP (1) j=0

≤ COP (1)

j=0

N −1 X j=0

′ −1 −1 ˆ (1 + ∆j βn η˜) − (1 + ∆j Θp )

≤ COP (1)(λn + bn )|η|1 → 0 in probability. However,

|I4 | QN −1 QN −1 QN −1 ′ ˆ ′ 2 η ] − j=0 (1 − ∆j (β − βˆn )′ η˜) − ( j=0 (1 + ∆j βˆn′ η˜) − 1) j=0 [(1 − ∆j (β − βn ) η˜) + ∆j η˜ Σ˜ = QN −1 ˆ′ ˜)2 (1 − QN −1 (1 + ∆j βˆ′ η˜)−1 )2 n j=0 j=0 (1 + ∆j βn η Q N −1 −1 N −1 NY (1 − ∆j (β − βˆn )′ η˜) 1 − j=0 (1 − ∆j (β − βˆn )′ η˜)2 + ∆j η˜′ Σ˜ η Y 1 ≤COP (1) − + QN −1 (1 + ∆j βˆ′ η˜)2 1 + ∆j βˆ′ η˜ (1 + ∆j βˆ′ η˜)2 n

j=0

j=0

n

≤COP (1)(dn + OP (1)bn |η|1 → 0 in probability. P

→ 0. Therefore, (σTu˜m )2 − (σTum )2 −

Proof of Theorem 4.2. We can similarly deduce that |˜ η ′ Σ˜ η − Θp | ≤ Cdn and η˜′ Σ˜ η ≥ Θp + oP (1) 39

j=0

n

by (A.2) and (A.5). Then we have ′ ′ p p β η˜ β η˜ − η˜′ Σ˜ p η ′ √ η − Θp − Θp ≤ √ ′ + η˜ Σ˜ η˜′ Σ˜ η η˜ Σ˜ η dn dn dn +p p ≤ COP (1) p . ≤ p Θp Θp + oP (1) Θp + oP (1) + Θp

With the condition (D), we have Fn p ≤ C pdn ( Θp + 1)eCdn ≤ Cdn eCdn . − 1 F Θp

40

A Linear Programming Model for Selecting Sparse High-Dimensional ...

A Linear Programming Model for Selecting Sparse High-Dimensional ...

Suggest Documents

A Sparse Linear Model and Significance Test for Individual ... - arXiv

A Sparse Linear Model and Significance Test for Individual ... - arXiv

A Sparse Spatial Linear Regression Model for fMRI Data Analysis

Linear Programming Algorithms for Sparse Filter Design - Google Sites

Integer linear programming model for grid-based

A Sparse QS-Decomposition for Large Sparse Linear System of

Scalable linear solvers for sparse linear systems

Selecting Landmark Points for Sparse Manifold

HYPERGRAPH PARTITIONING FOR SPARSE LINEAR SYSTEMS: A

A Simulation-Based Linear Fractional Programming Model for ... - MDPI

An integer linear programming model for a ... - ORLab Analytics

Robustness Analysis of Hottopixx, a Linear Programming Model for

A mixed integer linear programming model for optimal ... - CiteSeerX

A Two-Step Linear Programming Model for Energy-Efficient ... - arXiv

A Two-Step Linear Programming Model for Energy-Efficient ... - arXiv

A Mixed Integer Linear Programming Model For Solving Large ... - idei

A Practical, Integer-Linear Programming Model for the ... - Google Sites

A new mixed integer linear programming model for

A mixed-integer non-linear programming model for ... - Semantic Scholar

A linear programming model for preserving privacy ... - uO Research

A linear programming model for preserving privacy ... - uO Research

A Linear Programming model for the optimal ...

A mixed-integer linear programming model for ... - SciELO Colombia

A Mixed Integer Linear Programming Model for ... - User Web Pages