Sparse Bayesian Time-Varying Covariance ...

Sparse Bayesian Time-Varying Covariance Estimation in Many Dimensions Gregor Kastner∗ WU Vienna University of Economics and Business, Austria November 22, 2017

Abstract We address the curse of dimensionality in dynamic covariance estimation by modeling the underlying co-volatility dynamics of a time series vector through latent time-varying stochastic factors. The use of a global-local shrinkage prior for the elements of the factor loadings matrix pulls loadings on superfluous factors towards zero. To demonstrate the merits of the proposed framework, the model is applied to simulated data as well as to daily log-returns of 300 S&P 500 members. Our approach yields precise correlation estimates, strong implied minimum variance portfolio performance and superior forecasting accuracy in terms of log predictive scores when compared to typical benchmarks.

JEL classification: C32; C38; C53; C58; G11 Keywords: dynamic correlation, factor stochastic volatility, curse of dimensionality, shrinkage, minimum variance portfolio

1

Introduction

The joint analysis of hundreds or even thousands of time series exhibiting a potentially timevarying variance-covariance structure has been on numerous research agendas for well over a decade. In the present paper we aim to strike the indispensable balance between the necessary flexibility and parameter parsimony by using a factor stochastic volatility (SV) model in combination with a global-local shrinkage prior. Our contribution is threefold. First, the proposed ∗

Department of Finance, Accounting and Statistics, WU Vienna University of Economics and Business, Welthandelsplatz 1/D4/4, 1020 Vienna, Austria, +43 1 31336-5593, [email protected]

1

approach offers a hybrid cure to the curse of dimensionality by combining parsimony (through imposing a factor structure) with sparsity (through employing computationally efficient absolutely continuous shrinkage priors on the factor loadings). Second, the efficient construction of posterior simulators allows for conducting Bayesian inference and prediction in very high dimensions via carefully crafted Markov chain Monte Carlo (MCMC) methods made available to end-users through the R (R Core Team 2017) package factorstochvol (Kastner 2017). Third, we show that the proposed method is capable of accurately predicting covariance and precision matrices which we asses via statistical and economic forecast evaluation in several simulation studies and an extensive real-world example. Concerning factor SV modeling, early key references include Harvey et al. (1994), Pitt and Shephard (1999), and Aguilar and West (2000) which were later picked up and extended by e.g. Philipov and Glickman (2006), Chib et al. (2006), Han (2006), Lopes and Carvalho (2007), Nakajima and West (2013), Zhou et al. (2014), and Ishihara and Omori (2017). While reducing the dimensionality of the problem at hand, models with many factors are still rather rich in parameters. Thus, we further shrink unimportant elements of the factor loadings matrix to zero in an automatic way within a Bayesian framework. This approach is inspired by highdimensional regression problems where the number of parameters frequently exceeds the size of the data. In particular, we adopt the approach brought forward by Caron and Doucet (2008) and Griffin and Brown (2010) who suggest to use a special continuous prior structure – the Normal-Gamma prior – on the regression parameters (in our case the factor loadings matrix). This shrinkage prior is a generalization of the Bayesian Lasso (Park and Casella 2008) and has recently received attention in the econometrics literature (Bitto and Fr¨ uhwirth-Schnatter 2016; Huber and Feldkircher 2017). Another major issue for such high-dimensional problems is the computational burden that goes along with statistical inference, in particular when joint modeling is attempted instead of multistep approaches or rolling-window-like estimates. Suggested solutions include Engle and Kelly (2012) who propose an estimator assuming that pairwise correlations are equal at every point in time, Pakel et al. (2014) who consider composite likelihood estimation, Gruber and West (2016) who use a decoupling-recoupling strategy to parallelize estimation (executed on graphical processors), Lopes et al. (2016) who treat the Cholesky-decomposed covariance matrix within the framework of Bayesian time-varying parameter models, and Oh and Patton (2017) who choose a copula-based approach to link separately estimated univariate models. We propose to use a Gibbs-type sampler which allows to jointly take into account both parameter as well as

2

sampling uncertainty in a finite-sample setup through fully Bayesian inference, thereby enabling inherent uncertainty quantification. Additionally, this approach allows for fully probabilistic in- and out-of-sample density predictions. For related work on sparse Bayesian prior distributions in high dimensions, see e.g. Kaufmann and Schumacher (2013) who use a point mass prior specification for factor loadings in dynamic factor models or Ahelegbey et al. (2016) who use a graphical representation of vector autoregressive models to select sparse graphs. From a mathematical point of view, Pati et al. (2014) investigate posterior contraction rates for a related class of continuous shrinkage priors for static factor models and show excellent performance in terms of posterior rates of convergence with respect to the minimax rate. All of these works, however, assume homoskedasticity and are thus potentially misspecified when applied to financial or economic data. For related methods that take into account heteroskedasticity, see e.g. Nakajima and West (2013) and Nakajima and West (2017) who employ a latent thresholding process to enforce time-varying sparsity. Moreover, Zhao et al. (2016) approach this issue via dependence networks, Loddo et al. (2011) use stochastic search for model selection, and Basturk et al. (2016) use time-varying combinations of dynamic models and equity momentum strategies. These methods are typically very flexible in terms of the dynamics they can capture but are applied to moderate dimensional data only. We illustrate the merits of our approach through extensive simulation studies and an in-depth financial application using 300 S&P 500 members. In simulations, we find considerable evidence that the Normal-Gamma shrinkage prior leads to substantially sparser factor loadings matrices which in turn translate into more precise correlation estimates when compared to the usual Gaussian prior on the loadings.1 In the real-world application, we evaluate our model against a wide range of alternative specifications via log predictive scores and minimum variance portfolio returns. Factor SV models with sufficiently many factors turn out to imply extremely competitive portfolios in relation to well-established methods which typically have been specifically tailored for such applications. Concerning density forecasts, we find that our approach outperforms all included competitors by a large margin. The remainder of this paper is structured as follows. In Section 2, the factor SV model is specified and the choice of prior distributions is discussed. Section 3 treats statistical inference via MCMC methods and sheds light on computational aspects concerning out-of-sample density predictions for this model class. Extensive simulation studies are presented in Section 4, where 1

Note that in contrast to e.g. Fr¨ uhwirth-Schnatter and T¨ uchler (2008), we do not attempt to identify exact zeros in a covariance matrix. Rather, we aim to find a parsimonious factor representation of the underlying heteroskedastic data which may (or may not) imply covariances that are close to zero at times.

3

the effect of the Normal-Gamma prior on correlation estimates is investigated in detail. In Section 5, the model is applied to 300 S&P 500 members. Section 6 wraps up and points out possible directions for future research.

2

Model Specification 0

Consider an m-variate zero-mean return vector yt = (y1t , . . . , ymt ) for time t = 1, . . . , T whose conditional distribution is Gaussian, i.e. yt |Σt ∼ Nm (0, Σt ) .

2.1

Factor SV Model

To reduce dimensionality, factor SV models utilize a decomposition of the m × m covariance matrix Σt with m(m + 1)/2 free elements into a factor loadings matrix Λ of size m × r, an r-dimensional diagonal matrix Vt and an m-dimensional diagonal matrix U t in the following fashion: Σt = ΛVt Λ0 + U t .

(1)

This reduces the number of free elements to mr+m+r. Because r is typically chosen to be much smaller than m, this specification constrains the parameter space substantially, thereby inducing parameter parsimony. For the paper at hand, Λ is considered to be time invariant whereas the elements of both Vt and U t are allowed to evolve over time through parametric stochastic volatility models, i.e. U t = diag(exp(h1t ), . . . , exp(hmt )) and Vt = diag(exp(hm+1,t ), . . . , exp(hm+r,t )) with hit ∼ N µi + φi (hi,t−1 − µi ), σi2 , 2 hm+j,t ∼ N φm+j hm+j,t−1 , σm+j ,

i = 1, . . . , m,

(2)

j = 1, . . . , r.

(3)

More specifically, U t describes the idiosyncratic (series-specific) variances while Vt contains the variances of underlying orthogonal factors ft ∼ Nr (0, Vt ) that govern the contemporaneous dependence. The autoregressive process in (3) is assumed to have mean zero to identify the unconditional scaling of the factors.

4

This setup is commonly written in the following hierarchical form (e.g. Chib et al. 2006): yt |Λ, ft , U t ∼ Nm (Λft , U t ) ,

ft |Vt ∼ Nr (0, Vt ) ,

where the distributions are assumed to be conditionally independent for all points in time. To make further exposition clearer, let y = (y1 · · · yT ) denote the m × T matrix of all observations, f = (f1 · · · fT ) the r ×T matrix of all latent factors and h = (h01 · · · h0m+r )0 the (T +1)×(m+r) matrix of all m + r log-variance processes hi = (hi0 , hi1 , . . . , hiT ), i = 1, . . . , m + r. The vector θi = (µi , φi , σi )0 is referred to as the vector of parameters where µi is the level, φi the persistence, and σi2 the innovation variance of hi . To denote specific rows and columns of matrices, we use the “dot” notation, i.e. Xi· refers to the ith row and X·j to the jth column of X. The proportions of variances explained through the common factors for each component series, Cit = 1 − Uii,t /Σii,t for i = 1, . . . , m, are referred to as the communalities. Here, Uii,t and Σii,t denote the ith diagonal element of U t and Σt , respectively. As by construction 0 ≤ Uii,t ≤ Σii,t , the communality for each component series and for all points in time lies between zero and one. P The joint (overall) communality Ct = m−1 m i=1 Cit is simply defined as the arithmetic mean over all series. Three comments are in order. First, the variance-covariance decomposition in (1) can be rewrit1/2

ten as Σt = Λt Λ0t +U t with Λt := ΛVt

. An essential assumption within the factor framework

is that both Vt as well as U t are diagonal matrices. This implies that the factor loadings Λt are dynamic but can only vary column-wise over time. Consequently, the time-variability of Σt ’s off-diagonal elements are cross-sectionally restricted while its diagonal elements are allowed to move independently across series. Hence, the “strength” of a factor, i.e. its cross-sectional explanatory power, varies jointly for all series loading on it. Consequently, it is likely that more factors are needed to properly explain the co-volatility dynamics of a multivariate time series than in models which allow for completely unrestricted time-varying factor loadings (Lopes and Carvalho 2007), correlated factors (Vt not diagonal, see Zhou et al. 2014), or approximate factor models (U t not diagonal, see Bai and Ng 2002). Our specification, however, is less prone to overfitting and has the significant advantage of vastly simplified computations. Second, identifying loadings for latent factor models is a long-standing issue that goes back to at least Anderson and Rubin (1956). Even though this problem is alleviated somewhat when factors are allowed to exhibit conditional heteroskedasticity (Sentana and Fiorentini 2001; Rigobon 2003), most authors have chosen an upper triangular constraint of the loadings matrix with unit

5

diagonal elements, thereby introducing dependence on the ordering of the data (see Fr¨ uhwirthSchnatter and Lopes 2017). However, when estimation of the actual factor loadings is not the primary concern (but rather a means to estimate and predict the covariance structure), this issue is less striking because a unique identification of the loadings matrix is not necessary.2 This allows leaving the factor loadings matrix completely unrestricted, thus rendering the method invariant with respect to the ordering of the series. Third, note that even though the joint distribution of the data is conditionally Gaussian, its stationary distribution has thicker tails. Nevertheless, generalizations of the univariate SV model to cater for even more leptokurtic distributions (e.g. Liesenfeld and Jung 2000) or asymmetry (e.g. Yu 2005) can straightforwardly be incorporated in the current framework. All of these extensions, however, tend to increase both sampling inefficiency as well as running time considerably and could thus preclude inference in very high dimensions.

2.2

Prior Distributions

The usual prior for each (unrestricted) element of the factor loadings matrix is a zero-mean Gaussian distribution, i.e. Λij ∼ N 0, τij2 independently for each i and j, where τij2 ≡ τ 2 is a constant specified a priori (e.g. Pitt and Shephard 1999; Aguilar and West 2000; Chib et al. 2006; Ishihara and Omori 2017; Kastner et al. 2017). To achieve more shrinkage, we model this variance hierarchically by placing a hyperprior on τij2 . This approach is related to Bhattacharya and Dunson (2011) and Pati et al. (2014) who investigate a similar class of priors for homoskedastic factor models. More specifically, let Λij |τij2 ∼ N 0, τij2 ,

τij2 |λ2i ∼ G ai , ai λ2i /2 ,

λ2i ∼ G(ci , di ) .

(4)

Intuitively, each prior variance τij2 provides element-wise shrinkage governed independently for each row by λ2i . By integrating out τij2 it can be seen that the conditional variance of Λij |λ2i is 2/λ2i and the excess kurtosis of Λij is 3/ai . The hyperparameters ai , ci , and di are fixed a priori, whereas ai in particular plays a crucial role for the amount of shrinkage this prior implies. Choosing ai small enforces strong shrinkage towards zero, while choosing ai large imposes little shrinkage. For more elaborate discussions on Bayesian shrinkage in general and the effect of ai specifically, see Griffin and Brown (2010) and Polson and Scott (2011). Note that the Bayesian Lasso prior (Park and Casella 2008) arises as a special case when ai = 1. 2

The conditional covariance matrix Σt = ΛVt Λ0 + U t involves a rotation-invariant transformation of Λ.

6

One can see prior (4) as row-wise shrinkage with element-wise adaption in the sense that all variances in row i can be thought of as “random effects” from the same underlying distribution. In other words, each series has high and a priori independent mass not to load on any factors and thus can be thought of as series-specific shrinkage. For further aspects on introducing hierarchical prior structure via the Normal-Gamma distribution, see Griffin and Brown (2017) and Huber and Feldkircher (2017). Analogously, it turns out to be fruitful to also consider column-wise shrinkage with element-wise adaption, i.e. Λij |τij2 ∼ N 0, τij2 ,

τij2 |λ2j ∼ G aj , aj λ2j /2 ,

λ2j ∼ G(cj , dj ) .

This means that each factor has high and a priori independent mass not to be loaded on by any series and thus can be thought of as factor-specific shrinkage. Concerning the univariate SV priors, we follow Kastner and Fr¨ uhwirth-Schnatter (2014). For the m idiosyncratic and r factor volatilities, the initial states hi0 are distributed according to the stationary distributions of (2) and (3), respectively. Furthermore, p(µi , φi , σi ) = p(µi )p(φi )p(σi ), where the level µi ∈ R is equipped with the usual Gaussian prior µi ∼ N (bµ , Bµ ), the persistence parameter φi ∈ (−1, 1) is implied by (φi + 1)/2 ∼ B(a0 , b0 ) and the volatility of volatility parameter σi ∈ R+ is chosen according to σi2 ∼ Bσ χ21 = G(1/2, 1/(2Bσ )).

3

Statistical Inference

There are a number of methods to estimate factor SV models such as quasi-maximum likelihood (e.g. Harvey et al. 1994), simulated maximum likelihood (e.g. Liesenfeld and Richard 2006; Jungbacker and Koopman 2006), and Bayesian MCMC simulation (e.g. Pitt and Shephard 1999; Aguilar and West 2000; Chib et al. 2006; Han 2006). For high dimensional problems of this kind, Bayesian MCMC estimation proves to be a very efficient estimation method because it allows simulating from the high dimensional joint posterior by drawing from lower dimensional conditional posteriors.

3.1

MCMC Estimation

One substantial advantage of MCMC methods over other ways of learning about the posterior distribution is that it constitutes a modular approach due to the conditional nature of the sampling steps. Consequently, conditionally on the matrix of variances τ = (τij )1≤i≤m; 1≤j≤r , 7

we can adapt the sampling steps of Kastner et al. (2017). For obtaining draws for τ , we follow Griffin and Brown (2010). The MCMC sampling steps for the factor SV model are: 1. For factors and idiosyncratic variances, obtain m conditionally independent draws of the idiosyncratic log-volatilities from hi |yi· , Λi· , f , µi , φi , σi and their parameters from µi , φi , σi |yi· , Λi· , f , hi for i = 1, . . . , m. Similarly, perform r updates for the factor logvolatilities from hm+j |fm+j,· , φm+j , σm+j and their parameters from φm+j , σm+j |fm+j,· , hm+j for j = 1, . . . , r. This amounts to m + r univariate SV updates.3 2a. Row-wise shrinkage only: For i = 1, . . . , m, sample from r˜ ai X 2 λ2i |τi· ∼ G ci + ai r˜, di + τ 2 j=1 ij

! ,

where r˜ = min(i, r) if the loadings matrix is restricted to have zeros above the diagonal and r˜ = r in the case of an unrestricted loadings matrix. For i = 1, . . . , m and j = 1, . . . , r˜, draw from τij2 |λi , Λij ∼ GIG(ai − 21 , ai λ2i , Λ2ij ).4 2b. Column-wise shrinkage only: For j = 1, . . . , r, sample from 

 m X aj λ2j |τ·j ∼ G cj + aj (m − ˜j + 1), dj + τij2  , 2 ˜ i=j

where ˜j = j if the loadings matrix is restricted to have zeros above the diagonal and ˜j = 1 otherwise. For j = 1, . . . , r and i = ˜j, . . . , r, draw from τij2 |λj , Λij ∼ GIG(aj − 1 , aj λ2j , Λ2ij ).4 2

−2 0 3. Letting Ψi = diag τi1−2 , τi2−2 , . . . , τi˜ r , draw Λi· |f , yi· , hi , Ψi , ∼ Nr˜(biT , BiT ) with BiT = (Xi0 Xi + Ψi )−1 and biT = BiT Xi0 y˜i· . Hereby, y˜i· = (yi1 e−hi1 /2 , . . . , yiT e−hiT /2 )0 denotes the ith normalized observation vector and 

−hi1 /2

−hi1 /2



f e · · · fr˜1 e  11  .. ..   Xi =   . .   −hiT /2 −hiT /2 f1T e · · · fr˜T e 3

There is a vast body of literature on efficiently sampling univariate SV models. For the paper at hand, we use R package stochvol (Kastner 2016). 4 The Generalized Inverse Gaussian distribution GIG(m, k, l) has a density proportional to m−1 x exp − 12 (kx + l/x) . To draw from this distribution, we use the algorithm described in Hörmann and Leydold (2013) which is implemented in the R package GIGrvg (Leydold and Hörmann 2017).

8

r = 0 r = 1 r = 5 r = 10 r = 20 r = 50 plain R, m = 10 4 5 8 14 MKL, m = 10 5 6 10 15 plain R, m = 100 43 46 56 74 131 451 MKL, m = 100 45 49 57 69 90 185 plain R, m = 500 222 240 279 361 600 1993 MKL, m = 500 225 240 269 310 389 693 Table 1: Empirically obtained runtime per MCMC iteration on a single i5-5300U CPU (2.30GHz) core running Xubuntu Linux 16.04 using R 3.3.3 linked against the default linear algebra packages as well as Intel MKL (single thread). Measured in milliseconds for m ∈ {10, 100, 500} dimensional time series of length T = 1000. is the T × r˜ design matrix. This constitutes a standard Bayesian regression update. 3*. When inference on the factor loadings matrix is sought, optionally redraw Λ using deep interweaving (Kastner et al. 2017) to speed up mixing. This step is of less importance if one is interested in the (predictive) covariance matrix only. −1 4. Draw the factors from ft |Λ, yt , ht ∼ Nr (bmt , Bmt ) with Bmt = Xt0 Xt + Vt−1 and bmt =

Bmt Xt0 y˜t . Hereby, y˜t = (y1t e−h1t /2 , . . . , ymt e−hmt /2 )0 denotes the normalized observation vector at time t and



−h1t /2

−h1t /2



Λ e · · · Λ1r e  11  .. ..   Xt =   . .   Λm1 e−hmt /2 · · · Λmr e−hmt /2 is the m × r design matrix. This constitutes a standard Bayesian regression update. The above sampling steps are implemented in an efficient way within the R package factorstochvol. Table 1 displays the empirical run time in milliseconds per MCMC iteration. Note that using more efficient linear algebra routines such as Intel MKL leads to substantial speed gains only for models with many factors. To a certain extent, computation can further be sped up by computing the individual steps of the posterior sampler in parallel. In practice, however, doing so is only useful in shared memory environments (e.g. through multithreading/multiprocessing) as the increased communication overhead in distributed memory environments easily outweighs the speed gains.

3.2

Prediction

Given draws of the joint posterior distribution of parameters and latent variables, it is in principle straightforward to predict future covariances and consequently also future observations. 9

This gives rise to the predictive density (Geweke and Amisano 2010), defined as o ) p(yt+1 |y[1:t]

Z =

o o ) dκ, , κ) × p(κ|y[1:t] p(yt+1 |y[1:t]

(5)

K

where κ denotes the vector of all unobservables, i.e. parameters and latent variables. The o superscript o in y[1:t] denotes ex post realizations (observations) for the set of points in time

{1, . . . , t} of the ex ante random values y[1:t] = (y1 · · · yt ). The integration space K simply stands for the space of the possible values for κ. Because (5) is the integral of the likelihood function where the values of κ are weighted according to their posterior distribution, it can be seen as the forecast density for an unknown value yt+1 after accounting for the uncertainty o . about κ, given the history y[1:t]

As with most quantities of interest in Bayesian analysis, computing the predictive density can be challenging because it constitutes an extremely high-dimensional integral which cannot be solved analytically. However, it may be approximated at a given “future” point y f through Monte Carlo integration,

p(y

f

o ) |y[1:t]

K 1 X (k) o ≈ p(y f |y[1:t] , κ[1:t] ), K k=1

(6)

(k)

where κ[1:t] denotes the kth draw from the posterior distribution up to time t. If (6) is evaluated o , it is commonly referred to as the (one-step-ahead) predictive likelihood at time at y f = yt+1

t + 1, denoted PLt+1 . Also, draws from (5) can straightforwardly be obtained by generating (k)

values yt+1 from the distribution given through the (in our case multivariate Gaussian) density (k)

o p(yt+1 |y[1:t] , κ[1:t] ).

For the model at hand, two ways of evaluating the predictive likelihood particularly stand out. First, one could average over k = 1, . . . , K densities of (k) (k) (k) Nm Λ[1:t] ft+1,[1:t] , U t+1,[1:t] , o evaluated at yt+1 , where the subscript t+1 denotes the corresponding one-step ahead predictive (k) (k) (k) (k) draws and U t+1,[1:t] = diag exp h1,t+1,[1:t] , . . . , exp hm,t+1,[1:t] . Note that because U t+1,[1:t] is by

construction diagonal, this method only requires univariate Gaussian evaluations and is thus computationally efficient. Nevertheless, because evaluation is done conditionally on realized values of ft+1,[1:t] , it is extremely unstable in many dimensions. Moreover, since the numerical

10

inaccuracy increases with an increasing number of factors r, this approach can lead to systematic undervaluation of PLt+1 for larger r. Thus, in what follows, we recommended an alternative approach. To obtain PLt+1 , we suggest to average over k = 1, . . . , K densities of (k) (k) (k) (k) Nm 0, Λ[1:t] Vt+1,[1:t] (Λ[1:t] )0 + U t+1,[1:t] , (k) (k) (k) o , where Vt+1,[1:t] = diag exp hm+1,t+1,[1:t] , . . . , exp hm+r,t+1,[1:t] . This form of evaluated at yt+1 the predictive likelihood is obtained by analytically performing integration in (5) with respect to ft+1,[1:t] . Consequently, it is numerically more stable, irrespectively of the number of factors r. However, it requires a full m-variate Gaussian density evaluation for each k and is thus computationally much more expensive. To a certain extent, the computational burden can be mitigated −1 0 −1 −1 −1 −1 by using the Woodbury matrix identity, Σ−1 + Λ0 U −1 Λ U t , along t = U t − U t Λ Vt t Λ with the matrix determinant lemma, det(Σt ) = det(Vt−1 + Λ0 U −1 t Λ) det(Vt ) det(U t ). This substantially speeds up the repetitive evaluation of the multivariate Gaussian distribution if r m. We apply these results for comparing competing models A and B between time points t1 and t2 and consider cumulative log predictive Bayes factors defined through log BFt1 ,t2 (A, B) = Pt2 t=t1 +1 log PLt (A) − log PLt (B), where PLt (A) and PLt (B) denote the predictive likelihood of model A and B at time t, respectively. When the cumulative log predictive Bayes factor is greater than 0 at a given point in time, there is evidence in favor of model A, and vice versa. Thereby, data up to time t1 is regarded as prior information and out-of-sample evaluation starts at time t1 + 1.

4

Simulation Studies

The aim of this section is to apply the model to a simulated data set in order to illustrate the shrinkage properties of the Normal-Gamma prior for the factor loadings matrix elements. For this purpose, we first illustrate several scenarios on a single ten dimensional data set. Second, we investigate the performance of our model in a full Monte Carlo simulation based on 100 simulated data sets. Third, and finally, we investigate to what extend these results carry over to higher dimensions. In what follows, we compare five specific prior settings. Setting 1 refers to the usual standard 11

Gaussian prior with variance τij2 ≡ τ 2 = 1 and constitutes the benchmark. Setting 2 is the row-wise Bayesian Lasso where ai = 1 for all i. Setting 3 is the column-wise Bayesian Lasso where aj = 1 for all j. Setting 4 is the Normal-Gamma prior with row-wise shrinkage where ai = 0.1 for all i. Setting 5 is the Normal-Gamma prior with column-wise shrinkage where aj = 0.1 for all j. Throughout this section, prior hyperparameters are chosen as follows: bµ = 0, Bµ = 1000, Bσ = 1. The prior hyperparameters for the persistence of the latent log variances are fixed at a0 = 10, b0 = 2.5 for the idiosyncratic volatilities and a0 = 2.5, b0 = 2.5 for the factor volatilities; note that the parameters of the superfluous factor are only identified through the prior. The shrinkage hyperparameters are set as in Belmonte et al. (2014), i.e. ci = cj = di = dj = 0.001 for all applicable i and j. For each setting, the algorithm is run for 110 000 iterations of which the first 10 000 draws are discarded as burn-in.

4.1

The Shrinkage Prior Effect: An Illustration

To investigate the effects of different priors on the posteriors of interest, we simulate a single data set from a two factor model for m = 10 time series of length T = 1000. For estimation, an overfitting model with three latent factors is employed. The nonzero parameter values used for simulation are picked randomly and are indicated as black circles in Figure 1; some loadings are set to zero, indicated by black dots. We set Λij to zero if j > i for simulation and estimation. Figure 1 shows smoothed kernel density estimates of posterior loadings under the different prior assumptions. The signs of the loadings have not been identified so that a multimodal posterior distribution hints at a “significant” loading whereas a unimodal posterior hints at a zero loading, see also Fr¨ uhwirth-Schnatter and Wagner (2010). It stands out that only very little shrinkage is induced by the standard Gaussian prior. The other priors, however, impose considerably tighter posteriors. For the nonzero loadings on factor one, e.g., the row-wise Bayesian Lasso exhibits the strongest degree of shrinkage. Little difference between the various shrinkage priors can be spotted for the nonzero loadings on factor two. Turning towards the zero loadings, the strongest shrinkage is introduced by both variants of the Normal-Gamma prior, followed by the different variants of the Bayesian Lasso and the standard Gaussian prior. This is particularly striking for the loadings on the superfluous third factor. The difference between row- and column-wise shrinkage for the Lasso variants can most clearly be seen in row 9 and column 3, respectively. The row-wise Lasso captures the “zerorow” 9 better, while the column-wise Lasso captures the “zero-column” 3 better. Because of

12

2.0 1.0 0.0

●

−0.5

0.0

0.5

1.0

0.0

0.5

1.0

●

−0.5

0.0

0.5

1.0

−1

0

1

2

15

15

3 2 0

●

−0.4

−0.2

0.0

0.2

0.4

−0.05

0.0

0.5

−1.0

−0.5

0.0

0.5

−0.10

−0.05

0.00

0.05

0.10

0.15

−0.1

0.0

−0.2

−0.1

0.0

−0.2

−0.1

0.0

−0.2

−0.1

0.0

−0.2

−0.1

0.0

−0.2

−0.1

0.0

−0.2

−0.1

0.0

0 5

●

−1.0

0.2

0.0

0.5

1.0

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2

●

15

15

1.0

25

−0.5

0 5

0 5

●

−0.05

0.00

0.05

0.10

●

15 ●

−0.15

−0.10

−0.05

0.00

0.05

0.10

15

2.0

●

0.15

−1.0

−0.5

0.0

0.5

1.0

15 ●

−1.5

−1.0

−0.5

0.0

−0.05

0.00

0.5

1.0

1.5

0.05

0.10

0.15

0.05

0.10

15 ●

−0.10

●

●

−0.5

0.0

0.5

15 0 5

0 5

0.0

1.0

15

2.0

25

0.15 −0.15

25

0.00

3.0

−0.05

0 5

15 0 5

15 0 5

●

−0.10

●

25

1.0 25

0.5

25

0.0

0 5

1.0 0.0

●

−0.5

●

25

2.0

0.10

0 5

0.0

1.0

15

0.05

●

25

1.5

25

1.0

0 5

0 5

0.0

0.5

15

1.0

25

−0.10

25

0.0

0.5

1.5

1.5

1.0

●

−0.15

10

−0.2

0.1

25 −0.5

−1.0

9

0.0

15 −1.0

0 5 0.0 0.5 1.0 1.5 2.0

−0.15

8

−0.1

25

0.00

●

−1.5

7

●

−0.2

1.0 0.0

−0.05

1.5

−0.10

●

−1.5

6

0.05

2.0

15 0

●

−0.15

5

0.00

10 5

4

●

0.6 3.0

20

−0.6

0 5

0 5

1

3

●

−2 25

−1.0

25

0.0

2

1.0

2.0

−1.0

1.5

1

Normal prior Row−wise Lasso prior (a = 1) Column−wise Lasso prior (a = 1) Row−wise NG prior (a = 0.1) Column−wise NG prior (a = 0.1)

●

−0.10

−0.05

0.00

0.05

0.10

●

Figure 1: Kernel density estimates of posterior factor loadings under different priors. The standard Gaussian prior (setting 1) in red solid strokes, the row-wise Lasso prior (setting 2) in blue long-dashed strokes, the column-wise Lasso prior (setting 3) in green short-dashed strokes, the row-wise Normal-Gamma prior (setting 4) in purple dotted strokes, the columnwise Normal-Gamma prior (setting 5) in orange dashed-dotted strokes. The vertical axis is capped at 30.

13

Row−wise NG prior (a = 0.1) Column−wise NG prior (a = 0.1) Data generating values

0.0

0.2

0.4

0.6

0.8

Normal prior Row−wise Lasso prior (a = 1) Column−wise Lasso prior (a = 1)

0

200

400

600

800

1000

600

800

1000

−0.15

−0.10

−0.05

0.00

0.05

Time

0

200

400

Figure 2: “True” (gray, solid) and estimated posterior correlations between series 1 and series 2 (top) as well as series 9 and series 10 (bottom). To illustrate estimation uncertainty, posterior means plus/minus 2 times posterior standard deviations are displayed. the increased element-wise shrinkage of the Normal-Gamma prior, the difference between the row-wise and the column-wise variant are minimal. In the context of covariance modeling, however, factor loadings can be viewed upon as a mere means to parsimony, not the actual quantity of interest. Thus, Figure 2 displays selected timevarying correlations. The top panel shows a posterior interval estimate (mean plus/minus two standard deviations) for the correlation of series 1 and series 2 (which is nonzero) under all five prior settings; the bottom panel depicts the interval estimate for the correlation of series 9 and 10 (which is zero). While the relative differences between the settings in the nonzero correlation case are relatively small, the zero correlation case is picked up substantially better when shrinkage priors are used. Posterior means are closer to zero and the posterior credible intervals are tighter. To conclude, we briefly examine predictive performance by investigating cumulative log predictive Bayes factors. Thereby, the first 1000 points in time are treated as prior information, then 1-day- and 10-days-ahead predictive likelihoods are recursively evaluated until t = 1500. Table 2 displays the sum of these values for the respective models in relation to the 2-factor model

14

no fac Standard Gaussian −845.80 Row-wise NG (ci = di = 1) −845.80 Col-wise NG (cj = dj = 1) −845.80 Row-wise NG (ci = di = 0.01) −845.80 Col-wise NG (cj = dj = 0.01) −845.80 Standard Gaussian −787.41 Row-wise NG (ci = di = 1) −787.41 Col-wise NG (cj = dj = 1) −787.41 Row-wise NG (ci = di = 0.01) −787.41 Col-wise NG (cj = dj = 0.01) −787.41

1 fac 2 fac −316.41 −317.75 0.27 −317.45 0.38 −317.71 0.40 −317.52 1.04 −255.92 −256.85 2.90 −256.44 3.14 −256.81 3.18 −255.90 2.85

3 fac −2.81 −0.44 −0.77 −0.29 −0.12 −4.98 1.73 1.17 1.98 1.61

4 fac 5 fac −5.83 −8.38 −1.74 −2.34 −1.61 −2.06 −1.36 −1.97 −0.88 −2.07 −7.75 −12.67 0.87 −0.05 0.90 0.09 1.28 0.53 0.84 0.16

Table 2: Estimated log Bayes factors at t = 1500 against the 2-factor model using a standard Gaussian prior, where data up to t = 1000 is treated as the training sample. Lines 1 to 5 correspond to cumulative 1-day-ahead Bayes factors, lines 6 to 10 correspond to 10-days-ahead predictive Bayes factors. with the standard Gaussian prior. This way, numbers greater than zero can be interpreted as evidence in favor of the respective model. Not very surprisingly, log Bayes factors are highest for the 2-factor model; within this class, models imposing stronger shrinkage perform slightly better, in particular when considering the longer 10-day horizon. Underfitting models predict very poorly both on the short and the longer run, while overfitting models appear almost en par with the baseline model when shrinkage priors are used. This suggests that shrinkage safeguards against overfitting, at least to a certain extent.

4.2

Medium Dimensional Monte Carlo Study

For a more comprehensive understanding of the shrinkage effect, the above study is repeated for 100 different data sets where all latent variables are generated randomly for each realization. In Table 3, the medians of the respective relative RMSEs (root mean squared errors, averaged over time) between the true and the estimated pairwise correlations are depicted. The part above the diagonal represents the relative performance of the row-wise Lasso prior (setting 2) with respect to the baseline prior (setting 1), the part below the diagonal represents the relative performance of the row-wise Normal-Gamma prior (setting 4) with respect to the row-wise Lasso prior (setting 2). Clearly, gains are highest for series 9 which is by construction completely uncorrelated to the other series. Additionally, geometric averages of these performance indicators are displayed in the first row (setting 2 vs. baseline) and in the last row (setting 4 vs. baseline). They can be seen as the average relative performance of one specific series’ correlation estimates with all other series. We remark that choosing ci and di small is crucial

15

Average 1 1 2 3 4 5 6 7 8 9 10 Average 2

1 2 1.05 1.03 1.00 1.00 1.00 1.01 3.04 1.00 1.00 1.01 1.00 1.01 3.02 1.00 1.00 1.00 2.85 2.61 1.00 1.01 1.45 1.12

3 4 5 1.05 1.07 1.04 1.00 1.08 1.00 1.00 1.00 1.00 1.07 1.00 2.85 1.06 1.00 3.01 1.00 3.00 1.00 2.81 1.00 2.87 1.00 1.01 1.01 2.70 2.69 2.73 1.00 2.84 1.00 1.40 2.01 1.40

6 1.05 1.00 1.00 1.00 1.07 1.00 2.77 1.01 2.71 1.00 1.41

7 1.08 1.10 1.00 1.09 1.00 1.06 1.07

8 1.03 1.00 1.00 1.00 1.00 1.00 1.00 1.00

9 1.32 1.29 1.26 1.31 1.30 1.31 1.32 1.28 1.27

10 1.05 1.00 1.00 0.99 1.10 1.00 0.99 1.09 1.00 1.31

1.01 2.77 2.54 2.89 1.01 2.72 2.04 1.12 2.79 1.40

Table 3: Relative RMSEs of pairwise correlations. Above the diagonal: Row-wise Lasso (ai = 1) vs. benchmark standard Gaussian prior with geometric means (first row). Below the diagonal: Row-wise Normal-Gamma (ai = 0.1) vs. row-wise Lasso prior (ai = 1) with geometric means (last row). Numbers greater than one mean that the former prior performs better than the latter. All values reported are medians of 100 repetitions where ci = di = 0.001. GSH Standard Gaussian Row-wise Lasso Col-wise Lasso Row-wise Lasso Col-wise Lasso Row-wise NG Col-wise NG Row-wise NG Col-wise NG

0.001 0.001 1.000 1.000 0.001 0.001 1.000 1.000

Abs. RMSE Rel. RMSE Abs. MAE Rel. MAE 7.581 5.397 7.587 100.449 5.324 101.850 7.593 100.176 5.372 100.622 7.602 99.995 5.416 100.146 7.584 99.987 5.415 100.119 7.351 102.793 4.891 110.252 7.395 102.525 4.957 109.912 7.392 102.641 4.941 109.735 7.384 102.622 4.925 109.722

Table 4: Different error measures (×10−2 ) of posterior mean correlation estimates under various priors. For all scenarios, 3-factor SV models are fit to 10-dimensional data of length 1000 simulated from a 2-factor SV model. The column titled “GSH” contains the values of ci = cj = di = dj . All values reported are medians of 100 repetitions. for the shrinkage effect of the Bayesian Lasso while the performance of the Normal-Gamma prior is relatively robust with regard to these choices. Simulation studies with less extreme hyperparameter choices (ci = di = 1) show that performance gains of the Bayesian Lasso over the standard Gaussian prior nearly vanish while performance gains of the Normal-Gamma prior remain stable. This indicates that the shrinkage effect of the Bayesian Lasso is strongly dependent on the particular choice of these hyperparameters (governing row-wise shrinkage) while the Normal-Gamma can adapt better through increased element-wise shrinkage. An overall comparison of the errors under different priors is provided in Table 4 which lists RMSEs and MAEs for all prior settings, averaged over the non-trivial correlation matrix entries

16

GSH Standard Gaussian Row-wise Lasso Col-wise Lasso Row-wise Lasso Col-wise Lasso Row-wise NG Col-wise NG Row-wise NG Col-wise NG

0.001 0.001 1.000 1.000 0.001 0.001 1.000 1.000

Abs. RMSE Rel. RMSE Abs. MAE Rel. MAE 8.906 7.074 8.293 107.047 6.553 107.844 8.425 105.642 6.659 106.125 8.215 107.811 6.496 108.392 8.345 106.508 6.609 107.058 7.773 114.466 6.074 116.411 7.802 114.525 6.085 116.768 7.799 114.386 6.099 116.391 7.778 114.121 6.070 116.410

Table 5: Different error measures (×10−2 ) of posterior mean correlation estimates under various priors. For all scenarios, unrestricted 11-factor SV models are fit to 100-variate data of length 1000 simulated from 10-factor SV models. The column titled “GSH” contains the values of ci = cj = di = dj . All values reported are medians of 100 repetitions. as well as time. Note again that results under the Lasso prior are sensitive to the particular choices of the global shrinkage hyperparameters as well as the choice of row- or column-wise shrinkage, which is hardly the case for the Norma-Gamma prior. Interestingly, the performance gains achieved through shrinkage prior usage are higher when absolute errors are considered. This is coherent with the extremely high kurtosis of Normal-Gamma-type priors which, while placing most mass around zero, allow for large values.

4.3

High Dimensional Monte Carlo Study

The findings are similar if dimensionality is increased; in analogy to above, we report overall RMSEs and MAEs for 495 000 pairwise correlations, resulting from m = 100 component series at T = 1000 points in time. The factor loadings for the r = 10 factors are again randomly sampled with 43.8% of the loadings being equal to zero, resulting in about 2.6% pairwise correlations that vanish. Consequently, 100 data sets are generated; for each of these, a separate (overfitting) factor SV model using r = 11 factors without any prior restrictions on the factor loadings matrix is fit. Table 5 reports the medians of the aggregated error measures. In this setting, the shrinkage priors outperform the standard Gaussian prior by a relatively large margin; the effect of the specific choice of the global shrinkage hyperparameters is less pronounced.

5

Application to S&P 500 Data

In this section we apply the SV factor model to stock prices listed in the Standard & Poor’s 500 index. We only consider firms which have been continuously included in the index from 17

GICS sector Consumer Discretionary Consumer Staples Energy Financials Health Care Industrials Information Technology Materials Telecommunications Services Utilities

Members 45 28 23 54 30 42 27 23 3 25

Table 6: GICS sectors and the amount of members within the S&P 500 data set. November 1994 until December 2013, resulting in m = 300 stock prices on 5001 days, ranging from 11/1/1994 to 12/31/2013. The data was obtained from Bloomberg Terminal in January 2014. Instead of considering raw prices we investigate percentage log-returns which we demean a priori. The presentation consists of two parts. First, we exemplify inference using a multivariate stochastic volatility model and discuss the outcome. Second, we perform out-of-sample predictive evaluation and compare different models. To facilitate interpretation of the results discussed in this section, we consider the GICS5 classification into 10 sectors listed in Table 6.

5.1

A Four-Factor Model for 300 S&P 500 Members

To keep graphical representation feasible, we only focus on the latest 2000 returns of our data set, i.e. 5/3/2006 to 12/31/2013. This time frame is chosen to include both the 2008 financial crisis as well as the period before and thereafter. Furthermore, we restrict our discussion to a four-factor model. This choice is somewhat arbitrary but allows for a direct comparison to a popular model based on four observed (Fama-French plus Momentum) factors. A comparison of predictive performance for varying number of factors is discussed in Section 5.2; the FamaFrench plus Momentum model is introduced in Section 5.3. We run our sampler employing the Normal-Gamma prior with row-wise shrinkage for 110 000 draws and discard the first 10 000 draws as burn-in.6 Of the remaining 100 000 draws every 10th draw is kept, resulting in 10 000 draws used for posterior inference. Hyperparameters are 5

Global Industry Classification Standard, retrieved from https://en.wikipedia.org/w/index.php?title= List_of_S%26P_500_companies&oldid=589980759 on April 11, 2016. 6 To keep presentation at a reasonable length and because qualitative as well as quantitative results are very similar, we omit details about the Normal-Gamma prior with column-wise shrinkage.

18

−100 −400

−300

−200

−320 −300 −280 −260 −240 0

200

400

600

800

1000

0

200

400

600

800

1000

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure 3: Trace plots for 1000 draws after burn-in and thinning of the log determinant of the covariance (left panel) and the correlation matrix (right panel) for t = T .

5/3/2006

3/9/2007

1/15/2008

11/20/2008

9/28/2009

8/5/2010

6/13/2011

4/18/2012

2/22/2013

12/31/2013

Figure 4: Top panel: 300 mean posterior variances, i.e. E(diag(Σt ) |y) for t = 1, . . . , T (logarithmic scale). Bottom panel: Posterior mean of the joint communality Ct (bold line) along with mean plus/minus two posterior standard deviations (light lines). set as follows: ai ≡ a = 0.1, ci ≡ c = 1, di ≡ d = 1, bµ = 0, Bµ = 100, a0 = 20, b0 = 1.5, Bσ = 1, Bm+j = 1 for j = 1, . . . , r. To prevent factor switching, we set all elements above the diagonal to zero. The leading series are chosen manually after a preliminary unidentified run such that series with high loadings on that particular factor (but low loadings on the other factors) become leaders. Note that this intervention is only necessary for interpreting the factor loadings matrix but not for covariance estimation or prediction. We observe excellent mixing for both the covariance as well as the correlation matrix MCMC draws. To exemplify, trace plots of the first 1000 draws after burn-in and thinning for posterior draws of the log determinant distribution of the covariance and correlation matrices at t = T are displayed in Figure 3. To illustrate the substantial degree of volatility co-movement, mean posterior variances are depicted in the top panel of Figure 4. Clear spikes can be spotted during the financial crisis in late 2008 but also in early 2010 and late 2011. This picture is mirrored (to a certain extent) in the bottom panel which displays the posterior distribution of the joint communality Ct . At

19

times, the joint communality reaches high values of 0.7 and more. Median posterior factor loadings are visualized in Figure 5. In the top panel it can be seen that all series significantly load on the first factor which consequently could be interpreted to represent the joint dynamics of the wider US equity market. Investigating the second factor, it stands out that due to the use of the Normal-Gamma prior a considerable amount of loadings are shrunk towards zero. Main drivers are all in the sector Utilities. Also, companies in sectors Consumer Staples, Health Care and (to a certain extent) Financials load positively here. Both the loadings on factor 3 as well as the loadings on factor 4 are substantially shrunk towards zero. Exceptions are Energy and Materials companies for factor 3 and Financials for factor 4. The corresponding factor log variances are displayed in Figure 6. Apart from featuring similar low- to medium frequency properties, each process exhibits specific characteristics. First, notice the sharp increase of volatility in early 2010 which is mainly visible for the “overall” factor 1. The second factor (Utilities) displays a pre-crisis volatility peak during early 2008. The third factor, driven by Energy and Materials, shows relatively smooth volatility behavior while the fourth factor, governed by the Financial, exhibits a comparably “nervous” volatility evolution. Finally, we show three examples of the posterior mean of the correlation matrix Σt in Figure 7. The series are grouped according to the alphabetically ordered industry sectors (and simply sorted according to their ticker symbol therein). An animation displaying the mean correlation matrix for all points in time is available at https://vimeo.com/217021226. Considering the last trading day in 2006, highly correlated clusters appear within Energy and Utilities, to a certain extent also within Financials, Industrials and Materials. Not very surprisingly, there exists only low correlation between companies in the sectors Consumer Discretionary/Staples and Energy but higher correlation between Energy, Industrials and Materials. Looking at the last trading day of 2008, the overall picture changes radically. Higher correlation can be spotted throughout, both within sectors but also between sectors. There are only few companies that show little and virtually no companies that show no correlation with others. Another two years later, we again see a different overall picture. Lower correlations throughout become apparent with moderate correlations remaining within the sectors Energy, Utilities, and Financials.

5.2

Predictive Likelihoods for Model Selection

Even for univariate volatility models evaluating in- or out-of-sample fit is not straightforward because the quantity of interest (the conditional standard deviation) is not directly observable.

20

Posterior median of factor loadings

1.5

PPL EXCEIX PEG PCG AEP ETR NU TEG NEE CMS DTE XELPNW DUKWEC D SCG CNP TE SOED NI

1.0

GAS OKE

−0.5

0.0

Factor 2 0.5

CPB T MRK HCP VTR VZ GIS MOABT BF−B PEPCL KO BMY LLY HSY HCN SJM KMB PG K AIV VNO XOM PSAAVB SPGMAC EQT CTL PFE CAG CB AGN JNJWMT HRL KR ADM EQR CVX BAX SWY TRV CINF SYY KIM CLX AMGN WM MDT ACEHRB PCL AVP MCD WMB LCOP RTNPGR CMCSA WAG CCE CELG CAH CVS MMC BDXPBCT COST GILD NOC ALL BMS BCR DIS CTAS ECLESRX TMK UNH XL UNM NBL MSFT BEAM GDMHFI PAYX TSS AZOFDO AON IBM BIIB FISV PBIMMM GPC BBT CI SIAL OXYMRO HST UTX USB NTRS MNST BLL VAR IFF XRAY GE TGT PDCO WFC TROW AFL BEN OMC EFX TMO PNC AXP CMA LEG SHW WY HD BSX APA BK VFC REGN GHC HUM QCOM MUR FITB GWW INTU JPM TJX HES PX LUV TYC VRTX BA ARG FLIR APD SWN APC ROST DD AVY NEM NKE SYMC CSC GPS CSCO PKI NWL LOW WFM NSC EMR EOG STI SLB LB C TSO RDC ORLY CERN PETM KSS PPG PCP CCL HBAN INTC KEY SNA MSI SPLS DHR ADBE THC SBUX LM LNC MVMC BHI CSX UNP FAST HON IPG HAL IPCOG RIG EXPD IGT GCI BBBY BAC LUK HPQ TIF HRS AIG PLL FDX MS DOW NUE ITW BBY PCAR GLW RHI XRX EMC FMC EMN SWK NFX FOSL AA TXT ZIONDOV OI MCHP WHR JEC EA PHADSK R AAPL LLTC ETN HOG KSUHP DE JCI PHM MAS XLNX AMAT ALTR URBN F HOT IR PVH KLACCAT LRCXBWA LEN HAR

CMI

−1.0

X 0.0

0.2

0.4

0.6 Factor 1

0.8

1.0

CLF

1.2

2.5

Posterior median of factor loadings ZION STI HBAN FITB KEY

0.0

0.5

1.0

Factor 4

1.5

2.0

WFC CMA

Consumer Discretionary Consumer Staples Energy Financials Health Care Industrials Information Technology Materials Telecommunications Services Utilities

BAC C

BBT PNC JPM USB MS AIV AIG KIM LNC EQR HST XL VNO AVB BK SPG HCP MAC LEN PSA AXP VTR NTRS LM UNM TROW HCN AFL TMK PCLALL CINF LUK PHM BEN PBCT GCI PGR ACE L MAS TRV CB HOT GE WHR HOG MMC CCL F WY IP AON IGT LUV M TXT DOW NWL HAR LEG MHFI RCI IPG VMC MSI TGT XRX BEAM DD OI GHC FDX CMCSA AVY BMS HD SNA TBWA DIS HRB THC KSU URBN PBI EMN LB WFMVZ TE NI CTL SWK UNP NSC VFC JCI EFX PCAR PPG NKE BBY PFE CSX RHI LOW CTAS IRAPD FMC SPLS MMM FISV SWY KSS EIX TIF BBBY TYC TSS EXC SHW ED GPS UNH CSCO CMS TEG SBUX QCOM PNW SYY INTU HUM DUK UTX PAYX CNP CMI PVH COST WMT FOSL BMYBSX SJM LLY AGN SYMC INTC AMAT LRCX MRK AVP SO PCG OMC ITW CSC AEP NEE BLL EA DTE AAPL DOV SIAL PPL EMR ARG PEG CAT OKEDE WAG MCD PDCO KR MSFT CCE XEL IFF GAS GLW PX ADM CLX HON MDT WM PG KLAC JNJ D PEP HSY GPC ORLYTJX K KMB EXPD IBM FDOPETM CVS CAG BA EMC PH FAST ADBE HPQ REGN SCG ESRX AZO XRAY ECL BF−B MO ETR PKI CAH HRS NOC ABT PLL FLIR CPBGIS NUPCP ROST WEC KODHR CERN ETN HRLMCHP MNST CL GWW GD VAR BAX LLTC RTN BCR TMO ADSK CELG XLNX BDX AMGN GILD BIIB VRTX ALTR 0

AA

JEC XOM

X RDC NFX BHI MUR MRO TSO OXY SLB CLF EOG HESCOG HP HAL APC RIG APA NBL SWN

NUE CVX

EQTCOP

WMB

NEM

1 Factor 3

2

3

Figure 5: Median loadings on the first two factors (top) and the last two factors (bottom) of a 4-factor model applied to m = 300 demeaned stock price log-returns listed in the S&P 500 index. Shading: Sectors according to the Global Industry Classification Standard.

21

5 4 3 2 1 0 −1

3/9/2007

1/15/2008

11/20/2008

9/28/2009

8/5/2010

6/13/2011

4/18/2012

2/22/2013

12/31/2013

5/3/2006

3/9/2007

1/15/2008

11/20/2008

9/28/2009

8/5/2010

6/13/2011

4/18/2012

2/22/2013

12/31/2013

5/3/2006

3/9/2007

1/15/2008

11/20/2008

9/28/2009

8/5/2010

6/13/2011

4/18/2012

2/22/2013

12/31/2013

5/3/2006

3/9/2007

1/15/2008

11/20/2008

9/28/2009

8/5/2010

6/13/2011

4/18/2012

2/22/2013

12/31/2013

−6

−4

−2

0

2

−4

−3

−2

−1

0

1

−4

−3

−2

−1

0

1

5/3/2006

Figure 6: Latent factor log variances hm+j,· , j = 1, . . . , 4 (top to bottom). Bold line indicates the posterior mean; light lines indicate mean ± 2 standard deviations.

Figure 7: Posterior mean of the time-varying correlation matrix E(Σt |y), exemplified for t ∈ {173, 696, 1218} which corresponds to the last trading day in 2006, 2008, 2010, respectively. The matrix has been rearranged to reflect the different GICS sectors in alphabetical order, cf. Table 6.

22

While in lower dimensions this issue can be circumvented to a certain extent using intraday data and computing realized measures of volatility, the difficulty becomes more striking when the dimension increases. Thus, we focus on iteratively predicting the observation density outof-sample which is then evaluated at the actually observed values. Because this approach involves re-estimating the model for each point in time, it is computationally costly but can be parallelized in a trivial fashion on multi-core computers. For the S&P 500 data set, we begin by using the first 3000 data points (until 5/2/2006) to estimate the one-day-ahead predictive likelihood for day 3001 as well as the ten-day-ahead predictive likelihood for day 3010. In a separate estimation procedure, the first 1001 data points (until 5/3/2006) are used to estimate the one-day-ahead predictive likelihood for day 3002 and the corresponding ten-day-ahead predictive likelihood for day 3011, etc. This procedure is repeated for 1990 days until the end of the sample is reached. We use a no-factor model as the baseline which corresponds to 300 individual stochastic volatility models fitted to each component series separately. For each date, values greater than zero mean that the model outperforms the baseline model up to that point in time. Competitors of the no-factor SV model are r-factor SV models with r = 1, . . . , 20 under the usual standard Gaussian prior and under the Normal-Gamma prior with ai ≡ 0.1 and ci ≡ di ≡ 1 employing row-wise-shrinkage. All other parameters are kept identical, i.e. bµ = 0, Bµ = 100, a0 = 20, b0 = 2.5 (idiosyncratic persistences), a0 = 2.5, b0 = 2.5 (factor persistences), Bσ = 0.1, Bm+j = 0.1 for j = 1, . . . , r. Note that because the object of interest in this exercise does not require the factor loadings matrix to be identified, no a priori restrictions are placed on Λ. This alleviates the problem of arranging the data in any particular order before running the sampler. Other competing models are discussed in the following sections. Accumulated log predictive likelihoods for the entire period are displayed in Figure 8. Gains in predictive power are substantial up to around 8 factors with little difference for the two priors. After this point, the benefit of adding even more factors turns out to be less pronounced. On the contrary, the effect of the priors becomes more pronounced. Again, while differences tend to be muted for models with fewer factors, the benefit of shrinkage grows when r gets larger. While joint models with r > 0 outperform the marginal (no-factor) model for all points in time, days of particular turbulence particularly stand out. To illustrate this, we display average and top three log predictive gains over the no-factor model in Table 7. The biggest gains can be seen on “Black Monday 2011” (August 8), when US stock markets tumbled after a credit rating downgrade of US sovereign debt by Standard and Poor’s. The trading day before this, August 23

−950000

Estimated marginal log predictive likelihoods for industry "overall" (n = 300, method = "new")

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−1000000

● ● ● ● ●

−1050000

●

● ●

●

● ●

−1250000

−1200000

−1150000

−1100000

●

●

●

● ●

0

5

10

1−day−ahead, normal prior 1−day−ahead, NG prior 10−days−ahead, normal prior 10−days−ahead, NG prior

15

20

Figure 8: Accumulated 1-day-ahead and 10-days-ahead log predictive likelihoods for models with 0, 1, . . . , 20 factors. Data until t = 3000 is treated as training data.

1 2 3 4 10 20 1 2 3 4 10 20

factors, factors, factors, factors, factors, factors, factors, factors, factors, factors, factors, factors,

Gaussian Gaussian Gaussian Gaussian Gaussian Gaussian NG prior NG prior NG prior NG prior NG prior NG prior

Mean 2/27/2007 8/4/2011 84.83 1415.13 1189.78 91.88 1455.39 1261.38 98.55 1449.31 1237.97 101.50 1449.21 1254.36 114.81 1482.36 1271.59 117.69 1481.15 1293.98 85.03 1421.45 1194.08 91.56 1450.88 1239.42 98.57 1448.90 1231.94 101.47 1443.86 1240.68 115.32 1482.32 1283.90 119.21 1499.08 1293.74

8/8/2011 1446.03 1462.23 1479.84 1502.17 1557.63 1568.60 1445.51 1461.73 1471.58 1536.77 1546.43 1578.54

Table 7: Average and top 3 daily 1-day-ahead log predictive gains over the no-factor SV model. 4, displays the third highest gain. The 27th of February in 2007 also proves to be an interesting date to consider. This day corresponds to the burst of the Chinese stock bubble that led to a major crash in Chinese stock markets, causing a severe decline in equity markets worldwide. It appears that joint modeling of stock prices is particularly important on days of extreme events when conditional correlations are often higher.

5.3

Using Observable Instead of Latent Factors

An alternative to estimating latent factors from data is to use observed factors instead (cf. Wang et al. 2011). To explore this route, we investigate an alternative model with four observed 24

factors, the three Fama-French plus the Momentum factor.7 The Fama-French factors consist of the excess return on the market, a size factor SMB, and a book-to-market factor HML (Fama and French 1993); the momentum factor MOM (Carhart 1997) captures the empirically observed tendency for falling asset prices to fall further, and rising prices to keep rising. For estimation of this model we proceed exactly as before, except that we omit the last step of our posterior sampler and keep f fixed at the observed values. Without presenting qualitative results in detail due to space constraints, we note that both the loadings on and the volatilities of the market excess returns show a remarkably close resemblance to those corresponding to the first latent factor displayed in Figures 5 and 6. To a certain extent (although much less pronounced), this is also true for SMB and the second latent factor as well as HML and the fourth latent factor. However, most of the loadings on MOM are shrunk towards zero and there is no recognizable similarity to the remaining latent factor from the original model. Log predictive scores for this model are very close to those for the SV model with two latent factors. In what follows, we term this approach FF+MOM.

5.4

Comparison to Other Models

We now turn to investigating the statistical performance of the factor SV model via out-ofsample predictive measures as well as its suitability for optimal asset allocation. The competitors are: Moving averages (MAs) of sample covariance matrices over a window of 500 trading days, exponentially weighted moving averages (EWMAs) of sample covariances defined by Σt+1 = (1 − α)yt0 yt + αΣt , the Ledoit-Wolf shrinkage estimator (Ledoit and Wolf 2004) and FF+MOM described above. While this choice is certainly not exhaustive, it includes many of the approaches most widely used in practice. For comparison, we use two benchmarking methods. First, we consider the minimum variance ˆ t+1 (the point estimate or posterior mean estimate, respectively) which portfolio implied by Σ uniquely defines the optimal portfolio weights

ωt+1 =

ˆ −1 Σ t+1 ι , ˆ −1 ι0 Σ t+1 ι

where ι denotes an m-variate vector of ones. Using these weights, we compute the corresponding 7

The Fama-French+Momentum factors are available at a daily frequency at Kenneth French’s web page at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. The data was downloaded on February 21, 2017; missing values were replaced with zeroes and the data was standardized to have unconditional mean zero and variance one.

25

Equal weight portfolio MA (500 days) EWMA (α = 0.94) EWMA (α = 0.99) Ledoit-Wolf FF+MOM no-factor SV N-FSV 1 N-FSV 2 N-FSV 3 N-FSV 4 N-FSV 10 N-FSV 20 N-FSV 50 NG-FSV 1 NG-FSV 2 NG-FSV 3 NG-FSV 4 NG-FSV 10 NG-FSV 20 NG-FSV 50

SD Avg Sharpe PLPS 29.02 0.00 0.00 16.56 −2.39 −0.14 −320.17 35.90 −4.43 −0.12 −3 × 109 18.42 −4.18 −0.23 −1409.97 12.53 6.14 0.49 25.68 16.82 12.68 0.75 74.56 22.53 1.02 0.05 0.00 19.03 9.45 0.50 66.29 18.92 9.32 0.49 75.25 15.50 7.36 0.48 81.47 14.58 8.26 0.57 87.07 12.94 12.45 0.96 97.74 12.57 7.81 0.62 101.82 12.41 5.03 0.40 106.57 18.98 9.44 0.50 66.29 18.84 9.37 0.50 75.21 15.52 7.20 0.46 81.39 14.85 8.36 0.56 86.80 12.94 12.00 0.93 97.74 12.02 13.33 1.11 102.30 12.42 10.78 0.87 107.23

Table 8: Predictive performance measures, averaged over 1000 trading days after 5/3/2006. SD: Annualized empirical standard deviations of portfolio returns. Avg: Annualized average excess returns over the equal weight portolio. Sharpe: Quotients of Avg and SD. PLPS: Average one-day-ahead pseudo log predictive scores over the no-factor SV model. N-FSV stands for the factor SV model with the standard normal prior. NG-FSV stands for the factor SV model with row-wise Normal-Gamma prior (ai ≡ 0.1). realized portfolio returns rt+1 for t = 3000, 3001, . . . , 3999, effectively covering an evaluation period from 5/3/2006 to 3/1/2010. In the first three columns of Table 8, we report annualized empirical standard deviations, annualized average excess returns over those obtained from the equal weight portfolio, and the quotient of these two measures, the Sharpe ratio (Sharpe 1966). Considering the portfolio standard deviation presented in the first column of Table 8, it turns out that the Ledoit-Wolf shrinkage estimator implies an annualized standard deviation of about 12.5 which is only matched by factor SV models with many factors. Lower-dimensional factor SV models, including FF+MOM, as well as simple MAs and highly persistent EWMAs do not perform quite as well but are typically well below 20. Less persistent EWMAs, the no-factor SV model and the na¨ıve equal weight portfolio exhibit standard deviations higher than 20. Considering average returns, FF+MOM and factor SV models with around 10 to 20 factors tend to do well for the given time span. In the third column, we list Sharpe ratios, where factor SV models with 10 to 20 factors show superior performance, in particular when the Normal-

26

Gamma prior is employed. Note however that column two and three have to be interpreted with some care, as average asset and portfolio returns generally have a high standard error. Second, we use what we coin pseudo log predictive scores (PLPSs), i.e. Gaussian approximations to the actual log predictive scores. This simplification is necessary because most of the abovementioned methods only deliver point estimates of the forecast covariance matrix and it is not clear how to properly account for estimation uncertainty. Moreover, the PLPS is simpler to evaluate as there is no need to numerically solve a high-dimensional integral. Consequently, it is frequently used instead of the actual LPS in high dimensions while still allowing for evaluation of the covariance accuracy (Adolfson et al. 2007; Carriero et al. 2016; Huber 2016). More ˆ t+1 for Σt+1 and compute specifically, we use data up to time t to determine a point estimate Σ ˆ t+1 evaluated at the actually the logarithm of the multivariate Gaussian density Nm 0, Σ o to obtain the one-day-ahead PLPS for time t + 1. observed value yt+1

In terms of average PLPSs (the last column in Table 8), factor SV models clearly outperform all other models under consideration. In particular, even if r is chosen as small as r = 4 to match the number of factors, the model with latent factors outperforms FF+MOM. Using latent factors is generally preferable; note however that the 4-factor FF+MOM does better than the single- and no-factor SV models. Generally speaking, many factors appear to be needed for accurately representing the underlying data structure, irrespectively of the prior choice. Considering the computational simplicity of the Ledoit-Wolf estimator, its prediction accuracy is quite remarkable. It clearly outperforms the no-factor SV model which, in turn, beats simple MAs and EWMAs.

6

Conclusion and Outlook

The aim of this paper was to present an efficient and parsimonious method of estimating high-dimensional time-varying covariance matrices through factor stochastic volatility models. We did so by proposing an efficient Bayesian MCMC algorithm that incorporates parsimony by modeling the covariance structure through common latent factors which themselves follow univariate SV processes. Moreover, we added additional sparsity by utilizing a hierarchical shrinkage prior, the Normal-Gamma prior, on the factor loadings. We showed the effectiveness of our approach through simulation studies and illustrated the effect of different shrinkage specifications. We applied the algorithm to a high-dimensional data set consisting of stock returns of 300 S&P 500 members and conducted an out-of-sample predictive study to compare 27

different prior settings and investigate the choice of the number of factors. Moreover, we discussed the out-of-sample performance of a minimum variance portfolio constructed from the model-implied weights and related it to a number of competitors often used in practice. Because the algorithm scales linearly in both the series length T as well as the number of component series m, applying it to even higher dimensions is straightforward. We have experimented with simulated data in thousands of dimensions for thousands of points in time and successfully recaptured the time-varying covariance matrix. Further research could be directed towards incorporating prior knowledge into building the hierarchical structure of the Normal-Gamma prior, e.g. by choosing the global shrinkage parameters according to industry sectors. Alternatively, Villani et al. (2009) propose a mixture of experts model to cater for smoothly changing regression densities. It might be fruitful to adopt this idea in the context of covariance matrix estimation by including either observed (Fama-French) or latent factors as predictors and allowing for other mixture types than the ones discussed there. While not being the focus of this work, it is easy to extend the proposed method by exploiting the modular nature of Markov chain Monte Carlo methods. In particular, it is straightforward to combine it with mean models such as (sparse) vector autoregressions (e.g., Ba´ nbura et al. 2010; Kastner and Huber 2017), dynamic regressions (e.g., Korobilis 2013), or time-varying parameter models (e.g., Koop and Korobilis 2013; Huber et al. 2017).

Acknowledgments Variants of this paper were presented at the 6th European Seminar on Bayesian Econometrics (ESOBE 2015), the 2015 NBER-NSF Time Series Conference, the 2nd Vienna Workshop on High-Dimensional Time Series in Macroeconomics and Finance 2015, the 2015 International Work-Conference on Time Series Analysis, the 2016 ISBA World Meeting, the 10th International Conference on Computational and Financial Econometrics 2016, the CORE Econometrics & Finance Seminar 2017 and the 61st World Statistics Congress 2017. The author thanks all participants, in particular Luc Bauwens, Manfred Deistler, John Geweke, Florian Huber, Sylvia Kaufmann, Hedibert Lopes, Ruey Tsay, Stefan Voigt, Mike West, as well as the handling editor Herman van Dijk and two anonymous referees for crucially valuable comments and suggestions. Special thanks go to Mark Jensen who discussed this paper at the ESOBE 2015 and Sylvia Fr¨ uhwirth-Schnatter who supported the author throughout the development of this work.

28

References Adolfson, M., J. Lindé, and M. Villani (2007). “Forecasting Performance of an Open Economy DSGE Model”. Econometric Reviews 26(2–4), 289–328. Aguilar, O. and M. West (2000). “Bayesian Dynamic Factor Models and Portfolio Allocation”. Journal of Business & Economic Statistics 18(3), 338–357. Ahelegbey, D. F., M. Billio, and R. Casarin (2016). “Bayesian Graphical Models for Structural Vector Autoregressive Processes”. Journal of Applied Econometrics 31(2), 357–386. Anderson, T. W. and H. Rubin (1956). “Statistical Inference in Factor Analysis”. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 5: Contributions to Econometrics, Industrial Research, and Psychometry. University of California Press, pp. 111–150. url: http://projecteuclid.org/euclid.bsmsp/1200511860. Bai, J. and S. Ng (2002). “Determining the Number of Factors in Approximate Factor Models”. Econometrica 70(1), 191–221. Ba´ nbura, M., D. Giannone, and L. Reichlin (2010). “Large Bayesian Vector Auto Regressions”. Journal of Applied Econometrics 25(1), 71–92. Basturk, N., S. Grassi, L. Hoogerheide, and H. K. van Dijk (2016). Time-Varying Combinations of Bayesian Dynamic Models and Equity Momentum Strategies. Discussion Paper TI 2016-099/III. Tindbergen Institute. url: http://www.tinbergen.nl/discussionpaper/?paper=2685. Belmonte, M. A., G. Koop, and D. Korobilis (2014). “Hierarchical Shrinkage in Time-Varying Parameter Models”. Journal of Forecasting 33(1), 80–94. Bhattacharya, A. and D. B. Dunson (2011). “Sparse Bayesian Infinite Factor Models”. Biometrika 98(2), 291–309. Bitto, A. and S. Fr¨ uhwirth-Schnatter (2016). Achieving Shrinkage in a Time-Varying Parameter Model Framework. arXiv pre-print 1611.01310v1. url: https://arxiv.org/abs/1611.01310. Carhart, M. M. (1997). “On Persistence in Mutual Fund Performance”. The Journal of Finance 52(1), 57–82. Caron, F. and A. Doucet (2008). “Sparse Bayesian Nonparametric Regression”. In: Proceedings of the 25th International Conference on Machine Learning (ICML’2008), pp. 88–95. Carriero, A., T. E. Clark, and M. Marcellino (2016). “Common Drifting Volatility in Large Bayesian VARs”. Journal of Business & Economic Statistics 34(3), 375–390. Chib, S., F. Nardari, and N. Shephard (2006). “Analysis of High Dimensional Multivariate Stochastic Volatility Models”. Journal of Econometrics 134(2), 341–371. Engle, R. F. and B. Kelly (2012). “Dynamic Equicorrelation”. Journal of Business & Economic Statistics 30(2), 212–228. Fama, E. F. and K. R. French (1993). “Common Risk Factors in the Returns on Stocks and Bonds”. Journal of Financial Economics 33(1), 3–56. Fr¨ uhwirth-Schnatter, S. and H. F. Lopes (2017). Parsimonious Bayesian Factor Analysis When the Number of Factors is Unknown. Tech. rep. Fr¨ uhwirth-Schnatter, S. and R. T¨ uchler (2008). “Bayesian Parsimonious Covariance Estimation for Hierarchical Linear Mixed Models”. Statistics and Computing 18(1), 1–13. Fr¨ uhwirth-Schnatter, S. and H. Wagner (2010). “Stochastic Model Specification Search for Gaussian and Partial Non-Gaussian State Space Models”. Journal of Econometrics 154(1), 85–100. Geweke, J. and G. Amisano (2010). “Comparing and Evaluating Bayesian Predictive Distributions of Asset Returns”. International Journal of Forecasting 26(2), 216–230. Griffin, J. E. and P. J. Brown (2010). “Inference with Normal-Gamma Prior Distributions in Regression Problems”. Bayesian Analysis 5(1), 171–188. Griffin, J. E. and P. J. Brown (2017). “Hierarchical Shrinkage Priors for Regression Models”. Bayesian Analysis 12(1), 135–159.

29

Gruber, L. and M. West (2016). “GPU-Accelerated Bayesian Learning and Forecasting in Simultaneous Graphical Dynamic Linear Models”. Bayesian Analysis 11(1), 125–149. Han, Y. (2006). “Asset Allocation with a High Dimensional Latent Factor Stochastic Volatility Model”. Review of Financial Studies 19(1), 237–271. Harvey, A. C., E. Ruiz, and N. Shephard (1994). “Multivariate Stochastic Variance Models”. Review of Economic Studies 61(2), 247–264. Hörmann, W. and J. Leydold (2013). “Generating Generalized Inverse Gaussian Random Variates”. Statistics and Computing 24(4), 1–11. Huber, F. (2016). “Density Forecasting Using Bayesian Global Vector Autoregressions With Stochastic Volatility”. International Journal of Forecasting 32(3), 818–837. Huber, F. and M. Feldkircher (2017). “Adaptive Shrinkage in Bayesian Vector Autoregressive Models”. Journal of Business & Economic Statistics, forthcoming. Huber, F., G. Kastner, and M. Feldkircher (2017). A New Approach Toward Detecting Structural Breaks in Vector Autoregressive Models. arXiv pre-print 1607.04532. url: https://arxiv.org/ abs/1607.04532. Ishihara, T. and Y. Omori (2017). “Portfolio Optimization Using Dynamic Factor and Stochastic Volatility: Evidence on Fat-Tailed Error and Leverage”. The Japanese Economic Review 68(1), 63–94. Jungbacker, B. and S. J. Koopman (2006). “Monte Carlo Likelihood Estimation for Three Multivariate Stochastic Volatility Models”. Econometric Reviews 25(2–3), 385–408. Kastner, G. (2016). “Dealing with Stochastic Volatility in Time Series Using the R Package stochvol”. Journal of Statistical Software 69(5), 1–30. Kastner, G. (2017). factorstochvol: Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models. R package version 0.8.4. url: https://CRAN.R-project.org/package=factorstochvol. Kastner, G. and S. Fr¨ uhwirth-Schnatter (2014). “Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models”. Computational Statistics and Data Analysis 76, 408–423. Kastner, G. and F. Huber (2017). Sparse Bayesian Vector Autoregressions in Huge Dimensions. arXiv pre-print 1704.03239. url: https://arxiv.org/abs/1704.03239. Kastner, G., S. Fr¨ uhwirth-Schnatter, and H. F. Lopes (2017). “Efficient Bayesian Inference for Multivariate Factor Stochastic Volatility Models”. Journal of Computational and Graphical Statistics, forthcoming. Kaufmann, S. and C. Schumacher (2013). Bayesian Estimation of Sparse Dynamic Factor Models With Order-Independent Identification. Working Paper 13.04. Swiss National Bank, Study Center Gerzensee. url: http://ideas.repec.org/p/szg/worpap/1304.html. Koop, G. and D. Korobilis (2013). “Large Time-Varying Parameter VARs”. Journal of Econometrics 177(2), 185–198. Korobilis, D. (2013). “Hierarchical Shrinkage Priors for Dynamic Regressions With Many Predictors”. International Journal of Forecasting 29(1), 43–59. Ledoit, O. and M. Wolf (2004). “Honey, I Shrunk the Sample Covariance Matrix”. The Journal of Portfolio Management 30(4), 110–119. Leydold, J. and W. H¨ ormann (2017). GIGrvg: Random Variate Generator for the GIG Distribution. R package version 0.5. url: http://CRAN.R-project.org/package=GIGrvg. Liesenfeld, R. and J.-F. Richard (2006). “Classical and Bayesian Analysis of Univariate and Multivariate Stochastic Volatility Models”. Econometric Reviews 25(2–3), 335–360. Liesenfeld, R. and R. C. Jung (2000). “Stochastic Volatility Models: Conditional Normality Versus Heavy-Tailed Distributions”. Journal of Applied Econometrics 15(2), 137–160. Loddo, A., S. Ni, and D. Sun (2011). “Selection of Multivariate Stochastic Volatility Models via Bayesian Stochastic Search”. Journal of Business & Economic Statistics 29(3), 342–355.

30

Lopes, H. F. and C. M. Carvalho (2007). “Factor Stochastic Volatility With Time Varying Loadings and Markov Switching Regimes”. Journal of Statistical Planning and Inference 137(10), 3082– 3091. Lopes, H. F., R. E. McCulloch, and R. S. Tsay (2016). Parsimony Inducing Priors for Large Scale State-Space Models. Tech. rep. Nakajima, J. and M. West (2013). “Dynamic Factor Volatility Modeling: A Bayesian Latent Threshold Approach”. Journal of Financial Econometrics 11(1), 116–153. Nakajima, J. and M. West (2017). “Dynamics and Sparsity in Latent Threshold Factor Models: A Study in Multivariate EEG Signal Processing”. Brazilian Journal of Probability and Statistics (forthcoming). Oh, D. H. and A. J. Patton (2017). “Modeling Dependence in High Dimensions with Factor Copulas”. Journal of Business & Economic Statistics 35(1). Pakel, C., N. Shephard, K. Sheppard, and R. F. Engle (2014). Fitting Vast Dimensional Time-Varying Covariance Models. Tech. rep. Park, T. and G. Casella (2008). “The Bayesian Lasso”. Journal of the American Statistical Association 103(452), 681–686. Pati, D., A. Bhattacharya, N. S. Pillai, and D. Dunson (2014). “Posterior Contraction in Sparse Bayesian Factor Models for Massive Covariance Matrices”. The Annals of Statistics 42(3), 1102– 1130. Philipov, A. and M. E. Glickman (2006). “Factor Multivariate Stochastic Volatility via Wishart Processes”. Econometric Reviews 25(2–3), 311–334. Pitt, M. K. and N. Shephard (1999). “Time-Varying Covariances: A Factor Stochastic Volatility Approach”. In: Bayesian Statistics 6 – Proceedings of the Sixth Valencia International Meeting. Ed. by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith. Oxford University Press, pp. 547–570. Polson, N. G. and J. G. Scott (2011). “Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction”. In: Bayesian Statistics 9 – Proceedings of the Ninth Valencia International Meeting. Ed. by J. M. Bernardo, M. J. Bayarri, J. O. Berger, D. A. P., D. Heckerman, A. F. M. Smith, and M. West. Oxford University Press, pp. 501–538. R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. url: https://www.R-project.org/. Rigobon, R. (2003). “Identification Through Heteroskedasticity”. Review of Economics and Statistics 85(4), 777–792. Sentana, E. and G. Fiorentini (2001). “Identification, Estimation and Testing of Conditionally Heteroskedastic Factor Models”. Journal of Econometrics 102(2), 143–164. Sharpe, W. F. (1966). “Mutual Fund Performance”. The Journal of Business 39(1), 119–138. Villani, M., R. Kohn, and P. Giordani (2009). “Regression Density Estimation Using Smooth Adaptive Gaussian Mixtures”. Journal of Econometrics 153(2), 155–173. Wang, H., C. Reeson, and C. M. Carvalho (2011). “Dynamic Financial Index Models: Modeling Conditional Dependencies via Graphs”. Bayesian Analysis 6(4), 639–664. Yu, J. (2005). “On Leverage in a Stochastic Volatility Model”. Journal of Econometrics 127(2), 165– 178. Zhao, Z. Y., M. Xie, and M. West (2016). “Dynamic Dependence Networks: Financial Time Series Forecasting and Portfolio Decisions”. Applied Stochastic Models in Business and Industry 32(3), 311–332. Zhou, X., J. Nakajima, and M. West (2014). “Bayesian Forecasting and Portfolio Decisions Using Dynamic Dependent Sparse Factor Models”. International Journal of Forecasting 30(4), 963–980.

31

Sparse Bayesian Time-Varying Covariance ...

Sparse Bayesian Time-Varying Covariance ...

Suggest Documents

Sparse Inverse Covariance Selection via Alternating Linearization ...

Sparse multivariate regression with covariance estimation

Sparse Inverse Covariance Selection via Alternating Linearization ...

Sparse inverse covariance estimation with the lasso

OPTIMAL RATES OF CONVERGENCE FOR SPARSE COVARIANCE ...

Dynamic covariance estimation using sparse ...

SPARSE COVARIANCE ESTIMATION UNDER ... - Semantic Scholar

Sparse Inverse Covariance Matrix Estimation Using Quadratic ...

Sparse Permutation Invariant Covariance Estimation: Motivation ...

Sparse covariance matrix estimation in high ...

Bayesian anti-sparse coding - arXiv

Sparse Bayesian structure learning with

Sparse Bayesian Graphical Vector Autoregression

Bayesian Estimates of Covariance Components ... - Semantic Scholar

ABCD-SE: Automatic Bayesian Covariance Discovery ...

Wideband DOA Estimation via Sparse Bayesian ... - Radioengineering

Sparse Bayesian vector autoregressions in huge ...

Bayesian Methods for Finding Sparse ... - Semantic Scholar

Multi-frequency sparse Bayesian learning for matched

Sparse Bayesian Classifiers for Text Categorization

Bayesian Sparse Propensity Score Estimation for Unit

Spatiotemporal Sparse Bayesian Learning With ... - IEEE Xplore

Bayesian Sparse Partial Least Squares - Computational Intelligence ...

SPARSE BAYESIAN ESTIMATION OF SUPERIMPOSED SIGNAL