Bayesian Inference and Selection in Smooth

1 downloads 0 Views 930KB Size Report
Jul 1, 1986 - •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit. Bayesian Inference ... of the talk. • Logistic smooth transition autoregressive (LSTAR) models ...... Statistical aspects of ARCH and stochastic volatility. In. D. Cox, D.
Bayesian Inference and Selection in Smooth Transition Autoregressive Models Hedibert Freitas Lopes Esther Salazar

Departament of Statistical Methods Federal University of Rio de Janeiro

http://acd.ufrj.br/∼hedibert/lavras2003.pdf

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

• Logistic smooth transition autoregressive (LSTAR) models

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

• Logistic smooth transition autoregressive (LSTAR) models • Bayesian inference through Markov Chain Monte Carlo (MCMC)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

• Logistic smooth transition autoregressive (LSTAR) models • Bayesian inference through Markov Chain Monte Carlo (MCMC) • Model selection: Information criteria – Traditional : AIC, BIC – Bayesian Deviance : DIC

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

• Logistic smooth transition autoregressive (LSTAR) models • Bayesian inference through Markov Chain Monte Carlo (MCMC) • Model selection: Information criteria – Traditional : AIC, BIC – Bayesian Deviance : DIC • Transdimensional modeling : Reversible Jump MCMC (RJMCMC)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

• Logistic smooth transition autoregressive (LSTAR) models • Bayesian inference through Markov Chain Monte Carlo (MCMC) • Model selection: Information criteria – Traditional : AIC, BIC – Bayesian Deviance : DIC • Transdimensional modeling : Reversible Jump MCMC (RJMCMC) • Stochastic volatility (SV) models – Univariate SV + LSTAR : SV-LSTAR – Factor stochastic volatility + LSTAR : FSV-LSTAR

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Outline of the talk

• Logistic smooth transition autoregressive (LSTAR) models • Bayesian inference through Markov Chain Monte Carlo (MCMC) • Model selection: Information criteria – Traditional : AIC, BIC – Bayesian Deviance : DIC • Transdimensional modeling : Reversible Jump MCMC (RJMCMC) • Stochastic volatility (SV) models – Univariate SV + LSTAR : SV-LSTAR – Factor stochastic volatility + LSTAR : FSV-LSTAR • Final thoughts

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Canadian Lynx Total number (log) of canadian lynx captured (anually) at Mackenzie river 1821 e 1934.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

SP500 New York Stock Exchange’s Standard and Poor’s index (S&P500) - 07/01/1986 e 31/12/1997 (3127 observations)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Latin American Stock indexes Log-rerturns of 5 latin american markets: USA (DOW JONES), Brazil (IBOVESPA), Mexico (MEXBOL), Argentina (MERVAL) an Chile (IPSA). The series are daily observed - 08/01/1994 and 02/14/2001 (Lopes and Migon 2002).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

LSTAR model of order p - LSTAR(p)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

LSTAR model of order p - LSTAR(p)

LSTAR(p) are autoregressive models with a smooth, endogenous and logistic transition structure: ! p p X X yt = θ01 + θi1 yt−i + θ02 + θi2 yt−i F (γ(yt−d − c)) + εt (1) i=1

i=1

where εt ∼ N (0, τ 2 ) and F (γ(yt−d − c)) = {1 + exp (−γ(yt−d − c))}−1

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

LSTAR model of order p - LSTAR(p)

LSTAR(p) are autoregressive models with a smooth, endogenous and logistic transition structure: ! p p X X yt = θ01 + θi1 yt−i + θ02 + θi2 yt−i F (γ(yt−d − c)) + εt (1) i=1

i=1

where εt ∼ N (0, τ 2 ) and F (γ(yt−d − c)) = {1 + exp (−γ(yt−d − c))}−1 • γ > 0 : smoothness parameter,

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

LSTAR model of order p - LSTAR(p)

LSTAR(p) are autoregressive models with a smooth, endogenous and logistic transition structure: ! p p X X yt = θ01 + θi1 yt−i + θ02 + θi2 yt−i F (γ(yt−d − c)) + εt (1) i=1

i=1

where εt ∼ N (0, τ 2 ) and F (γ(yt−d − c)) = {1 + exp (−γ(yt−d − c))}−1 • γ > 0 : smoothness parameter, • c : location parameter,

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

LSTAR model of order p - LSTAR(p)

LSTAR(p) are autoregressive models with a smooth, endogenous and logistic transition structure: ! p p X X yt = θ01 + θi1 yt−i + θ02 + θi2 yt−i F (γ(yt−d − c)) + εt (1) i=1

i=1

where εt ∼ N (0, τ 2 ) and F (γ(yt−d − c)) = {1 + exp (−γ(yt−d − c))}−1 • γ > 0 : smoothness parameter, • c : location parameter, • d : delay parameter,

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

LSTAR model of order p - LSTAR(p)

LSTAR(p) are autoregressive models with a smooth, endogenous and logistic transition structure: ! p p X X yt = θ01 + θi1 yt−i + θ02 + θi2 yt−i F (γ(yt−d − c)) + εt (1) i=1

i=1

where εt ∼ N (0, τ 2 ) and F (γ(yt−d − c)) = {1 + exp (−γ(yt−d − c))}−1 • γ > 0 : smoothness parameter, • c : location parameter, • d : delay parameter, • yt−d : transition variable.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 1: {1 + exp (−γz)}−1

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d) • We assume conditionally conjugate prior distributions – θ i ∼ N (mi , σi2 Ip )

i = 1, 2.

– γ ∼ G(a, b), c ∼ N (mc , σc2 ) and τ 2 ∼ GI(v/2, vs2 /2).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d) • We assume conditionally conjugate prior distributions – θ i ∼ N (mi , σi2 Ip )

i = 1, 2.

– γ ∼ G(a, b), c ∼ N (mc , σc2 ) and τ 2 ∼ GI(v/2, vs2 /2). • Exact Bayesian inference is possible through MCMC.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d) • We assume conditionally conjugate prior distributions – θ i ∼ N (mi , σi2 Ip )

i = 1, 2.

– γ ∼ G(a, b), c ∼ N (mc , σc2 ) and τ 2 ∼ GI(v/2, vs2 /2). • Exact Bayesian inference is possible through MCMC. • γ and c are highly correlated (Tersvirta 1994) - Figura 2.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d) • We assume conditionally conjugate prior distributions – θ i ∼ N (mi , σi2 Ip )

i = 1, 2.

– γ ∼ G(a, b), c ∼ N (mc , σc2 ) and τ 2 ∼ GI(v/2, vs2 /2). • Exact Bayesian inference is possible through MCMC. • γ and c are highly correlated (Tersvirta 1994) - Figura 2. • γ and c are, therefore, sampled jointly.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d) • We assume conditionally conjugate prior distributions – θ i ∼ N (mi , σi2 Ip )

i = 1, 2.

– γ ∼ G(a, b), c ∼ N (mc , σc2 ) and τ 2 ∼ GI(v/2, vs2 /2). • Exact Bayesian inference is possible through MCMC. • γ and c are highly correlated (Tersvirta 1994) - Figura 2. • γ and c are, therefore, sampled jointly. • d is easily sample: discrete priori.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Bayesian inference in LSTAR(p) models

Let θ 1 = (θ01 , θ11 , . . . , θp1 ), θ 2 = (θ02 , θ12 , . . . , θp2 ), φ = (γ, c, τ 2 ) and θ = (θ 1 , θ 2 , φ). Then, p(θ | y, p, d) ∝ p(y | θ, p, d)π(θ | p, d) • We assume conditionally conjugate prior distributions – θ i ∼ N (mi , σi2 Ip )

i = 1, 2.

– γ ∼ G(a, b), c ∼ N (mc , σc2 ) and τ 2 ∼ GI(v/2, vs2 /2). • Exact Bayesian inference is possible through MCMC. • γ and c are highly correlated (Tersvirta 1994) - Figura 2. • γ and c are, therefore, sampled jointly. • d is easily sample: discrete priori. • Uncertainty about p is assessed through RJMCMC (Green 1995).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 2: Likelihood function of (γ, c) for a LSTAR(1) with T = 1000,θ01 = 0, θ11 = 0.74, θ02 = 0.02, θ12 = 0.1, γ = 20 and τ 2 = 0.022 . Also, c = (0.02, 0.05, 0.10, 0.12, 0.15).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Model selection: information criteria

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Model selection: information criteria

AIC Akaike (1974) AIC = −2 log(p(y | θ)) + 2d

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Model selection: information criteria

AIC Akaike (1974) AIC = −2 log(p(y | θ)) + 2d SBIC Schwarz (1978) BIC = −2 log(p(y | θ)) + k log T

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Model selection: information criteria

AIC Akaike (1974) AIC = −2 log(p(y | θ)) + 2d SBIC Schwarz (1978) BIC = −2 log(p(y | θ)) + k log T where k is the number of parameters, which equals, in a LSTAR(p) model, 2p + 5.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Model selection: information criteria

AIC Akaike (1974) AIC = −2 log(p(y | θ)) + 2d SBIC Schwarz (1978) BIC = −2 log(p(y | θ)) + k log T where k is the number of parameters, which equals, in a LSTAR(p) model, 2p + 5. It is not always trivial to define k.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Model selection: information criteria

AIC Akaike (1974) AIC = −2 log(p(y | θ)) + 2d SBIC Schwarz (1978) BIC = −2 log(p(y | θ)) + k log T where k is the number of parameters, which equals, in a LSTAR(p) model, 2p + 5. It is not always trivial to define k. What if d and, more importantly, p are included in the analysis?

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ) Spiegelhalter, Best, Carlin, and van der Linde (2002) ¯ + 2pD DIC = −2 log p(y | θ)

(2)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ) Spiegelhalter, Best, Carlin, and van der Linde (2002) ¯ + 2pD DIC = −2 log p(y | θ) DIC =

¯ D + pD |{z} |{z} goodness of fit model complexity

(2) (3)

¯ ¯ = E(D(θ) | y) and pD = D ¯ − D(θ). where θ¯ = E(θ | y), D

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ) Spiegelhalter, Best, Carlin, and van der Linde (2002) ¯ + 2pD DIC = −2 log p(y | θ) DIC =

¯ D + pD |{z} |{z} goodness of fit model complexity

(2) (3)

¯ ¯ = E(D(θ) | y) and pD = D ¯ − D(θ). where θ¯ = E(θ | y), D pd : effective number of parameters.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ) Spiegelhalter, Best, Carlin, and van der Linde (2002) ¯ + 2pD DIC = −2 log p(y | θ) DIC =

¯ D + pD |{z} |{z} goodness of fit model complexity

(2) (3)

¯ ¯ = E(D(θ) | y) and pD = D ¯ − D(θ). where θ¯ = E(θ | y), D pd : effective number of parameters. Small values of DIC suggests a better-fitting model.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ) Spiegelhalter, Best, Carlin, and van der Linde (2002) ¯ + 2pD DIC = −2 log p(y | θ) DIC =

¯ D + pD |{z} |{z} goodness of fit model complexity

(2) (3)

¯ ¯ = E(D(θ) | y) and pD = D ¯ − D(θ). where θ¯ = E(θ | y), D pd : effective number of parameters. Small values of DIC suggests a better-fitting model. DIC generalizes the AIC.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The Deviance Information Criterion (DIC)

Dempster (1974): posterior of deviance D(θ) = −2 log p(y | θ) Spiegelhalter, Best, Carlin, and van der Linde (2002) ¯ + 2pD DIC = −2 log p(y | θ) DIC =

¯ D + pD |{z} |{z} goodness of fit model complexity

(2) (3)

¯ ¯ = E(D(θ) | y) and pD = D ¯ − D(θ). where θ¯ = E(θ | y), D pd : effective number of parameters. Small values of DIC suggests a better-fitting model. DIC generalizes the AIC. DIC approximately describes the expected posterior loss when adopting a particular model and applying a logarithmic loss function.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Computing DIC through MCMC ouput

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Computing DIC through MCMC ouput

The DIC is computationally attractive since its terms can be easily computed during an MCMC run, as opposed to, for instance, Bayes factors.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Computing DIC through MCMC ouput

The DIC is computationally attractive since its terms can be easily computed during an MCMC run, as opposed to, for instance, Bayes factors. More especifically, let θ(1) , . . . , θ(M ) be an MCMC sample from p(θ | y). Then, ! M M X X ¯ ≈D 1 ¯≈ 1 D D(θ(i) ) and D(θ) θ(i) M i=1 M i=1

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Computing DIC through MCMC ouput

The DIC is computationally attractive since its terms can be easily computed during an MCMC run, as opposed to, for instance, Bayes factors. More especifically, let θ(1) , . . . , θ(M ) be an MCMC sample from p(θ | y). Then, ! M M X X ¯ ≈D 1 ¯≈ 1 D D(θ(i) ) and D(θ) θ(i) M i=1 M i=1 DIC in hierarchical (spatio-temporal) models: Zhu and Carlin (2000) DIC in stochastic volatilty models: Berg, Meyer, and Yu (2002)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Simulated series Besides analysing the Canadian Lynx series, we simulated T=1000 obsrvations from the follwoing LSTAR(2): yt = 1.8yt−1 − 1.06yt−2 + (0.02 − 0.9yt−1 + 0.795yt−2 )F (yt−2 ) + εt

(4)

Onde F (yt−d ) = [1 + exp{−20(yt−2 − 0.02)}]−1 e εt ∼ N (0, 0.022 ).

Figure 3: simulated LSTAR(2) : (θ01 , θ11 , θ21 ) (0.02, −0.9, 0.795), (γ, c, τ ) = (20, 0.02, 0.02).

=

(0.0, 1.8, −1.06), (θ02 , θ12 , θ22 )

=

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Models

AIC

BIC

DIC

p

d

1

1 2 3

-4637.6 -4934.2 -4805.8

-4603.2 -4899.9 -4771.5

-4644.9 -4936.0 -4811.4

2

1 2 3

-4957.4 -4980.6 -4946.8

-4913.3 -4936.5 -4902.7

-4975.4 -4985.9 -4912.5

3

1 2 3

-3905.8 -4974.0 -4945.5

-3851.9 -4920.1 -4891.5

-3927.8 -4982.6 -4960.4

Table 1:

Model selection - simulated series.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Canadian Lynx

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Models

AIC

BIC

DIC

1 2 3

91.857 91.760 91.808

110.82 110.73 110.77

83.684 83.789 84.245

2

1 2 3

5.6963 88.260 5.8371

30.082 112.65 30.223

-4.4869 -131.59 -9.1606

3

1 2 3

51.394 51.371 51.235

81.198 81.175 81.040

37.193 37.011 36.811

p

d

1

Table 2:

Model selection - Canadian Lynx.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Par θ01 θ11 θ21 γ τ2

True Posterior value mean (st.dev.) 0 0.00 (0.01) 1.8 1.74 (0.16) -1.06 -1.01 (0.13) 20 26.1 (9.94) 0.022 0.022 (1.74e-5)

Par True Posterior Par value mean (st.dev.) θ02 0.02 0.01 (0.014) θ12 -0.9 -0.77 (0.25) θ22 0.795 0.71 (0.21) c 0.02 0.02 (0.02)

Table 3: Simulated series: Posterior mean of the parameters of the LSTAR(2). Chain size = 5000. Standard deviations in parenthesis.

Par θ01 θ11 θ21 γ τ2

Posterior mean (st.dev.) 0.3367 (0.2046) 1.2344 (0.0732) -0.2894 (0.1145) 10.637 (4.2304) 0.0417 (0.0048)

Par Par θ02 θ12 θ22 c

Posterior mean (st.dev.) -0.7422 (0.4001) 0.3758 (0.1531) -0.3118 (0.1807) 3.295 (0.0767)

Table 4: Canadian Lynx: Posterior mean of the parameters of the LSTAR(2). Chain size = 5000. Standard deviations in parenthesis.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional distribution of d

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional distribution of d

Inference about d is trivial. Let, for simplicity, that d is uniformely distributed on the integers 1, 2, . . . , dm ,i.e. p(d = j) = 1/dm , j ∈ {1, . . . , dm }

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional distribution of d

Inference about d is trivial. Let, for simplicity, that d is uniformely distributed on the integers 1, 2, . . . , dm ,i.e. p(d = j) = 1/dm , j ∈ {1, . . . , dm } Therefore, p(d = j | y, θ) ∝ p(y | θ, d = j)p(d = j) ∝ p(y | θ, d = j) j = 1, . . . , dm

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional distribution of d

Inference about d is trivial. Let, for simplicity, that d is uniformely distributed on the integers 1, 2, . . . , dm ,i.e. p(d = j) = 1/dm , j ∈ {1, . . . , dm } Therefore, p(d = j | y, θ) ∝ p(y | θ, d = j)p(d = j) ∝ p(y | θ, d = j) j = 1, . . . , dm

P r(d | y) is estimated by the proportion of visits to each d.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional distribution of d

Inference about d is trivial. Let, for simplicity, that d is uniformely distributed on the integers 1, 2, . . . , dm ,i.e. p(d = j) = 1/dm , j ∈ {1, . . . , dm } Therefore, p(d = j | y, θ) ∝ p(y | θ, d = j)p(d = j) ∝ p(y | θ, d = j) j = 1, . . . , dm

P r(d | y) is estimated by the proportion of visits to each d. Simulated data: P r(d = 2 | y, p = 2) = 0.98. Canadian Lynx: P r(d = 2 | y, p = 2) = 0.75.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Conditional distribution of d

Inference about d is trivial. Let, for simplicity, that d is uniformely distributed on the integers 1, 2, . . . , dm ,i.e. p(d = j) = 1/dm , j ∈ {1, . . . , dm } Therefore, p(d = j | y, θ) ∝ p(y | θ, d = j)p(d = j) ∝ p(y | θ, d = j) j = 1, . . . , dm

P r(d | y) is estimated by the proportion of visits to each d. Simulated data: P r(d = 2 | y, p = 2) = 0.98. Canadian Lynx: P r(d = 2 | y, p = 2) = 0.75. =⇒ We discuss next a transdimensional algorithm (Peter Green’s RJMCMC) that is used to assess the uncertainty of p, the autoregression lag-order.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo RJMCMC is a generalization of the Metropolis-Hastings algorithm to allow moves between parameter spaces of different dimension.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo RJMCMC is a generalization of the Metropolis-Hastings algorithm to allow moves between parameter spaces of different dimension. • J(k → k 0 ) : the probability of proposing a move from model k to k 0 , 0

• q(θ(k ) | k 0 , k, θ(k) ) : Proposal distributions.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo RJMCMC is a generalization of the Metropolis-Hastings algorithm to allow moves between parameter spaces of different dimension. • J(k → k 0 ) : the probability of proposing a move from model k to k 0 , 0

• q(θ(k ) | k 0 , k, θ(k) ) : Proposal distributions. 0  • The acceptance probability is A (k, θ(k) ) → (k 0 , θ(k ) ) = min(1, A), where

0

0

0

p(y | θ(k ) , k 0 )p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) q(θ(k) | k, k 0 , θ(k ) ) A= p(y | θk , k)p(θk | k)p(k) J(k → k 0 ) q(θ(k0 ) | k 0 , k, θ(k) ) | {z }| {z } Target density ratio Transition probability ratio

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo RJMCMC is a generalization of the Metropolis-Hastings algorithm to allow moves between parameter spaces of different dimension. • J(k → k 0 ) : the probability of proposing a move from model k to k 0 , 0

• q(θ(k ) | k 0 , k, θ(k) ) : Proposal distributions. 0  • The acceptance probability is A (k, θ(k) ) → (k 0 , θ(k ) ) = min(1, A), where

0

0

0

p(y | θ(k ) , k 0 )p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) q(θ(k) | k, k 0 , θ(k ) ) A= p(y | θk , k)p(θk | k)p(k) J(k → k 0 ) q(θ(k0 ) | k 0 , k, θ(k) ) | {z }| {z } Target density ratio Transition probability ratio • The choice of q is crucial to the sucess of the algorithm.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo RJMCMC is a generalization of the Metropolis-Hastings algorithm to allow moves between parameter spaces of different dimension. • J(k → k 0 ) : the probability of proposing a move from model k to k 0 , 0

• q(θ(k ) | k 0 , k, θ(k) ) : Proposal distributions. 0  • The acceptance probability is A (k, θ(k) ) → (k 0 , θ(k ) ) = min(1, A), where

0

0

0

p(y | θ(k ) , k 0 )p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) q(θ(k) | k, k 0 , θ(k ) ) A= p(y | θk , k)p(θk | k)p(k) J(k → k 0 ) q(θ(k0 ) | k 0 , k, θ(k) ) | {z }| {z } Target density ratio Transition probability ratio • The choice of q is crucial to the sucess of the algorithm. 0

0

• q(θ(k ) | k 0 , k, θ(k) ) should mimic p(θ(k ) | k 0 , y).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Reversible Jump Markov Chain Monte Carlo RJMCMC is a generalization of the Metropolis-Hastings algorithm to allow moves between parameter spaces of different dimension. • J(k → k 0 ) : the probability of proposing a move from model k to k 0 , 0

• q(θ(k ) | k 0 , k, θ(k) ) : Proposal distributions. 0  • The acceptance probability is A (k, θ(k) ) → (k 0 , θ(k ) ) = min(1, A), where

0

0

0

p(y | θ(k ) , k 0 )p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) q(θ(k) | k, k 0 , θ(k ) ) A= p(y | θk , k)p(θk | k)p(k) J(k → k 0 ) q(θ(k0 ) | k 0 , k, θ(k) ) | {z }| {z } Target density ratio Transition probability ratio • The choice of q is crucial to the sucess of the algorithm. 0

0

• q(θ(k ) | k 0 , k, θ(k) ) should mimic p(θ(k ) | k 0 , y). Lopes (2000): http://www.stat.duke.edu/phdthesis

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

RJMCMC for LSTAR(k) Time Series

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

RJMCMC for LSTAR(k) Time Series In the case of a LSTAR model, θ is directly sampled from its posterior dis0 0 tribution, given φ = (γ, c, τ 2 ),i.e. θ (k ) ∼ p(θ(k ) | k 0 , φ), where k is the model index.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

RJMCMC for LSTAR(k) Time Series In the case of a LSTAR model, θ is directly sampled from its posterior dis0 0 tribution, given φ = (γ, c, τ 2 ),i.e. θ (k ) ∼ p(θ(k ) | k 0 , φ), where k is the model index. Therefore, 0

0

p(y | θ(k ) , k 0 , φ)p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) p(θ(k) | y, k, φ) A= p(y | θk , k, φ)p(θk | k)p(k) J(k → k 0 ) p(θ(k0 ) | y, k 0 , φ)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

RJMCMC for LSTAR(k) Time Series In the case of a LSTAR model, θ is directly sampled from its posterior dis0 0 tribution, given φ = (γ, c, τ 2 ),i.e. θ (k ) ∼ p(θ(k ) | k 0 , φ), where k is the model index. Therefore, 0

0

p(y | θ(k ) , k 0 , φ)p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) p(θ(k) | y, k, φ) A= p(y | θk , k, φ)p(θk | k)p(k) J(k → k 0 ) p(θ(k0 ) | y, k 0 , φ) But p(y | θ, k, φ)p(θ | k)p(k) = p(k | y, φ) p(θ | y, k, φ) Z ∝ p(k) p(y | θ, k, φ)p(θ | k)dθ

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

RJMCMC for LSTAR(k) Time Series In the case of a LSTAR model, θ is directly sampled from its posterior dis0 0 tribution, given φ = (γ, c, τ 2 ),i.e. θ (k ) ∼ p(θ(k ) | k 0 , φ), where k is the model index. Therefore, 0

0

p(y | θ(k ) , k 0 , φ)p(θ(k ) | k 0 )p(k 0 ) J(k 0 → k) p(θ(k) | y, k, φ) A= p(y | θk , k, φ)p(θk | k)p(k) J(k → k 0 ) p(θ(k0 ) | y, k 0 , φ) But p(y | θ, k, φ)p(θ | k)p(k) = p(k | y, φ) p(θ | y, k, φ) Z ∝ p(k) p(y | θ, k, φ)p(θ | k)dθ The acceptance probability can be written as,   p(k 0 | y, φ) J(k 0 → k) min 1, p(k | y, φ) J(k → k 0 )

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

When J(k 0 → k) = J(k → k 0 ) for all k, k 0 , the acceptance probability becomes,   p(k 0 | y, φ) min 1, p(k | y, φ) In other words, the posterior odds ratio drives the jumping rate.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

When J(k 0 → k) = J(k → k 0 ) for all k, k 0 , the acceptance probability becomes,   p(k 0 | y, φ) min 1, p(k | y, φ) In other words, the posterior odds ratio drives the jumping rate. The table below summarizes a run of length 5000 draws from the Reversible Jump Markov chain Monte Carlo algorithm when applied to the simulated LSTAR(2) series. p 1 1 0 2 1 3 0 4 1

d 2 3 0 0 4785 9 7 183 1 13

Table 5: Simulated LSTAR(2)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Stochastic volatility with LSTAR(p) structure

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Stochastic volatility with LSTAR(p) structure

The most traditional (univariate) stochastic volatility model is, yt ∼ N (0, eht ) ht = µ + φ(ht−1 − µ) + εt

εt ∼ N (0, σ 2 )

which is the well-known SVAR(1) model. See, for instance, Jarquier, Polson, and Rossi (1994),Shephard (1996), Kim, Shephard, and Chib (1998) and Chib, Nardari, and Shephard (2002)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Stochastic volatility with LSTAR(p) structure

The most traditional (univariate) stochastic volatility model is, yt ∼ N (0, eht ) ht = µ + φ(ht−1 − µ) + εt

εt ∼ N (0, σ 2 )

which is the well-known SVAR(1) model. See, for instance, Jarquier, Polson, and Rossi (1994),Shephard (1996), Kim, Shephard, and Chib (1998) and Chib, Nardari, and Shephard (2002) We generalize the SVAR(1) model by replacing the simple AR(1) structure by a LSTAR(p). ht | ht−1 = g(θ, ht−1 ) + εt where ht−1 = (ht−1 , . . . , ht−p , ht−d ) and g(θ, ht−1 ) = θ01 +

p X i=1

θi1 ht−i +

θ02 +

p X

! θi2 ht−i F (γ(ht−d − c))

i=1

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The SV-LSTAR(p) is a particular case of West and Harrison’s nonnormal/nonlinear dynamic model (West and Harrison 1997).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The SV-LSTAR(p) is a particular case of West and Harrison’s nonnormal/nonlinear dynamic model (West and Harrison 1997). Given h0 , h1 , . . . , hT , inference about θ follows the previous algorithms.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The SV-LSTAR(p) is a particular case of West and Harrison’s nonnormal/nonlinear dynamic model (West and Harrison 1997). Given h0 , h1 , . . . , hT , inference about θ follows the previous algorithms. Given θ, inference about h0 , h1 , . . . , hT is performed by using Jarquier, Polson and Rossi’s algorithm (Jarquier, Polson, and Rossi 1994).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The SV-LSTAR(p) is a particular case of West and Harrison’s nonnormal/nonlinear dynamic model (West and Harrison 1997). Given h0 , h1 , . . . , hT , inference about θ follows the previous algorithms. Given θ, inference about h0 , h1 , . . . , hT is performed by using Jarquier, Polson and Rossi’s algorithm (Jarquier, Polson, and Rossi 1994).

Figure 4: SP500: Posteriori mean of log-volatility from a SV-LSTAR(2).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Common Factor Structure

See (Lopes and West 2003) for more details.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Factor stochastic volatility

• Factor Model: (y t | f t , β t , Σt ) ∼ N (β t f t , Σt ) 2 2 for Σt = diag(σt1 , . . . , σtn ).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Factor stochastic volatility

• Factor Model: (y t | f t , β t , Σt ) ∼ N (β t f t , Σt ) 2 2 for Σt = diag(σt1 , . . . , σtn ).

• Factors: (f t | H t ) ∼ N (0, H t ) for H t = diag(eht1 , . . . , ehtk ).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Factor stochastic volatility

• Factor Model: (y t | f t , β t , Σt ) ∼ N (β t f t , Σt ) 2 2 for Σt = diag(σt1 , . . . , σtn ).

• Factors: (f t | H t ) ∼ N (0, H t ) for H t = diag(eht1 , . . . , ehtk ). • Factors’ volatilities: – FSV-AR(1) : htj ∼ AR(1) Lopes, Aguilar, and West (2000), Lopes and Migon (2002)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Factor stochastic volatility

• Factor Model: (y t | f t , β t , Σt ) ∼ N (β t f t , Σt ) 2 2 for Σt = diag(σt1 , . . . , σtn ).

• Factors: (f t | H t ) ∼ N (0, H t ) for H t = diag(eht1 , . . . , ehtk ). • Factors’ volatilities: – FSV-AR(1) : htj ∼ AR(1) Lopes, Aguilar, and West (2000), Lopes and Migon (2002) – FSV-MSAR(1) : htj ∼ AR(1) with Markov Switching Marinho and Lopes (2002)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Factor stochastic volatility

• Factor Model: (y t | f t , β t , Σt ) ∼ N (β t f t , Σt ) 2 2 for Σt = diag(σt1 , . . . , σtn ).

• Factors: (f t | H t ) ∼ N (0, H t ) for H t = diag(eht1 , . . . , ehtk ). • Factors’ volatilities: – FSV-AR(1) : htj ∼ AR(1) Lopes, Aguilar, and West (2000), Lopes and Migon (2002) – FSV-MSAR(1) : htj ∼ AR(1) with Markov Switching Marinho and Lopes (2002) – FSV-LSTAR(1) : htj ∼ LST AR(p)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 5: FSV-LSTAR(2) - common factor.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 6: FSV-LSTAR(2) - factor’s variance

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 7: FSV-MS - common factors

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Figure 8: FSV-MS - factor’s variances

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward;

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC;

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible;

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for. • Some (not necessarily good) ideas

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for. • Some (not necessarily good) ideas – More general switching mechanisms;

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for. • Some (not necessarily good) ideas – More general switching mechanisms; – More general stochastic volatility models;

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for. • Some (not necessarily good) ideas – More general switching mechanisms; – More general stochastic volatility models; – Lubrano (2000) suggests using p(θ1 −θ2 | γ) ∼ N (0, eγ σ 2 ) and p(γ) ∝ 1/(1 + γ).

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for. • Some (not necessarily good) ideas – More general switching mechanisms; – More general stochastic volatility models; – Lubrano (2000) suggests using p(θ1 −θ2 | γ) ∼ N (0, eγ σ 2 ) and p(γ) ∝ 1/(1 + γ). – Sequential Monte Carlo with hyperparameters - Doucet and Tadic (2002),Storvik (2002),Polson, Stroud, and Muller (2002)

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Final thoughts • Few comments... – MCMC in LSTAR-type models is fairly straightforward; – DIC : an alternative to AIC and BIC; – RJMCMC : Bayesian model averaging possible; – Uncertainty about d and p are fully accounted for. • Some (not necessarily good) ideas – More general switching mechanisms; – More general stochastic volatility models; – Lubrano (2000) suggests using p(θ1 −θ2 | γ) ∼ N (0, eγ σ 2 ) and p(γ) ∝ 1/(1 + γ). – Sequential Monte Carlo with hyperparameters - Doucet and Tadic (2002),Storvik (2002),Polson, Stroud, and Muller (2002) – Time varying number of factors, kt .

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

References Akaike, H. (1974). New look at the statistical model identification. IEEE Transactions in Automatic Control AC–19, 716–723. Berg, A., R. Meyer, and J. Yu (2002). Deviance information criterion for comparing stochastic volatility models. Working paper http://www.stat.auckland.ac.nz/ meyer/dic.ps, Department of Statistics, University of Auckland. Chib, S., F. Nardari, and N. Shephard (2002). Markov chain monte carlo methods stochastic volatility models. Journal of Econometrics 108, 281– 316. Dempster, A. (1974). The direct use of likelihood for significance testing. In Proceedings of Conference on Foundational Questions in Statistical Inference, pp. 335–352. Doucet, A. and V. Tadic (2002). Parameter estimation in general state-space mdoels using particle methods. Working paper, Signal processing group, department of engineering. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732. •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Jarquier, E., N. Polson, and P. Rossi (1994). Bayesian analysis of stochastic volatility models (with discussion). Journal Bussines of Econometrics Statistics 12(4), 371–417. Kim, S., N. Shephard, and S. Chib (1998). Stochastic volatility: likelihood inference and comparison with ARCH models. Review of Economic Studies 65, 361–393. Lopes, H. (2000). Bayesian Analysis in Latent Factor and Longitudinal Models. Ph. D. thesis, Duke University. Lopes, H. and M. West (2003). Bayesian model assessment in factor analysis. Statistica Sinica (forthcoming). Lopes, H. F., O. Aguilar, and M. West (2000). Time-varying covariance structures in currency markets. In Proceedings of the XXII Brazilian Meeting of Econometrics. Lopes, H. F. and H. Migon (2002). Comovements and contagion in emergent markets: stock indexes volatilities. Case Studies in Bayesian Statistics 6, 285–300. Marinho, C. and H. F. Lopes (2002). Simulation-based sequential analysis of Markov switching stochastic volatility models. Technical report, Depart-

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

ment of Statistical Methods, Federal Universit of Rio de Janeiro. Polson, N., J. Stroud, and P. Muller (2002). Practical filtering for stochastic volatility models. Working paper, Graduate School of Business. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 461–464. Shephard, N. (1996). Statistical aspects of ARCH and stochastic volatility. In D. Cox, D. Hinkley, and O. Barndorff-Nielsen (Eds.), Time Series Models in Econometrics, Finance and Other Fields. London: Chapman and Hall. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B 64, 583–639. Storvik, G. (2002). Particle filters in state space models with the presence of unknown static parameters. IEEE Trans. on Signal Processing, 281. Tersvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive models. Journal of the American Statistical Association 89(425), 208–218. West, M. and P. Harrison (1997). Bayesian Forecasting and Dynamic Model. New York: Springer-Verlag. •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Zhu, L. and B. Carlin (2000). Comparing hierarchical models for spatiotemporally misaligned data using the deviance information criterion. Statistics in Medicine 19, 2265–2278.

•First •Prev •Next •Last •Go Back •Full Screen •Close •Quit