Modeling dynamic diurnal patterns in high frequency financial data

0 downloads 104 Views 6MB Size Report
Jan 10, 2013 - near market open. This feature of our model also highlights one of the advantages of the expo- nential li
Modeling dynamic diurnal patterns in high frequency financial data Preliminary draft Ryoko Ito∗ University of Cambridge January 10, 2013

Abstract We extend the dynamic conditional score (DCS) model of Andres and Harvey (2012) and develop an intra-day DCS model with dynamic cubic spline to be fitted to highfrequency observations with periodic behavior. The spline can allow for periodic behavior and seasonal effects that may evolve over time, and it is based on the work of Harvey and Koopman (1993). The periodic component can be estimated simultaneously with all other components of the model by the method of maximum likelihood. We fit our splineDCS model to ultra-high frequency observations of IBM trade volume. We estimate the dynamics of conditional scale of data and illustrate the impressive in-sample and predictive performance of our model.

1

Introduction

Countless studies in finance have been devoted to modeling and forecasting financial variables using variants of multiplicative error models, whose dynamics are driven by a martingale process constructed from the moment conditions of observations. Prototypes of such models are the autoregressive conditional heteroscedasticity (ARCH) model of Engle (1982), generalized ARCH (GARCH) of Bollerslev (1986) and Taylor (1986), and autoregressive conditional duration (ACD) model of Engle and Russell (1998). These classes of observation-driven models are considered to be the most popular for forecasting volatility of returns, trade duration, and intensity in finance. Their success can be attributed to their simplicity, practicality, and the ease of making statistical inference. Because the predictive filter of these models are essentially a linear function of past observations (or squared observations in the context of return volatility), the filter can be said to respond too much to extreme observations, whose effects can be too persistent, causing strong serial correlation in the estimation residuals that are difficult to remove. From a statistical perspective, whether an extreme observation should be treated as an outlier depends on the shape of distribution against which data is tested. If we should anticipate occasional extreme observations with non-negligible probability, it may be pertinent to capture them as part of the tail(s) of underlying distribution and subdue their effects on the dynamics of the conditional distribution instead of reacting sensitively to them via persistent movements in the time-varying parameter of the distribution. ∗

Faculty of Economics, University of Cambridge. Email: [email protected]. I would like to thank my supervisor Professor Andrew Harvey for his helpful guidance throughout the year. I would also like to thank Philipp Andres and Stephen Thiele for providing thoughtful comments on an early version of this paper.

1

Harvey and Chakravarty (2008) take a step in the direction called for above by introducing a new type of observation-driven model called Beta-t-EGARCH. Beta-t-EGARCH falls in the class of dynamic conditional score (DCS) model formally defined and studied by Harvey (2010), Harvey (2013), and Andres and Harvey (2012). The advantages of DCS are fourfold. First, the parameters of the model can be estimated easily by the method of maximum likelihood. The consistency and asymptotic normality of the maximum likelihood estimators are established by Harvey (2010) and Andres and Harvey (2012). Second, the analytical expressions of the multi-step optimal predictors1 as well as their mean square errors are available. Third, these analytic and asymptotic properties of the model extend to a number of heavy- or long-tailed distributions against which the distribution of observations may be tested. Fourth, because the predictive filter of the model is driven by the conditional score,2 the effects of news on the filtered dynamics are bounded even in the presence of extreme observations. The predictive DCS filter can be applied either to the scale parameter or the location parameter of a given distribution.3 The basic DCS model applied to scale is defined as follows. Given a sequence of T observations (yt )Tt=0 , we denote its underlying random data generating process as (Yt )Tt=0 . We assume that this process is defined on the probability space (Ω, F, P) equipped with a filtration (Ft )Tt=0 , and that F0 trivial so that Y0 is almost surely constant. Each Yt follows some unknown cumulative distribution function (cdf) Ft for t = 1, . . . , T . To model the observations (yt )Tt=0 , the scale DCS model assumes that there exist a constant location factor c ∈ R and a standard cumulative distribution function (cdf) F such that (Yt − c)/αt|t−1 |Ft−1 ∼ iid F

t = 1, . . . T

(1)

T for a sequence of the time-varying scale factor αt|t−1 t=1 , where αt|t−1 > 0 for all t = 1, . . . , T . We denote ·t|t−1 to reflect the conditionality of the variable at time t on Ft−1 . F is a short-hand notation for the standard cdf F (·; θ) re-centered around the origin with a constant vector θ of distribution parameters that does not include c and αt|t−1 . The probability density/mass function (pdf) of F is denoted f . The location DCS model of Harvey and Luati (2012) assumes that the location parameter c is time varying and can be estimated by the DCS filter. The first-order DCS filter for scale is defined as αt|t−1 = exp(λt|t−1 ),

λt|t−1 = δ + φλt−1|t−2 + κut−1 ,

t = 1, . . . T,

(2)

where ut is the conditional score   ∂ log e−λt|t−1 f (yt e−λt|t−1 ; θ) ut = ∂λt|t−1 for t = 1, . . . , T . We also have δ ∈ R, κ > 0, and |φ| < 1 if (λt|t−1 )Tt=0 is stationary. This scale DCS filters the dynamics of αt|t−1 via the exponential link function with the canonical link parameter λt|t−1 ∈ R for t = 1, . . . , T . The exponential link is useful as it eliminates the need to impose parameter restrictions in order to ensure that the scale parameter αt|t−1 remains positive for all t. However, the choice of link function should always be consistent with the parametric choice of F for the asymptotic properties of DCS to go through. For example, the logit link may be used if F is binomial. See Harvey (2013). The choice of F should always 1

This optimality is in terms of minimum mean square error (MMSE). It is the first derivative of the log-likelihood of a single observation, where the derivative is taken with respect to the dynamic parameter of the model. 3 See Harvey and Chakravarty (2008) and Andres and Harvey (2012) for the DCS model applied to scale, and Harvey and Luati (2012) for an application to location. 2

2

depend on the main characteristics and empirical distribution of data.4 Andres and Harvey (2012) study a special case where yt is non-negative for all t, and consider a number of nonnegative distributions for F including generalized gamma (GG), generalized beta (GB), and log-normal. In contrast, Beta-t-EGARCH of Harvey and Chakravarty (2008) is another special case of DCS where F is characterized by the student’s t-distribution in the context of return volatility. If F is characterized by the GG distribution, ut is a linear function of a gamma distributed random variable. In contrast, if F is characterized by the GB distribution, ut is a linear function of a beta distributed random variable. In both cases, ut is iid given Ft−1 for t = 1, . . . , T . See Appendix A. Appendix B describes the features of the DCS model in a greater detail. DCS is a relatively new research frontier and, as such, its practicality in empirical financial analysis is still to be explored in a variety of applications. This paper contributes to the DCS literature by constructing a variant of the DCS model to be fitted to ultra high-frequency observations of financial variables. Our empirical results illustrate that DCS becomes a powerful tool for approximating the local dynamics and conditional distribution of high-frequency data provided that we thoroughly understand the main characteristics of data and the parametric assumptions are formulated to reflect them. There are a number of stylized characteristics of high-frequency financial data. One of them is a concentration of zero-valued observations. See Hautsch (2012) for recent evidence. Our model deals with a non-negligible number of zero observations by decomposing the conditional distribution of data at zero and applying the DCS filter only to non-zero observations. This decomposition method is a standard technique in economics and finance, and it is adopted by many. See, for instance, Amemiya (1973), Heckman (1974), McCulloch and Tsay (2001), Rydberg and Shephard (2003), and Hautsch et al. (2010). Moreover, intra-day returns and trading activity often exhibit diurnal patterns as well as weekly, monthly, and quarterly seasonal effects. (See, for example, Taylor (2005), Hautsch (2012), and Lo and Wang (2010), among many others.) Diurnal patterns have been modeled in present literature mainly by either applying the Fourier representation, or using a deterministic spline, or computing sample moments for each bin of intra-day trading hours.5 These methods generally assume that diurnal pattern is a deterministic function of time and does not evolve over time. They may also require the diurnal component and other stochastic components to be estimated in isolation by a two-step procedure in order to “diurnally adjust” data. This can render the asymptotic properties of statistical tests invalid. We take a novel approach to this issue by integrating the DCS model with the dynamic cubic spline model of Harvey and Koopman (1993). There are three main benefits to this approach. First, it allows for the possibility that the diurnal U-shaped patterns may evolve over time. Second, the diurnal patterns can be estimated very easily and simultaneously with all other components of the model by the method of maximum likelihood. Third, the presence of seasonal effects can be reflected in the shape of diurnal patterns and can be formally tested by a classical likelihood based test. In this paper, we use a special case of the spline termed weekly spline to check whether the intra-weekly seasonality (the day-of-the-week effect) is operative in data by testing 4

If we assume that F is the standard normal distribution, c = E[Yt ] and α2 = Var(Yt ). In this case, as the first and second moments completely describe the shape of the distribution, there are no parameters in θ of F to be estimated. If we assume that F is Burr, c may be better described as the location of the pole (or sometimes the vertical asymptote) of the distribution. Obviously and yet importantly, c and α are not the same as the first or the second moment of the distribution in this case. If F is highly non-Gaussian, θ is often non-empty and includes parameters that determine the shape of F such as skewness, excess-kurtosis, and the heaviness of tail(s). For highly non-Gaussian distributions, the first and second moments (even when they exist) may carry very little information about the shape of distribution. 5 See, for instance, Andersen and Bollerslev (1998), Engle and Russell (1998), Zhang et al. (2001), Campbell and Diebold (2005), Engle and Rangel (2008), Brownlees et al. (2011), and Engle and Sokalska (2012).

3

whether there is statistically significant difference in the diurnal shape of different weekdays. This method of testing for the day-of-the-week effect contrasts with standard testing procedure using seasonal dummies, which essentially capture seasonality by allowing for a level shift in the variable being estimated on each weekday. See, for instance, Andersen and Bollerslev (1998) and Lo and Wang (2010) for examples of the use of day-of-the-week dummies. The weekly spline may be preferred over seasonal dummies if seasonality manifests in data in a non-linear fashion via differences in the shape of intra-day patterns. Harvey and Koopman (1993) originally develop the dynamic cubic spline to estimate the intra-weekly patterns of hourly electricity demand. It has been employed later by Harvey et al. (1997) to model a changing seasonal component of weekly money supply in the U.K., and also by Bowsher and Meeks (2008) to forecast zero-coupon yield curves. Bowsher and Meeks (2008) interpret the spline as a special type of “dynamic factor model”, where the knots of the spline are the factors and the factor loadings are treated as given and specified accordingly to the requirement that the model has to be a cubic spline. Our proposed spline-DCS model will be fitted to intra-day observations of trade volume of IBM stock traded on the New York Stock Exchange (NYSE). Trade volume is a measure of intensity of trading activity. There are a variety of volume measures used in the literature including the number of shares traded, dollar volume, number of transactions, turnover (shares traded divided by shares outstanding), and dollar turnover. Given that the purpose of this paper is to develop the spline-DCS model and illustrate its practicality, we arbitrarily choose trade volume as measured by the number of shares traded. While this paper deals with nonnegative time series, the spline-DCS model can be also applied to variables with support over the entire real line, in which case our model should be viewed as an extension of Beta-t-EGARCH of Harvey and Chakravarty (2008) and can be compared with studies of high-frequency asset returns including the analysis of the deutsche mark-dollar exchange rate by Andersen and Bollerslev (1998). We do not include terms to explicitly capture the overnight effect in our model, and assume that periodic surge in market activity near market open is a natural part of the cyclical dynamics of the conditional distribution of data. This contrasts with the standard methods for capturing overnight effects such as treating extreme observations as outliers using dummy variables, differentiating day and overnight jumps, and treating events that occur in a specified period immediately after open as censored. (See, for instance, Rydberg and Shephard (2003), Gerhard and Hautsch (2007), and Boes et al. (2007).) We choose to deviate from these standard methods due to a consideration that; it may take time for the overnight effect to diminish completely during the day; and it is generally difficult to identify which observations are due to overnight information if there is arrival of new information after market open. Our model captures the overnight effect via periodic hikes in the time-varying scale parameter, reflecting the effects of overnight information as a cyclical elevation in the probability of extreme events near market open. This feature of our model also highlights one of the advantages of the exponential link function: volatile and dynamic movements in the scale parameter can be induced by relatively more subtle movements in the link parameter to which the DCS filter is applied. These results are reported in Section 5.6.1. The structure and main findings of this paper are as follows. Section 2 gives an initial investigation of our trade volume data and motivate the construction of our model. We illustrate that the data exhibits diurnal U-shaped patterns, and that the empirical distribution and autocorrelation structure change with the length of aggregation interval. Section 3 outlines our modeling assumptions and formally construct the intra-day DCS model with dynamic cubic spline. Sections 4 and 5 discuss methods for estimation and model selection. The in-sample and out-of-sample estimation results are reported in Sections 5 and 6. We find that the estimation results are mostly robust to marginal changes in the aggregation interval. A statistical test 4

for the intra-weekly seasonality finds no evidence of the weekend effect. However, we find indications that a marginal change in the aggregation interval can affect whether seasonal effects manifest in data (Section 5.4.1). We find that the Burr distribution, which is a special case of the GB distribution, achieves an impressive fit to our data. In particular, the probability integral transform (PIT) of the estimation residuals turns out to be remarkably close to being iid standard uniformly distributed. In contrast, the log-normal distribution gives an inferior fit due to the departure of the log-transformed data from normality especially around the tail regions. Burr fits better than log-normal presumably because the shape of the standard lognormal distribution is determined by only one parameter while the standard Burr distribution has two shape parameters, making the shape of Burr more flexible than that of log-normal (Section 5.2). We also find evidence that long memory is inherent in our data (Section 5.3). The out-of-sample performance tests show that the in-sample estimation results are stable, and that our model is able to provide good one-step and multi-step ahead forecasts for several hundred or thousand steps of prediction horizon (Section 6).

2

Data characteristics

Before proceeding to detailed modeling and forecasting results, it is useful to get an overall feel for the trade volume data. This section provides an initial investigation of our high-frequency trade volume series that serves to motivate our formal model in Section 3. We analyze the trade volume (in the number of shares) of IBM stock traded on NYSE between Monday 28 February and Friday 31 March 2000, which includes 25 trading days and no public holidays. Our raw data set is in tick-format and consists of the record of every trade in the sequence of occurrence. The tick-data is irregularly spaced and often have multiple transactions in one second. The tick-data can be aggregated over equally spaced time-intervals. In this section, we aggregate the raw data in tick-format by different aggregation intervals and describe the notable features of data. For convenience, we refer to the aggregated series as IBM1s if the aggregation interval is 1 second, IBM1m if 1 minute, and so on.

2.1

Diurnal U-shaped patterns

The top panel of Figure 1 shows the time series plot of IBM1s during the market open hours of NYSE (9.30am-4pm in local time) over a given trading week of March 2000. At the aggregation interval of 1 second, there are 23,400 observations per trading day. We observe several recurrent spikes in trade volume either immediately after the market opening or before closure. These extreme observations make the upper tail of empirical distribution very long. (See Table 1 and Figure 3.) The right column of Figure 1 shows IBM1s smoothed by the simple moving average of nearest 500 observations, which is equivalent to observations made in the nearest 4 hours. The smoothed series reflect that there are diurnal U-shaped patterns in trading activity. In particular, trade volume tends to be high soon after the market opening, falls during the quiet lunch hours of around 1pm, and picks up again before closure. One may naturally suspect that the shape of diurnal patterns may be slowly changing over time. These qualitative features are also present at the aggregation level of 1 minute. Other measures of trading activity based on durations (such as trade durations, midquote change durations, and volume durations) also exhibit similar diurnal U-shaped patterns. See, for example, Hautsch (2012, pp.41) for recent evidence. Extreme movements in trading activity in the first hour of trading day can be caused by news transmitted over night. Trading activity slows down towards lunch time as overnight information is processed, but picks up again in the afternoon as traders re-balance their positions before market closure. 5

400,000

4,000

350,000

3,500

300,000

3,000

250,000

2,500

200,000

2,000

150,000

1,500

100,000

1,000

50,000

500

0 Mon‐20‐Mar

Tue‐21‐Mar

Wed‐22‐Mar

Thu‐23‐Mar

0 Mon‐20‐Mar

Fri‐24‐Mar

100,000

2,000

90,000

1,800

80,000

1,600

70,000

1,400

60,000

1,200

50,000

1,000

40,000

800

30,000

600

20,000

400

10,000

Tue‐21‐Mar

Wed‐22‐Mar

Thu‐23‐Mar

Fri‐24‐Mar

200

0

0 09

11

13

15

09

11

13

15

Figure 1 – IBM1s (left column) and the same series smoothed by the simple moving average of nearest 500 observations (right column). Time on the x-axis. Top panel: Monday 20 - Friday 24 March 2000. Each day covers trading hours between 9.30am-4pm (in the New York local time). Bottom panel: Wednesday 22 March 2000, covering 9.30am-4pm (in the New York local time).

2.2

Aggregation effect on a discrete random variable

The IBM stock is traded in the fixed increment of 100 shares over our sampling period. This feature has the following two important implications to our analysis: • rounding effect: traders often round up or down the number of shares to trade to round numbers such as 500, 1000, 5000, 10000, and 15000. See Figure 2. The empirical frequency of tick-data takes discrete jumps at the multiples of 500 (denoted 500N>0 ). The size of jumps decays as volume increases. Thus, the empirical cdf of tick-data exhibits step-increases at 500N>0 . These jumps in empirical frequency at 500N>0 moderate as the aggregation interval widens. The bottom panel of Figure 2 shows that, at the aggregation level of 10 seconds, the jumps are much smaller than those of tick-data. • discreteness: at any aggregation level, the volume series increments in the multiples of 100. Thus our series should be viewed as a realization of a sequence of discrete random variables or a discretely distributed continuous-time process. However, the empirical distribution becomes closer to one of continuous random variables as the aggregation interval widens.

2.3

Aggregation effect on empirical distribution and autocorrelation

Figure 3 shows the frequency distribution, empirical cumulative distribution function (cdf), and sample autocorrelation of IBM30s, and IBM1m. We observe a number of important features including: • right skewness, long-tail: the left and middle columns of Figure 3 show that our series are heavily right-skewed and have extremely long upper-tail. The length of upper-tail can be also seen in the distance between the maximum and the 99.9% sample quantile 6

35

100

30

Empirical CDF %

Frequency %

80 25 20 15 10

60

40

20 5 0 0

1000

2000

3000

4000

5000

0 0

6000

1000

Non−zero volume

2000

3000

4000

5000

6000

5000

6000

Non−zero volume

35

100

30

Empirical CDF %

Frequency %

80 25 20 15 10

60

40

20 5 0 0

1000

2000

3000

4000

5000

0 0

6000

Non−zero volume

1000

2000

3000

4000

Non−zero volume

Figure 2 – Frequency distribution (left) and empirical cdf (right) of non-zero trade volume. Aggregation interval: tick-data (top row) and 10 seconds (bottom row). Sampling period: 28 February - 31 March 2000.

in Table 1. The length of the upper-tail and skewness of are most extreme for IBM1s. These extreme features moderate as the aggregation interval widens. Note that the sample skewness statistics in Table 1 must be interpreted with care as the theoretical skewness may not exist. • autocorrelation: the right column of Figure 3 shows that data aggregation affects autocorrelation. IBM1s shows statistically significant but weak autocorrelation that is very slow to decay. As the aggregation interval widens, the series become more strongly and statistically significantly autocorrelated. The autocorrelation of IBM1m is faster to decay than IBM1s.6 • zero-observations: the frequency of zero-volume decreases as the aggregation interval widens. (See the bottom row of Table 1.) The frequency of zero-volume observations is approximately 80% for IBM1s, which compares with the 0.06% of IBM1m.

2.4

Choice of aggregation interval for modeling

For modeling and estimation, the rest of our analysis will focus on IBM30s and IBM1m and ignore all other aggregation intervals. This choice of aggregation interval is controversial and deserves a formal explanation. As we illustrated in Section 2.2, the raw tick-data has a non-negative discrete distribution that is extremely right-skewed and has a long upper-tail. It is difficult to determine what type of discrete distribution the data might follow as there are a number of jumps in the empirical frequency due to the aforementioned rounding effect. We can remove the rounding effect by widening the aggregation interval until there are no obvious jumps in the empirical distribution. Widening the aggregation interval also moderates the extreme skewness and length of the 6

These effects of data aggregation is interesting as there may be a level of aggregation at which the degree of autocorrelation is maximized in some sense. This is an interesting feature to be studied in a separate paper.

7

Series

IBM1s

IBM30s

IBM1m

Observations (total)

585,000

19,500

9,750

Mean Median Maximum Minimum Std. Dev. Skewness

351 0 1,560,000 0 4,402 169

10,539 6,100 1,652,100 0 26,071 29

21,297 14,000 1,652,100 0 39,114 18

99.9% sample quantile Max - 99.9% quantile

25,000 1,535,000

293,654 1,358,446

604,295 1,047,805

Frequency of zero-vol.

79.13%

0.47%

0.06%

Table 1 – Table of sample statistics of the IBM trade volume. Sampling period: Monday 28 February Friday 31 March 2000. (25 trading days with 6.5 trading hours per trading day.)

upper-tail. Once the empirical distribution becomes sufficiently smooth, the distribution of the aggregated series can be approximated by continuous distributions especially when the sample size is large. Thus data aggregation also widens the parametric choice of distributions we can use for modeling. However, we cannot widen the aggregation interval arbitrarily as it risks introducing stronger and more statistically significant autocorrelation in the aggregated series as we illustrated in Section 2.3. Strong autocorrelation in data can leave strong autocorrelation in the estimation residuals that we may find difficult to remove. As the empirical distribution become sufficiently smooth at the aggregation intervals of 30 seconds and one minute, we will focus mainly on IBM1m in the rest of the analysis and use continuous distributions with non-negative support to characterize the distribution of our series. In order to explore the effects of changes in the aggregation interval on our inference, we will also report the results for IBM30s. Although our estimation results are robust to marginal changes in the aggregation interval, we find indications that the aggregation interval may affect statistical tests for seasonality. See Section 5.4.1.

3

Intra-day DCS with dynamic spline

The discussion thus far suggests that a periodic component capturing the diurnal U-shaped patterns will be important in a model fitted to the trade volume series. Moreover, this component should accommodate the possibility that the shape of diurnal patterns may change slowly over time. One may naturally suspect that non-periodic factors may coexist with the periodic component in data. One such factor is a highly persistent low-frequency component governing movements around the periodic U-shaped patterns. The empirical autocorrelation structure suggests that there may be long-memory inherent in data. This can be captured by a combination of autoregressive components. The presence of non-negligible number of zero-volume observations can be explained by another component governing the binary outcome of whether the next trade volume is zero or non-zero. Given the nature of our data, we find that fixing the location parameter (c) at zero is sufficient in our application. Thus we stipulate that the trade volume process is driven by the simultaneous interaction of numerous latent components, one determining zero or non-zero volume, one associated with diurnal periodic movements, one with persistent low-frequency movements, and one governing the long-memory feature of data. We allow for conditional scale dynamics with the canonical link parameter having an unobserved-component structure, where contributions come from both

8

12

100

0.3

95% confidence interval

80

Empirical CDF %

Frequency %

10

Sample Autocorrelation of y

90

8

6

4

70 60 50 40 30 20

2

10 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

All volume

0

2

0

0.5

1

1.5

2

2.5

3

3.5

All volume

4

x 10

1.5

1

1.5

2

2.5

3

3.5

50 40 30

4

0

1

100

0.9

90

0.8

80

0.6 0.5 0.4 0.3 0.2

2

2.5

3

3.5

1

2

3

All volume

4

5

6

120

140

160

180

200

0.2 0.15 0.1 0.05 0 −0.05

20

40

60

80

100

120

140

160

180

200

Lag 0.3

95% confidence interval

70 60 50 40 30

0

100

0.25

−0.1 0

4

10

0

80

4

20

0.1

60

x 10

Sample Autocorrelation of y

0.7

1.5

All volume

1

0

0.5

4

x 10

Empirical CDF %

Frequency %

All volume

40

Lag Sample Autocorrelation of y

Empirical CDF %

Frequency %

60

10

0.5

20

95% confidence interval

20

0

0 −0.05

0.3

70

0

0.05

−0.1 0

4

80

0

0.1

4

90

0.5

0.2 0.15

x 10

100

1

0.25

0

0.5

4

x 10

1

1.5

2

2.5

All volume

3

3.5

4

0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 0

20

40

60

80

4

x 10

100

120

140

160

180

200

Lag

Figure 3 – Frequency distribution (left column), empirical cdf (middle column), and sample autocorrelation (right column). IBM1s (top row), IBM30s (middle row), and IBM1m (bottom row). Sampling period: 28 February - 31 March 2000. The axes of cdf and autocorrelation charts are fixed across the rows so that they are easily comparable. The 200th lag corresponds approximately to 3 minutes prior for IBM1s, 1.5 hours prior for IBM30s, and 3 hours prior for IBM1m.

periodic and non-periodic components. The rest of this section gives the formal definitions of each of these components.

3.1

Redefining time indices

Henceforth, we redefine the time-axis by τ = 0, . . . , I to denote intra-day time and t = 1, . . . , T to denote trading days up to T . τ = 0 and τ = I are the moments of market opening and closure on each trading day. We use the subscript ·t,τ to denote the intra-day time τ on t-th trading day. We assume that F1,0 is trivial, and that the first (aggregated) trading volume of each trading day is observed at τ = 1. Moreover, we assume that ·t+1,0 = ·t,I for all t = 1, . . . , T − 1. Thus the total sample size is I × T . In order to avoid superfluous subscripts, we suppress the conditionality of variables so that we merely write ·t,τ instead of ·t,τ |t,τ −1 even when the variable is conditional on Ft,τ −1 . Finally, we introduce the following notations ΨT, I = {(t, τ ) ∈ {1, 2, . . . , T } × {1, 2, . . . , I}}, ΨT,I>0 = {(t, τ ) ∈ {1, 2, . . . , T } × {1, 2, . . . , I} : yt,τ > 0}.

9

Generalized gamma (

Generalized beta (

Gamma (

Burr ( ) (Pareto IV)

F(

Log-logistic ( ) (Pareto III)

Pareto II ( )

Weibull ( )

)

Exponential

Figure 4 – Nested diagram of some of the useful non-negative distributions. Scale factor is assumed to be one in all cases.

3.2

A point-mass probability at the origin

The presence of a non-negligible number of zero-volume observations cannot be explained by conventional continuous distributions as the probability of observing a particular value is zero by definition. Although one can eliminate zero-volume observations by widening the aggregation interval, an better approach is to define a distribution such that: • its strictly positive support is captured by a conventional continuous distribution; and • the origin has a discrete point-mass probability. Formally, we characterize the cdf F : R≥0 → [0, 1] of a standard random variable X ∼ F by PF (X = 0) = p PF (X > 0) = 1 − p PF (X ≤ x|X > 0) = F ∗ (x)

p ∈ [0, 1] (3) x>0

where F ∗ : R>0 → [0, 1] is the cdf of a conventional standard continuous random variable with the time-invariant parameter vector θ∗ .7 Thus we have θ = (p, θ∗ ) for F . The properties of this type of distributions are studied in Hautsch et al. (2010). The idea of decomposing the distribution of a random variable using thresholds is not new as it is essentially the same as the famous censored regression model (c.f. Amemiya (1973) and Heckman (1974)) and the decomposition models of McCulloch and Tsay (2001) or Rydberg and Shephard (2003). For non-negative series, F ∗ can be characterized by a number of distributions including Weibull, Gamma, Burr, and log-normal, many of which are special cases of the GG and GB distributions. See Figure 4 for the relationship between these distributions. Under the definition of F in (3), we apply the predictive DCS filter only to positive observations so that ut,τ becomes the conditional score of F ∗ and λt,τ responds to ut,τ −1 only when 7

The unconditional n-th moment of X is well-defined as long as it is well-defined for F ∗ because EF [X n ] = p EF [X n |X = 0] + (1 − p)EF [X n |X > 0] = (1 − p)EF ∗ [X n ].

10

yt,τ −1 > 0. The idea is that we treat observations at zero to be in the tranquil period of no news so that the link parameter λt,τ does not respond to exogenous shocks. This is essentially the same as treating zero-volumes as points of missing data in the DCS filter. It is natural to suspect that the probability p of having a zero-volume should change over time as p must be low immediately after market opening when trading activity is intense but must be high during the quiet lunch time. That is, p must be dynamic and negatively correlated with the intensity of trading activity. Rydberg and Shephard (2003) and Hautsch et al. (2010) independently study decomposition models defining the conditional dynamics of p via the logit link in the context of high-frequency finance. Thus an interesting extension of our model is a hybrid cubic-spline DCS model for the conditional dynamics of p as well as the scale parameter α. However, in this paper, we retain p to be constant for simplicity and leave this extension for future research. We claim that this restrictive assumption on p is inconsequential in our application as the number of zero-volumes in IBM1m and IBM30s is small. The joint likelihood function based on F for the set of observations (yt,τ )(t,τ )∈ΨT,I is L (yt,τ )(t,τ )∈ΨT,I ; θ



=

Y

(1 − p) exp(−λt,τ )f ∗ (εt,τ ; θ∗ )

Y

p

(t,τ )∈ΨT,I ∪ΨcT,I>0

(t,τ )∈ΨT,I>0

Y

= (1 − p)A pT ×I−A

exp(−λt,τ )f ∗ (εt,τ ; θ∗ )

(t,τ )∈ΨT,I>0

where f ∗ is the pdf of F ∗ and A = |ΨT,I>0 |. The log-likelihood is X log L = A log(1 − p) + (T × I − A) log p + log (exp(−λt,τ )f ∗ (εt,τ ; θ∗ )) . (t,τ )∈ΨT,I>0

It is easy to check that the maximum likelihood estimator of p is pb = (T × I − A)/(T × I).

3.3

Unobserved components of scale

We noted earlier that our series exhibit diurnal U-shaped patterns and strong autocorrelation that decays slowly. We assume that these features are due to the periodicity and autocorrelation in the conditional scale factor αt,τ . The standardized series εt,τ ≡ yt,τ /αt,τ given Ft,τ −1 is assumed to be iid and free of periodic behavior. As we will estimate αt,τ via the link parameter λt,τ , this means that λt,τ exhibits periodicity and autocorrelation. Before proceeding further, we note that, by the stationarity property of λt,τ , we do not infer to the true stationarity property of the underlying process as the spirit of our model is merely to obtain a good local approximation of the true data generating process. Our analysis is local in the sense that it works with discrete-time observations of what is actually a continuous-time process collected over a short sampling period. Given these assumptions, we specify λt,τ as a sum of: • the periodic component st,τ , which explains the diurnal U-shaped patterns and allows for the shape of diurnal patterns to change over time; • the random-walk component µt,τ capturing the highly persistent low-frequency dynamics around the periodic component; and • the stationary (autoregressive) component ηt,τ , which is a mixture of several autoregressive components and explains the long-memory characteristics of data. Formally, our intra-day DCS model becomes: yt,τ = εt,τ exp(λt,τ ),

λt,τ = δ + µt,τ + ηt,τ + st,τ , 11

εt,τ |Ft,τ −1 ∼ iid F

for τ = 1, . . . , I and t = 1, . . . , T . µt,τ is defined as µt,τ = µt,τ −1 + κµ ut,τ −1 1l{yt,τ −1 >0} ,

∀ (t, τ ) ∈ ΨT, I ,

where ut,τ is the score of F ∗ . The estimation results in Section 5 suggest that the inclusion of this random-walk component does a good job in capturing the low-frequency dynamics of data.8 ηt,τ is the stationary component defined as: ηt,τ =

J X

(j)

ηt,τ ,

∀ (t, τ ) ∈ ΨT, I

j=1 (j)

ηt,τ

(j) (j)

(j) (j)

(j)

(j) = φ1 ηt,τ −1 + φ2 ηt,τ −2 · · · + φ(j) m ηt,τ −m(j) + κη ut,τ −1 1l{yt,τ −1 >0} (j)

for some J ∈ N>0 . We assume that m(j) ∈ N>0 and ηt,τ is stationary for all j = 1, . . . , J. The stationarity requirements on the autoregressive coefficients are the same as ARMA. By making ηt,τ a mixture of autoregressive components, we can capture any long-memory type behavior that λt,τ may exhibit given the statistically significant and slowly decaying strong autocorrelation we observed in data. The significance of parameters of these components is tested by the standard methods of model selection in Section 5.4. Although we defined ηt,τ in the most general form of J autoregressive components, we report in Section 5.4 that J = 2 empirically works well for our application. Moreover, we illustrate in Section 5.3 that ηt,τ with J = 2 fitted to our data exhibit long memory. For the identifiability of each component, we assume that µt,τ is less sensitive to changes (1) (2) in ut,τ −1 than ηt,τ , which is in turn less sensitive than ηt,τ . Thus we impose the parameter (1) (2) restrictions κµ < κη < κη . Moreover, the scale of trade volume should increase in the wake of positive news. Thus we impose κµ > 0.

3.4

Dynamic cubic spline st,τ

The periodic component st,τ captures the diurnal U-shaped patterns. Diurnal patterns have been modeled in the financial literature mainly by the use of either the Fourier representation or deterministic splines or sample moments for each τ = 1, . . . , I. See, for instance, Andersen and Bollerslev (1998), Campbell and Diebold (2005), Engle and Rangel (2008), Brownlees et al. (2011), and Engle and Sokalska (2012), among many others. We pay attention to the following disadvantages of the existing methods for estimating diurnal patterns. First, they generally treat diurnal patterns as deterministic and time-invariant, which can be problematic if the patterns change randomly over time as markets evolve. Second, they usually require the diurnal U-shaped component and other stochastic components to be estimated in isolation by a two-step procedure, which may affect the asymptotic properties of statistical tests. In specifying st,τ , we overcome these issues by applying the dynamic cubic spline model of Harvey and Koopman (1993). The main benefits of using this cubic spline is twofold. First, it allows for the possibility that the diurnal U-shaped patterns may evolve over time. Second, the diurnal patterns can be estimated very easily and simultaneously with all other components of the model by the method of maximum likelihood. Harvey and Koopman (1993) develop the spline as a contemporaneous filter whose dynamics are driven by a contemporaneous Gaussian noise. In this paper, we make a subtle difference by letting it be a predictive filter driven by the lagged score ut,τ −1 . Thus the identification restrictions of our filter are also different. In order to formally define the dynamic cubic spline 8

In other applications, this low-frequency component µt,τ can be an integrated random walk. See Harvey (2013, pp. 92).

12

st,τ , we first give the definition of a special case, termed a static spline, in which the pattern of periodicity does not change over time. Some of the technical details are omitted in the following sections, and we give the complete mathematical construction in Appendix C. 3.4.1

Static daily spline

The cubic spline is termed a daily spline if its periodicity is complete over one trading day. The daily spline is a continuous piecewise function of time and connected at k + 1 knots for some k ∈ N>0 such that k < I. The coordinates of the knots along the time axis are denoted τ0 < τ1 < · · · < τk , where τ0 = 0, τk = I, and τj ∈ {1, . . . , I − 1} for j = 1, . . . , k − 1. The set of knots is also called mesh. The y-coordinates (height) of the knots are denoted γ = (γ0 , γ1 , . . . , γk )> . As the spline is periodic, we impose γ0 = γk so that γ0 is redundant in estimation. Using the notation γ † = (γ1† , . . . , γk† )> = (γ1 , . . . , γk )> the static daily spline (i.e. st,τ = sτ ) is defined as sτ =

k X

1l{τ ∈[τj−1 ,τj ]} z j (τ ) · γ † ,

τ = 1, . . . , I,

(4)

j=1

where z j : [τj−1 , τj ]k → Rk for j = 1, . . . k is a k-dimensional vector of deterministic functions that conveys all information about the polynomial order, continuity, periodicity, and zero-sum conditions of our static spline. See Appendix C for the derivation of this variable. The zero-sum condition ensures that the parameters in γ † are identified. To impose the zero-sum condition, we also need to set k−1 X † γk = − w∗i γi† /w∗k , i=1

where w∗ = (w∗1 . . . w∗k )> is defined in Appendix C. For a daily spline, we always fix τ0 = 0 at the market opening and τk = I at closure. As NYSE is open between 9.30am and 4.00pm, this means that we fix τ0 at 9.30am and τk at 4pm. The locations of τ1 , . . . , τk−1 and the size of k depend on a number of factors such as the empirical features of diurnal pattern, number of intra-day observations, and estimation efficiency. Increasing the number of knots can sometimes improve the fit of the model, but using too many knots deteriorates estimation efficiency. In our subsequent analysis, we find that using knots located at 9.30am, 11am, 12.30pm, 2.30pm, 4pm works well in terms of estimation efficiency and goodness-of-fit. The shape of the spline up to 12.30pm captures the busy trading hours in the morning, between 12.30pm and 2.30pm captures the quiet lunch hours, and after 2.30pm captures any acceleration in trading activities before closure. By the periodicity condition of the spline, we set the coordinate (τk , γk† ) = (τk , γk ) at 4pm to be the same as (τ0 , γ0 ) at 9.30am. In this case, we have k = dim(γ † ) = 4. The estimation efficiency deteriorates quickly while there is little or no improvement in the goodness-of-fit when the number of knots per day increases from this specification. 3.4.2

Weekly spline

The cubic spline becomes a weekly spline if we set the periodicity to be complete over one trading week instead of one day. In this case, we redefine τ0 , τ1 , . . . , τk as follows. We let τe0 < τe1 < · · · < τek0 denote the coordinates along the time-axis of the intra-day mesh, where k 0 < I, τe0 = 0, τek0 = I, and τej ∈ {1, . . . , I − 1} for j = 1, . . . , k 0 − 1. Then the coordinates 13

τ0 , τ1 , . . . , τk along the time-axis of the total mesh for the whole trading week is defined as τ0 = τe0 and τik0 +j = τej for i = 0, . . . , 4 and j = 1, . . . , k 0 . This means that (τj )kj=1 is no longer an increasing sequence. Thus we have k = 5k 0 over one trading week. By the periodicity condition of the spline, we set the coordinate (τk , γk ) at 4pm on Friday to be the same as (τ0 , γ0 ) at 9.30am on Monday. The weekly spline allows for diurnal patterns to be different between different weekdays and gives the scope to formally test whether the day-of-the-week effect is operative in data. The day-of-the-week effect is a type of seasonality that may or may not manifest itself in a given set of financial data. Andersen and Bollerslev (1998) test for the presence of such effect in highfrequency returns of the deutsche mark-dollar exchange rate (DM-D) using day-of-the-week dummies. Lo and Wang (2010) also use dummy variables to test for the day-of-the-week effect in weighted daily turnover9 and returns of indexes of NYSE and AMEX10 ordinary common shares. In essence, seasonal dummies capture the day-of-the-week effect by allowing for level shift on each day in the variable being estimated. In contrast, our model captures seasonality by allowing the coordinates (and hence the shape) of cubic spline to be different between different weekdays. Thus we check for the day-of-the-week effect by testing whether the diurnal Ushaped patterns are significantly different between different weekdays. More formally, we can test for the day-of-the-week effect by a likelihood ratio test under the null hypothesis: H0 : (γ1 , γ2 , . . . , γk0 )> = (γk0 +1 , γk0 +2 , . . . , γ2k0 )> = · · · = (γ4k0 +1 , γ4k0 +2 , . . . , γ5k0 )> . The alternative hypothesis replaces = by 6=. The likelihood ratio statistics under the null asymptotically has the chi-square distribution with 4k 0 degrees of freedom. The way in which our weekly spline captures the day-of-the-week effect appears more intuitive than the use of dummy variables as seasonality is embedded in the diurnal U-shaped patterns. However, a major disadvantage of the weekly spline is that the number of total knots for one week increases quickly (fivefold) with the number of intra-day knots. If we use 5 knots per trading day (k 0 = 4) as in daily spline, the total number of knots for the week is 25 (k = 20). Estimating this many number of knots can be computationally highly inefficient, especially when the sampling frequency is high. 3.4.3

Restricted weekly spline

Instead of letting the the coordinates of the mesh be free for all weekdays, we can restrict them to be the same on selected days so that the diurnal pattern is allowed to be different only on certain days of the week. We call this the restricted weekly spline. In this paper, we restrict the diurnal pattern to be the same on mid-weekdays (Tuesday-Thursday) and let the pattern be different on Monday and Friday. This restricted weekly spline captures the weekend effect, which, in the context of this paper, is the tendency of trading volume on Fridays and Mondays to display distinct patterns compared to mid-weekdays. More specifically, one may suspect that trading volume may tend to accelerate more on Friday afternoon and begin higher on Monday morning than on any other weekdays because the amount of news disseminated on Saturday and Sunday may be more than any of the overnight period during weekdays. The weekend effect can be viewed as a special case of the day-of-the-week effect. It appears that the evidence for the day-of-the-week effect in financial data are mixed or generally weak and largely depends on the estimation method and the variable being tested. For instance, Andersen and Bollerslev (1998) find that the day-of-the-week effect is insignificant in the DM-D returns once the calendar effect (e.g. daylight saving and public holidays) and 9

Turnover is another measure of trading activity. It is trading volume divided by the total number of shares outstanding. 10 Abbreviation of the American Stock Exchange.

14

the effects of major macroeconomic announcement are taken into account. However, they find indications of a weak (but clear) seasonality on Monday mornings and Friday afternoons. Lo and Wang (2010) find that returns exhibit a strong day-of-the-week effect. They also find that turnover is roughly constant on all days but Mondays and Fridays tend to have slightly lower turnover than mid-weekdays. The weekend effect is well-documented in many other studies (mainly in the context of asset returns)11 and appears to be more pronounced than the more general day-of-the-week effect. Thus, given the computational cost of the full-blown weekly spline, estimating the restricted weekly spline may be adequate and sufficient if the ultimate goal is to establish a good forecasting model. Formally, we define (the static version of) the restricted weekly spline as: k X (5) sτ = 1l{τ ∈[τj−1 ,τj ]} z j (τ ) · Sγ † j=1

where γ † = (e γ †> ,γ e†> ,γ e†> )> and γ e†i for i = 1, 2, 3 are k 0 × 1 vectors of mesh for Monday, mid1 2 3 weekdays (Tuesday-Thursday), and Friday, respectively. S is the following 5k 0 × 3k 0 matrix of entries zeros and ones:   Ik0 ×k0 0 0  0 Ik0 ×k0 0     .. ..  . S =  ... . .     0 Ik0 ×k0 0  0 0 Ik0 ×k0 We can rewrite (5) as sτ =

k X

1l{τ ∈[τj−1 ,τj ]} zej (τ ) · γ †

(6)

j=1

where zej (τ ) = S > z j (τ ). Finally, we can let γ † be time-varying according to the dynamics in (7) with appropriate > κ∗> adjustments to the zero-sum conditions. In this case, we denote κ∗ = (e e∗> e∗> and 1 ,κ 2 ,κ 3 ) †> †> †> ∗ † > † e2;t,τ , γ e3;t,τ ) for t = 1, . . . , T and τ = 0, . . . , I, where dim(e κj ) = dim(e γ j;t,τ ) = k 0 γ t,τ = (e γ 1;t,τ , γ for j = 1, 2, 3. If we place 5 knots per trading day as we specified for the daily spline, we have k 0 = 4 giving k = 12. 3.4.4

Dynamic spline

The static daily spline (4) or the restricted weekly spline (6) becomes dynamic by letting γ † be time-varying as st,τ =

k X

1l{τ ∈[τj−1 ,τj ]} z j (τ ) · γ †t,τ ,

γ †t,τ = γ †t,τ −1 + κ∗ · ut,τ −1 1l{yt,τ −1 >0}

(7)

j=1

for τ = 1, . . . , I and t = 1, . . . , T , where κ∗ = (κ∗1 , . . . , κ∗k )> and z j (τ ) is to be replaced by zej (τ ) for the restricted weekly spline. Harvey and Koopman (1993) use a set of contemporaneous disturbances to drive the dynamics of γ †t,τ instead of the lagged score. In terms of parameter identification, our dynamic spline in (7) still satisfies the zero-sum condition over one complete 11

See Cross (1973), French (1980), Gibbons and Hess (1981), Harris (1986), Jaffe and Westerfield (1985), Keim and Stambaugh (1984), Lakonishok and Levi (1982), and Lakonishok and Smidt (1988).

15

period due to the construction of z j (τ ), but we also need to impose the parameter restrictions † γk;1,0 =−

k−1 1 X † w∗i γi;1,0 w∗k i=1

κ∗k = −

k−1 1 X w∗i κ∗i . w∗k i=1

(8)

† where γi;1,0 denotes the i-th element of γ †1,0 . See Appendix C.

3.4.5

Public holidays

Public holidays can be treated in the same way as overnight periods or weekends where data is missing due to market closure. Whenever the sampling period includes public holidays, we leave λt,τ unchanged by setting µt,τ = µt,τ −1 ,

(1)

(1)

ηt,τ = ηt,τ −1 ,

(2)

(2)

ηt,τ = ηt,τ −1 ,

γ †t,τ = γ †t,τ −1

for all τ = 1, . . . , I whenever t ∈ H ≡ {t ∈ N>0 : t is a public holiday}. The joint log-likelihood is defined only for days t ∈ Hc .

3.5

Summary of the model specification

To summarize, our intra-day DCS model with dynamic spline is yt,τ = εt,τ exp(λt,τ ), εt,τ |Ft,τ −1 ∼ iidF λt,τ = δ + µt,τ + ηt,τ + st,τ µt,τ = µt,τ −1 + κµ ut,τ −1 1l{yt,τ −1 >0} ηt,τ =

J X

(j)

ηt,τ

(9)

j=1 (j)

ηt,τ

(j) (j)

(j) (j)

(j)

(j)

= φ1 ηt,τ −1 + φ2 ηt,τ −2 · · · + φm(j) ηt,τ −m(j) + κ(j) η ut,τ −1 1l{yt,τ −1 >0} ,

st,τ =

k X

j = 1, 2

1l{τ ∈[τj−1 ,τj ]} z j (τ ) · γ †t,τ

j=1

γ †t,τ

= γ †t,τ −1 + κ∗ · ut,τ −1 1l{yt,τ −1 >0}

for τ = 1, . . . , I and t = 1, . . . , T , where F is as defined in Section 3.2. st,τ is either a daily or weekly spline. ut,τ is the score of conditional distribution F ∗ . z j (τ ) is to be replaced by zej (τ ) for the restricted weekly spline. Parameters must also satisfy the identifiability and stationarity restrictions specified above.

4

Maximum likelihood estimation

All parameters of the model can be estimated by the method of maximum likelihood. Andres and Harvey (2012) show the asymptotic consistency and normality of the maximum likelihood estimator for the basic DCS model (2) where the scale parameter αt|t−1 (or λt|t−1 ) is stationary. In this paper, we do not formally verify that their asymptotic results also extend to our particular specification with nonstationary scale parameter. This is left as an interesting topic for future research.

16

4.1

Initialization of maximum likelihood

We choose the initial values of the components of λ1,0 as follows. (1)

(2)

• η1,0 : as we have E[ηt,τ ] = E[ηt,τ ] = 0 by definition of ηt,τ , set η1,0 = 0. • µ1,0 : as µt,τ is assumed to be a random-walk, we have E[µt,τ ] = µ1,0 . So we impose the constraint µ1,0 = 0 for δ to be identified. • s1,0 or γ †1,0 : we treat the elements of γ †1,0 as unknown constant parameters to be estimated simultaneously with all other parameters of the model. We impose the zero-sum constraint w∗ · γ †1,0 = 0 as specified in (8). As a preview of what we are going to see later, Figure 5 compares the estimated daily and restricted weekly splines with static coordinates γ † and the dynamic γ †t,τ . We find that sbt,τ successfully captures the diurnal U-shaped patterns every trading day in all cases, and that • the daily spline with static coordinates (γ † ) restricts diurnal patterns to be the same for all trading days (the top left panel); • the restricted weekly spline with static coordinates (γ † ) allows diurnal patterns to vary during the week but restricts the weekly pattern to be fixed for all trading weeks (the bottom left panel); and • the dynamic splines with time-varying coordinates (γ †t,τ ) is more flexible in capturing changing diurnal patterns over different trading days than the static splines. The restricted weekly spline appears more dynamic than the daily spline (the right panels).

1.2

28‐3 Mar

6‐10 Mar

13‐17 Mar

1.2 20‐24 Mar

27‐31 Mar

0.4

0.4

0

0.0

‐0.4

‐0.4

‐0.8

‐0.8

1.2

28‐3 Mar

6‐10 Mar

13‐17 Mar

1.2 20‐24 Mar

28‐3 Mar

6‐10 Mar

13‐17 Mar

20‐24 Mar

27‐31 Mar

Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday

0.8

Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday

0.8

27‐31 Mar

0.4

0.4

0

0.0

‐0.4

‐0.4

‐0.8

‐0.8

28‐3 Mar

6‐10 Mar

13‐17 Mar

20‐24 Mar

27‐31 Mar

Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday

0.8

Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday Monday Tuesday Wednesday Thursday Friday

0.8

Figure 5 – sbt,τ : static daily spline (top left), dynamic daily spline (top right), static restricted weekly spline (bottom left), and dynamic restricted weekly spline (bottom right). Obtained by fitting variants of Model 2 of Table 2 to IBM30s. 28 February - 31 March 2000. Shaded regions distinguish different weeks of the sample period.

17

Empirical CDF

Empirical CDF 1

0.9

0.9

Empirical CDF of F*(res > 0)

1

Empirical CDF

0.8 0.7 0.6 0.5 0.4 0.3 0.2

res > 0 Burr with estimated parameters

0.1 0

0

1

2

3

4

5

6

7

8

9

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

10

res > 0, Burr

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

b (left). Empirical cdf of the PIT of εbt,τ > 0 Figure 6 – Empirical cdf of εbt,τ > 0 against cdf of Burr(b ν , ζ) b (right). IBM1m over 28 February - 31 March 2000. Using Model 2 computed under F ∗ (·; θ∗ ) ∼Burr(b ν , ζ) specified in Table 2.

5

Estimation results

This section reports the estimation results of fitting our model to IBM30s and IBM1m. As before, the sampling period is Monday 28 February to Friday 31 March 2000, which covers five consecutive trading weeks and no public holidays.

5.1

Choice of F ∗ : general-to-specific approach

Given the shape of empirical distributions of data, distributions in the GG, GB, and Pareto classes are among our candidate F ∗ for both IBM1m and IBM30s. Note that the GB distribution is closely related to the Pareto class of distributions. See Appendix A and Figure 4 for the formal definitions of these distributions and the relationship between them. Taking a general-to-specific approach, we sequentially estimate GB, Burr, and then log-logistic from the GB class of distributions. Within the GG class of distributions, we sequentially estimate GG, and then Gamma and Weibull. We find that the GB distribution is difficult to estimate as it has three parameters in θ∗ . Burr fits both IBM30s and IBM1m very well. As the GG distribution is an asymptotic distribution of GB with large ζ, GG can be considered as a special class of GB. However, the goodness of fit of the GG class of distributions are generally found to be inferior to the GB class of distributions. The inferior fit of GG is consistent with the estimated size of the GB parameter ζ, which is far from being large for both IBM1m and IBM30s. Figure 6 illustrates the impressive fit of Burr for IBM1m. The quality of fit for IBM30s is b equally as impressive. The empirical cdf of non-zero εbt,τ appears to overlap the cdf of Burr(b ν ,ζ) 12 almost everywhere so that these two lines are visually indistinguishable. The empirical cdf of b lies along the diagonal, the PIT values of non-zero εbt,τ computed under F ∗ (·; θb∗ ) ∼ Burr(b ν , ζ) indicating that the distribution of the PIT is very close to the standard uniform distribution (denoted U [0, 1]). As an experiment, we have also fitted many other distributions including F, inverse-Gamma, and log-Cauchy. However, the estimation results are inferior to the results of Burr because either the residual PIT values of these distributions become far from uniform or we cannot remove serial dependence of the score.

5.2

Comparing with log-normality

A standard approach to non-negative time series in the financial literature is to fit the lognormal distribution to data whenever the logarithm of observations roughly resemble normality. In such cases, the degree of efficiency and bias in the estimated parameters depend on how far 12

In a non-mathematical term.

18

2500

4000

Empirical frequency

Empirical frequency

3500 2000

1500

1000

500

3000 2500 2000 1500 1000 500

0 −6

−4

−2

0

2

4

0 −4

6

−3

−2

log(IBM1m>0)

−1

0

1

2

3

4

5

log(IBM30s>0)

Empirical CDF of NormCDF(log(IBM30s))

Empirical CDF of NormCDF(log(IBM1m))

Figure 7 – The frequency distributions of log(IBM1m) (left) and log(IBM30s) (right) at non-zero observations. The series are re-centered around mean and standardized by one standard deviation. Empirical CDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NormCDF(log(IBM1m))

Empirical CDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NormCDF(log(IBM30s))

Empirical quantiles of IBM30s>0

Empirical quantiles of IBM1m>0

Figure 8 – Empirical cdf of the PIT values of log(IBM1m) (left) and log(IBM30s) (right) at non-zero observations computed under the standard normal assumption. The series are re-centered around mean QQ Plot of Sample Data versus Standard Normal QQ Plot of Sample Data versus Standard Normal and standardized by6 one standard deviation. 5 4

2

0

−2

−4

−6 −4

−3

−2

−1

0

1

2

3

4 3 2 1 0 −1 −2 −3 −4 −5

4

Theoretical N(0,1) quantiles

−4

−3

−2

−1

0

1

2

3

4

5

Theoretical N(0,1) quantiles

Figure 9 – The QQ-plots of log(IBM1m) (left) and log(IBM30s) (right) at non-zero observations. The series are re-centered around mean and standardized by one standard deviation.

the distribution of the log-variable is from normality. See, for example, Alizadeh et al. (2002) for a discussion and an empirical example. In our case, the estimation results of F ∗ ∼ lognormal are inferior to the results of Burr for both IBM1m and IBM30s as the PIT values of the residuals turn out to be far from uniform. Figure 7 shows the frequency distributions of the logarithm of non-zero IBM1m and IBM30s re-centered around the mean and standardized by one standard deviation (hereafter denoted log(IBM1m) and log(IBM30s)). While the frequency distribution of log(IBM1m) roughly resembles normality, log(IBM30s) looks far from normal. Figure 8 shows the empirical cdfs of the PIT values of log(IBM1m) and log(IBM30s) under the normality assumption. As the empirical cdfs meander around the diagonal but do not lie on it, the two series cannot be normal. The QQ-plots of log(IBM1m) and log(IBM30s) in Figure 9 show that the normal distribution appears to put too much weight on the lower-tail and too little weight on the upper-tail. Thus the failure of F ∗ ∼ log-normal can be ascribed to the departure of the empirical distributions of log(IBM1m) and log(IBM30s) from normality particularly around the tail regions. The superior performance of Burr over log-normal is presumably because the shape of log-normal is 19

0.8

0.8

exp−Burr N(0,0.5) N(0,0.7) N(0,1) N(0,1.5)

0.6

pdf

0.5

0.6 0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 −3

−2

−1

0

1

2

exp−Burr N(0,0.5) N(0,0.7) N(0,1) N(0,1.5)

0.7

pdf

0.7

0 −3

3

−2

−1

x

0

1

2

3

x

b against normal with parameter σ (denoted N(0, σ)). νb and ζb are Figure 10 – Comparing exp-Burr(b ν ,ζ) obtained by fitting Model 2 to IBM30s (left) and IBM1m (right). See Footnote 14 for the pdf of exp-Burr.

determined by only one parameter (σ) while Burr has two shape parameters (ν and ζ). This makes the shape of Burr more flexible than log-normal.13 To compare the shape of Burr and log-normal, Figure 10 shows the pdf of the exponentialBurr distribution (denoted exp-Burr)14 against the normal distribution at different values of σ.15 For IBM30s, we find that the normal distribution is dissimilar to exp-Burr at any values of σ. For IBM1m, although normal at σ = 0.7 appears close to exp-Burr, the symmetric shape of normal distribution contrasts with the asymmetry of exp-Burr.

5.3

Long memory

We find that empirically a two component specification for ηt,τ is the most effective in capturing the autocorrelation structure of data for both IBM1m and IBM30s. Including the autoregressive (1) (2) lags of order two for ηt,τ and one for ηt,τ was found to be appropriate for both IBM1m and IBM30s. (Henceforth, we denote this specification of ηt,τ as ηt,τ ∼ AR(2)+AR(1).) That is, the following ηt,τ specification (1)

(2)

ηt,τ = ηt,τ + ηt,τ , (1) ηt,τ (2) ηt,τ

= =

(1) (1) φ1 ηt,τ −1 (2) (2) φ1 ηt,τ −1

+

(t, τ ) ∈ ΨT, I (1) (1) φ2 ηt,τ −2

+ κ(1) η ut,τ −1 1l{yt,τ −1 >0}

+ κ(2) η ut,τ −1 1l{yt,τ −1 >0}

is our preferred specification, because it completely removes autocorrelation from the estimation residuals εbt,τ and the score u bt,τ . Figure 11 shows that the strong and slowly decaying autocorrelation in IBM1m and IBM30s are captured successfully by the model so that the estimation residuals εbt,τ and the score u bt,τ show no obvious signs of serial correlation. The absence of autocorrelation is also illustrated more formally by the Box-Ljung statistics in Table 2, which summarizes statistics for assessing 13

For the formal definitions of Burr and log-normal, see Appendix A. If a random variable X follows the standard Burr distribution with parameters ν and ζ, we say that log(X) follows the exponential-Burr distribution with the pdf 14

f (x; ν, ζ) =

ν exp(xν)(exp(xν) + 1)−1−ζ , B(1, ζ)

x > 0, and ν, ζ > 0

where B(·, ·) denotes the Beta function. 15 If a random variable X follows the standard log-normal distribution with parameter σ, log(X) follows the normal distribution with mean 0 and standard deviation σ.

20

the goodness-of-fit of our model fitted to IBM1m and IBM30s.16 Whenever we change the specification from ηt,τ ∼ AR(1) to ηt,τ ∼ AR(2)+AR(1), AIC and SIC17 fall and autocorrelation in u bt,τ is removed. Note that u bt,τ exhibits stronger serial correlation than εbt,τ . This is because the score down-weighs (and thus it is robust to) the effects of extreme observations. 0.3

0.2 0.15 0.1 0.05 0 −0.05 −0.1 0

20

40

60

80

100

120

140

160

180

0.2

0.1 0.05 0 −0.05

20

40

60

80

Lag

120

140

160

180

0.2 0.15 0.1 0.05 0 −0.05

40

60

80

100

Lag

0.1 0.05 0 −0.05

20

40

60

80

120

140

160

180

200

95% confidence interval

120

140

160

180

200

95% confidence interval

0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 0

100

Lag 0.3

Sample Autocorrelation of u

Sample Autocorrelation of res

0.25

20

0.2 0.15

−0.1 0

200

0.3

95% confidence interval Sample Autocorrelation of y

100

0.25

Lag

0.3

−0.1 0

95% confidence interval

0.15

−0.1 0

200

0.3

95% confidence interval 0.25

Sample Autocorrelation of u

95% confidence interval

Sample Autocorrelation of res

Sample Autocorrelation of y

0.3 0.25

20

40

60

80

100

Lag

120

140

160

180

200

0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 0

20

40

60

80

100

120

140

160

180

200

Lag

Figure 11 – Sample autocorrelation of trade volume (left) of εbt,τ (middle), and of u bt,τ (right). IBM1m (top row) and IBM30s (bottom row). Using Model 2 as specified in Table 2. The 95% confidence interval is computed at ±2 standard errors. Sample period: 28 February - 31 March 2000.

16

In Table 2, Q(K) is the Box-Ljung statistic based on the first K autocorrelation computed under the null hypothesis of no autocorrelation up to order K. Q(K) is compared against the chi-square distribution with K − q degrees of freedom (denoted χ2K−q ), where q is the number of dynamic parameters. (e.g. the basic √ DCS in (1) has two dynamic parameters φ and κ.) of the model. We report the results for K ≈ T × I and some deviations above it following Harvey (1993, pp.76). * and ** denote significance at the 5% and 1% levels, respectively. 17 AIC and SIC stand for the Akaike and Schwarz information criteria, respectively.

21

IBM1m

22

Type of spline ηt,τ Number of dynamic coef. (q) Maximum log-like AIC SIC KS stat on εbt,τ a KS stat: p-value Autocorrelation of εbt,τ : Q(100)b Q(150) Q(200) Autocorrelation of u bt,τ : Q(100) Q(150) Q(200) a

b

IBM30s

Model 1

Model 2

Model 3

Model 4

Model 1

Model 2

Model 3

Model 4

Daily AR(1) 6 -104,158 21.314 21.323 1.242 0.001*

Daily AR(2)+AR(1) 9 -104,136 21.310 21.322 0.968 0.004*

Weekly AR(1) 14 -104,139 21.313 21.334 1.051 0.001*

Weekly AR(2)+AR(1) 17 -104,123 21.310 21.334 1.086 0.000*

Daily AR(1) 6 -195,540 20.031 20.036 0.903 0.011

Daily AR(2)+AR(1) 9 -195,509 20.028 20.035 0.924 0.009*

Weekly AR(1) 14 -195,524 20.031 20.043 0.868 0.016

Weekly AR(2)+AR(1) 17 -195,498 20.029 20.042 0.870 0.015

96.3 154.0 197.1

92.8 152.9 197.6

98.5 156.5 202.3

92.8 150.5 197.2

90.3 116.0

88.9 114.0

90.7 117.5

87.6 113.5

128.8** 163.6 208.8

90.3 126.2 169.1

130.0** 164.6* 208.1

94.4 130.1 173.1

196.2** 239.7*

142.3 183.4

201.0** 244.1**

143.9 185.2

√ Kolmogorov-Smirnov test statistic with finite sample adjustment by T × I computed under F ∗ . Selected critical values obtained by simulation are tabulated in Appendix E. Simulation assumes three parameters α, ν, and ζ are estimated from data. * indicates significance at the 1% level. Box-Ljung statistic Q(K) based on the first K autocorrelation. H0 : No autocorrelation up to order K. Compare against χ2K−q where the degree of freedom is adjusted for the number of dynamic coefficients q. * and ** denote significance at the 5% and 1% levels, respectively.

Table 2 – Goodness-of-fit statistics. F ∗ ∼ Burr and all specifications use dynamic spline with 5 knots per day. ”Weekly” refers to the restricted weekly spline.

Sample Autocorrelation of eta

ACF of eta Hyperbolic decay

0.8

0.6

0.4

0.2

0

−0.2 0

50

100

150

Lag

Figure 12 – Autocorrelation function of ηbt,τ of IBM1m.

We find that the autocorrelation function of ηbt,τ decays slowly. The rate of decay compares with a hyperbolic decay as shown in Figure 12. This suggests that this component may be picking up the presence of long-memory in data. The estimate of the degree of fractional integration (d) obtained by fitting ηbt,τ to a fractionally integrated filter is db = 0.173 (s.e. 0.079) for IBM1m and db = 0.140 (s.e. 0.057) for IBM30s, confirming that there is long memory operative in ηt,τ and hence in the trade volume series. Appendix D explains the properties of b a fractionally integrated model and how we obtained d.

5.4

Estimated coefficients and diagnostics

Table 3 shows the estimated coefficients of Model 2 as specified in Table 2.18 In the rest of the paper, we refer to model specifications by the model numbers at the top of Table 2. For both IBM1m and IBM30s, we have: (2)

(1)

(2)

• κ bη > κ bη > κ bµ > 0 as expected, which means that ηt,τ is more sensitive to changes in (1) (1) ut,τ −1 than ηt,τ , and that ηt,τ is more sensitive to ut,τ −1 than µt,τ .19 (2) (2) (1) • 0 < φb1 < 1 so ηt,τ ∼ AR(1) is stationary. The stationarity of ηt,τ ∼ AR(2) requires:20 (1)

(1)

φ1 + φ2 < 1,

(1)

(1)

−φ1 + φ2 < 1,

(1)

and φ2 > −1.

(10)

(1) (1) (1) It is easy to check that these conditions are satisfied by φb1 and φb2 so that ηt,τ is also stationary.

• the estimate of the point-mass probability at zero is pb = 0.0006 for IBM1m and pb = 0.0047 for IBM30s. These estimates are consistent with the sample statistics in Table 1. • νbζb ≈ 2.5 for IBM1m and νbζb ≈ 2.4 for IBM30s. This implies that only the first and second moments exist under F ∗ ∼ Burr. (See Appendix A.1.) Thus the theoretical skewness does not exist for both IBM1m and IBM30s. 18

The coefficients for all other model specifications are tabulated in Appendix F. In this application, these coefficient estimates are obtained without imposing parameter restrictions. In different applications, one may need to explicitly impose the identifying parameter restrictions. 20 See, for example, Harvey (1993, pp. 19). 19

23

κµ (1) φ1 (1) φ2 (1) κη (2) φ1 (2) κη κ∗1 κ∗2 κ∗3

IBM1m

IBM30s

0.008 (-0.002)

0.007 (0.002)

0.417 (0.097)

0.561 (0.129)

0.513 (0.101)

0.400 (0.129)

0.051 (0.010) 0.619 (0.067) 0.065 (0.010) 0.000 (0.002) -0.003 (0.001) 0.000 (0.001)

0.053 (0.008) 0.676 (0.046) 0.091 (0.009) 0.000 (0.001) -0.002 (0.001) 0.000 (0.001)

IBM1m

IBM30s

† γ1;1,0

0.094 (0.070)

0.122 (0.070)

† γ2;1,0 † γ3;1,0

-0.464 (0.076)

-0.485 (0.079)

-0.233 (0.058)

-0.229 (0.058)

δ ν ζ p

9.841 2.229 1.134 0.0006

(0.157) (0.033) (0.042) (0.0003)

9.254 1.635 1.467 0.0047

(0.181) (0.016) (0.042) (0.0005)

Table 3 – Estimated coefficients of Model 2 for IBM1m over 28 February - 31 March 2000. Standard errors in parenthesis are computed numerically.

5.4.1

Test for the weekend effect

In Table 2, we observe that SIC always increases (while the results are mixed for AIC) when we change the specification from daily to weekly spline. This is partly due to the inclusion of eight extra parameters in the restricted weekly spline, which is penalized severely by SIC. We can test whether the restricted weekly spline is preferred over the daily spline by a likelihood ratio test. The null hypothesis21 of the test is e∗2 = κ e∗3 H0 : κ e∗1 = κ

e†2;1,0 = γ e†3;1,0 . and γ e†1;1,0 = γ

κ∗j ) = 4 for j = 1, 2, 3, the The alternative hypothesis replaces = by 6=. As dim(e γ †j;1,0 ) = dim(e likelihood ratio statistics are compared against the chi-square distribution with sixteen degrees of freedom. If the null hypothesis is rejected, it means that there is statistical evidence of the weekend effect over our sampling period; that is, diurnal patterns of Mondays and Fridays are statistically significantly different to the patterns of mid-weekdays. Table 4 shows the test results for IBM1m and IBM30s. The null cannot be rejected at the 5% level for either series. However, it is important to note that the test statistic for IBM1m is only marginally insignificant, while the test statistic for IBM30s is comfortably outside the rejection region. This feature is interesting as they indicate that our inference about seasonal effects can be influenced by the aggregation interval. These results are roughly consistent with Lo and Wang (2010), whose estimation results indicate that the average turnover is roughly constant for all weekdays, in spite of beging slighly lower on Mondays and Fridays. As the test results may depend (not only frequency but) on the sampling period, it would be interesting to apply this test on different sets of data collected over a variety of sampling periods. Figure 13 shows the estimated daily spline sbt,τ of Model 2. We find that sbt,τ successfully captures the tendency of trade volume to be high in the morning, fall during the quiet lunch hours of around 1pm, and pick-up again in the afternoon. As it is a dynamic spline, the shape of diurnal U-shaped pattern varies over time. The shape also appears similar for all weekdays largely because the daily spline assumes that the diurnal patterns are not significantly different between different weekdays. 21

We can also perform the test using the static spline, in which case the null and alternative hypotheses of the test become H0 : γ e†1 = γ e†2 = γ e†3 (daily spline),

H1 : γ e†1 6= γ e†2 6= γ e†3 (restricted weekly spline).

24

Model specification Time-variation in spline Degrees of freedom (df ) Loglike, restricted Loglike, unrestricted Likelihood ratio stat χ2df , p-value

IBM1m

IBM30s

Models 2 & 4 Dynamic (γ †t,τ ) 16 -104,136 -104,123 26.0 0.054

Models 2 & 4 Dynamic (γ †t,τ ) 16 -195,509 -195,498 20.4 0.203

Table 4 – Likelihood ratio test for the weekend effect. Sample period: 28 February - 31 March 2000. Using static spline. Model specifications are in Table 2.

We can contrast the daily spline with the restricted weekly spline by comparing Figure 13 with Figure 14, which shows sbt,τ of Model 4. The left panel illustrates that the restricted weekly spline is more dynamic, especially on Mondays and Fridays, than the daily spline. Diurnal patterns are similar on mid-weekdays largely because the restricted weekly spline assumes that the patterns are significantly different only on Mondays and Fridays. As illustrated in the right panel, the restricted weekly spline captures the tendency of trade volume to accelerate more before the Friday close and begin higher shortly after the Monday open compared to any other days, in spite of the weekend effect being insignificant (only marginally for IBM30s) by the likelihood ratio test. 1.0 1.5 6 ‐ 10 March

27 ‐ 31 March

20 ‐ 24 March

13 ‐ 17 March

Tuesday 14 March

1.0

0.5 0.5

0.0

0.0

‐0.5

‐0.5 Friday

1.5

15:30

14:30

13:30

12:30

11:30

10:30

‐1.0 09:30

Thursday

Tuesday

Wednesday

Friday

Monday

Thursday

Tuesday

Wednesday

Friday

Monday

Thursday

Tuesday

Wednesday

Friday

Monday

Thursday

Tuesday

Wednesday

Monday

‐1.0

1.0 6‐10 March 2000

13‐17 March 2000

20‐24 March 2000

27‐31 March 2000

1.0 0.5 0.5 0.0 0.0 ‐0.5 ‐0.5 Average for March 2000

Friday

Thursday

Wednesday

Tuesday

Monday

Friday

Thursday

Wednesday

Tuesday

‐1.0 Monday

‐1.0

Figure 13 – sbt,τ of Model 2 for IBM1m. Over 6 - 31 March 2000 (top left). sbt,τ of a typical day, Tuesday 14 March, from market open to close (top right). sbt,τ of different weeks over 6 - 31 March 2000 superimposed (bottom left). The weekly average of sbt,τ over 6 - 31 March 2000 (bottom right). Time along the x-axes.

25

1.5

1.0 6‐10 March 2000

13‐17 March 2000

20‐24 March 2000

27‐31 March 2000

1.0 0.5 0.5 0.0 0.0 ‐0.5 ‐0.5 Average for March 2000

Friday

Thursday

Wednesday

Tuesday

Monday

Friday

Thursday

Wednesday

Tuesday

‐1.0 Monday

‐1.0

Figure 14 – sbt,τ of Model 2 fitted to IBM1m. sbt,τ of different weeks over 6 - 31 March 2000 superimposed (left). The weekly average of sbt,τ over 6 - 31 March 2000 (right). Time along the x-axes.

IBM1m Model specification Type of spline Degree of freedom (df ) Loglike, restricted Loglike, unrestricted Likelihood ratio stat χ2df , p-value

Model 2 Daily 1 -104,141 -104,136 10.8 0.001

Model 4 Weekly 1 -104,128 -104,123 10.8 0.001

IBM30s Model 2 Daily 1 -195,601 -195,509 185.2 0.000

Model 4 Weekly 1 -195,591 -195,498 185.8 0.000

Table 5 – Likelihood ratio statistics for the test of log-logistic against Burr. Sample period: 28 February - 31 March 2000. Model specifications are in Table 2.

5.4.2

General versus specific distribution

For both IBM1m and IBM30s, the estimates of the Burr parameter ζ are close to one. If ζ = 1, Burr becomes log-logistic. However, a likelihood ratio test of the null hypothesis H0 : ζ = 1 (restricted) against the alternative H1 : ζ 6= 1 (unrestricted) suggest that the estimated Burr distribution is not log-logistic. (See Table 5.) 5.4.3

Dynamic versus static spline

Some of the elements in κ∗ appear statistically insignificant.22 We can assess whether dynamic spline is preferred over static spline by comparing AIC and SIC. They are tabulated in Table 6. We find that both AIC and SIC are slightly larger for the dynamic spline than the static spline, partly due to the penalty for the inclusion of κ∗ .23 22

The statistical significance is based on numerical standard error. It may be possible to test the joint significance of κ∗ formally using some of the classical likelihood-based tests. However, the asymptotic distribution of the test statistics may not be the usual chi-square distribution under the null hypothesis (e.g. it may be a mixture of chi-square) as κ∗ lies on the boundary of the parameter space. (See, for example, Harvey (1991, pp.236) for a discussion on this issue.) Verifying the properties of the likelihood-based test statistics in this context is left for further research. 23

26

IBM1m Model specification Total number of coef. Variation in spline AIC SIC

—— 13 Static (γ † ) 21.308 21.318

Model 2 16 Dynamic (γ †t,τ ) 21.310 21.322

IBM30s —— 13 Static (γ † ) 20.0280 20.033

Model 2 16 Dynamic (γ †t,τ ) 20.0281 20.035

Table 6 – Static versus dynamic splines. The criterions for static spline are computed by setting κ∗ = 0 in Model 2. Model 2 is as specified in Table 2. F ∗ ∼ Burr.

5.5

Remark on the Kolmogorov-Smirnov test

It is interesting to contrast the quality of the fit of our model illustrated above with the Kolmogorov-Smirnov (KS) goodness-of-fit test24 in Table 2. Despite the impressive fit of Burr demonstrated by the residual PITs in Figure 6, the KS test always rejects Burr at the 1% level for IBM1m. This partly reflects the fact that the size of the KS test at a given critical value tends to be smaller when the distribution being tested is generalized to include more unknown parameters in θ. For IBM30s, the KS statistics of Models 1, 3, and 4 are insignificant at the 1% level so that the null hypothesis of F ∗ ∼ Burr cannot be rejected at the 1% level. This quality of fit for IBM30s is quite impressive. Note that Models 1 and 3 cannot be the best specification as u bt,τ is significantly serially correlated. In all cases, Burr is rejected at the more conventional significace level of 5% for both IBM1m and IBM30s. Our application illustrates that, when parameters are estimated from data, the KS test can be found to be too stringent in the sense that it rejects H0 for negligible deviations of the empirical distribution of εbt,τ from F ∗ especially when the sample size is very large. The test becomes even more stringent at a given sample size when the dimension of θ∗ increases. We find that the KS test is generally not a practical mode of model selection in this type of applications.

5.6

Estimated components

bt,τ of Model 2 for IBM1m.25 All Figure 15 shows εbt,τ , u bt,τ and the estimated components of λ series are displayed over the entire sample period between 28 February and 31 March 2000. While the time series plot of IBM1m clearly exhibits periodic patterns, εbt,τ appears free of bt,τ through the spline sbt,τ . µ periodicity. The diurnal U-shaped patterns are captured by λ bt,τ (1) and ηbt,τ appear to satisfy their structural assumptions as µ bt,τ resembles a random-walk while ηbt,τ (2) and ηbt,τ resemble stationary AR(2) and AR(1), respectively. u bt is bounded between −b ν ≈ 2.2 26 b and νbζ ≈ 2.5 as expected. 24

The Kolmogorov-Smirnov (KS) goodness-of-fit statistics in Table 2 take into account the finite sample √ adjustment by T × I, and are computed under the null hypothesis H0 : εt,τ | {Ft,τ −1 , {εt,τ > 0}} ∼ iid F ∗ .

(11)

where F ∗ ∼ Burr in this case. The alternative hypothesis replaces ∼ by  in (11). Appendix E discusses the KS statistics in detail. As the unknown parameters in θ∗ of F ∗ are estimated from data, we compute the critical values by simulation. Some of the simulated critical values at the 10%, 5%, and 1% levels are tabulated in Appendix E. 25 The results for IBM30s are very similar to Figure 15 and thus they are omitted. 26 This is due to the property of the beta distribution. See (18) at the end of Appendix B.

27

IBM_VOLUME

RES

2,000,000

1,600,000

U

60

3

50

2

40

1

1,200,000 30

0

20

-1

10

-2

800,000

400,000

0

0 2500

5000

7500

-3 2500

LAMBDA

5000

7500

2500

MU

12

5000

7500

S

.4

1.2

.2

0.8

11 .0

0.4 10

-.2 0.0 -.4

9

-0.4

-.6 8

-.8 2500

5000

7500

-0.8 2500

ETA

5000

7500

2500

5000

7500

ETA2

.8

.4

.6 .2

.4 .2

.0 .0 -.2

-.2

-.4 -.6

-.4 2500

5000

7500

2500

5000

7500

bt,τ (second-row left), µ Figure 15 – IBM1m (top left), εbt,τ (top middle), α bt,τ (top right), λ bt,τ (second-row (1) (2) middle), sbt,τ (second-row right), ηbt,τ (bottom left), ηbt,τ (bottom middle), and u bt,τ (bottom right) of Model 2 fitted to IBM1m over 28 February - 31 March 2000. Time along the x-axes.

5.6.1

Overnight effect

In Section 2.1, we observed that market activity tend to surge shortly after market opening and before closure, and this lead to extreme spikes in trade volume that recurrently occur after open or before close. These extreme observations make the upper-tail of empirical distribution very long. Standard approaches to such overnight effect include treating extreme observations as outliers using dummy variables, differentiating day and overnight jumps, and treating events that occur within a specified period immediately after open as censored.27 Such an approach is often employed in some of observation driven models (based on moments of data) termed autoregressive conditional parameter (ACP) models in Koopman et al. (2012). We do not explicitly model the overnight effect using any jumps or dummy variables or threshold of time because; it may take time for the overnight effect to diminish completely during the day; and it is generally difficult to identify which observations are due to overnight information if there is arrival of new information. Our model assumes that periodic extreme events are natural part of the cyclical dynamics of the conditional distribution of data. Figure 16 illustrates that bt,τ ), reflecting our model captures the overnight effect by the periodic hikes in α bt,τ = exp(λ the cyclical surge in market activity due to the overnight effect. These results are intuitive as overnight information can be interpreted simply as the elevated probability of extreme events. bt,τ . This highlights one of the advantages of the Note that α bt,τ is far more volatile than λ exponential link function as dynamic movements in the scale of conditional distribution can be induced by subtle movements in the link parameter λt,τ to which the filter is applied. 27

See, for instance, Rydberg and Shephard (2003), Gerhard and Hautsch (2007), and Boes et al. (2007).

28

IBM_VOLUME

EXP_LAMBDA

2,000,000

120,000

100,000

1,600,000

80,000 1,200,000 60,000 800,000 40,000 400,000

20,000

0

0 1000

2000

3000

4000

5000

6000

7000

8000

9000

1000

2000

3000

4000

5000

6000

7000

8000

9000

bt,τ ) (right). Sampling Figure 16 – Capturing periodic spikes in data: IBM1m (left) and α bt,τ = exp(λ period: 28 February - 31 March. Time along the x-axes. Using Model 2 as specified in Table 2.

6

Out-of-sample performance

6.1

Model stability: one-step ahead forecasts

We use the predictive cdf to assess the stability of the estimated distribution and parameters as well as the ability of our model to produce good one-step ahead forecasts over a given outof-sample prediction period. The procedure is as follows. Hence force, we use the following notations Ψh = {(t, τ ) ∈ {T + 1, . . . , T + h} × {1, . . . , I}} Ψh,>0 = {(t, τ ) ∈ {T + 1, . . . , T + h} × {1, . . . , I} : yt,τ > 0} Given a set of in-sample observations up to trading day T , we compute the predictive cdf for the next h trading-days at each positive observation as F ∗ (b εt,τ ; θb∗ ),

εbt,τ = yt,τ /b αt,τ

∀(t, τ ) ∈ Ψh,>0

(12)

where all parameters are the maximum likelihood estimates computed using the observations up to day T . We take the estimation results from the previous sections to compute (12). The predictive cdf in (12) simply gives the PIT values of future observations. If our model is a good forecasting model, the predictive cdf in (12) should be distributed as U [0, 1], at least approximately, for each (t, τ ). We assess the closeness of the PIT values to uniformity only qualitatively by inspecting the empirical cdf of (12).28 Figure 17 shows the empirical cdf of the PIT values for IBM1m over forecast horizons up to h = 20 days ahead. In this paper, we do not display the results for IBM30s as they are very similar to IBM1m. As we have 390 observations per day for IBM1m and 780 observations for IBM30s, h = 20 corresponds to 7,800 steps ahead for IBM1m and 156,000 steps ahead for IBM30s. We find that • for both IBM1m and IBM30s, the distribution of the PIT values is roughly U [0, 1] for much of the 20 days of forecast horizon, although there is non-negligible deterioration in the quality of fit; 28

One can formally test this using the predictive likelihood methods discussed in Mitchell and Wallis (2011). In our application, the predictive (log-)likelihood of a single observation is    −1 ∗ log lt,τ = 1l{yt,τ >0} log(1 − pb ) + 1 − 1l{yt,τ >0} log pb + 1l{yt,τ >0} log α bt,τ f (b εt,τ ; θb∗ ) for all (t, τ ) ∈ Ψh .

29

Empirical CDF 1

0.9

0.9

Empirical CDF of F*(res > 0)

Empirical CDF of F*(res > 0)

Empirical CDF 1

0.8 0.7 0.6 0.5 0.4 0.3 0.2

1 day ahead

0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.8 0.7 0.6 0.5 0.4 0.3 0.2

0

1

5 days ahead

0.1

0

0.1

0.2

0.3

1

1

0.9

0.9

0.8 0.7 0.6 0.5 0.4 0.3 0.2

10 days ahead

0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0) Empirical CDF Empirical CDF of F*(res > 0)

Empirical CDF of F*(res > 0)

F*(res > 0) Empirical CDF

0.7

0.8

0.9

0.8 0.7 0.6 0.5 0.4 0.3 0.2

0

1

F*(res > 0)

20 days ahead

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F*(res > 0)

Figure 17 – Empirical cdf of the PIT values (predictive cdf) for one-step ahead forecasts. Forecast horizons: 1 day ahead (top left), 5 days ahead (top right), 10 days ahead (bottom left), 20 days ahead (bottom right). IBM1m for forecast horizons between 3 - 23 April 2000. Using Model 2 as specified in Table 2.

• for both IBM1m and IBM30s, the empirical cdf of the PIT values is close to uniformity around the upper-middle quantile (roughly between the 50% and 70% quantiles) for all of the horizons; and • for IBM1m, the empirical cdf of the PIT values is particularly close to uniformity when h = 5 and h = 20. For IBM30s, it is the closest when h = 5. In summary, our estimated distributions and parameter values appear to be fairly stable and able to provide reasonable one-step-ahead forecasts of the conditional distribution of our series up to 20 days of prediction horizon.

6.2

Multi-step ahead forecasts

We now examine multi-step forecasts, which are of greater interest than one-step forecasts as they give us an ultimate assessment of our model’s predictive ability. We produce multi-step ahead density forecasts over a long forecast horizon using the estimation results obtained in Section 5. The procedure is as follows. We make an optimal forecast of αt,τ = exp(λt,τ ) for (t, τ ) ∈ Ψh conditional on FT,I . This optimality is in terms of MMSE so that the prediction is E [exp(λt,τ )|FT,I ] ≡ α et,τ for (t, τ ) ∈ Ψh .29 We then standardize the actual future observations yt,τ by α et,τ for all (t, τ ) ∈ Ψh . If our model is a good forecasting model, the standardized future observations εet,τ ≡ yt,τ /e αt,τ should be conditionally distributed as Burr(b ν , ζb ) at least approximately. Then F ∗ (e εt,τ ; θb∗ ) for (t, τ ) ∈ Ψh,>0 with F ∗ ∼ Burr gives the PIT values of future observations, and they should be distributed as U [0, 1]. The PIT values should also 29 See (16) and (17) in Appendix B for how this quantity is computed. The difference between notations b·t,τ and e·t,τ is that the computation of the former does not require the use of conditional moment conditions.

30

1 0.9

0.8

0.8

0.8

0.7 0.6 0.5 0.4 0.3 0.2

0.7 0.6 0.5 0.4 0.3 0.2

1 day ahead

0.1 0

Empirical CDF

1 0.9

Empirical CDF

Empirical CDF

1 0.9

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.6 0.5 0.4 0.3 0.2

5 days ahead

0.1 1

0.7

0

0.1

0.2

0.3

PIT value

0.4

0.5

0.6

0.7

0.8

0.9

8 days ahead

0.1 0

1

0

0.1

0.2

0.3

PIT value

0.4

0.5

0.6

0.7

0.8

0.9

1

PIT value

Figure 18 – Empirical cdf of the PIT of multi-step forecasts. Forecast horizons: 1 day ahead (left), 5 days ahead (middle), 8 days ahead (right). IBM1m for forecast horizons between 3 - 23 April 2000. Using Model 2 as specified in Table 2. 1

1

1

0

10

20

30

40

50

Lag

60

70

80

90

0.5

0

−0.5 0

95% confidence interval Sample Autocorrelation

0.5

−0.5 0

95% confidence interval Sample Autocorrelation

Sample Autocorrelation

95% confidence interval

10

20

30

40

50

60

70

80

90

0.5

0

−0.5 0

10

20

Lag

30

40

50

60

70

80

90

Lag

Figure 19 – Autocorrelation of the PIT of multi-step forecasts. Forecast horizons: 1.5 hours ahead (left), one day ahead (middle), five days ahead (right). IBM1m for forecast horizons between 3 - 23 April 2000. Using Model 2 as specified in Table 2.

be free of autocorrelation if the predictions are good. We make predictions for up to h = 8 future trading days. As we have 390 observations per day for IBM1m and 780 observations for IBM30s, h = 8 corresponds to 3,120 steps ahead for IBM1m and 6,240 steps ahead for IBM30s. Figure 18 shows the empirical cdf of the PIT values for IBM1m. As before, we do not display the results for IBM30s as they are very similar to IBM1m. We find that the distribution of the PIT values is very close to U [0, 1] for the first day of forecast horizon for both IBM1m and IBM30s. This means that the density prediction produced by our model is very good for the first 390 steps for IBM1m and 780 steps for IBM30s. Figure 19 shows the sample autocorrelation function of the PIT values for IBM1m. A Box-Ljung test indicate that the PIT values for the first quarter of the first forecast day, equivalent to 100 steps ahead for IBM1m, is not serially correlated. By the end of the first forecast day, the PIT becomes autocorrelated, although the degree of autocorrelation is weak. If we consider this predictive performance in terms of the number of steps, these results are impressive as the PIT values are remarkably close to being iid U [0, 1] for several hundred steps of the forecast horizon. Beyond the first forecast day, the quality of density forecasts and the degree of autocorrelation deteriorate with the length of forecast horizon. Any deterioration in the goodness-of-fit over a given out-of-sample period can be attributed to changes in fundamental factors such as the nature of periodicity, autocorrelation, and the shape of underlying distribution. In order to reflect these changes, the model should be reestimated frequently as more data become available. When updating the model, one may want to keep the length of sampling period roughly fixed and roll it forward instead of arbitrarily increasing the sample size because • the estimation efficiency deteriorates relatively quickly as the number of sampling days T increases in a high-frequency environment with large I; and • our model is only a local approximation as we emphasized earlier. 31

7

Concluding remarks and further directions

The main contribution of this paper is the development of an intra-day DCS model with dynamic cubic spline to be fitted to ultra-high frequency observations with periodic behavior. We illustrated the excellent performance of our model in a financial application. The object of empirical analysis is trade volume as measured by the number of shared traded, and, as such, this study also contributes to the literature dedicated to the analysis of market activity and intensity. This paper is a new addition to countless studies devoted to modeling and forecasting the dynamics of price and quantity in isolation. By the guiding principle of economic analysis, market prices are determined by the interaction of supply and demand, and thus such one-dimensional studies is ultimately not satisfactory. Thus, the next step towards a more complete analysis is to construct multivariate intra-day DCS modeling price and volume simultaneously. One can also extend our model to study correlation of scale with other variables, or construct a multivariate model for panel-data using composite likelihood. We illustrated that our spline-DCS model deals with recurrent extreme observations by capturing them as part of the periodic movements in the dynamic conditional scale αt,τ . We also noted that this contrasts with the standards methods for dealing with overnight effect using dummy variables, jumps, or censoring. It would be interesting to compare the performance of the spline-DCS model against that of some of the ACP models such as the ACD or autoregressive conditional intensity (ACI) models of Engle and Russell (1998) and some of multiplicative error models surveyed in Hautsch (2012, pp. 99) in a variety of applications.

References Alizadeh, S., Brandt, M. W. and Diebold, F. X. (2002). Range-based estimation of stochastic volatility models, Journal of Finance 57(3): 1047–1091. Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal, Econometrica 41(6): 997–1016. Andersen, T. G. and Bollerslev, T. (1998). Deutsche mark-dollar volatility: intraday activity patterns, macroeconomic annoucements, and longer run dependencies, Journal of Finance 53(1): 219–265. Andres, P. and Harvey, A. C. (2012). The dynamic location/scale model: with applications to intra-day financial data, Cambridge Working Papers in Economics CWPE1240, University of Cambridge. Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics, Journal of Econometrics 73(1): 5–59. Boes, M.-J., Drost, F. C. and Werker, B. J. M. (2007). The impact of overnight periods on option pricing, Journal of Financial and Quantitative Analysis 42(2): 517–533. Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity, Journal of Econometrics 31(3): 307–327. Bollerslev, T. and Mikkelsen, H. O. (1996). Modeling and pricing long memory in stock market volatility, Journal of Econometrics 73(1): 151–184. Bowsher, C. G. and Meeks, R. (2008). The dynamics of economic functions: Modelling and forecasting the yield curve, Economics Papers 2008-W05, Economics Group, Nuffield College, University of Oxford. 32

Brownlees, C. T., Cipollini, F. and Gallo, G. M. (2011). Intra-daily volume modelling and prediction for algorithmic trading, Journal of Financial Economics 9(3): 489–518. Campbell, S. D. and Diebold, F. X. (2005). Weather forecasting for weather derivatives, Journal of American Statistical Association 100(469): 6–16. Cross, F. (1973). The behavior of stock prices on Fridays and Mondays, Financial Analysts Journal 29(6): 67–69. Diebold, F. X. and Rudebusch, G. D. (1989). Long memory and persistence in aggregate output, Journal of Monetary Economics 24(2): 189–209. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50(4): 987–1007. Engle, R. F. and Rangel, J. G. (2008). The spline-GARCH model for low-frequency volatility and its global macroeconomic causes, Review of Financial Studies 21(3): 1187–1222. Engle, R. F. and Russell, J. R. (1998). Autoregressive conditional duration: A new model for irregularly spaced transaction data, Econometrica 66(5): 1127–1162. Engle, R. F. and Sokalska, M. E. (2012). Forecasting intraday volatility in the US equity market. multiplicative component GARCH, Journal of Financial Econometrics 10(1): 54–83. French, K. (1980). Stock returns and the weekend effect, Journal of Financial Economics 8(1): 55–69. Gerhard, F. and Hautsch, N. (2007). A dynamic semiparametric proportional hazard model, Studies in Nonlinear Dynamics and Econometrics 11(2). Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series, Journal of Time Series Analysis 4(4): 221–238. Gibbons, M. R. and Hess, P. (1981). Day of the week effects and asset returns, Journal of Business 54(4): 579–596. Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing, Journal of Time Series Analysis 1(1): 15–29. Harris, L. (1986). A transaction data study of weekly and intradaily patterns in stock returns, Journal of Financial Economics 16(1): 99–117. Harvey, A. C. (1991). Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press. Harvey, A. C. (1993). Time Series Models, 2nd edn, Harvester, Wheatsheaf. Harvey, A. C. (2010). Exponential conditional volatility models, Cambridge Working Papers in Economics CWPE1040, University of Cambridge. Harvey, A. C. (2013). Dynamic models for volatility and heavy tails: with applications to financial and economic time series, Econometric Society monograph, Cambridge University Press (forthcoming). Harvey, A. C. and Chakravarty, T. (2008). Beta-t-(E)GARCH, Cambridge Working Papers in Economics CWPE0840, University of Cambridge. 33

Harvey, A. C. and Koopman, S. J. (1993). Forecasting hourly electricity demand using timevarying splines, Journal of the American Statistical Association 88(424): 1228–1236. Harvey, A. C., Koopman, S. J. and Riani, M. (1997). The modeling and seasonal adjustment of weekly observations, Journal of Business & Economic Statistics 15(3): 354–68. Harvey, A. C. and Luati, A. (2012). Filtering with heavy tails, Cambridge Working Papers in Economics CWPE1255, University of Cambridge. Hautsch, N. (2012). Econometrics of financial high-frequency data, Springer, Berlin. Hautsch, N., Malec, P. and Schienle, M. (2010). Capturing the zero: a new class of zeroaugmented distributions and multiplicative error processes, SFB 649 Discussion Papers SFB649DP2010-055, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany. Heckman, J. J. (1974). 42(4): 679–94.

Shadow prices, market wages, and labor supply, Econometrica

Hosking, J. R. M. (1981). Fractional differencing, Biometrika 68(1): 165–176. Jaffe, J. and Westerfield, R. (1985). The week-end effect in common stock returns: the international evidence, Journal of Finance 40(2): 433–454. Keim, D. B. and Stambaugh, R. F. (1984). A further investigation of the weekend effect in stock returns, Journal of Finance 39(3): 819–835. Koopman, S. J., Lucas, A. and Scharth, M. (2012). Predicting time-varying parameters with parameter-driven and observation-driven models, Tinbergen Institute Discussion Papers 12020/4, Tinbergen Institute. Lakonishok, J. and Levi, M. (1982). Weekend effects on stock returns: a note, Journal of Finance 37(3): 883–889. Lakonishok, J. and Smidt, S. (1988). Are seasonal anomalies real? A ninety-year perspective, Review of Financial Studies 1(4): 403–425. Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown, Journal of the American Statistical Association 62(318): 399–402. Lo, A. W. and Wang, J. (2010). Stock market trading volume, in Y. At-Sahalia and L. Hansen (eds), Handbook of financial econometrics, Vol. 2, North-Holland, New York. McCulloch, R. E. and Tsay, R. S. (2001). Nonlinearity in high-frequency financial data and hierarchical models, Studies in Nonlinear Dynamics and Econometrics 5(1): 1–17. Mitchell, J. and Wallis, K. F. (2011). Evaluating density forecasts: forecast combinations, model mixtures, calibration and sharpness, Journal of Applied Econometrics 26(6): 1023–1040. Robinson, P. M. (1990). Time series with strong dependence, Advances in Econometrics: 6th world congress, Cambridge University Press, Cambridge. Rydberg, T. H. and Shephard, N. (2003). Dynamics of trade-by-trade price movements: decomposition and models, Journal of Financial Econometrics 1(1): 2–25. Taylor, S. (1986). Modelling financial time series, Wiley, Chichester West Sussex ; New York.

34

Taylor, S. (2005). Asset price dynamics, volatility, and prediction, Princeton University Press, Princeton, N.J. Zhang, M. Y., Russell, J. R. and Tsay, R. S. (2001). A nonlinear autoregressive conditional duration model with applications to financial transaction data, Journal of Econometrics 104(1): 179–207.

A A.1

List of distributions and their properties Generalized beta distribution

The (standard) generalized beta (GB) distribution (also known as the generalized beta distribution of the second kind) has the probability density function (pdf): f (x; ν, ξ, ζ) =

νxνξ−1 (xν + 1)−ξ−ζ , B(ξ, ζ)

x > 0, and ν, ξ, ζ > 0

where B(·, ·) denotes the Beta function. GB becomes the Burr distribution when ξ = 1 and the log-logistic distribution when ξ = ζ = 1. The Burr distribution is also called the Pareto Type IV distribution (denoted Pareto IV). Log-logistic is also called Pareto III. Burr becomes Pareto II when ν = 1. Burr becomes Weibull (defined in Appendix A.2) when ζ → ∞. GB with ν = 1 and ξ = ζ is a special case of the F distribution with the degrees of freedom ν1 = ν2 = 2ξ. If a non-standardized random variable Y follows the GB distribution, its pdf fY : R>0 → R with the scale parameter α > 0 is fY (y; α, ν, ξ, ζ) = f (y/α; ν, ξ, ζ)/α,

y > 0.

The m-th moment of Y is αm Γ(ξ + m/ν)Γ(ζ − m/ν) , E[y ] = Γ(ξ)Γ(ζ) m

m < νζ.

(13)

For a set of iid observations y1 , . . . , yT where each follows the non-standardized GB distribution, the log-likelihood function of a single observation yt can be written using the exponential link function α = exp(λ) with the link parameter λ ∈ R as: log fY (yt ) = log(ν) − νξλ + (νξ − 1) log(yt ) − log B(ξ, ζ) − (ξ + ζ) log[(yt e−λ )ν + 1]. The score ut of the non-standardized GB computed at yt is ut ≡

∂ log fY (yt ) ν(ξ + ζ)(yt e−λ )ν = − νξ = ν(ξ + ζ)bt (ξ, ζ) − νξ ∂λ (yt e−λ )ν + 1

where we used the notation bt (ξ, ζ) ≡

(yt e−λ )ν . (yt e−λ )ν + 1

By the property of the GB distribution, we know that bt (ξ, ζ) follows the beta distribution with parameters ξ and ζ, which is characterized by the moment generating function (mgf) ! ∞ k−1 X Y  ξ + r  zk Mb (z; ξ, ζ) ≡ E[ebz ] = 1 + . ξ + ζ + r k! r=0 k=1 35

It is easy to check that E[ut ] = 0.

A.2

Generalized gamma distribution

The (standard) generalized gamma (GG) distribution has the pdf: f (x; γ, ν) =

ν νγ−1 x exp (−xν ) , Γ(γ)

0 < x, and γ, ν > 0,

where Γ(·) is the gamma function. The GG distribution becomes the gamma distribution when ν = 1, the Weibull distribution when γ = 1, and the exponential distribution when ν = γ = 1. If a non-standardized random variable Y follows the GG distribution, its pdf fY : R>0 → R with the scale parameter α > 0 is fY (y; α, γ, ν) = f (y/α; γ, ν)/α,

0 < y.

The m-th moment of Y is E[y m ] = αm Γ(γ + m/ν)/Γ(ν). For a set of iid observations y1 , . . . , yT where each follows the non-standardized GG distribution, the log-likelihood function of a single observation yt can be written using the exponential link function α = exp(λ) as: log fY (yt ) = log(ν) − λ + (νγ − 1) log(yt e−λ ) − (yt e−λ )ν − log Γ(γ). The score ut of the non-standardized GG computed at yt is ut ≡

∂ log fY (yt ) = ν(yt e−λ )ν − νγ = νgt (ν) − νγ, ∂λ

where we used the notation gt (ν) ≡ (yt e−λ )ν . By the property of the GG distribution, we know that gt (ν) follows the (standard) Gamma distribution with parameter γ, which is characterized by the mgf E[egz ] = (1 − z)−γ for z < 1. We also have E[ut ] = 0.

A.3

Log-normal distribution

The (non-standardized) log-normal distribution has the pdf:  2 ! 1 log(y) − log(α) 1 fY (y; α, σ) = √ exp − , 2 σ y 2πσ 2

y > 0, and α, σ > 0

For a log-normally distributed random variable Y , the moments of all orders can be obtained easily using the mgf of the normal distribution as E[Y m ] = E[em log(Y ) ] for all m ∈ N>0 . For a set of iid observations y1 , . . . , yT , where each follows the log-normal distribution, the loglikelihood function of a single observation yt can be written using the exponential link function α = exp(λ) as: 1 1 1 log fY (yt ) = − log(yt ) − log(2π) − log(σ 2 ) − 2 2 2 The score ut of the log-normal computed at yt is ut ≡

∂ log fY (yt ) log(yt ) − λ = . ∂λ σ2 36



log(yt ) − λ σ

2 .

Thus, ut is a log-normally distributed random variable. We also have E[ut ] = 0 as λ is the first moment of log(yt ).

B

The DCS model of Andres and Harvey (2012)

Andres and Harvey (2012) introduce the DCS model for estimating the sequence of positive scaling factors (αt|t−1 )Tt=1 of a given sequence of observations (yt )Tt=1 so that the standardized series yt /αt|t−1 ≡ εt satisfy εt |Ft−1 ∼ iid F,

t = 1, . . . T,

for some standard distribution F with a vector of parameters θ and the pdf f . The DCS estimates αt|t−1 via an appropriately chosen canonical link function and the link parameter λt|t−1 ∈ R. The choice of the link function depends on F . Using the exponential link function, the one-factor DCS model of Andres and Harvey is defined as yt = εt exp(λt|t−1 ),

λt|t−1 = δ + φλt−1|t−2 + κut−1 ,

εt |Ft−1 ∼ iid F,

t = 1, . . . , T,

(14)

where δ, φ ∈ R and κ ≥ 0 are the parameters of the model and ut is the conditional score: ∂ log fY (yt ; λt|t−1 , θ) ∂[log f (yt e−λt|t−1 ; θ) − λt|t−1 ] ut ≡ = , ∂λt|t−1 ∂λt|t−1

t = 1, . . . , T.

We used fY to denote the pdf of the conditional distribution of yt for each t. λt|t−1 is stationarity if |φ| < 1. If λt|t−1 is stationary, its unconditional first moment is ω = δ/(1 − φ). Andres and Harvey study the case where yt is non-negative for all t, and consider a number of non-negative distributions for F including GG, GB, and log-normal. However, the application of DCS needs not be limited to non-negative observations. Beta-t-EGARCH of Harvey and Chakravarty (2008) considers the case where yt |Ft−1 follows the student’s t-distribution in the context of return volatility. The notable features of the DCS model in (14) include: • the asymptotic properties of the maximum likelihood estimators (MLEs) of the model. Andres and Harvey show the consistency and asymptotic normality of MLEs for a number of useful distributions. • the conditional score variable ut , whose specification depends on the conditional distribution of yt . If F is characterized by the GG distribution, ut is a linear function of a gamma distributed random variable. In contrast, if F is characterized by the GB distribution, ut is a linear function of a beta distributed random variable. (See Appendix A.) In both cases, ut is iid given Ft−1 . The sensitivity of the dynamics of λt|t−1 to the effects of extreme observations is determined by F as we will see later. • the use of the exponential link function, which means that we do not need to impose parameter restrictions on δ, φ, and κ to ensure that αt|t−1 is positive. • the availability of the analytical expressions for the unconditional moments and optimal multi-step forecasts of yt whenever the corresponding (conditional) moments of εt and the mgf of ut exist.

37

The analytical expression for the m-th moment of yt is readily available in terms of the m-th moment of εt and the mgf of ut (whenever they are well defined) as: E[ytm ]

=

mλt|t−1 E[εm ] t ]E[e

=

mω E[εm t ]e

∞ Y

E[emψj ut−j ]

(15)

j=1

where the second equality used the MA(∞) representation of λt|t−1 with ψj = κφj−1 for each j ∈ N>0 . (Note that if ut is a function of a gamma distributed random variable, the mgf of ut exists only for a restricted parameter space.) We can also derive the expression for the autocorrelation function of yt using (15). Given the observations up to time T , we can derive the expression for the optimal30 l-step predictor E[yT +l |FT ] for l ∈ N>0 by writing λT +l|T +l−1 as λT +l|T +l−1 = ω +

l−1 X

ψk uT +l−k +

k=1

where λT +l|T = ω +

P∞

k=l

∞ X

ψk uT +l−k = λT +l|T +

k=l

l−1 X

ψk uT +l−k ,

(16)

k=1

ψk uT +l−k is the optimal l-step forecast of λT +l|T +l−1 . (16) gives E[e

λT +l|T +l−1

λT +l|T

|FT ] = e

l−1 Y

ET [eψk ut+l−k ].

(17)

k=1

Thus we can derive the expression for E[yT +l |FT ] using (17) and the expression for E[εT +l |FT ] (whenever these moments exists under F ). Moreover, the mean squared error (MSE) of E[yT +l |FT ] can be similarly derived using Var[eλT +l|T +l−1 |FT ] = E[e2λT +l|T +l−1 |FT ] − E[eλT +l|T +l−1 |FT ]2 " l−1 # l−1 Y Y = e2λT +l|T E(e2ψj ut−j |FT ) − E[eψj ut−j |FT ]2 . j=1

j=1

The existence of the mgf of ut is essential in these expressions. Simulating the l-step predictive distribution for the DCS model is easy because, by (16), it is the distribution of l−1 Y λT +l|T yT +l = εT +l e eψj uT +l−j . j=1

Finally, when F is characterized by the GB distribution, the variable bt (ξ, ζ) we defined in Appendix A is bounded between 0 and 1 by the property of the beta distribution. This means that we have −νξ ≤ ut ≤ νζ. (18) These bounds on ut determines the sensitivity of the dynamics of λt|t−1 to the effects of extreme observations. If F is characterized by the GG distribution, −νγ < ut and it is unbounded above by the property of the gamma distribution. 30

This optimality is in terms of the minimum mean square error (MMSE).

38

C

Time-varying cubic spline of Harvey and Koopman (1993)

In this section, we use the time indices defined in Section 3.3 and formally explain the construction of the daily cubic spline of Section 3.4.1. Most of our notations are consistent with Harvey and Koopman. As before, we use the increasing sequence of times 0 = τ0 < τ1 < · · · < τk = I for some k < I to denote the knots (or mesh) along the time-axis of our spline, where τj ∈ {1, . . . , I − 1} for each j = 0, . . . , k. The y-coordinates (height) of the knots are denoted γ = (γ0 , γ1 , . . . , γk ). We denote the distance between knots along the time-axis by hj = τj −τj−1 for j = 1, . . . k. Our cubic spline function g : [τ0 , τk ] → R is a piecewise function of the form g(τ ) =

k X

gj (τ )1l{τ ∈[τj−1 ,τj ]} ,

∀ τ ∈ [τ0 , τk ],

j=1

where each function gj : [τj−1 , τj ] → R,

j = 1, . . . , k

is a polynomial of order up to three. gj for j = 1, . . . , k must satisfy the following three conditions. 1. Continuity: the function g must be continuous at each knot (τj , γj ) so that gj (τj ) = γj and gj (τj−1 ) = γj−1 for all j = 1, . . . , k. This means that we have, for j = 1, . . . , k, gj (τj−1 ) = gj−1 (τj−1 )

and

0 gj0 (τj−1 ) = gj−1 (τj−1 ).

(19)

2. Order: gj00 (·) is a linear function on [τj−1 , τj ] for j = 1, . . . , k. This means that we have gj00 (τ ) = aj−1 +

(τj − τ ) (τ − τj−1 ) τ − τj−1 (aj − aj−1 ) = aj−1 + aj , hj hj hj

(20)

for τ ∈ [τj−1 , τj ] and j = 1, . . . , k, where a0 = g100 (τ0 ) and aj = gj00 (τj ) for j = 1, . . . , k. 3. Periodicity: to make g a periodic function of time, g1 and gk (at the beginning and the end of one period) must satisfy γ0 = γk ,

g10 (τ0 ) = gk0 (τk ),

and g100 (τ0 ) = gk00 (τk )

so that a0 = ak . We integrate (20) with respect to τ to find the expressions for gj0 and gj for j = 1, . . . , k. That is, we evaluate Z Z Z 0 00 gj (τ ) = gj (τ )dτ, and gj (τ ) = gj00 (τ )dτ for each j = 1, . . . , k, where we recover the integration constant using the continuity condition (19). From this, we obtain for j = 1, . . . , k     1 (τj − τ )2 hj 1 (τ − τj−1 )2 hj 0 gj (τ ) = − − aj−1 + − aj (21) 2 hj 6 2 hj 6     (τj − τ )2 − h2j (τ − τj−1 )2 − h2j 1 1 gj (τ ) = (τj − τ ) aj−1 + γj−1 + (τ − τj−1 ) aj + γj 6hj hj 6hj hj = rj (τ ) · γ + sj (τ ) · a (22) 39

for τ ∈ [τj−1 , τj ], where we used the notation a = (a0 , a1 , . . . , ak )> , and rj (τ ) and sj (τ ) are k × 1 vectors defined as  > (τj − τ ) (τ − τj−1 ) , , 0, . . . , 0 , rj (τ ) = 0, . . . , 0, hj hj (23)  > (τj − τ )2 − h2j (τ − τj−1 )2 − h2j , (τ − τj−1 ) , 0, . . . , 0 . sj (τ ) = 0, . . . , 0, (τj − τ ) 6hj 6hj The non-zero elements of the two vectors in (23) are at the j-th and (j + 1)-th entries. By the periodicity condition, we have γ0 = γk and a0 = ak so that γ0 and a0 become redundant during estimation. Moreover, the conditions for gj0 in (19) and (21) give hj hj+1 6γj−1 6γj 6γj+1 aj−1 + 2aj + aj+1 = − + hj + hj+1 hj + hj+1 hj (hj + hj+1 ) hj hj+1 hj+1 (hj + hj+1 ) for j = 2, . . . , k − 1, and h1 h2 6γk 6γ1 6γ2 ak + 2a1 + a2 = − + , h1 + h2 h1 + h2 h1 (h1 + h2 ) h1 h2 h2 (h1 + h2 ) 6γk h1 6γk−1 6γ1 hk − ak−1 + 2ak + a1 = + hk + h1 hk + h1 hk (hk + h1 ) hk h1 h1 (hk + h1 )

j = 1, j = k.

From these, we obtained k equations for k ”unknowns” a1 , . . . , ak . Using notations γ † = (γ1† , . . . , γk† )> = (γ1 , . . . , γk )> and a† = (a†1 , . . . , a†k )> = (a1 , . . . , ak )> , we write this system of equations in a matrix form as P a† = Qγ † , where 

2

 h2  h2 +h3  0   0 P =  .  ..    0

h2 h1 +h2

2 h3 h3 +h4

h1 h1 +hk  − h16h2 6   h2 (h2 +h3 )

   Q=     

0 0 .. . 0

6 h1 (h1 +hk )

0 .. .

0 h3 h2 +h3

2 h4 h4 +h5

.. . 0 0

0 0

6 h2 (h1 +h2 ) − h26h3 6 h3 (h3 +h4 )

0 .. . 0 0

0 0 2 .. .

... ... ··· ··· .. .

0 0

... ...

h4 h3 +h4

0 6 h3 (h2 +h3 ) − h36h4 6 h4 (h4 +h5 )

.. . 0 0

h1 h1 +h2

0 0 0 0 .. .



0 0 0 .. .

     ,     hk hk−1 +hk  2

2 hk h1 +hk

... ... ... ... .. .

0 0 0 0 .. .

... ...

6 − hk−1 hk

6 hk (h1 +hk )

6 h1 (h1 +h2 )



0 0 0 .. .

          6 hk (hk−1 +hk )  − h16hk

For a non-singular P , we have a† = P −1 Qγ † . Then (22) becomes gj (τ ) = wj (τ ) · γ † ,

wj (τ )> = rj (τ )> + sj (τ )> P −1 Q

40

(24)

for τ ∈ [τj−1 , τj ], where rj (τ ) and sj (τ ) are now k × 1 vectors by the periodicity condition with  r1 (τ ) =

τ − τ0 τ1 − τ , 0, . . . , 0, h1 h1

> ,

>  (τ1 − τ )2 − h21 (τ − τ0 )2 − h21 , 0, . . . , 0, (τ1 − τ ) , s1 (τ ) = (τ − τ0 ) 6h1 6h1 and rj (τ ) and sj (τ ) for j = 2, . . . , k are as defined in (23) but the non-zero elements are shifted to (j − 1)-th and j-th entries. Finally, using the expression in (24), we obtain our cubic spline as k X sτ = g(τ ) = 1l{τ ∈[τj−1 ,τj ]} wj (τ ) · γ † ∀ τ ∈ [τ0 , τk ]. (25) j=1

The elements of γ † are parameters of the model to be estimated. For the parameters to be identified, we impose a zero-sum constraint on the elements of γ † . This constraint is X

sτ =

τ ∈[τ0 ,τk ]

k X X

1l{τ ∈[τj−1 ,τj ]} wj (τ ) · γ † = w∗ · γ † = 0

τ ∈[τ0 ,τk ] j=1

where we used the notation k X X

>

w∗ = (w∗1 . . . w∗k ) =

1l{τ ∈[τj−1 ,τj ]} wj (τ ).

τ ∈[τ0 ,τk ] j=1

We can impose this condition by setting γk†

=−

k−1 X

w∗i γi† /w∗k .

i=1

Then (25) becomes k X

 k−1  k X X wjk (τ )w∗i † sτ = 1l{τ ∈[τj−1 ,τj ]} wji (τ ) − γi = 1l{τ ∈[τj−1 ,τj ]} z j (τ ) · γ † w ∗k j=1 i=1 j=1 for all τ ∈ [τ0 , τk ], where wji (τ ) denotes the i-th element of wj (τ ), and the i-th element of z j (τ ) is ( wji (τ ) − wjk (τ )w∗i /w∗k i 6= k zji (τ ) = (26) 0 i=k for τ ∈ [τj−1 , τj ] and i, j = 1, . . . , k. When estimating the model, it is convenient to compute w∗ using the equation > > −1 w> Q, ∗ = r ∗ + s∗ P where r∗ and s∗ are k × 1 vectors defined as  r∗ =  s∗ =

τ2 − τ0 τk − τk−2 τ1 − τ0 + τk − τk−1 ,..., , 2 2 2

> ,

τk − τk−2 − h3k − h3k−1 h1 (1 − h21 ) + hk (1 − h2k ) τ2 − τ0 − h32 − h31 ,..., , 24 24 24

41

> .

1.2 1 0.8

|ak|

0.6 0.4

0.2 0 0

1000

2000

3000

4000

5000

6000

7000

8000

9000

k

Figure 20 – Decay rate of (1 − L)d coefficients

The static spline in (25) becomes dynamic by letting γ † to be time-varying according to st,τ =

k X

1l{τ ∈[τj−1 ,τj ]} wj (τ ) · γ †t,τ ,

γ †t,τ = γ †t,τ −1 + κ∗ · ut,τ −1 1l{yt,τ −1 >0}

(27)

j=1

for τ = 1, . . . , I and t = 1, . . . , T , where κ∗ = (κ∗1 , . . . , κ∗k )> is a vector of parameters. Harvey and Koopman use a set of contemporaneous disturbances instead of the lagged score. The dynamic spline (27) still needs to sum to zero over one complete period for the parameters to be identified. That is, (27) must satisfy I X

st,τ = w∗ · γ †t,τ = 0,

t = 1, . . . , T.

τ =1

We can ensure that this constraint holds by setting w∗ · γ †1,0 = 0,

w∗ · κ∗ = 0.

and

These conditions, in turn, become zero-sum constraints on the elements of γ †1,0 and κ∗ . We can set k−1 k−1 1 X 1 X † † γk;1,0 =− w∗i γi;1,0 κ∗k = − w∗i κ∗i . w∗k i=1 w∗k i=1 † where γi;1,0 denotes the i-th element of γ †1,0 . Finally, using the definition of z j (τ ) for j = 1, . . . , k in (26), we can write (27) as

st,τ =

k X

1l{τ ∈[τj−1 ,τj ]} z j (τ ) · γ †t,τ ,

γ †t,τ = γ †t,τ −1 + κ∗ · ut,τ −1 1l{yt,τ −1 >0} .

j=1

for τ = 1, . . . , I and t = 1, . . . , T .

D

Fractional integration

Consider the following fractionally integrated process (1 − L)d yt = εt ,

42

t = 1, . . . , T

(28)

where d is a fraction, L is the lag-operator, and (εt )Tt=1 is a stationary and invertible process. This process is stationary and invertible if |d| < 0.5 (Granger and Joyeux (1980), and Hosking (1981)).31 Using the Binomial formula, we can expand (1 − L)d as ! ∞ k−1 ∞ ∞ X Y X X 1 Γ(k − d) Lk = 1 + (j − d) Lk = 1 + (1 − L)d = ak Lk (29) Γ(−d)Γ(k + 1) k! j=0 k=1 k=1 k=0 Q  k−1 (j − d) in the last equality. Although |ak | → 0 as k → ∞, where we have used ak ≡ k!1 j=0 the convergence is very slow as shown in Figure 20. These slowly decaying coefficients ak implied by (1 − L)d induce long-memory in yt . Harvey (1993) and Taylor (2005) suggest the method for estimating d in the frequency domain developed by Geweke and Porter-Hudak (1983). The procedure is as follows. First, we rewrite (28) as yt = (1 − L)−d εt . Denote the power spectrum of (εt )Tt=1 as fε (x) where x ∈ [−π, π] is a frequency in radians. The power spectrum of yt denoted fy (x) is fy (x) = |1 − e

−ix −2d

|

−d

−d

fε (x) = 2 (1 − cos x) fε (x) = 4

−d



x −d sin fε (x), 2 2

x ∈ [−π, π].

Take the logarithm of this equation and rearrange on both sides using the sample spectral for j = 0, . . . , m and m  T to get density32 I(xj ) computed at frequencies xj = 2πj T     h  i fε (xj ) I(xj ) 2 xj log I(xj ) = log fε (0) − d log 4 sin + log + log 2 fε (0) fy (xj ) The last term is asymptotically iid with a constant mean and variance π 2 /6. This term can be treated as the disturbance term. The third term is negligible for frequencies near zero. Thus we have a regression model: h  x i j + vj log I(xj ) = β0 + β1 log 4 sin2 2 where vj is assumed to be iid with zero mean and variance π 2 /6. Regressing log I(xj ) on ˆ Consistency and asymptotic normality of dˆ log[4 sin2 (xj /2)] and taking minus βˆ1 yields d. when d < 0 are shown by Geweke and Porter-Hudak (1983), and consistency for 0 < d < 0.5 is shown by Robinson (1990). If m is chosen appropriately, dˆ has the limiting distribution q d ˆ ˆ → (d − d)/ Var(d) N(0, 1) ˆ is obtained from the regression. (See Geweke and Porter-Hudak (1983) or Baillie where Var(d) (1996).) As noted in Taylor (2005) and Baillie (1996), one main issue is the choice of m as this 31

Nonetheless, as emphasized in Bollerslev and Mikkelsen (1996), shocks to optimal forecasts over an out-ofsample period will dissipate for all values of d < 1. For d > 0.5, Bollerslev and Mikkelsen (1996) suggest that one could simulate pre-sample values instead of fixing the pre-sample values at their unconditional counterparts. However, they note that employing a non-parametric estimation procedure of this kind to determine the presample values does not necessarily lead to notably different estimation results from the ones obtained by setting them at their unconditional counterparts. 32 The sample spectral density is defined as " # T −1 X 1 c(0) + 2 c(τ ) cos(xτ ) , x ∈ [−π, π], I(x) = 2π τ =1 where c(τ ) denotes sample autocovariance.

43

is associated with the bias-accuracy trade-off. Taylor (2005) suggests that a popular choice is to set m = T θ where θ is between 0.5 and 0.8. Other methods for choosing m are suggested in Baillie (1996). For this study, we chose m = bT 0.5 c, which was also used by Diebold and Rudebusch (1989).33

E

Kolmogorov-Smirnov: simulated critical values

Given a set of n iid observations x1 , . . . , xn , we can compute the Kolmogorov-Smirnov statistic using a cumulative distribution function F0 : SX → [0, 1] defined over the support SX with the vector of parameters β (which may or may not include the location and scale factors) as Dn = sup |Fn (x) − F0 (x)|

(30)

x∈SX

where Fn : R → [0, 1] is the empirical cumulative distribution function of x1 , . . . , xn defined as n

Fn (x) =

1X 1l{xi ≤x} , n i=1

x ∈ R.

√ To perform the KS test, we use KSn = nDn , which adjusts (30) by the finite sample size. Given the iid observations x1 , . . . , xn , if we want to test the null hypotheses H0 : x1 ∼ F0 against the alternative H1 : x1  F0 without specifying the parameter values in β, the asymptotic distribution of KSn needs to be simulated. We tabulate some of the simulated critical values of the KS statistics (Dn and KSn ) for the two cases, F0 ∼ log-logistic and F0 ∼ Burr, in Table 7. For log-logistic, we assume that ν and the scale parameter α are estimated from data using the method of maximum likelihood. For Burr, we assume that ν, ζ, and α are estimated from data. In both cases, we simulate the set of n observations 5,000 times and find the sample quantiles. The critical values in these tables may be compared with the simulated critical values for the Gaussian case in Lilliefors (1967).

33

In this appendix, T denotes the total sample size. Recall that, in the body of this paper before the appendices, the total sample size is denoted T × I where T denotes the total number of sampling days and I is the sampling (aggregation) frequency.

44

Dn

KSn

Level of significance 0.10 0.05 0.01

Level of significance 0.10 0.05 0.01

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0.31 0.29 0.27 0.25 0.23 0.22 0.21 0.21 0.20 0.19 0.18 0.18 0.17 0.17 0.16 0.16 0.16

0.33 0.31 0.28 0.27 0.25 0.24 0.23 0.22 0.21 0.20 0.20 0.19 0.18 0.18 0.18 0.17 0.17

0.37 0.34 0.32 0.30 0.28 0.27 0.26 0.25 0.24 0.24 0.23 0.22 0.21 0.21 0.20 0.20 0.19

0.62 0.64 0.65 0.66 0.66 0.67 0.68 0.68 0.68 0.68 0.69 0.70 0.69 0.69 0.70 0.70 0.70

0.66 0.69 0.70 0.71 0.71 0.72 0.73 0.73 0.73 0.74 0.74 0.75 0.74 0.75 0.75 0.75 0.76

0.75 0.75 0.78 0.80 0.81 0.82 0.82 0.83 0.83 0.85 0.85 0.86 0.84 0.87 0.85 0.87 0.85

25 30

0.14 0.13

0.15 0.14

0.18 0.16

0.70 0.71

0.77 0.78

0.88 0.88

1000 5000 10000 15000 20000

0.02 0.01 0.01 0.01 0.01

0.03 0.01 0.01 0.01 0.01

0.03 0.01 0.01 0.01 0.01

0.75 0.76 0.75 0.74 0.75

0.81 0.83 0.81 0.80 0.81

0.91 0.92 0.93 0.94 0.92

Sample size (n)

Sample size (n) 1000 5000 10000 15000 20000

Dn

KSn

Level of significance 0.10 0.05 0.01

Level of significance 0.10 0.05 0.01

0.02 0.01 0.01 0.01 0.01

0.73 0.72 0.73 0.73 0.73

0.02 0.01 0.01 0.01 0.01

0.03 0.01 0.01 0.01 0.01

0.79 0.78 0.79 0.79 0.79

0.89 0.89 0.91 0.90 0.91

Table 7 – Simulated critical values of the Kolmogorov-Smirnov statistics under the log-logistic (top) and Burr (bottom) distributions. The statistics are simulated 5,000 times. Unknown parameters estimated by the method of maximum likelihood are ν and α for log-logistic and ν, ζ and α for Burr.

45

F

Estimated coefficients of other Models in Table 2 IBM1m Model 1 Coef. S.E. 0.015 0.002

Model 3 Coef. S.E. 0.015 0.002

φ1

0.757

0.022

0.733

φ2







κη

0.099

0.005

0.098

0.005









κη κ∗ 1 κ∗ 2 κ∗ 3 κ∗ 4 κ∗ 5 κ∗ 6 κ∗ 7 κ∗ 8 κ∗ 9 κ∗ 10 κ∗ 11 † γ1;1,0

— 0.001 -0.004 -0.240 — — — — — — — — 0.124

— 0.002 0.002 0.044 — — — — — — — — 0.058

— 0.003 -0.011 -0.008 0.013 -0.003 -0.005 -0.004 0.001 -0.002 0.002 0.013 0.045

— 0.009 0.004 0.004 0.008 0.003 0.002 0.002 0.003 0.005 0.005 0.007 0.178

γ2;1,0

-0.455

0.066

-0.521

γ3;1,0

-0.240

0.044

-0.223

γ4;1,0





γ5;1,0



γ6;1,0 γ7;1,0

IBM30s Model 4 Coef. S.E. 0.011 0.002

Model 1 Coef. S.E. 0.021 0.002

Model 3 Coef. S.E. 0.017 0.002

0.024

0.396

0.109

0.795

0.016

0.801

0.015

0.566

0.134



0.499

0.122









0.382

0.134

0.050

0.014

0.119

0.006

0.121

0.005

0.054

0.009

0.606

0.076









0.664

0.053

0.062 0.003 -0.008 -0.006 0.008 -0.002 -0.004 -0.002 0.001 -0.001 0.002 0.009 0.066

0.014 0.008 0.004 0.004 0.007 0.002 0.002 0.002 0.002 0.004 0.005 0.006 0.187

— 0.001 -0.007 -0.001 — — — — — — — — 0.150

— 0.003 0.002 0.002 — — — — — — — — 0.067

— 0.001 -0.013 -0.005 0.015 -0.002 -0.005 -0.004 0.002 0.003 -0.002 0.011 0.051

— 0.008 0.004 0.005 0.009 0.003 0.002 0.002 0.003 0.006 0.005 0.006 0.181

0.088 0.002 -0.006 -0.003 0.005 -0.001 -0.003 -0.001 0.001 0.001 0.000 0.005 0.087

0.010 0.006 0.003 0.004 0.005 0.002 0.002 0.002 0.002 0.004 0.004 0.004 0.177

0.123

-0.508

0.132

-0.518

0.080

-0.570

0.157

-0.527

0.151

0.117

-0.218

0.125

-0.232

0.050

-0.193

0.128

-0.203

0.133

0.838

0.237

0.807

0.207





0.868

0.288

0.796

0.194



-0.011

0.064

-0.004

0.068





0.008

0.072

0.027

0.074





-0.513

0.058

-0.502

0.065





-0.539

0.066

-0.532

0.075





-0.274

0.058

-0.250

0.061





-0.275

0.065

-0.233

0.065

γ8;1,0





0.632

0.082

0.645

0.083





0.660

0.099

0.679

0.094

γ9;1,0





0.371

0.135

0.348

0.134





0.445

0.182

0.382

0.160

γ10;1,0





-0.156

0.142

-0.145

0.144





-0.239

0.146

-0.181

0.142

γ11;1,0 δ ν ζ p

— 9.849 2.223 1.135 0.0006

— 0.180 0.033 0.042 0.0003

-0.100 9.786 2.228 1.135 0.0006

0.190 0.139 0.033 0.042 0.0003

-0.184 9.757 2.232 1.135 0.0006

0.183 0.133 0.033 0.042 0.0003

— 9.340 1.630 1.473 0.0047

— 0.214 0.016 0.042 0.0005

-0.072 9.219 1.631 1.475 0.0047

0.202 0.188 0.016 0.042 0.0005

-0.203 9.177 1.635 1.470 0.0047

0.190 0.174 0.016 0.042 0.0005

S.E. 0.014

κ∗ 31

Coef. -0.007

S.E. 0.006

γ11;1,0

κµ (1) (1)

(1) (2)

φ1

(2)

† † † † † † † †





Model 4 Coef. S.E. 0.009 0.002

IBM30s, —(13 knots) Coef. 0.014

S.E. 0.001

κ∗ 13

Coef. 0.024

0.761

0.018

κ∗ 14

-0.003

0.004

κ∗ 32

0.004

0.007

γ12;1,0





κ∗ 15

0.006

0.004

κ∗ 33

0.000

0.006

γ13;1,0

0.122

0.006

κ∗ 16

-0.001

0.003

κ∗ 34

-0.003

0.006

γ14;1,0





κ∗ 17

0.002

0.003

κ∗ 35

0.007

0.006

γ15;1,0





κ∗ 18

-0.006

0.003

κ∗ 36

0.000

0.006

κ∗ 1

-0.003

0.007

κ∗ 19

-0.002

0.003

κ∗ 37

0.007

κ∗ 2

0.002

0.007

κ∗ 20

0.000

0.003

κ∗ 38

κ∗ 3

0.002

0.007

κ∗ 21

0.000

0.003

κ∗ 4

-0.006

0.005

κ∗ 22

-0.006

κ∗ 5

-0.004

0.006

κ∗ 23

0.000

κ∗ 6

-0.017

0.005

κ∗ 24

0.003

κ∗ 7

-0.007

0.005

κ∗ 25

0.003

κ∗ 8

-0.008

0.005

κ∗ 26

κ∗ 9

-0.008

0.005

κ∗ 10

-0.001

κ∗ 11

-0.003

κ∗ 12

0.016

κµ (1)

φ1

(1)

φ2

(1)

κη

(2)

φ1

(2)

κη

Coef. 0.284

S.E. 0.132

γ29;1,0

0.086

0.335

γ30;1,0

0.733

0.460

γ31;1,0

0.619

0.087

γ32;1,0

0.435

0.149

γ33;1,0

γ16;1,0

-0.034

0.074

0.007

γ17;1,0

-0.106

-0.005

0.005

γ18;1,0

γ1;1,0

0.715

0.189

0.003

γ2;1,0

0.434

0.003

γ3;1,0

0.054

0.003

γ4;1,0

0.004

γ5;1,0

0.003

0.004

κ∗ 27

0.008

0.006

κ∗ 28

0.007

κ∗ 29

0.012

κ∗ 30









Coef. 0.477

S.E. 0.220

0.359

0.137

-0.013

0.165

-0.491

0.168

-0.341

0.138

γ34;1,0

-0.591

0.149

0.098

γ35;1,0

-0.238

0.199

-0.354

0.099

γ36;1,0

-0.419

0.161

γ19;1,0

-0.439

0.069

γ37;1,0

0.048

0.194

0.176

γ20;1,0

-0.544

0.072

γ38;1,0

-0.023

0.171

0.171

γ21;1,0

-0.551

0.073

δ

9.062

0.220

-0.380

0.165

γ22;1,0

-0.388

0.110

ν

1.636

0.016

-0.213

0.149

γ23;1,0

-0.177

0.073

ζ

1.477

0.043

γ6;1,0

-0.460

0.273

γ24;1,0

0.123

0.105

p

0.005

0.000

0.009

γ7;1,0

-0.587

0.157

γ25;1,0

0.397

0.105

-0.007

0.006

γ8;1,0

-0.530

0.166

γ26;1,0

0.858

0.105

0.008

0.008

γ9;1,0

-0.391

0.168

γ27;1,0

0.538

0.221

-0.002

0.006

γ10;1,0

-0.184

0.133

γ28;1,0

0.264

0.170



† †































































Table 8 – Estimated coefficients of other Models in Table 2. Standard errors are computed numerically. See Table 2 for the model specifications.

46

Suggest Documents