Communications in Statistics - Theory and Methods
ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20
A characteristic function-based approach to approximate maximum likelihood estimation M. Bee & L. Trapin To cite this article: M. Bee & L. Trapin (2018) A characteristic function-based approach to approximate maximum likelihood estimation, Communications in Statistics - Theory and Methods, 47:13, 3138-3160, DOI: 10.1080/03610926.2017.1348523 To link to this article: https://doi.org/10.1080/03610926.2017.1348523
View supplementary material
Accepted author version posted online: 19 Jul 2017. Published online: 29 Aug 2017. Submit your article to this journal
Article views: 108
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lsta20
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS , VOL. , NO. , – https://doi.org/./..
A characteristic function-based approach to approximate maximum likelihood estimation M. Bee
a
and L. Trapinb
a Department of Economics and Management, University of Trento, Trento, Italy; b Scuola Normale Superiore, Pisa, Italy
ABSTRACT
ARTICLE HISTORY
The choice of the summary statistics in approximate maximum likelihood is often a crucial issue. We develop a criterion for choosing the most effective summary statistic and then focus on the empirical characteristic function. In the iid setting, the approximating posterior distribution converges to the approximate distribution of the parameters conditional upon the empirical characteristic function. Simulation experiments suggest that the method is often preferable to numerical maximum likelihood. In a time-series framework, no optimality result can be proved, but the simulations indicate that the method is effective in small samples.
Received March Accepted June KEYWORDS
Characteristic function; Intractable likelihood; κ-Nearest neighbor entropy; Summary statistics. MATHEMATICS SUBJECT CLASSIFICATION
F; F
1. Introduction Approximate maximum likelihood estimation (AMLE) is a simulation-based method developed by Rubio and Johansen (2013) for approximating maximum likelihood estimators (MLEs) without using the likelihood function. It is most commonly applied to models with intractable likelihood, which, broadly speaking, can be classified into two main categories. The first one contains distributions whose likelihood is not known in closed form, such as the Tukey’s gh and the class of stable distributions; the second one comprises models for which the computation of the likelihood is too expensive, typically because the density includes an intractable normalizing constant. Instances of the latter category are many conditionally specified spatial models (see, e.g., several distributions for spherical data used in directional statistics) . These two cases are, however, not exhaustive, as there are many other examples of models with intractable likelihoods, such as stochastic processes with non standard dependence structure or some other kind of complexity. As AMLE bears a close resemblance to approximate Bayesian computation (ABC), a key role is played by the summary statistics. When the sufficient statistics are known, they are the best summary statistics, in the sense that the posterior density of the parameters given the sufficient statistics can be made arbitrarily close to the likelihood. This is generally not true when conditioning with respect to other summary statistics. In many interesting frameworks, no low-dimensional sufficient statistics are available. The use of non sufficient summary statistics impacts the properties of the estimators, and both the selection of a set of candidate summary statistics and the choice of the best one are non trivial tasks. Ideally, one would like to have a method for picking a summary statistic with the CONTACT M. Bee
[email protected]
© Taylor & Francis Group, LLC
Department of Economics and Management, University of Trento, Trento, Italy.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3139
property that the corresponding AMLE is more efficient than the AMLEs obtained with other summary statistics. There are at least two problems with this strategy. First, the set of candidate summary statistics may be completely different across distributions, because of the features of the models, the existence of certain statistics, the computational difficulties associated to their evaluation, and so on. For example, high-order moments do not always exist and therefore cannot be considered a general choice. Second, even assuming to be able to select a set of statistics that exist, are meaningful, and easy to compute in general, there is no way of ranking their performance a priori. In this article, we try to address these issues, thus providing a twofold contribution. First, we propose to rank the summary statistics according to the average squared error of the posterior distribution. The technique, originally introduced in an ABC setting by Nunes and Balding (2010), is based on the minimization of the k-nearest neighbor entropy of the posterior sample. We develop a modified version of their procedure that can also be used with infinite-dimensional summary statistics such as the continuous empirical characteristic function (chf). Second, we focus on the use of the empirical chf as summary statistic. The motivation of this choice is related to two results: the empirical chf converges to its theoretical counterpart, which always exists and whose informational content is equivalent to the likelihood, so that this solution is completely general, at least in univariate models; furthermore, the estimation methods based on the minimization of some distance measure between the theoretical and empirical chfs are ML efficient (under an appropriate choice of the weighting function). Moreover, we derive a closed-form expression of the distance between the observed and simulated chf. The latter feature is essential: it dramatically eases the computational burden and the continuous empirical chf has more desirable theoretical properties and is easier to implement than the discrete counterpart. The estimation method proposed in this article may be a viable option also when the chf and the likelihood are explicitly available, but the MLEs cannot be obtained in closed form, so that parameter estimation is carried out via numerical minimization of the distance between the theoretical and empirical chfs or numerical maximization of the likelihood. In such a situation, when the number of parameters gets large, the relative efficiency of the optimization routines commonly used for solving either problem with respect to simulation-based approaches is likely to deteriorate quickly, and the computational burden of standard numerical algorithms increases sharply. The remainder of this article is organized as follows. Section 2 reviews AMLE and details the technique proposed for ranking the performances of a set of summary statistics. Section 3 shows how AMLE can be implemented using the chf as summary statistic and proves its properties. Section 4 contains simulation results for both the iid and the time-series setup. Section 5 discusses the results. Finally, the appendix contains the proof of the main theoretical result and some supplementary simulation outcomes.
2. Approximate maximum likelihood The general theory of AMLE is due to Rubio and Johansen (2013) , whereas applications to the autologistic model, the Tukey’s gh and the Bingham distribution have respectively been developed by Bee, Espa, and Giuliani (2015), Bee and Trapin (2016), and Bee, Benedetti, and Espa (2017).
3140
M. BEE AND L. TRAPIN
Let x = (x1 , . . . , xn ) and L(θ; x1 , . . . , xn ) be, respectively, a sample from some distribution F (x; θ) with parameter vector θ ∈ ⊂ Rd and the likelihood function, where xi ∈ Rq . If π (θ) denotes the prior distribution of θ, π (θ|x) is the posterior: f (x|θ)π (θ) . f (x|t )π (t )dt
π (θ|x) =
(1)
Consider approximating the likelihood function as follows: K (x|z) f (z|θ)dz, fˆ (x|θ) = Rq×n
where K (x|z) is a normalized Markov kernel and is a scale parameter. If we substitute f (x|θ) with fˆ (x|θ) in (1), we obtain an approximation of the posterior: πˆ (θ|x) =
fˆ (x|θ)π (θ) . fˆ (x|t )π (t )dt
If the prior is uniform on a suitable set D ⊂ Rd , maximizing the likelihood and maximizing the posterior gives the same result. Working with the entire sample x is typically impractical, so that in most cases the kernel K is defined on the space of some summary statistic η : Rq×n → Rl : 1 ρ(η(x), η(z)) < , ρ K (η(x)|η(z)) ∝ 0 otherwise, where ρ : Rl × Rl → R+ is a metric. In this setup, Rubio and Johansen (2013) prove that lim→0 πˆ (θ|η(x)) = π (θ|η(x)). If η is a jointly sufficient statistic for the unknown parameters of the model, the use of a summary statistic η(x) implies no loss of information with respect to the original sample, that is, conditioning upon the sufficient statistics is the same as conditioning upon the sample. Thus, it is highly recommended to use the sufficient statistics of the model, if available. An high-level description of AMLE is given in Algorithm 1. Algorithm 1. (AMLE) 1. Simulate θ ∗ from the prior distribution π (θ) = di=1 π (θi ), where π (θi ) is U (θiL , θiU ); 2. Generate x∗ = (x∗1 , . . . , x∗n ) from f (·|θ ∗ ), where f is the density of interest; 3. Use x∗ to compute the summary statistics η(x∗ ); accept θ ∗ with probability ∝ Kρ (η(x∗ )|η(x)), otherwise return to Step 1. 4. Repeat steps 1–3 until m vectors of simulated parameter values θ ∗ = (θ ∗,1 , . . . , θ ∗,m ) from the approximate posterior πˆ (θ|x) are accepted; θ ∗ is the ABC sample. 5. Use θ ∗ to find a non parametric estimator φˆ of the density πˆ (θ|x); ˆ θ˜ m, . This is an approximation of the MLE. 6. Compute the maximum of φ, There are various possibilities of carrying out steps 5 and 6. The first one, explicitly mentioned in the description above, resorts to kernel density estimation. In the d-parameter case, it is preferable to compute θ˜ m, as the maximum of the joint d-dimensional kernel density, estimated using the m s-dimensional accepted vectors. Alternatively, a “quick and dirty” approximation to θ˜ m, is the vector of the maxima of the d univariate kernel densities fitted to the ˆ This is not necessarily the joint maximum, since it corresponds to a mulmarginals of φ. tivariate kernel density estimator with diagonal bandwith matrix, which is not a consistent estimator. Thus, maximizing the marginal densities ignores the dependence structure, which may have a non negligible effect on the location of the maximum. An even simpler approach
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3141
computes θ˜ m, as the sample mean of the m accepted observations. The last two approaches may produce good approximations for models with asymptotically normal, or at least symmetric, posterior distributions, but this requirement is often difficult to check in models with intractable likelihoods. However, given that large sample sizes are necessary for reliable multivariate kernel density estimation, unless d is small (no larger than 3, say), in most multiparameter cases the second strategy is preferable. A strongly consistent method for estimating a multivariate mode is proposed by Abraham, Biau, and Cadre (2003). Instead of choosing , for which it is difficult to guess initially an appropriate numerical value, as it depends on the distribution of the summary statistics, we fix in advance the total number n p of simulated candidate parameter values and a fraction u ∈ [0, 1]. We then include in the ABC sample the m simulated values with the m smallest distances between the observed and simulated summary statistics, so that m = u · n p ; see Sousa, Fritz, Beaumont, and Chikhi (2009). The properties of the estimators depend on the summary statistics; if they are sufficient, when the distance between η(x) and η(x∗ ) goes to zero, πˆ (θ|η) converges to the posterior and, under slightly stronger conditions, the mode of πˆ (θ|η) converges to the MLE. When the summary statistics are not sufficient but satisfy certain mild conditions (Rubio and Johansen, 2013, p. 1641), a similar result holds, but only as the sample size goes to infinity: when n → ∞, the AMLE converges almost surely to the true parameter value if a sufficiently small distance is used. While this guarantees that the procedure produces asymptotically meaningful estimators also with non sufficient statistics, the performance in finite samples may vary widely, and it is therefore useful to develop a general technique for finding the summary statistics that produce the best performance. 2.1. Choosing the summary statistics This section aims at finding some criterion for ranking the summary statistics on the basis of the features of the posterior distribution. If such a ranking can be found, the best summary statistic is then used in the AMLE algorithm for the estimation of the parameters. In the ABC literature, i.e., in a fully Bayesian framework, this issue has been addressed in various ways: see Blum et al. (2013) and Nunes and Prangle (2015). Even though, in principle, they can be applied to AMLE as well, most of them are not appropriate in the present setup. In particular, the semiautomatic ABC approach of Fearnhead and Prangle (2012) cannot be used with a complex-valued summary statistic such as the empirical chf1 and the method based on approximate sufficiency proposed by Joyce and Marjoram (2008) is limited to singleparameter inference. Given these constraints, we devise a criterion based on the minimization of the κ-nearest neighbor entropy Eˆκ of the posterior sample (Nunes and Balding, 2010), which is equal to m d π d/2 ˆ − ψ (κ ) + log(m) + log(di,κ ), (2) Eκ = log (d/2 + 1) n i=1 where (·) and ψ (·) are, respectively, the Gamma and digamma function, and di,κ is the Euclidean distance from θ ∗i (i = 1, . . . , m) to its κth closest neighbor in the ABC sample. The interpretation is that posterior samples with low values of Eˆκ are more concentrated and
This remark applies to both the discrete and the continuous case; the latter has the additional problem that the number of summary statistics is infinite, because r ∈ R.
3142
M. BEE AND L. TRAPIN
thus contain more information. This approach was first proposed in ABC and is particularly appealing when the prior is diffuse (Nunes and Prangle, 2015). Nunes and Balding (2010) propose two ways of selecting the summary statistics via the κ-nearest neighbor entropy measure Eˆκ . A first possibility simply uses Eˆκ to construct a ranking, so that the best summary statistic is the one that minimizes (2). However, such a ranking may not be equivalent to the root-mean-squared-error (RMSE)-based ranking. Hence, a more sophisticated approach adds a step aimed at finding the summary statistic that minimizes the squared root of the sum of squared errors (RSSE) of the posterior. The minimum-entropy criterion is first used to identify simulated datasets “close” to the observed one, and then each of these datasets is treated as if it were the observed dataset, and the mean root integrated squared error of the ABC posterior approximation is minimized over sets of summary statistics. In a real-data setup, this second procedure is not feasible because the true parameter values are unknown, but Nunes and Balding (2010) show that a mean RSSE (MRSSE) can be computed over simulated datasets that are close to the observed one in terms of the Euclidean distance between the observed and simulated values of the summary statistics.2 These ranking procedures are barred in the present framework, because when the summary statistics are the values of the continuous chf, there is an infinite (uncountable) number of summary statistics. Hence, we develop an alternative approach which uses the entropy of the simulated datasets instead of the values of the summary statistics. A pseudo-code is as follows: sim 1. Compute Eˆκobs and Eˆκ, j ( j = 1, . . . , m), namely the κ-nearest neighbor entropy of the observed and of each of the m simulated datasets. sim 2. Select the n˜ datasets with the smallest Euclidean distance |Eˆκobs − Eˆκ, j |. 3. For each dataset selected at Step 2, compute the RSSE by means of the leave-one-out technique. 4. Compute the mean of these RSSEs, i.e., the MRSSE. This algorithm is repeated for each summary statistic, and the summary statistic with the smallest MRSSE is eventually chosen. In a simulation experiment, the implementation of the algorithm is simplified, in that the computation of the RSSE no longer requires the leave-oneout method, and Step 3 is modified as follows:
3’. For each dataset selected at Step 2 compute RSSEi = m1 mj=1 θ ∗j,i − θ , where θ ∗j,i ˜ is the jth accepted parameter vector of the ith dataset selected at Step 2, i = 1, . . . , n, and θ is the true parameter vector.
3. Characteristic function-based AMLE Since the pioneering contribution of Parzen (1962), many authors have considered the chf for estimation purposes, first in the iid case and then in non iid settings. See Yu (2004) for references and for a review of both approaches. Given a random sample x = (x1 , . . . , xn ) from a distribution f (·; θ) with θ ∈ ⊂ Rd , the empirical chf estimation method is based on the minimization of Q˜ n (θ) =
|ψn (r; x) − ψθ (r)|2 dW (r),
(3)
It is worth adding that, instead of the mean, one may use the median or the mode of the RSSE distribution. If the distribution is not symmetric, the outcomes may be significantly different.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3143
where ψn (r; x) = (1/n) ni=1 exp{irxi } and ψθ (r) = E(exp{irX}) are, respectively, the empirical and theoretical chfs and W is a weighting function. The empirical chf estimator is the parameter vector that minimizes (3), denoted as θˆECF = arg minθ∈ Q˜ n (θ). The properties of θˆ ECF when the observations are iid are different in the discrete and the continuous case. In the former, the empirical chf method is sometimes as efficient as the generalized method of moments (depending on the identification of the system), but always less efficient than MLE (Yu, 2004). In the continuous case, ML efficiency can be achieved provided that the optimal weighting function w∗ (r) = (2π )−1 exp{−irx}(∂ log( fθ (x))/∂θ )dx is used (Feuerverger and McDunnough, 1981a). Considering that w∗ (r) is unknown whenever the likelihood is not available in closed form, Carrasco and Florens (2002) work out a different technique that guarantees asymptotic ML efficiency requiring only the knowledge of the theoretical chf. The method can also be used in non iid settings. The empirical chf is based on blocks of overlapping observations: if a stationary time series of length T is available, the blocks are defined as Z j = (X j , . . . , X j+p ) , j = 1, . . . , T − p. Thus, p + 1 is the length of each block and two adjacent blocks have p overlapping observations. The joint empirical chf is given by T −p
ψn (r; x) =
1 ir Z j e . T − p j=1
From this point on, the method works exactly as in the iid case. Asymptotic ML efficiency results have been developed for Markov processes (Carrasco and Florens, 2002), but not for non Markov processes. 3.1. AMLE and characteristic function For simplicity, in the rest of the article, we will call the approximate characteristic function estimation (ACFE) the version of AMLE based on the empirical chf as summary statistics. The pseudo-code for ACFE is given below. Algorithm 2. (ACFE) 1.-2. Same as Algorithm 1; 3. Use x∗ to compute the empirical chf ψn (r; x∗ ); accept θ ∗ with probability ∝ Kρ (ψn (r; x∗ )|ψn (r; x)), otherwise return to Step 1. 4.-6. Same as Algorithm 1. From the algorithmic point of view, the only difference between AMLE and ACFE is Step 3. Hence, the crucial issue is the evaluation of the function (4) Qn (x∗ , x) = |ψn (r; x∗ ) − ψn (r; x)|2 dW (r), where ψn (r; x∗ ) and ψn (r; x) are the empirical chfs computed, respectively, with the simulated and observed data. Note that Qn (x∗ , x) is the distance between the simulated and observed summary statistics. Unfortunately, the optimal weight obtained by either Feuerverger and McDunnough (1981b) or Carrasco and Florens (2002) cannot, in general, be used in AMLE, as the former requires the knowledge of the likelihood, while for computing the latter one needs to know the theoretical chf. In the discrete case, the integral reduces to a sum and W is a step function, so that computing (4) is trivial, as the investigator only has to choose the grid, but the estimator is not ML
3144
M. BEE AND L. TRAPIN
efficient. The discrete empirical chf has already been used in AMLE by Rubio and Johansen (2013). In the continuous case, one has dW (r) = w(r)dr in (4). When w(r) is exponential, i.e., 2 w(r) = e−cr , the following proposition shows that the integral (4) can be computed in closed form; this result is key to obtain a computationally feasible algorithm. Proposition 1. Let x and x∗ be two random samples
of size n from some distribution F, w(r) = 2 e−cr a weighting function, and ψn (r; x) = (1/n) ni=1 exp{irXi } the empirical chf. The integral (4) is given by 2 n n irx∗j ∗ irx j −cr2 Qn (x , x) = e − e e dr R j=1 j=1 ⎛ ⎞ n n n n n n 1 π ⎝ − 4c1 (xi∗ −x∗j )2 − 1 (xi −x j )2 1 ∗ 2 e + e 4c −2 e− 4c (xi −x j ) ⎠ . = 2 n c i=1 j=1 i=1 j=1 i=1 j=1 Proof. See the supplementary material.
Remark. Given a summary statistic η and a random sample x = (x1 . . . , xn ) , Rubio and Johansen (2013, Proposition 1 and Remark 2) prove that lim πˆ (θ|η(x)) = π (θ|η(x)).
→0
(5)
Denoting with πˆ (θ|ψn (r; x)), r ∈ R, the approximation of the posterior obtained by means of AMLE based on the continuous empirical chf evaluated on a fixed grid, (5) shows that lim→0 πˆ (θ|ψn (r; x)) = π (θ|ψn (r; x)).
4. Applications 4.1. Tukey’s gh distribution AMLE estimation of the Tukey’s gh random variable X (a, b, g, h) has recently been considered by Bee and Trapin (2016). They use various sets of quantiles as summary statistics and find that the best one is the set of all the quantiles from 0.025 to 0.975 with increments by 0.025 (called Q.5 in their paper). Accordingly, we carry out discrete and continuous ACFE, as well as AMLE with the set of quantiles Q.5. The true parameter values are a = 10, b = 2, g = 0.2, h = 0.2. The outcomes obtained with the entropy criterion are displayed in Table A1, while the first two panels of Figure 1 show the bias and RMSE of the discrete and continuous ACFEs (respectively denoted by θˆiD and θˆiC , where θi ∈ (a, b, g, h)) as well as of the quantile-based AMLE (θˆiQ ). The bottom panel displays the relative performance of the two ACFEs with respect to the quantile-based estimator, defined as RMSE(θˆiD )/RMSE(θˆiQ ) and RMSE(θˆiC )/RMSE(θˆiQ ), respectively, for the discrete and continuous ACFE. We use a sample of n = 200 observations from the gh distribution and carry out a simulation experiment with B = 100 replications. From Table A1, it is clear that the performance of the discrete ACFE is generally poor, and the two best estimators are based on the continuous chf and on the quantiles. Accordingly, for estimation, we only employ the best discrete ACFE, where r = −2, −1.9, . . . , 1.9, 2. Figure 1 supports the evidence of Table A1 and shows that θˆiC performs better than θˆiQ , in particular for a and h. Figure A5 confirms that hˆ C outperforms hˆ D and hˆ Q .
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3145
Figure . gh distribution. The first two panels, respectively, show the bias and the RMSE of θˆiD , θˆiC , and θˆiQ (i = 1, . . . , 4), where θˆiQ is the AMLE estimator based on the quantiles. The third panel displays the relative performance of the two ACFEs with respect to the quantile-based estimator. The true parameters are: a = 10, b = 2, g = 0.2, h = 0.2, m = 100, u = 1%, c = 1, a∗ ∼ U(9, 11), b∗ ∼ U(1, 3), g∗ ∼ U(−0.1, 0.5), h∗ ∼ U(−0.1, 0.5), and the discrete approach uses r = −2, −1.9, . . . , 1.9, 2. The observed sample size is n = 200 and the number of replications is B = 100.
The outcomes in Table A1 use the ACFE approach with c = 1; the results with c = 2, reported in Table A2, are very similar. The computing times are respectively 25 and 7.5 seconds for the continuous ACFE and for the quantile-based AMLE. 4.2. Stable distributions MLE of stable distributions is notoriously difficult, mainly because the density is generally not available. Numerical maximization of an approximation of the likelihood has been developed by Nolan (2001); a Bayesian approach has been proposed by Peters, Sisson, and Fan (2012), and discrete ACFE is illustrated in Rubio and Johansen (2013). In the last two cases, the discrete chf is used, and the specification of the grid for r seems to be rather arbitrary: Peters, Sisson, and Fan (2012) employ r = −5, −4.5, . . . , 4.5, 5, whereas Rubio and Johansen (2013) choose r = −250, −200, −100, −50, −10, 10, 50, 100, 200, 250. We compare the discrete and continuous ACFE method to the numerical MLE method described by Nolan (2001) and implemented in the stableEstim R package (Kharrat
3146
M. BEE AND L. TRAPIN
and Boshnakov, 2016). The experiment consists in simulating B = 100 times n = 200 observations from the St(1.9, 0.3, 1, 0) distribution. ACFE uses m = 100, n˜ = 10, u = 1%, c = 1, α ∗ ∼ U (0.65, 2), β ∗ ∼ U (−1, 1), γ ∗ ∼ U (0.8, 1.15), δ ∗ ∼ U (−0.35, 0.35), S1 = ψn (r1 ), S2 = ψn (r2 ), S3 = ψn (r3 ), S4 = ψn (r4 ), where r1 = −400, −300, −200, −150, −100, −50, −10, 10, 50, 100, 150, 200, 300, 400, r2 = −150, −100, −50, −30, −10, 10, 30, 50, 100, 150, r3 = −250, −200, −150, −100, −75, −50, −25, −10, 10, 25, 50, 75, 100, 150, 200, 250, r4 = −5, −4.5, . . . , 4.5, 5 and r5 ∈ R. As in the preceding examples, we first compute Eˆ4 . Table A3 suggests that the continuous ACFE outperforms all the discrete-based implementations. Given this evidence, we use the continuous ACFE and the numerical MLE for estimation, and Figure 2 shows the bias, the RMSE, and the relative performance of the two estimators. Similar to the gh example, continuous ACFE gives better results. Finally, the computing times are almost identical: a single replication of continuous ACFE (MLE) requires 152 (138) seconds.
Figure . Stable distribution. The first two panels show, respectively, the bias and the RMSE of θˆiC and θˆiMLE (i = 1, . . . , 4). The third panel displays the relative performance of the chf-based estimators with respect to the MLE. The true parameters are: α = 1.9, β = 0.3, γ = 1, δ = 0, m = 100, f = 1%, c = 1, α ∗ ∼ U(1, 2), β ∗ ∼ U(−0.3, 0.8), γ ∗ ∼ U(0.8, 1.15), δ ∗ ∼ U(−0.2, 0.2). The sample size is n = 200 and the number of replications is B = 100.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3147
4.3. Stable Paretian GARCH Our last example deals with a GARCH model with disturbances distributed as stable random variables (Bidarkota and McCulloch, 2004; Curto, Pinto, and Tavares, 2009). As pointed out in Section 3, even the chf estimation method is in general not ML efficient in a non iid framework such as the one considered in the present example. Hence, ACFE is expected to be less efficient than MLE in large samples, but may still be better in small samples. The data-generating process is Xt = μ +
m
ai Xt−i +
i=1
n
b j t− j + t ,
j=1 iid
t = σt zt ,
zt ∼ St (α, β ), p q δ δ δ αi (t−i − i|t−i |) + β j σt− σt = ω + j, i=1
j=1
where St (α, β ) is the stable density with index α ∈ (0, 2], skewness parameter β ∈ [−1, 1], zero location parameter, and unit scale parameter.3 We restrict ourselves to the case m = n = 0, p = q = 1, and δ = 1. Numerical MLE has been developed by Curto, Pinto, and Tavares (2009) and implemented in the gsFit function of the GEVStableGarch R package (do Rego Sousa, Otiniano, and Lopes, 2015). For this model, we use the ACFE approach described in Section 3 with p = 4. Yu (2004, p. 102) notes that the block length should be determined according to the amount of dependence: if the process can be well approximated by a Markov process of order l, then p = l should work well, but in general a larger p is statistically more efficient but computationally less efficient. We have carried out the same experiment also with p = 2 and p = 8: with the latter value, the results are very similar, whereas with the former the ACFE of α is worse (Figure A6), so that we stick to the choice p = 4. The gsFit function uses starting values α1(0) = 0.1 and β1(0) = 0.8 and the restriction 1 ≤ α ≤ 1.99. In order to carry out a fair comparison, the intervals used for α1 and β1 are, respectively, centered at 0.1 and 0.8. The full set of ranges used is as follows: μ∗ ∼ U [μ(0) − 0.1, μ(0) + 0.1], ω∗ ∼ U [0, ω (0) + 0.02], α1∗ ∼ U [0, 0.2], β1 ∼ U [0.6, 1], α ∗ ∼ U [1, 2], β ∗ ∼ U [−1, 0]. μ(0) and ω (0) are obtained by fitting to the data an ARMA(0, 0)−GARCH(1, 1) model with a constant. By fitting a stable distribution to the GARCH standardized residuals, we obtain a starting value β (0) for β close to −1. The rather wide interval [−1, 0] employed for simulating β ∗ is motivated by the fact that the estimate based on the standardized residuals has a large standard error. gsFit computes the numerical MLE under the stationarity restriction λα,β α1 + β1 < 1, where λα,β is a constant that depends on α and β (Rachev and Mittnik, 2000). However, if we enforce this condition in AMLE, the marginal distributions of the simulated values of the parameters are neither uniform nor independent. Thus, we estimate a (possibly non stationary) stable Paretian GARCH, and check afterward whether candidate parameter values implying a non stationary model are rejected. We use n p = 20, 000, u = 1%, and a sample size n = 100. The ACFEs are the modes of the univariate kernel densities. We consider three estimators: the discrete ACFE with r = 1 (θˆiD , where θi ∈ {μ, ω, α1 , β1 , α, β}), the continuous ACFE (θˆiC ), and the MLE (θˆiMLE ) computed
We use the so-called S parameterization introduced by Samorodnitsky and Taqqu () and Rachev and Mittnik ().
3148
M. BEE AND L. TRAPIN
numerically via the gsFit function. Figure 3 shows the bias and the RMSE of the estimators as well as the relative performance of the two ACFEs with respect to the MLE. The continuous ACFE performs best for all parameters except μ. The difference between the discrete and continuous ACFEs is negligible, with the notable exception of the estimator of α: in this case, the RMSE of αˆ C is considerable smaller, and also smaller than the RMSE of αˆ MLE . The analysis has been repeated using the averages of the ABC samples instead of the modes of the univariate kernel densities. Figure A7 shows that the performance of the two ACFEs is worse with respect to the ACFEs based on the modes of the kernel densities. The difference is larger for α, probably because the distributions of the accepted values are quite skewed, as they have an upper bound at 2. The kernel densities of the simulated distributions of αˆ D , αˆ C , and αˆ MLE displayed in Figure A8 confirm the ranking outlined above. The last panel shows that many simulated values of the MLE estimator are very close to the theoretical upper bound α = 2: actually 35 of the 100 simulated estimates are equal to 1.99, i.e., the constraint set by the gsFit function. Uniform priors on larger intervals for some parameters do not produce significant differences with respect to the preceding case, as shown by Figure A9, where
Figure . Stable Paretian GARCH. The first two panels show, respectively, the bias and the RMSE of θˆiD , θˆiC , and θˆiMLE (i = 1, . . . , 6). The third panel displays the relative performance of the two AMLE estimators. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 100, u = 1%, c = 1, p = 4, μ∗ ∼ U[μ(0) − 0.1, μ(0) + 0.1], ω∗ ∼ U[0, ω (0) + 0.02], α1∗ ∼ U[0, 0.4], β1∗ ∼ U[0.6, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 0] and the discrete ACFE uses r = 1. The length of the series is n = 100 and the number of replications is B = 100.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3149
Figure . Stable Paretian GARCH. The first two panels show, respectively, the bias and the RMSE of θˆiC and θˆiMLE (i = 1, . . . , 6). The third panel displays the relative performance of the AMLE estimator. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 200, u = 0.4%, c = 1, p = 4, μ∗ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω∗ ∼ U[0, ω (0) + 0.2], α1∗ ∼ U[0, 1], β1∗ ∼ U[0, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 1]. The length of the series is n = 500 and the number of replications is B = 100.
we set μ ∼ U [μ(0) − 0.2, μ(0) + 0.2], ω ∼ U [0, ω (0) + 0.2], α1 ∼ U [0, 0.4], β1 ∼ U [0.4, 1], α ∼ U [1, 2], β ∼ U [−1, 1]. Further experiments have been carried out with n = 500 and using the MLE and the continuous ACFE, given that it outperforms its discrete counterpart; n p is equal to 50,000, with u ∈ {0.1%, 0.2%, 0.4%, 0.8%, 1%, 1.2%}. The outcomes shown in Figure 4 refer to the case u = 0.4%, i.e., the value giving the best results in terms of RMSE.4 The MLE is now mostly preferable, but the performance is similar for the parameters μ and α. Moreover, in 84 cases out of 100, the MLE estimator of β is equal to −0.99, i.e., the constraint used by the optimization algorithm. Finally, the RMSE of ACFE is larger for the GARCH parameters ω, α1 , and β1 . Even though the ACFE estimation procedure is unrestricted, the estimated parameters always satisfy the stationarity condition λα,β α1 + β1 < 1, both when n = 100 and n = 500. ACFE computing times are longer than MLE but acceptable: continuous ACFE (MLE) takes
The outcomes for the experiment with u = 1% are reported in Figure A.
3150
M. BEE AND L. TRAPIN
approximately 545 (127) seconds when n = 100 and n p = 20, 000, and 3315 (530) seconds with n = 500 and n p = 50, 000. Overall, the message of this section is that continuous ACFE outperforms numerical MLE when the sample size n is small, but not when n is larger. Given that MLEs are asymptotically optimal, whereas no ML efficiency result is known for approaches based on the joint chf in non iid settings, the latter result is not surprising.
5. Conclusion This article deals with approximate maximum likelihood in models where no sufficient statistic is available. Two difficulties are inherently related to such a setup: the choice of the summary statistics is not obvious, and there is no way of ranking a priori the performances of the various summary statistics. We propose to use the empirical chf as summary statistic. Building on a technique first proposed by Nunes and Balding (2010), we also develop a method for choosing the best summary statistics. This is particularly important in view of the fact that the empirical chf is often compared to ad hoc summary statistics that may be preferable in specific applications and for small sample sizes. From the applied point of view, the extension of the present approach to multivariate distributions requires further research. In the continuous case, this would amount to derive a result analogous to Proposition 1 for the multivariate case, whereas in the discrete case it is necessary to understand more precisely the impact of the values of the multidimensional grid that defines the points where the chf is evaluated.
Acknowledgments We thank an anonymous reviewer whose valuable and constructive comments considerably improved an earlier version of the article.
ORCID M. Bee
http://orcid.org/0000-0002-9579-3650
References Abraham, C., G. Biau, and B. Cadre. 2003. Simple estimation of the mode of a multivariate density. The Canadian Journal of Statistics 31:23–34. Bee, M., R. Benedetti, and G. Espa. 2017. Approximate maximum likelihood estimation of the Bingham distribution. Computational Statistics and Data Analysis 108:84–96. Bee, M., G. Espa, and D. Giuliani. 2015. Approximate maximum likelihood estimation of the autologistic model. Computational Statistics and Data Analysis 84:14–26. Bee, M., and L. Trapin. 2016. A simple approach to the estimation of Tukey’s gh distribution. Journal of Statistical Computation and Simulation 86:3287–302. Bidarkota, P. V., and J. H. McCulloch. 2004. Testing for persistence in stock returns with GARCH-stable shocks. Quantitative Finance 4:256–65. Blum, M. G. B., M. A. Nunes, D. Prangle, and S. A. Sisson. 2013. A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science 28:189–208. Carrasco, M., and J. Florens. 2002. Efficient GMM estimation using the empirical characteristic function. Technical report, Department of Economics, University of Rochester.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3151
Curto, J. D., J. C. Pinto, and G. N. Tavares. 2009. Modeling stock markets’ volatility using GARCH models with Normal, Student’s t and stable Paretian distributions. Statistical Papers 50:311–21. do Rego Sousa, T., C. E. G. Otiniano, and S. R. C. Lopes. 2015. GEVStableGarch. R package version 1.1. Fearnhead, P., and D. Prangle. 2012. Constructing summary statistics for approximate Bayesian computation: Semi-automatic ABC. Journal of the Royal Statistical Society B 74:419–74. Feuerverger, A., and P. McDunnough. 1981a. On efficient inference in symmetric stable laws and processes. In Statistics and related topics, ed. M. Csorgo, 109–122. New York: North-Holland. Feuerverger, A., and P. McDunnough. 1981b. On some Fourier methods for inference. Journal of the American Statistical Association 76:379–87. Joyce, P., and P. Marjoram. 2008. Approximately sufficient statistics and Bayesian computation. Statistical Applications in Genetics and Molecular Biology 7 (1):26. Kharrat, T., and G. N. Boshnakov. 2016. StableEstim. R package version 2.0. Nolan, J. P. 2001. Lévy processes: Theory and applications. In Barndorff-Nielsen, Ole E.; Mikosch, Thomas; Resnick, Sidney I. Chapter maximum likelihood estimation and diagnostics for stable distributions, 379–400. Boston: Birkhäuser. Nunes, M. A., and D. J. Balding. 2010. On optimal selection of summary statistics for approximate Bayesian computation. Statistical Applications in Genetics and Molecular Biology 9 (1):34. Nunes, M. A., and D. Prangle. 2015. abctools: An R package for tuning Approximate Bayesian Computation analyses. The R Journal 7:189–205. Parzen, E. 1962. On estimation of a probability density function and mode. Annals of Mathematical Statistics 33:1065–76. Peters, G. W., S. Sisson, and Y. Fan. 2012. Likelihood-free Bayesian inference for α-stable models. Computational Statistics & Data Analysis 56:3743–56. Rachev, S., and S. Mittnik. 2000. Stable Paretian models in finance. New York: Wiley. Rubio, F. J., and A. M. Johansen. 2013. A simple approach to maximum intractable likelihood estimation. Electronic Journal of Statistics 7:1632–54. Samorodnitsky, G., and M. S. Taqqu. 1994. Stable non-Gaussian random processes. Stochastic models with infinite variance. London: Chapman and Hall. Sousa, V. C., M. Fritz, M. A. Beaumont, and L. Chikhi. 2009. Approximate Bayesian computation without summary statistics: The case of admixture. Genetics 181:1507–19. Yu, J. 2004. Empirical characteristic function estimation and its applications. Econometric Reviews 23:93–123.
A. Proofs and supplementary results A.1. Proof of Proposition 1 2
When dW (r) = w(r)dr = e−cr dr, the integral (4) is given by 2 n n irx∗j ∗ irx j −cr2 Qn (x , x) = e − e e dr. R j=1 j=1 The square of the absolute value can be rewritten as 2 n n irx∗ irx j j e − e = A 2 + B2 , j=1 j=1 where 1 1 cos(rx∗j ) − cos(rx j ), n j=1 n j=1 n
A=
n
3152
M. BEE AND L. TRAPIN
1 1 B= sin(rx∗j ) − sin(rx j ). n j=1 n j=1 n
n
Accordingly, A2 + B2 can be decomposed as follows: ⎤2 ⎤2 ⎡ ⎡ n n 1 1 A 2 + B2 = 2 ⎣ cos(rx∗j )⎦ + 2 ⎣ cos(rx j )⎦ n n j=1 j=1 ⎤2 ⎤2 ⎡ ⎡ n n 1 1 + 2⎣ sin(rx∗j )⎦ + 2 ⎣ sin(rx j )⎦ n n j=1 j=1 −
n n 2 ∗ cos(rx ) cos(rx j ) j n2 j=1 j=1
−
(A1)
(A2)
(A3)
n n 2 ∗ sin(rx ) sin(rx j ). j n2 j=1 j=1
(A4)
Now we can compute separately the four integrals obtained by multiplying Equations (A1)– 2 (A4) by e−cr . We get: 2 n n irX ∗j ∗ irx j −cr2 e − e Qn (x , x) = e dr R j=1 j=1 Table A. gh distribution: -nearest neighbor entropy (Eˆ4 ), root sum of squared errors (RSSE), and mean RSSE (MRSSE) for the gh(a, b, g, h) distribution. The true parameters are: a = 10, b = 2, g = 0.2, h = 0.2, m = 100, n˜ = 10, u = 1%, c = 1, a∗ ∼ U(9, 11), b∗ ∼ U(1, 3), g∗ ∼ U(−0.1, 0.5), h∗ ∼ U(−0.1, 0.5), S1 = ψn (r1 ), S2 = ψn (r2 ), S3 = ψn (r3 ), S4 = ψn (r4 ), where r1 = −400 , −300, −200, −150, −100, −50, −10, , , , , , , , r2 = −10, −9, . . . , 9, 10, r3 = −2, −1.9, . . . , 1.9, 2, r4 ∈ R and S5 uses all the quantiles from . to .. The observed sample size is n = 200 and the number of replications is equal to .
Eˆ4 RSSE MRSSE
S1
S2
S3
S4
S5
. . .
. . .
. . .
− . . .
− . . .
Table A. -nearest neighbor entropy (Eˆ4 ), root sum of squared errors (RSSE), and mean RSSE (MRSSE) for the gh(a, b, g, h) distribution. The true parameters are: a = 10, b = 2, g = 0.2, h = 0.2, m = 100, n˜ = 10, u = 1%, c = 2, a∗ ∼ U(9, 11), b∗ ∼ U(1, 3), g∗ ∼ U(−0.1, 0.5), h∗ ∼ U(−0.1, 0.5), S1 = ψn (r1 ), S2 = ψn (r2 ), S3 = ψn (r3 ), S4 = ψn (r4 ), where r1 = −400, −300, −200, −150, −100, −50, −10, , , , , , , , r2 = −10, −9, . . . , 9, 10, r3 = −2, −1.9, . . . , 1.9, 2, r4 ∈ R and S5 uses all the quantiles from . to .. The sample size is n = 100 and the number of replications is B = 100. Eˆ4 RSSE MRSSE
S1
S2
S3
S4
S5
. . .
. . .
. . .
− . . .
− . . .
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
π − 4c1 (xi∗ −x∗j )2 e c i=1 j=1 n n 1 π − 1 (xi −x j )2 + 2 e 4c n c i=1 j=1 n n 2 2 − 2 cos(r(xi∗ − x j ))e−cr dr. n R i=1 j=1
1 = 2 n
n
3153
n
(A5)
The integral (A5) is given by n n n n π − (xi∗ −x j )2 ∗ −cr2 4c cos(r(xi − x j ))e dr = e , c i=1 j=1 R i=1 j=1 so that, summarizing, we have 2 n n ∗ irx j ∗ irx j −cr2 Qn (x , x) = e − e e dr R j=1 j=1 ⎞ ⎛ n n n n n n 1 π ⎝ − 4c1 (xi∗ −x∗j )2 − 1 (xi −x j )2 1 ∗ 2 e + e 4c −2 e− 4c (xi −x j ) ⎠ . = 2 n c i=1 j=1 i=1 j=1 i=1 j=1 A.2. gh distribution This section contains further outcomes of the simulation experiments referring to the gh distribution introduced in Section 4.2 of the paper. Tables A1 and A2 show the values of the 4-nearest neighbor entropy (Eˆ4 ), the root sum of squared errors (RSSE), and the mean RSSE (MRSSE) for the gh(a, b, g, h) distribution, respectively, with c = 1 and c = 2. Figure A5 displays the kernel density estimates of the distributions of the estimators of the parameter h obtained with the discrete and continuous ACFEs and with AMLE based on the quantiles from 0.025 to 0.975 in steps of 0.025. It is apparent that the distribution of hˆ C is more regular and more concentrated near the true value of h than those of hˆ D and hˆ Q . A.3. Stable distribution This section contains further outcomes of the simulation experiments referring to the stable distribution introduced in Section 4.3 of the paper. Table A3 shows the values of the 4-nearest neighbor entropy (Eˆ4 ), the RSSE, and the MRSSE for the St(1.9, 0.3, 1, 0) distribution. A.4. Stable Paretian GARCH This section contains the outcomes of simulation experiments based on the ARMA–GARCH stable Paretian model introduced in Section 4.4 of the article. Table A4 shows the values of the 4-nearest neighbor entropy (Eˆ4 ), the RSSEs, and the MRSSE for the stable Paretian GARCH process. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252.
3154
M. BEE AND L. TRAPIN
Figure A. gh distribution. Kernel densities of the simulated distributions of hˆ D (top), hˆ C (middle), and hˆ Q (bottom). The true parameters are: a = 10, b = 2, g = 0.2, h = 0.2, m = 100, n˜ = 10, u = 1%, c = 1, a∗ ∼ U(9, 11), b∗ ∼ U(1, 3), g∗ ∼ U(−0.1, 0.5), h∗ ∼ U(−0.1, 0.5), the discrete ACFE uses r = −2, −1.9, . . . , 1.9, 2. The sample size is n = 100 and the number of replications is B = 100. Table A. Stable distribution: -nearest neighbor entropy (Eˆ4 ), root sum of squared errors (RSSE), and mean RSSE (MRSSE) for the St(α, β, γ , δ) distribution. The parameters are: α = 1.9, β = 0.3, γ = 1, δ = 0, m = 100, n˜ = 10, u = 1%, c = 1, α ∗ ∼ U(0.65, 2), β ∗ ∼ U(−1, 1), γ ∗ ∼ U(0.8, 1.15), δ ∗ ∼ U(−0.35, 0.35), S1 = ψn (r1 ), S2 = ψn (r2 ), S3 = ψn (r3 ), S4 = ψn (r4 ), where r1 = −400, −300, −200, −150, −100, −50, −10, , , , , , , , r2 = −150, −100, −50, −30, −10, , , , , , r3 = −250, −200, −150, −100, −75, −50, −25, −10, , , , , , , , , r4 = −5, −4.5, . . . , 4.5, 5 and r5 ∈ R. The observed sample size is n = 200 and the number of replications is B = 100. Eˆ4 RSSE MRSSE
S1
S2
S3
S4
S5
− . . .
− . . .
− . . .
− . . .
− . . .
Figure A6 displays the bias, RMSE, and relative performance for the case where n = 100 and p = 2. Figure A7 displays the bias, RMSE, and relative performance for the case where n = 100, p = 4, and the averages of the ABC samples are used to find the ACFEs. It is clear that, with
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3155
Table A. Stable Paretian GARCH. -nearest neighbor entropy (Eˆ4 ), root sum of squared errors (RSSE), and mean RSSE (MRSSE) for the ARMA–GARCH stable model. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 100, n˜ = 10, u = 1%, c = 1, p = 4, μ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω ∼ U[0, ω (0) + 0.2], α1 ∼ U[0, 0.2], β1 ∼ U[0.6, 1], α ∼ U[1, 2], β ∼ U[−1, 0]. S1 is the discrete empirical chf with r = 1 and S2 is the continuous empirical chf. The length of the series is n = 100 and the number of replications is B = 100. Eˆ4 RSSE MRSSE
S1
S2
− . . .
− . . .
respect to the estimators based on the modes of the univariate kernel densities, the performance is worse. Figure A8 shows the kernel densities of the simulated distributions of the three estimators of α obtained with the discrete empirical chf method (αˆ D ), the continuous empirical chf method (αˆ C ), and the numerical MLE (αˆ MLE ). Also in this case αˆ C is more regular and stable than αˆ C and αˆ MLE . Figure A9 displays, for the case of larger ranges of the uniform priors, the bias and the RMSE of αˆ D , αˆ C , and αˆ MLE , as well as the relative performance of the two ACFEs with respect to the MLE. Figure A10 displays the bias, RMSE, and relative performance for the case where the observed time series has length n = 500 and the sample fraction is equal to 1%.
3156
M. BEE AND L. TRAPIN
Figure A. Stable Paretian GARCH. The first two panels show, respectively, the bias and the RMSE of θˆiC and θˆiMLE (i = 1, . . . , 6). The third panel displays the relative performance. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 200, n˜ = 10, u = 1%, c = 1, p = 2, μ∗ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω∗ ∼ U[0, ω (0) + 0.2], α1∗ ∼ U[0, 0.4], β1∗ ∼ U[0.6, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 0], the discrete ACFE uses r = 1. The length of the simulated series is n = 100 and the number of replications is B = 100.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3157
Figure A. Stable Paretian GARCH. The first two panels show, respectively, the bias and the RMSE of θˆiC and θˆiMLE (i = 1, . . . , 6). The third panel displays the relative performance. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 200, n˜ = 10, u = 1%, c = 1, p = 4, μ∗ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω∗ ∼ U[0, ω (0) + 0.2], α1∗ ∼ U[0, 0.4], β1∗ ∼ U[0.6, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 0]. The length of the simulated series is n = 100 and the number of replications is B = 100. The ACFEs are the averages of the ABC samples.
3158
M. BEE AND L. TRAPIN
Figure A. Stable Paretian GARCH. Kernel densities of the simulated distributions of αˆ D (top), αˆ C (middle), and αˆ MLE (bottom). The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 200, n˜ = 10, u = 1%, c = 1, p = 4, μ∗ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω∗ ∼ U[0, ω (0) + 0.02], α1∗ ∼ U[0, 0.4], β1∗ ∼ U[0.6, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 1], the discrete ACFE uses r = 1. The length of the series is n = 100 and the number of replications is B = 100.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS
3159
Figure A. Stable Paretian GARCH. The first two panels show, respectively, the bias and the RMSE of θˆiD , θˆiC , and θˆiMLE (i = 1, . . . , 6). The third panel displays the relative performance of the two ACFEs. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 100, u = 1%, c = 1, p = 4, μ∗ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω∗ ∼ U[0, ω (0) + 0.02], α1∗ ∼ U[0, 0.4], β1∗ ∼ U[0.4, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 1], the discrete ACFE uses r = 1. The length of the series is n = 100 and the number of replications is B = 100.
3160
M. BEE AND L. TRAPIN
Figure A. Stable Paretian GARCH. The first two panels show, respectively, the bias and the RMSE of θˆiC and θˆiMLE (i = 1, . . . , 6). The third panel displays the relative performance. The parameters are: μ = 0.0596, ω = 0.0061, α1 = 0.0497, β1 = 0.9325, β = −0.9516, α = 1.9252, m = 200, n˜ = 10, u = 1%, c = 1, p = 4, μ∗ ∼ U[μ(0) − 0.2, μ(0) + 0.2], ω∗ ∼ U[0, ω (0) + 0.2], α1∗ ∼ U[0, 0.4], β1∗ ∼ U[0.6, 1], α ∗ ∼ U[1, 2], β ∗ ∼ U[−1, 0]. The length of the simulated series is n = 500 and the number of replications is B = 100.