Corrected estimates for Student t regression models with unknown ...

3 downloads 0 Views 214KB Size Report
We discuss analytical bias corrections for maximum likelihood estimators in a ... of freedom that produces a bias corrected estimator with very good small.
Journal of Statistical Computation and Simulation Vol. 75, No. 6, June 2005, 409–423

Corrected estimates for Student t regression models with unknown degrees of freedom KLAUS L. P. VASCONCELLOS* and SYDNEY GOMES DA SILVA Departamento de Estatística, CCEN, Universidade Federal de Pernambuco, Cidade Universitária, Recife, PE 50740-540, Brazil (Revised 21 October 2002; in final form 9 June 2004) We discuss analytical bias corrections for maximum likelihood estimators in a regression model where the errors are Student-t distributed with unknown degrees of freedom. We propose a reparameterization of the number of degrees of freedom that produces a bias corrected estimator with very good small sample properties. This unknown number of degrees of freedom is assumed greater than 1, to guarantee a bounded likelihood function. We discuss some special cases of the general model and present some simulations which show that the corrected estimates perform better than their corresponding uncorrected versions in finite samples. Keywords: Bias correction; Maximum likelihood estimation; Student-t regression model; Unknown degrees of freedom

1.

Introduction

The performance of maximum likelihood estimates (MLEs) in small sample sizes is an important subtopic of asymptotic theory. MLEs are, typically, biased estimates of the true parameter values. This bias does not constitute a serious problem if the sample size is reasonably large, because it is, in general, of order O(n−1 ), while the asymptotic standard error has order O(n−1/2 ). However, if n is not large enough, the bias will be significant and it is useful, in many practical situations, to produce formulae that allow one to compute it approximately. The idea described above has led to a great extent of study about the bias correction of the MLE. Box [1] gives a general expression for the n−1 bias in multivariate nonlinear models where covariance matrices are known. Pike et al. [2] investigate the bias in logistic linear models. For nonlinear regression models, Cook et al. [3] relate bias to the position of the explanatory variables in the sample space. Young and Bakir [4] show that bias correction can improve estimation in the generalized log-gamma regression models. Cordeiro and McCullagh [5] have given general matrix formulae for bias correction in generalized linear models. Firth [6] introduces an alternative bias corrected estimator, which corresponds to the *Corresponding author. Email: [email protected]

Journal of Statistical Computation and Simulation ISSN 0094-9655 print/ISSN 1563-5163 online © 2005 Taylor & Francis Group Ltd http://www.tandf.co.uk/journals DOI: 10.1080/00949650412331270888

410

K. L. P. Vasconcellos and S. Gomes da Silva

solution of a modified score equation. More recently, Cordeiro and Vasconcellos [7] obtain general matrix formulae for bias correction in multivariate nonlinear regression models with normal errors, while Cordeiro and Vasconcellos [8] obtain second-order biases of the maximum likelihood estimates (MLEs) in von Mises regression models. In addition, Cribari-Neto and Vasconcellos [9] compare the performance of some bias correction alternatives for the beta distribution. In this paper, we obtain general formulae for second-order biases of the MLEs of the parameters in a nonlinear Student-t regression model where the number of degrees of freedom is unknown. The t-distribution has been widely used in the literature, since it provides a robust alternative to estimation. The reason is that MLEs for the location and scale parameters in the t-distribution are not as sensitive to atypical observations as those corresponding to the normal distribution. Jeffreys [10] uses the t-distribution to describe astronomical data. Zellner [11] studies the univariate linear regression model where the vector of observed responses is multivariate t distributed. West [12] performs Bayesian analysis related to the use of the t distribution in regression problems. Sutradhar and Ali [13] extend Zellner’s work to cover MLEs in a multivariate regression model. Lange et al. [14] provide a rich illustration on the use of the univariate and multivariate regression models with t distributed errors as robust extensions of the classical normal model and give a number of practical applications. Ferrari and Arellano-Valle [15] have developed Bartlett and Bartlett-type corrections to improve likelihood ratio and score tests in the class of univariate linear t regression models discussed in ref. [14]. In addition, formulae for second-order biases of MLEs in univariate nonlinear t regression models are given in ref. [16], while their results are extended by Vasconcellos and Cordeiro [17] for multivariate regression models and by Vasconcellos et al. [18] for heteroskedastic models. The paper is organized in the following form. In section 2, we introduce the nonlinear Student-t regression model with an unknown number of degrees of freedom. In sections 3 and 4, we use Cox and Snell [19] general expression to obtain formulae for second-order biases of the MLEs of the parameters. In section 5, we present some simulation results and propose a reparameterisation of the number of degrees of freedom parameter, in order to have estimatives with good finite sample properties. We verify, by simulation, that the corrected estimates have better finite sample performance than the uncorrected ones. Finally, in section 6, the main conclusions of this work are summarized.

2.

Model definition

We consider a univariate nonlinear regression model where the observations y1 , . . . , yn , are independent, yi having a Student-t distribution with location parameter µi , scale parameter σ and ν degrees of freedom. The density of yi , for each i = 1, . . . , n, is therefore given by   −(ν+1)/2  ν ν/2 ((ν + 1)/2) y − µi 2 , (1) ϕ(y; µi , σ, ν) = ν+ √ σ σ π (ν/2) where σ > 0 and ν > 1 are both unknown. We define the precision parameter φ = σ −2 for each observation yi . We assume that, for each i = 1, . . . , n, the location parameter µi can be expressed as µi = fi (xi , β), where fi (·) is a known function, continuously twice differentiable with respect to β, xi is an m × 1 vector of known explanatory variables and β = (β1 , . . . , βp )T is a vector of unknown parameters. The vector of all unknown parameters is, then, given by θ = (β T , ν, φ)T . We also assume that p is small relative to n. It is important to stress that,

Corrected estimates for Student t regression

411

although, in principle, the number ν of degrees of freedom can be any positive number, there are some estimation problems if we allow for very small values of ν, since, as Fernandez and Steel [20] have observed, the likelihood function can become unbounded. That is, for a given β = β0 , let s = s(β0 ) be the number of observations for which yi = µi . Then, Fernandez and Steel [20] observe that if we take ν = ν0 sufficiently small such that ν0−1 + 1 > n/s, the value of the likelihood function at (β0T , ν0 , φ)T tends to infinity, as φ tends to infinity. They also observe that this problem does not occur if we take ν0 large enough so that ν0−1 + 1 < n/s, which forces the likelihood function to be bounded. Since s = p in most cases, then, assuming n > 2p, the restriction ν > 1 seems to solve this problem in most practical situations. This restriction is also convenient from the practical point of view and it means that we assume the expected value of each observation yi exists and is equal to µi . We must also emphasize that we are working under the assumption that the observations are independent. Therefore, the approach we use here is different from that of Zellner [11]. Instead, we follow Lange et al. [14], Ferrari and Arellano-Vale [15], Cordeiro et al. [16] and Vasconcellos and Cordeiro [17]. From equation (1) and using independence, the total loglikelihood (θ ) for the (p + 2) × 1 vector θ = (β T , ν, φ)T of unknown parameters, given the observations y1 , . . . , yn , becomes  (θ ) = n

 n 1 ν+1 log(ν + ti2 ), log φ + g(ν) − 2 2 i=1

(2)

with each ti = φ 1/2 (yi − µi ) and g(ν) given by  g(ν) = log

 ν ν/2 ((ν + 1)/2) . π 1/2 (ν/2)

(3)

We assume that the standard regularity conditions [ref. 21, chapter 9] on (θ ) and its first three derivatives hold as n tends to infinity; these conditions are usually satisfied in practice. We now introduce the notation used throughout the paper. The total log-likelihood derivatives with respect to the unknown parameters are indicated by indices, where lower-case letters r, s, t, . . . correspond to derivatives with respect to the β parameters, while the indices φ and ν correspond to derivatives with respect to these parameters. Thus, Ur = ∂ /∂βr , Uφ = ∂ /∂φ, Uνs = ∂ 2 /∂ν ∂βs , Ursφ = ∂ 3 /∂βr ∂βs ∂φ and so on. The standard notation for the moments of these derivatives is used here [22]: κrs = E(Urs ), κφ,ν = E(Uφ Uν ), κrs,φ = E(Urs Uφ ), κrst = E(Urst ), etc., where all κs refer to a total over the sample, and are, in general, of order (ν) (t) = ∂κrs /∂βt , κrφ = ∂κrφ /∂ν, etc. n. The derivatives of these quantities are denoted by κrs Moreover, if δ = (ν, φ), we denote the information matrices with respect to β and δ by Kβ and Kδ , respectively. We assume that Kβ and Kδ are nonsingular, denoting the element in position (r, s) of Kβ−1 by κ r,s ; the elements of Kδ−1 will be denoted as κ ν,ν , κ ν,φ and κ φ,φ . In addition, we use the notation in ref. [3] for the derivatives of µi = fi (xi ; β) with respect to the components of β: fir = ∂fi /∂βr , fir,s = fir fis , firs = ∂ 2 fi /∂βr ∂βs , firs,t = firs fit etc. Differentiation of equation (2) yields   n n   fir ti ti2 1 1/2 Ur = φ (ν + 1) n − (ν + 1) , , Uφ = 2φ ν + ti2 ν + ti2 i=1 i=1    n n  ν  1  ν+1 n ν+1 1 Uν = 1 + log ν + ψ −ψ − log(ν + ti2 ) − , 2 2 2 2 i=1 2 i=1 ν + ti2

412

K. L. P. Vasconcellos and S. Gomes da Silva

where ψ denotes the digamma function (derivative of the log of the gamma function). We assume that the log-likelihood function has a maximum at an interior point of the parametric space and that this maximum corresponds to the unique extreme point of the log-likelihood. Therefore, the MLEs of the parameters β, φ, ν can be obtained as the solution of a nonlinear system of p + 2 equations: Ur = 0 for r = 1, . . . , p, Uφ = 0 and Uν = 0. This system can, in practice, be solved using an iterative procedure that converges to the desired MLEs. A good numerical procedure to maximize the log-likelihood function is the MaxBFGS function implemented in the Ox programming language [23], which uses BFGS, the quasi-Newton method developed by Broyden, Fletcher, Goldfarb and Shanno [see also ref. 24]. This was the procedure used in the numerical studies of the present work. From here onwards, we assume ˆ φˆ and νˆ exist and are given by the joint solution of Ur = 0 for r = 1, . . . , p, that the MLEs β, Uφ = 0 and Uν = 0.

3.

Biases of the estimates of β

From the log-likelihood defined in equation (2) and from basic properties of the Student-t distribution [ref. 25, appendix B], we obtain the following moments: ν + 1  r,s f , ν + 3 i=1 i n

κrs = −φ

n , κνν φ(ν + 1)(ν + 3) ν + 1  r,st = −φ (fi + fis,rt + fit,rs ), ν+3 i

κφν = − κrst

nν , 2φ 2 (ν + 3) 

    n 1 ν+5  ν +1  ν = + ψ −ψ , 2 ν(ν + 1)(ν + 3) 2 2 2

κrφ = κrν = 0,

κφφ = −

κrsφ =

ν+2 κrs , φ(ν + 5)

2(ν − 1) 2(ν + 8) κrs , κrφφ = κrφν = κrνν = 0, κφφφ = − κφφ , ν(ν + 1)(ν + 5) φ(ν + 5) 3(ν − 3) 2(ν − 1) = κφφ , κφνν = − κφν , ν(ν + 1)(ν + 5) ν(ν + 5) 

    2ν 2 + 21ν + 31 1 n  ν + 1  ν − ψ − ψ . =− 2 ν 2 (ν + 1)(ν + 3)(ν + 5) 4 2 2

κrsν = κφφν κννν

We immediately observe that β and δ are globally orthogonal [26], since κrφ = κrν = 0, for all r = 1, . . . , p. Therefore, the joint information matrix K for θ = (β T , δ T )T is block-diagonal, K = diag{Kβ , Kδ }. In view of block-diagonality, the n−1 biases of βˆ and δˆ can be obtained in a straightforward way. Our first goal is to derive the second-order bias of the MLE βˆ of β. To do so, we will follow the approach in ref. [18]. Let B(βˆs ) be the n−1 bias of βˆs . From the general formula of Cox and Snell [19], the block-diagonality of K and the additional property that κrφφ = κrφν = κrνν = 0, for all r = 1, . . . , p, we have B(βˆs ) =

 r,t,u

 s,r t,u

κ κ

κrt(u)

 1 − κrtu . 2

(4)

Corrected estimates for Student t regression

413

Thus, the second-order bias of βˆ will be the same, regardless of whether δ is known or unknown. Therefore, we obtain the result in ref. [16]. That is, the expression for the second-order bias ˆ will be written as vector B(β) ˆ = (F T F )−1 F T d, B(β)

(5)

where F is an n × p matrix, having in its (i, r)th position the derivative of fi with respect to βr , i = 1, . . . , n; r = 1, . . . , p. In addition, d is the n-dimensional vector given by 1 d=− 2φ



 ν+3 H vec(F T F )−1 , ν+1

where H is an n × p2 matrix having in the ith row and [r(j − 1) + k]th column the secondorder derivative of fi with respect to βj and βk , i = 1, . . . , n; j , k = 1, . . . , p and vec stands for the operator that forms a column vector from a matrix, by stacking its columns, ˆ can be thus obtained from an ordione underneath the other. The second-order bias B(β) nary linear regression of d on F . Our approach also has advantages for algebraic purposes because it only involves products and inversion of matrices. Equation (5) depends on the β parameters only through the first and second partial derivatives of the mean functions with respect to these parameters. Although equation (5) has a simple form, its interpretation is not straightforward. One can use equation (5) with a computer algebra system such as MATHEMATICA [27] ˆ or MAPLE [28] to obtain closed-form expressions for B(β). ˆ ˆ All quantities have to be evaluated at β and δ. It is then possible to obtain the bias-corrected ˆ β), ˆ β) ˆ where B( ˆ denotes the right-hand side of equation (5) evaluated estimate as β˜ = βˆ − B( ˆ ˆ ˜ at β and δ. The corrected estimate β is expected to have better sampling properties than βˆ in small samples. It is also interesting to observe that for a linear regression model the second-order bias will be 0, since, in this case, H = 0.

4.

Biases of the estimates of δ

ˆ For this purpose, we consider We now proceed to the calculation of the n−1 bias of the MLE δ. again Cox and Snell’s formula. Since the information matrix for θ = (β T , δ T )T is blockdiagonal, we can write the n−1 bias of φˆ as ˆ =− B(φ)

   1  φ,R s,t 1 (T ) κ κ κRst + κ φ,R κ S,T κRS − κRST , 2 R,s,t 2 R,S,T

(6)

with the indices R, S, T varying over {φ, ν} and r, s, t representing the indices of the components of β = (β1 , . . . , βp )T . ˆ denote the first sum of equation (6). We have Let B1 (φ) ˆ =− B1 (φ)

κ φφ  st κ φν  st 1  φ,R s,t κ κ κRst = − κ κφst − κ κνst , 2 R,s,t 2 s,t 2 s,t

414

K. L. P. Vasconcellos and S. Gomes da Silva

with s, t = 1, . . . , p. Substituting the expressions for κφst and κνst , we obtain ˆ =− B1 (φ)

Since

s,t

κ φφ (ν + 2)  st κ φν (ν − 1)  st κ κst − κ κst . ν(ν + 1)(ν + 5) s,t 2φ(ν + 5) s,t

κ st κst is the trace of the p × p identity matrix, which equals p, we arrive at ˆ =− B1 (φ)

p(ν + 2) φφ p(ν − 1) κ − κ φν . 2φ(ν + 5) ν(ν + 1)(ν + 5)

(7)

ˆ we write For the second sum in equation (6), B2 (φ),  1 κ κ − κRST 2 R,S,T       1 1 (T ) (T ) = κ φφ κ ST κφS − κφST + κ φν κ ST κνS − κνST , 2 2 S,T S,T

ˆ = B2 (φ)





φ,R S,T

(T ) κRS

where S, T vary over {φ, ν}. Therefore, each of the two sums in the last expression is a sum (φ) of four terms. Using the additional fact that κνν = 0, we can write     1 3 (φ) (φ) (ν) ˆ = κφφ B2 (φ) − κφφφ (κ φφ )2 + κφφ + 2κφν − κφφν κ φφ κ φν 2 2     

1 1 (ν) (ν) φφ νν (ν) φν 2 + κφν − κφνν (κ ) + κφν − κφνν κ κ + κνν − κννν κ φν κ νν (8) 2 2 ˆ given in equation (7), to the expression for B2 (φ), ˆ given in equation (8), we Adding B1 (φ), ˆ After tedious calculations, it can be shown that find the n−1 bias of the MLE φ. ˆ 2 8j =0 aj (ν)ν j B(φ) nν(ν + 5) = φ (G1 ν 3 + 2G1 ν 2 + (1 + G1 )ν + 3)2 5 j j =0 bj (ν)ν , +p G1 ν 3 + 2G1 ν 2 + (1 + G1 )ν + 3 where the aj s and bj s are given by a8 (ν) = G21 ,

a7 (ν) = 9G21 ,

a5 (ν) = 58G21 + 18G1 − 8G2 ,

a6 (ν) = 32G21 + 2G1 − G2 , a4 (ν) = 57G21 + 59G1 + 1 − 18G2 ,

a3 (ν) = 29G21 + 93G1 + 11 − 16G2 , a1 (ν) = 21G1 + 81, b5 (ν) = G1 ,

a2 (ν) = 6G21 + 71G1 + 45 − 5G2 ,

a0 (ν) = 30,

b4 (ν) = 7G1 ,

b3 (ν) = 17G1 + 1,

b2 (ν) = 17G1 + 8,

b1 (ν) = 6G1 + 21,

b0 (ν) = 6,

(9)

Corrected estimates for Student t regression

415

G1 and G2 being, respectively, the first and second derivatives of G(ν) = ψ((ν + 1)/2) − ψ(ν/2). Now, similarly, we can write the n−1 bias of νˆ as    1 1  ν,R s,t (T ) ν,R S,T κRS − κRST , (10) κ κ κ κ κRst + B(ˆν ) = − 2 2 R,s,t R,S,T with the indices R, S, T varying over {φ, ν} and r, s, t representing the indices of the components of β = (β1 , . . . , βp )T . If B1 (ˆν ) denotes the first sum of equation (10), then, B1 (ˆν ) = −

1  ν,R s,t κ νφ  st κ νν  st κ κ κRst = − κ κφst − κ κνst , 2 R,s,t 2 s,t 2 s,t

ˆ We with s, t = 1, . . . , p. The development is quite similar to what we did before for B1 (φ). find p(ν + 2) νφ p(ν − 1) (11) κ − κ νν . B1 (ˆν ) = − 2φ(ν + 5) ν(ν + 1)(ν + 5) In addition, for the second sum in equation (10), B2 (ˆν ), we have    1 (T ) ν,R S,T κRS − κRST κ κ B2 (ˆν ) = 2 R,S,T       1 1 (T ) (T ) νφ ST νν ST =κ κφS − κφST + κ κνS − κνST , κ κ 2 2 S,T S,T where S, T vary over {φ, ν}. It is not difficult to see that for this case we obtain  

 1 (φ) (φ) (ν) B2 (ˆν ) = κφφ − κφφφ κ φφ κ φν + κφφ + κφν − κφφν (κ φν )2 2     1 3 (φ) (ν) φφ νν + κφν − κφφν κ κ + 2κφν − κφνν κ φν κ νν 2 2   1 (ν) + κνν − κννν (κ νν )2 . 2

(12)

The n−1 bias of the MLE νˆ will be given by the sum of B1 (ˆν ), given in equation (11), and B2 (ˆν ), given in equation (12). We obtain 6 j ν+5 j =0 cj (ν)ν nB(ˆν ) = ν+1 (G1 ν 3 + 2G1 ν 2 + (1 + G1 )ν + 3)2 6p , + G1 ν 3 + 2G1 ν 2 + (1 + G1 )ν + 3 where the cj s are given by c6 (ν) = G2 ,

c5 (ν) = 8G2 ,

c2 (ν) = 5G2 − 9G1 − 22,

c4 (ν) = 18G2 − 3G1 , c1 (ν) = −3(G1 + 22),

c3 (ν) = 16G2 − 9G1 − 2, c0 (ν) = −30.

(13)

416

K. L. P. Vasconcellos and S. Gomes da Silva

ˆ νˆ )T , in Clearly, all quantities in equations (9) and (13) have to be evaluated at δˆ = (φ, ˆ φ) ˆ ν ), where Bˆ means the ˆ and ν˜ = νˆ − B(ˆ order to obtain the corrected MLEs φ˜ = φˆ − B( ˆ The estimative of the second-order bias obtained using the maximum likelihood estimator δ. corrected estimators are expected to have better sampling properties than the corresponding original MLEs in small sample sizes. The following figures give an idea of the behavior of the second-order bias as a function of both the true value of ν and the dimension p of the β parameter. ˆ Figure 1 shows the behavior of nB(φ)/φ, obtained from equation (9), as a function of ν, for four different values of p, namely p = 1, 3, 5 and 8. It is seen that this quantity is a decreasing function of ν. This is consistent with the fact that the information about φ, −κφφ , is an increasing function of ν, and the second-order bias is directly related to the information. ˆ We also observe that nB(φ)/φ increases with p. Figure 2 shows how nB(ˆν ), as given by equation (13), varies, as a function of ν, for the same four values of p. It is seen from this figure that nB(ˆν ) is an increasing convex function of ν. Therefore, the second-order bias of νˆ can become excessively large, even for relatively small values of ν and moderate values of the sample size (n around 100). This is again consistent with the fact that the information about ν, −κνν , decreases very quickly as ν increases. Hence, for large values of ν, we need a very large number of observations to produce enough information in order to have good estimation of this parameter.

Figure 1.

ˆ Relative second-order bias term of φ.

Corrected estimates for Student t regression

Figure 2.

5.

417

Second-order bias term of νˆ .

Simulation studies

We performed Monte Carlo simulation studies for two different situations. The first one is a Student-t regression model, where the observations have means µi = xi (1 − exp(−β/xi )), xi being the values of the explanatory variables for each observation, assumed known. The regression parameter β, the precision parameter φ and the number ν of degrees of freedom are unknown. The second situation corresponds to independent and identically distributed observations of a Student t distribution with mean µ, precision parameter φ and ν degrees of freedom, all of them unknown. For the first situation, the true parameters were taken as β = 1, φ = 1 and ν = 4. The observations of the covariates xi were chosen as absolute values of random draws from a normal N(0, 1) distribution, those values being held constant throughout simulations with equal sample sizes. For the second situation, the true parameters were taken as µ = 0, φ = 1 and ν = 4. The number of observations in both situations was set at n = 150, 250 and 400. The simulations and the calculations of the MLEs and their biases were performed using the version 3.10 of the Ox programming language [23]. For each particular situation and sample size, we carried out simulations based on 10,000 replications. In each of the 10,000 replications, we computed the MLEs by maximizing the log-likelihood using the MaxBFGS routine of Ox [23]. Then, we computed the second-order biases from formulae (5), (9) and (13),

418

K. L. P. Vasconcellos and S. Gomes da Silva

with all quantities evaluated at the MLEs. For each specific case (sample size and situation), we computed the sample means and standard errors of the MLEs and of their corrected estimates, based on their values in the 10,000 simulated samples. We have also calculated, for each specific case, the failure rate of the simulation. This failure rate is due to the numerical instability of the log-likelihood, which means that when the number of observations is not very large, our numerical maximization of the log-likelihood may not converge to a solution. For each specific case, we replicate until we have obtained 10,000 samples for which we successfully maximized the log-likelihood and obtained the corresponding estimates. The failure rate is defined as (N − 10,000)/N , where N is the total number of trials until we have obtained 10,000 successes. Table 1 gives the sample means of both uncorrected and corrected estimates for the regression model with their respective standard errors in parentheses. For each sample size, the first two lines give the original MLEs with their corresponding standard errors. The last two lines give the corrected MLEs with their standard errors. Overall, it is clear from the figures in this table that the bias-corrected estimates of β and φ are slightly closer to the true parameter values than the unadjusted estimates. Thus, the second-order bias correction brings the corrected MLEs closer to the true parameter values. It should also be noted that reasonably large sample sizes are necessary for uncorrected estimates of ν to become accurate. However, because of the behavior of the second-order bias for νˆ , not always the corrected estimator will improve the bias. In terms of mean square error, we observe that the correction slightly improves the mean square errors in the estimation of β and φ, while the mean square error for the corrected estimator of ν can become excessively large. Thus, we need a large number of observations in order for the bias corrected estimator for ν to display good behavior both in terms of bias and mean square error. Table 2 gives the failure rate for each sample size corresponding to these simulations. As we can see, there can be convergence problems, because of the numerical instability of the log-likelihood, if the number of observations is not large. Hence, some of the simulated samples may not converge, although, as it is clear from the table, this problem tends to vanish as the sample size increases. For example, from the table it can be seen that for n = 150, we had convergence problems 89 times, needing 10,089 trials in order to get 10,000 estimates. For n = 250, we only had six samples with convergence problems, so we needed 10,006 trials.

Table 1. Uncorrected and corrected estimates for a regression model with standard errors. n

β=1

φ=1

ν=4

150

1.037 (0.335) 0.992 (0.307)

1.019 (0.217) 1.003 (0.209)

5.271 (4.781) −3.200 (203)

250

1.023 (0.242) 1.001 (0.234)

1.009 (0.164) 0.999 (0.161)

4.610 (2.496) 3.150 (31.9)

400

1.013 (0.185) 1.000 (0.182)

1.006 (0.129) 1.000 (0.127)

4.309 (1.186) 3.958 (0.845)

Corrected estimates for Student t regression

419

Table 2. Failure rates for the simulations presented in table 1. Rate

n 150 250 400

0.00882 0.000600 0

Table 3. Uncorrected and corrected estimates for the i.i.d. model with standard errors. n

φ=1

ν=4

150

1.019 (0.219) 1.003 (0.211)

5.346 (5.278) −4.956 (149)

250

1.009 (0.164) 1.000 (0.161)

4.649 (3.162) 1.324 (157.)

400

1.006 (0.128) 1.000 (0.126)

4.316 (1.206) 3.963 (0.857)

For n = 400, the failure rate was 0, thus indicating that we did not have convergence problems for this sample size. In table 3, we present simulation results for the case of independent and identically distributed observations with mean µ, precision parameter φ and ν degrees of freedom. We only analyze the bias correction for ν and φ, since the second-order bias for µ is 0. Again, the correction produces a slight decrease in bias and mean square error in the estimation of φ. However, as in the first situation, the bias corrected estimator for ν can lead again to very poor estimation. For the smallest sample, n = 150, the correction can produce unacceptably negative estimation. The mean square error of the corrected estimator for ν is also very large for the smallest samples. Only for the largest sample, with n = 400, the bias correction worked in reducing the bias and mean square error of the original estimator error. Table 4 gives the failure rates. For n = 150, we had convergence problems 94 times, thus needing 10,094 trials. For n = 250, we only had five samples with convergence problems. For n = 400, all estimations were successful.

Table 4. Failure rates for the simulations presented in table 3. n 150 250 400

Rate 0.00931 0.000500 0

420

K. L. P. Vasconcellos and S. Gomes da Silva Table 5.

Estimates for regression model with reparameterization.

n

β=1

φ=1

ν=4

150

1.037 (0.335) 0.992 (0.307)

1.019 (0.217) 1.003 (0.209)

5.318 (5.343) 3.424 (0.755)

250

1.021 (0.230) 1.000 (0.224)

1.010 (0.165) 1.000 (0.162)

4.615 (2.381) 3.813 (0.893)

400

1.014 (0.196) 0.999 (0.192)

1.006 (0.128) 1.000 (0.126)

4.303 (1.188) 3.928 (0.823)

As the corrected estimator for ν can display poor performance, we have tried a different approach. Supposing we can assume that the true ν is greater than 1 (this seems to be a reasonable assumption, since it guarantees that the mean exists), we defined a new parameter λ as λ = log(ν − 1) and tried to maximize the log-likelihood function in terms of λ. As the function λ(ν) = log(ν − 1) has derivatives λ (ν) = (ν − 1)−1 and λ (ν) = −(ν − 1)−2 , it is easy to see that the second-order bias of λˆ can be written as ˆ = B(λ)

κ ν,ν B(ˆν ) , − v − 1 2(ν − 1)2

(14)

since κ ν,ν represents the asymptotic variance of νˆ , of order n−1 . Our proposal is, then, to ˆ λ) ˜ where λ˜ = λˆ − B( ˆ is the bias estimate ν using the alternative estimator ν ∗ = 1 + exp(λ), ˆ ˆ corrected estimator for λ, with B(λ) being obtained by equation (14). Table 5 shows the results for the regression model (first situation). We observe now that, even for n = 150, the correction becomes quite effective in reducing the bias of the original maximum likelihood estimation. Not only do we observe a bias reduced estimator, but also the mean square error becomes considerably smaller for the corrected estimator of ν. Table 6 gives the failure rates for this simulation. For n = 150, we had to discard 85 samples, thus needing 10,085 trials, while, for n = 250, we only had four samples with convergence problems. No convergence problems were observed for n = 400. Finally, we have observed the behavior of the alternative estimator ν ∗ for the second situation with i.i.d. observations. Table 7 shows the results of the simulations. Again, it is clear that the corrected estimator ν ∗ has better performance than the original MLE, both in terms of bias and mean square errors, in particular, for smaller sample sizes. Table 8 shows the corrresponding Table 6. Failure rates for the simulations presented in table 5. n 150 250 400

Rate 0.00843 0.000400 0

Corrected estimates for Student t regression Table 7.

421

Estimates for the i.i.d. observations with reparameterization.

n

φ=1

ν=4

150

1.019 (0.219) 1.003 (0.211)

5.420 (6.083) 3.421 (0.760)

250

1.009 (0.165) 1.000 (0.161)

4.639 (2.700) 3.804 (0.885)

400

1.006 (0.129) 1.000 (0.127)

4.306 (1.166) 3.932 (0.827)

Table 8. Failure rates for the simulations presented in table 7. n 150 250 400

Rate 0.00872 0.000600 0

failure rates. We discarded eighty-eight samples for n = 150, only six for n = 250 and none for n = 400.

6.

Conclusions

We have considered a nonlinear regression model for which the observations are Student t distributed with precision parameter φ and ν degrees of freedom, both of them unknown. For this model, we obtained the second-order bias of the maximum likelihood estimators of the parameters. The maximum likelihood estimators of the parameters in the regression equation have second-order biases whose expressions, as we have seen, are the same for known of unknown ν. The second-order biases of the maximum likelihood estimators of φ and ν have ˆ qualitatively different behavior. The relative bias B(φ)/φ is, for each fixed sample size, a decreasing function of ν and an increasing function of p, the dimension of the β parameter. The second-order bias B(ˆν ), on the other hand, is an increasing convex function of ν. Those behaviors are somewhat expected, since the second-order bias is directly related to Fisher’s information. As the second-order bias of B(ˆν ) can become very large, even for sample sizes as 100 observations, the classical bias corrected estimator of ν can produce very poor estimation, as the simulation results confirm. This corrected estimator can even assume negative values. A proposal to solve this difficulty is to consider an alternative bias corrected estimator, ν ∗ = ˆ λ) ˜ where λ˜ = λˆ − B( ˆ is the bias corrected estimator for λ = log(ν − 1). The 1 + exp(λ), simulation results have shown that this alternative estimator has very good performance in reducing both bias and mean square error of the original maximum likelihood estimator for ν.

422

K. L. P. Vasconcellos and S. Gomes da Silva

Another feature that was detected in our simulations is that there can be convergence problems when maximizing the log-likelihood function, due to the numerical instability of this function. For unconstrained optimization, the limit obtained in each maximization must correspond to a local maximum, a point of null gradient, since we cannot guarantee global maximization, as pointed out by Fernandez and Steel [20]. As expected, the intensity of this problem decreases as n, the sample size, increases. For n = 150, we expect to have convergence problems around 0.9% of the time. For n = 250, those problems occurred five or six times in 10,000 replications. For n = 400, the numerical instability seems to disappear.

Acknowledgement We gratefully acknowledge partial financial support from CNPq.

References [1] Box, M., 1971, Bias in nonlinear estimation (with discussion). Journal of the Royal Statistical Society B, 33, 171–201. [2] Pike, M., Hill, A. and Smith, P., 1980, Bias and efficiency in logisitc analysis of stratified case-control studies. International Journal of Epidemiology, 9, 89–95. [3] Cook, R., Tsai, C. and Wei, B., 1986, Bias in nonlinear regression. Biometrika, 73, 615–623. [4] Young, D. and Bakir, S., 1987, Bias correction for a generalized log-gamma regression model. Technometrics, 29, 183–191. [5] Cordeiro, G.M. and McCullagh, P., 1991, Bias correction in generalized linear models. Journal of the Royal Statistical Society B, 53, 629–643. [6] Firth, D., 1993, Bias reduction of maximum likelihood estimates. Biometrika, 80, 27–38. [7] Cordeiro, G.M. and Vasconcellos, K.L.P., 1997, Bias correction for a class of multivariate nonlinear regression models. Statistics and Probability Letters, 35, 155–164. [8] Cordeiro, G.M. and Vasconcellos, K.L.P., 1999, Second-order biases of the maximum likelihood estimates in von Mises regression models. Australian and New Zealand Journal of Statistics, 41, 901–910. [9] Cribari-Neto, F. and Vasconcellos, K.L.P., 2002, Nearly unbiased maximum likelihood estimation for the beta distribution. Journal of Statistical Computation and Simulation, 72, 107–118. [10] Jeffreys, H., 1939, Theory of Probability (Oxford: Clarendon Press). [11] Zellner, A., 1976, Bayesian and non-Bayesian analysis of the regression model with multivariate Student-t error terms. Journal of the American Statistical Association, 71, 400–405. [12] West, M., 1984, Outlier models and prior distributions in Bayesian linear regression. Journal of the Royal Statistical Society B, 46, 431–439. [13] Sutradhar, B. and Ali, M., 1986, Estimation of parameters of a regression model with a multivariate t error variable. Communications in Statistics – Theory and Methods, 15, 429–450. [14] Lange, K., Little, R. and Taylor, J., 1989, Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881–896. [15] Ferrari, S. and Arellano-Valle, R., 1996, Modified likelihood ratio and score tests in linear regression models using the t distribution. Brazilian Journal of Probability and Statistics, 10, 15–33. [16] Cordeiro, G.M., Vasconcellos, K.L.P. and Santos, M.L.F., 1998, On the second-order bias of parameter estimates in nonlinear regression models with Student t errors. Journal of Statistical Computation and Simulation, 60, 363–378. [17] Vasconcellos, K.L.P. and Cordeiro, G.M., 2000, Bias corrected estimates in multivariate Student t regression models. Communtications in Statistics – Theory and Methods, 29, 797–822. [18] Vasconcellos, K.L.P., Cordeiro, G.M. and Barroso, L.P., 2000, Improved estimation for robust econometric regression models. Brazilian Journal of Probability and Statistics, 14, 141–157. [19] Cox, D. and Snell, E., 1968, A general definition of residuals. Journal of the Royal Statistical Society B, 30, 248–275. [20] Fernandez, C. and Steel, M.F.J., 1999, Multivariate Student t regression models: pitfalls and inference. Biometrika, 86, 153–167. [21] Cox, D. and Hinkley, D., 1974, Theoretical Statistics (London: Chapman and Hall). [22] Lawley, D., 1956, A general method for approximating to the distribution of likelihood ratio criteria. Biometrika, 43, 295–303. [23] Doornik, J.A., 2001, Ox: an Object-oriented Matrix Programming Language (4th edn) (London: Timberlake Consultants). http://www.nuff.ox.ac.uk/Users/Doornik.

Corrected estimates for Student t regression

423

[24] Fletcher, R., 1987, Practical Methods of Optimization (New York: John Wiley and Sons). [25] Zellner, A., 1971, An Introduction to Bayesian Inference in Econometrics (New York: John Wiley and Sons). [26] Cox, D. and Reid, N., 1987, Parameter orthogonality and approximate conditional inference (with discussion). Journal of the Royal Statistical Society B, 49, 1–39. [27] Wolfram, S., 1991, Mathematica: a System for Doing Mathematics by Computer (Massachusets: AddisonWesley). [28] Abell, M.L. and Braselton, J.P., 1994, The Maple V Handbook (New York: AP Professional).

Suggest Documents