Maximum Likelihood Based Approximations of the Pitman Estimator in a Location Family SPIRIDON PENEV Department of Statistics The University of New South Wales 2052 Sydney AUSTRALIA
[email protected] http://www.maths.unsw.edu.au/∼spiro/ Abstract: - Two different types of Maximum Likelihood based approximations of the Pitman Estimator in a location family are discussed. Both approximations rely on asymptotic Laplace expansions for integrals. At the first approximation we try to find an M - estimator with a ”ψ- function that is better than minus the loglikelihood derivative of the density”. At the second approximation we directly improve the Maximum Likelihood Estimator towards the Pitman Estimator by adding an Op (1/n) correction term. This approximation can be applied also for multivariate parameters. The effect of the improvement can be felt by small sample sizes when the Laplace approximation is usually better than the normal- based asymptotics. Key-Words: - Saddlepoint; M-estimator; Pitman Estimator; Asymptotic Laplace Expansion; Maximum Likelihood; Location Family.
1
Introduction
for a suitable function ψ(.) : Rm −−→ Rm which satisfies the condition
Let us consider the classical problem of estimating a location parameter ϑ ∈ Rm in a family of distributions Fϑ (x) with a Lebesgue density fϑ (x) = f (x − ϑ), based on n i.i.d. observations X1 , X2 , . . . , Xn ∈ Rm . The best translation invariant estimator of ϑ with respect to quadratic loss is the Pitman Estimator (PE). It is defined as R
Z +∞ −∞
(1)
where t = (t1 , t2 , . . . , tm ) and dt is a shorthand notation for dt1 dt2 . . . dtm . This estimator is outside the representative class of translation invariant estimators {ϑn,ψ }, ψ ∈ Ψ each of which is defined as the solution of a m- dimensional M - equation n X
ψ(Xi − ϑn,ψ ) = 0
(3)
Estimators defined by (2) are easier to compute especially when the dimension m ≥ 2 because one replaces the multidimensional integration to be performed in (1) by an iterative equation solving procedure. In addition, it should be mentioned that while for quadratic loss, the PE can be explicitly defined via (2), there is no explicit formula for its calculation for other types of symmetric subconvex losses of interest in Statistical Inference. There is also a theoretical and methodological interest in considering the class of translation invariant M - estimators and trying to find or to approximate the best one in this class. The Maximum Likelihood Estimator (MLE) ϑˆn is a special M - estimator which can be obtained by choosing
Qn
f (xi − t)dt m t ϑ˜n = RR Qni=1 i=1 f (xi − t)dt Rm
ψ(x)f (x)dx = 0.
(2)
i=1
1
ψ(x) = −f 0 (x)/f (x). It is asymptotically efficient but it is usually dominated by the PE for any finite sample size n (an exception is the case of a normal location family where both estimators coincide). Both estimators ϑ˜n and ϑˆn are different in general and it is interesting to find another translation invariant M -estimator which is closer to the PE than the MLE. We will ask for the best (with respect to mean-squared error) estimator in the class of translation invariant M -estimators. It is difficult to find it for a fixed sample size n. Recent simulations confirm though that the saddlepoint approximations of the density of certain statistics, although asymptotic in nature, work remarkably well even down to sample sizes such as 3 or 4. The idea is therefore to replace the true density of a given M - estimator by its saddlepointapproximated version and to substitute this approximation into the formula for computing the mean- squared error. One would get a very accurate estimator of the mean- squared error of the corresponding M - estimator and can try to find a suitable ψ- function which minimizes this approximation. It turns out that this (quasi)optimal ψfunction can be obtained by imposing a small correction on the function −f 0 (x)/f (x). Regrettably, the approach we apply to find the solution can hardly be implemented if m > 1. Therefore we develop a second approach for approximating the PE which can also be applied in the case m > 1. It is based on Laplace expansions of the two integrals in (1). The resulting approximation is of the form MLE + MLE-based correction +Op (1/n2 ) It should be noted that the corrected estimator we get is translation invariant. This is an important fact. Note that earlier proposals of shrinking MLE- based estimators with small mean- squared error exist but all of these estimators do not possess the translation invariance property.
2
lation invariance the mean squared error of ϑn,ψ satisfies the relation Z +∞ −∞
2
(t − ϑo )
.pϑn0 (t)dt
=
Z +∞ −∞
t2 .˜ pn (t)dt
where p˜n (t) = p0n (t) does not depend on ϑo . R +∞ 2 We want to minimize −∞ t .˜ pn (t)dt over the set of functions p˜n resulting from the ψ- functions satisfying (3). Substituting for p˜n its saddlepoint approximation, we get, along the lines of the derivations in [2], the following minimization problem: R 2 t Bt σt−1 c−n t dt min R −1 −n
Bt σt ct dt
ψ
(4)
under the side condition (3) where ht (x) = exp(α(t)ψ(x − t))f (x) and the function α(t) is the solution of the implicit equation Z
ψ(x − t)ht (x)dx = 0. Moreover : Z
exp(α(t)ψ(x − t))f (x)dx]−1 ;
ct = [
Z
σt2
= ct
ψ 2 (x − t)ht (x)dx
R
and Bt = ct . ψ 0 (x − t)ht (x)dx. Here and further we skip the integration limits if the integration is over the real line. Note that in (4) we have substituted the renormalized version of the density approximation. As noted in [2], the renormalization improves the order of the approximation. The minimization problem (4) is of the form R
min R ψ
q(t) exp[−(n − 21 )p(t)]dt q˜(t) exp[−(n − 21 )p(t)]dt
(5)
under the side conR −1 dition R (3) where : p(t) = ln{[R ht (x)dx] }, q(t) = 2 0 0 Rt . ψ (x−t)ht (x)dx , q˜(t) = R ψ (x−t)ht (x)dx . 1/2 1/2 2 2
[
Realization of the first approach
ψ (x−t)ht (x)dx]
[
ψ (x−t)ht (x)dx]
Some more smoothness assumptions on the functions ψ and f will be required in order to permit an elementary treatment. These are:
Let ϑ0 ∈ R1 be the ”true” parameter and pϑn0 (.) denote the density of the estimator ϑn,ψ which is defined by (2), (3). Note that because of the trans-
A1) ψ and f are 3 times differentiable on R and f (x) > 0 on R. 2
R
Z
ψ(x − t) exp(α(t)ψ(x − t))f (x)dx = 0 has a unique solution α(t). R A3) ψ 0 (x − t)ht (x)dx > 0 for all t. R A4) 0 < ψ 2 (x−t)ht (x)dx < ∞ and is continuous for all t. A5) The function Z
K(α) = ln
exp(α.ψ(x))f (x)dx
exists for all α in a set containing the origin. R A6) The integrals (ψ 0 (x − t))2 ψ(x − t)ht (x)dx,
ψ f
t¯ = t0 = 0 is the minimal point of p(t). This point is an interior point of the integration path. Theorem 8.1 in [5] , Ch.III gives the asymptotic Laplace expansion for the case that t0 is the starting point of the integration path. But because of the relation g (k) (−t) |t=0 = (−1)(k) g (k) (t) |t=0 for the k-th derivatives at zero, it is easy to see that we could use the results of this theorem also for cases of parabolic- like extremes in the interior of the integration path by simply multiplying by 2 half of the coefficients as in this theorem and setting the other half to be equal to zero. For the sake of completeness we give the form of the asymptotic expansion for x → ∞ as given in Theorem 8.1 in [5]:
Z
ψ 2 (x − t)ψ 0 (x − t)ht (x)dx, Z
Z
ψ 3 (x − t)ht (x)dx,
(ψ 0 (x − t))2 ht (x)dx,
Z
ψ 00 (x − t)ψ(x − t)ht (x)dx, Z
Z
ψ 000 (x − t)ht (x)dx,
ψ 00 (x − t)ht (x)dx
R
and ψ 4 (x − t)ht (x)dx are finite and continuous in t. Some more conditions will be needed when deriving the Euler Equation below. It is difficult to directly solve the minimization problem (5). But a close look at its structure suggests the idea to use the Laplace method to approximate the values of the numerator and of the denominator for large sample size n. Using the leading term in the Laplace approximation only gives the solution ψ(x) = −f 0 (x)/f (x), i.e. the MLE. To get the next non- zero correction term, we need the next terms in the asymptotic expansion of the Laplace type integrals in the numerator and in the denominator. Standard reference for such type of asymptotic expansions is [5], Chapter 3.8. Before proceeding further, we introduce some shorthand notations: Z
Z b a
ψ (x − t)ht (x)dx =
e−xp(t) q(t)dt ∼ e−xp(a) .
∞ X s=0
Γ(
s + λ as ). s+λ µ x µ
(6) Here it is assumed that a is the only minimum of the function p(t) in [a, b)(b > a), p0 (t) and q(t) are continuous in a neighbourhood of a and the integral is absolutely convergent in (a, b) for sufficiently large values of x. It is assumed that the P asymptotic expansions p(t) ∼ p(a) + ∞ s=0 ps (t − P∞ s+λ−1 s+µ for t → a+ a) and q(t) ∼ s=0 qs (t − a) hold with µ > 0, λ > 0 and p0 6= 0, q0 6= 0. Moreover the asymptotic expansion (for t → a+ )
Z 0
R
ψ 2 (x − t)ht (x)dx = ψ 2 ht etc. Using the definition of the function α(t) we see that the stationary point t¯ of the function p(t) must satisfy R 0 ¯ α(t) ψ (x − t¯)ht¯(x)dx = 0. This combined with condition A3) tellsR us that α(t¯) = 0. Hence ht¯(x) = f (x) and ψ(xR− t¯)f (x)dx = 0. At the same time the relation ψ(x)f (x)dx = 0 holds. This together with condition A2) leads to the conclusion that t¯ = t0 = 0 is the only stationary point of the function p(t). In this case ht¯(x) = f (x) R 0 holds and then we denote : ψ (x −R t0 )ht0 (x)dx = R 0 R 2 ψ f, ψ (x − t0 )ht0 (x)dx = ψ 2 f etc. We will also write αt for α(t), αt0 for α0 (t) etc. By differentiating p0 (t) once again itR can be shown ( ψ 0 f )2 (see (9) below ) that p00 (t¯) = R 2 > 0, i.e.
A2) The equation (2) has a unique solution and also the equation
p0 (t) ∼
0
ψ ht ,
∞ X
(s + µ)ps (t − a)s+µ−1
s=0
3
R
a0 =
q0 λ µ
, a1 = [
µp0
q 1 (λ + 1)p1 q 0 1 − ]. λ+1 , 2 µ µ p0 p0 µ
(λ+2)p1 q 1 µ2 p0 (λ+2)q0 1 . 2µp0 p2 ]. 2µ3 p2 }. (λ+2)/µ p0 0
a2
=
{ qµ2 −
Rt
−∞
+ [(λ + µ + 2)p21 −
1 7/2 a2 ) 1 5/2 + O(1/(n − 2 ) (n − 2 )
p0 (t) = [αt
Z
ψ 0 ht − αt0
ψψ 0 ht = 0
ψ 2 ht − αt R
(8)
R
ψ 0 f / ψ 2 f which
R
( ψ 0 f )2 p (0) = R 2 ψ f 00
(9)
One can continue along the same lines. One can differentiate p00 (t) once again. Somewhere on the right hand side appears α00 (t) however. By differentiating the implicit equation (8) once again, we can express α00 (t) by using f, ψ and their derivatives only. Then the result can be substituted where necessary to obtain a formula for p000 (t) etc. We give the final results for the derivatives of p at t = 0: R
(7)
R
R
6D2 σ 2 ψψ 0 f − 3Dσ 4 ψ 00 f − D3 ψ 3 f p (0) = σ6 000
R
R
00 . ψ 00 f + 6α0 α00 ψψ 0 f + p(4) (0) = α000 .D − 3α R 000 R R 0 0 2 3α ψR f − 6(α ) ψψ 00 f − 6(α0 )2 . (ψ 0 )2 f + 3(α0 )3 ψ 2 ψ 0 f + 3(α0 )2 D2 where
Z
Z
σ2 =
ψ 2 f, D =
ψ 0 f, α0 = α0 (0) =
Z
α00 = (4Dσ 2
Z
ψψ 0 f − σ 4
1 α = − 2 .[− σ
Z
ψht ]/
Z
Z
ψ 0 ht + αt0
Using α(0) = 0 we get α0 (0) = can be substituted to get:
Here a0 and a2 are computed by the above formulae for λ = 3, µ = 2, p0 = p00 (0)/2, p1 = p000 (0)/6, p2 = p(4) (0)/24, q0 = q 00 (0)/2, q1 = q 000 (0)/6, q2 = q (4) (0)/24. Evaluating the derivatives of the functions p(t) and q(t) involves heavy algebra but for our goals the computation is simplified by the fact that we only need their values at t = t0 = 0 and that α(0) = 0. Let us indicate the computation on an example involving some lower order derivatives. Using the shorthand notation we have: Z
R
Z
−
a0 1 q(t) exp[−(n− )p(t)]dt = 2Γ(1.5) 2 (n − 12 )3/2
+2Γ(2.5)
R
αt (ψR0 )2 ht ] ht + (αt ψ 0 ht )2 } Hence p00 (0) = α0 (0). ψ 0 f. To Rget α0 (0), we differentiate the implicit equation ψht = 0:
Let us start with the approximation of the numerator in (5). In this case x = n − 12 , µ = 2 and λ = 3, the minimal point of p(t) is t¯ = 0, p0 = p00 (0)/2, p1 = p000 (0)/6, p2 = p(4) (0)/24, q0 R= q 00 (0)/2, q1 = q 000 (0)/6, q2 = q (4) (0)/24 for 0∞ q(t) exp[−(n − 12 )p(t)]dt. When trying to integrate the same integrand from −∞ to 0 we can use a variable transformation t˜ = −t and then apply the same formulae for a0 , a1 , a2 but this time with p0 = p00 (0)/2, p1 = −p000 (0)/6, p2 = p(4) (0)/24, q0 = q 00 (0)/2, q1 = −q 000 (0)/6, q2 = q (4) (0)/24. Hence when approximating the integral along the whole real line by cancelling the series at s = 2, the result will be Z +∞
R
we have p0 (t) = αt ψ 0 ht / ht . Hence p0 (t¯) = 0 means α(t¯) = 0 and this, as already discussed, implies t¯ = 0 for the stationary point. Further differentiation leads to:R p00 (t) = R 0 R 00 1 0 R .{[αt ψ ht − αt ψ ht + αt αt0 ψψ 0 ht − ( h )2
is assumed to hold. The first coefficients a0 , a1 and a2 in the expansion (6) are as follows:
ht (x)dx.
Z
6α00
Since α(t) is a solution of the implicit equation
Z
ψ 00 f − D2
Z
000
Z
ψ f + 6α
4
0
ψ 00 ψf − Z
(ψ 0 )2 f − 9(α0 )2
Z
3α0 α00
ψht = 0
ψ 3 f )/σ 6 ,
Z 000
Z
ψψ 0 f + 6α0
D , σ2
Z
ψ 3 f + (α0 )3
ψ4f ]
ψ2ψ0f +
We proceed in a similar manner when finding the derivatives of the function q(t) and their numerical values at t = 0. Substituting in (7) we get finally: Z +∞ −∞
In order to derive an approximation for the quotient of the left hand side expressions in (10) and (12), we have to evaluate an expression of the form
√ 1 2πσ 2 q(t) exp[−(n − )p(t)]dt = 2 + 2 D (n − 12 )3/2
1 σ 2 /D2 + a∗ /(n − 1/2) + O(1/n2 ) . . n − 1/2 1 + a/(n − 1/2) + O(1/n2 )
√ Z Z 2π 8 000 2 6 {−4Dσ ψ f +12D σ (ψ 0 )2 f 4D6 σ 4 (n − 21 )5/2 Z
00
+12D σ
2 4
ψ ψf + 24D σ (
Z
ψψ 0 f )2 − 48Dσ 6 ×
Z
ψ f Z 2 4
+4D σ
0
3 4
3
3 2
ψ f
2
ψψ f − 6D σ Z
00
0
ψ ψf Z
ψ f − 2D σ
Z
Z 3
ψψ 0 f +
ψ f
Z 00
2
Taking into consideration the first summand 1 σ2 n−1/2 . D2 to approximate the left hand side, the solution of the minimization problem (5) will be ψ(x) = −f 0 (x)/f (x), that is the the MLE (see [3], Chapter 2). To get a more accurate approximation which takes into account the sample size n, we include the second summand:
Z
00
15σ (
σ2 1 1 1 σ2 ∗ . + .a) + O( 3 ) (13) (a − 1 D2 1 2 2 D n n− 2 (n − 2 )
Z
2 6
8
This equals
4 2
ψ f ) −0.5D σ
Z 4
4
ψ f +(5/6)D (
ψ 3 f )2
R
1 −(15/2)D4 σ 6 } + O(1/(n − )7/2 ) 2
R
(10)
When approximating the denominator, one has x = n − 21 , λ = 1, µ = 2, p0 = p00 (0)/2, p1 = p000 (0)/6, p2 = p(4) (0)/24, q0 = q˜(0), q1 = q˜0 (0), q2 = q˜00 (0)/2 and one obtains Z +∞ −∞
2Γ(1/2)
1 σ2 ∗ .a) (a − D2 (n − 21 )2
(14)
and try to find the ψ that minimizes the right hand side. The latter is
1 q˜(t) exp[−(n − )p(t)]dt = 2
R
1 σ2 σ 4 ψ 000 f 1 J(ψ) = { .[− + + D5 n − 21 D2 n − 21
a0 a2 + 1 1/2 + 2Γ(3/2) (n − 2 ) (n − 12 )3/2 1 O(1/(n − )5/2 ) 2
q(t) exp[−(n − 21 )p(t)]dt 1 σ2 ≈ . + q˜(t) exp[−(n − 21 )p(t)]dt n − 12 D2
R
R
R
3σ 2 (ψ 0 )2 f 6( ψψ 0 f )2 3σ 2 ψ 00 ψf + + D4 D4 D4 R
12σ 2 ψ 00 f − D5
(11)
where again a0 and a2 are computed by the same formulae using the new values for λ, q0 , q1 and q2 . We will only give the final result here: √ Z +∞ 1 2π q˜(t) exp[−(n − )p(t)]dt = + 2 (n − 12 )1/2 −∞
R
ψψ 0 f
R
2 ψ2 ψ0 f − + D2
R
R
ψ 00 f ψ 3 f D4
Z
15σ 4 3σ 2 00 2 ( ψ f ) − ]} (15) 4D6 2D2 To find the ψ that gives the minimum of the functional J(ψ) we could apply variation technique. ¯ for sufficiently We consider the value of J(ψ + tψ) small real t and a function ψ¯ which, like ψ, is supR ¯ posed to satisfy the condition ψf = 0 and conditions A1)−A6). Let us define the scalar product of the functions g1 (x) and g2 (x) as +
√ R R R R 2π 12 ψ 3 f ψψ 0 f 5( ψ 3 f )2 3 ψ 4 f {− + − 24 Dσ 4 σ6 σ4 R
12 ψ 2 ψ 0 f 1 1 5/2 + − 9} + O(1/(n − ) ) 1 2 Dσ 2 (n − 2 )3/2 (12)
< g1 (x), g2 (x) >= 5
Z +∞ −∞
g1 (s).g2 (s)f (s)ds.
R
R
¯
R
R
We consider the limit limt→0 J(ψ+tψ)−J(ψ) trying t to represent it as a scalar product
6 ψ 0 ψf 0 24 ψ 00 f ψψ 0 f 4σ 2 ψ 000 f − − + [− D5 D4 D5
¯ < Φ(x, ψ, ψ , ψ , ψ ), ψ(x) >
3 σ 4 f 000 (x) 15σ 2 ( ψ 00 f )2 − ]ψ(x) + . + D6 D2 D5 f (x)
0
00
R
000
R
for a suitable function Φ. Then the necessary condition for the minimum will be Φ(x, ψ, ψ 0 , ψ 00 , ψ 000 ) = 0 (Euler Equation). To put through this program we require some more regularity conditions on the tail behaviour of f and ψ¯ so that the following differentiation- under-theintegral and integration- by- parts formulae hold: Z
ψ¯0 f = −
Z
f 0 ψ¯
3σ 2 f 00 (x) 12σ 2 ψ 00 f . .ψ(x) + [ − D4 f (x) D5
R
R
R
¯ = 0) Because of ψ¯0 f 0 = (consequence of ψf R R ψ¯0 df = − ψ¯00 f, after differentiating (16) once again, one would hold: Z
ψ¯00 f =
Z
¯ 00 ψf
ψ(x) = −
(17)
ψ¯000 f = −
Z
¯ 000 ψf
(18)
For example, when computing the limit R R ¯ 2 f ]2 . R (ψ 000 + tψ¯000 )f [ (ψ + tψ) σ 4 . ψ 000 f R lim{ − } t→0 D5 (D + t ψ¯0 f )5
by using (16) and (18), R
we conclude that
the limit equals
. We proceed in a similar way with all terms which appear in the right hand side of (15). Adding the results, we get an expres¯ sion of the form n−1 1 < Φ(x), ψ(x) > from which 2
2 3If2
2ψ(x) 2σ 2 f 0 (x) 1 × + 3. + 2 D D f (x) n − 12 {[−
R
ψ 000 f
12σ 2
R
ψ 00 ψf
12σ 2
R
σ 2 f 0 (x) D2 . − .ξψ,f (x) D f (x) 2n − 1
1 1 f 0 (x) − {[ 2 . ψ1 (x) = − f (x) 2n − 1 If
the Euler Equation can be derived. It is:
5σ 4
(19)
(20)
where ξψ,f (x) contains some functionals and functions of ψ, f and their derivatives up to third order. It is hardly possible to find the exact solution of this equation because of the complicated form of ξψ,f (x). But as we are interested in the approximation up to order O(1/n), we can substitute ψ0 (x) = −f 0 (x)/f (x) in the right hand side of (20) as an initial guess and get a one- step improvement ψ1 of ψ0 by computing the resulting right hand side in R(20) for ψR = ψ0 . In this case we also have D = ψ00 f = ψ02 f = σ 2 = If (If denoting the Fisher information for the location family fϑ (x) = f (x − ϑ).) The final result for ψ1 is:
and after one more differentiation: Z
R
15σ 4 ψ 00 f f 00 (x) 2ψ 2 (x)f 0 (x) ψ3f + ]. + + D4 2D6 f (x) D3 f (x)
3 ψ 00 f 2 .ψ (x)} = 0 D4 The above equation is of the form
(16)
R
R
R
12σ 2 ψψ 0 f 12 ψψ 0 f f 0 (x) ] .ψ(x) + [− + D4 f (x) D5
5 . 4If2
(ψ 0 )2 f
+ + D6 D5 D5 R R R 2 00 0 0 2 60σ ψ f. ψψ f 6 ψ2ψ0f 24( ψψ f ) − − + D5 D6 D4 R 00 R 3 R 4 ψ f ψ f 45σ 4 ( ψ 00 f )2 3σ 2 f 0 (x) + + − 3 ]. + D5 2D7 D f (x) R
Z
Z
Z
(f 0 )4 7 + 3 .( 3 f 8If
Z
(f 00 )2 − f
(f 0 )3 2 f 0 (x) ) ]. + f2 f (x)
(f 0 )3 f 00 (x) 3 . − 2. f 2 f (x) 2If
Z
(f 0 )3 (f 0 (x))2 . 2 + f2 f (x)
1 f 000 (x) 3f 00 (x)f 0 (x) f 0 (x)3 [ − + 2 ]} If f (x) f 2 (x) f 3 (x)
(21)
Note that the improved function ψ1 satisfies the R side condition ψ1 f = 0 only ”up to order 6
R
R
0 3
1 O(1/n)”. The equality ψ1 f = If (2n−1) . (ff 2) holds in general. For large sample size this is not a great shortcoming. There is but one very important case when a considerable simplification occurs and when ψ1 satisfies the side condition exactly. This is the case of a symmetric density f. Then we get from (21):
f 0 (x) 1 1 ψ1 (x) = − − {[ f (x) 2n − 1 If2 2 − 2 3If +
Z
Z
k = 1, 2, . . . , m; Z
H(n) =
(f 0 )4 f 0 (x) ] f 3 f (x) (22)
REMARK 1. The correction terms to ψ0 not 0 (x) containing the factor ff (x) in (21) and (22), vanish if f is the normal density. This is in agreement with our expectations because in this case ϑ˜n = ϑˆn and ϑˆn can be obtained by putting ψ(x) = c.f 0 (x)/f (x), c 6= 0 being an arbitrary constant. If f is not the normal density, however, we could expect an improvement of ϑˆn towards ϑ˜n by using ψ1 as computed in (22). The magnitude of the improvement depends crucially on the behaviour of the function {
f 0 (x)3 f 0 (x) 00 f 000 (x) 3f 00 (x)f 0 (x) } = − + 2 . f (x) f (x) f 2 (x) f 3 (x)
The more this function varies with x, the more the effect of the improvement will be.
3
Realization of the second approach
The advantage of the second approach is that it can be applied without difficulty in the case m > 1. We try now to directly approximate the PE as given in (1). Introduce the functions
{det[nD2 pn (ϑˆn )]}1/2 × Z Rm \Bδ (ϑˆn )
n 1 X pn (t) = − . ln f (Xi − t); qk (t) = tk , q˜(t) = 1; n i=1
and Z
Z
Gk (n) =
Rm
q˜(t) exp(−npn (t))dt
where t = (t1 , t2 , . . . , tm ) and dt = dt1 dt2 . . . dtm . Then the k-th component of ϑ˜n (denoted as ϑ˜n(k) ) can be expressed by ϑ˜n(k) = Gk (n)/H(n). The functions Gk (n) and H(n) have the ”Laplace- like” form as before but the function pn (t) in the exponent explicitly depends on n. Therefore direct application of the multivariate Laplace expansion can not be applied. On the other hand an application of the law of large numbers to pn (t) shows that its dependence on n is negligible asymptotically and a formal application of Laplace’s method could be expected to produce satisfactory results. Such a heuristic strategy has been applied in Bayesian inference for a long time but first in [4] a rigorous justification of the approach has been given. The authors deal with the Rproblem of approximating integrals of the form Rm q(t) exp(−npn (t))dt where the pairs of functions {q(t), pn (t), n = 1, 2, . . .} satisfy the analytical assumptions of the Laplace’s method ([4]). Looking at the form of the integrands in our problem we can assert that for nice smooth densities f (x) these assumptions can be satisfied. We have to require that for f (x) there exist positive numbers ², M and η and an integer n0 such that n ≥ n0 implies with probability one : B1). The function pn (x) has all continuous partial derivatives up to 6th order and for all t ∈ B² (ϑˆn ) and all 1 ≤ j1 , . . . , jd ≤ m with 0 ≤ d ≤ 6 :| ∂j1 ...jd pn (t) |< M. B2) det(D2 pn (ϑˆn )) > η and the integrals Gk (n), k = 1, 2, . . . , m; H(n) exist and are finite for all n. B3) For all δ ∈ (0, ²) it holds:
(f 00 )2 f
1 f 000 (x) 3f 00 (x)f 0 (x) f 0 (x)3 [ − + 2 ]} If f (x) f 2 (x) f 3 (x)
Rm
qk (t) exp(−npn (t)dt,
Rm \Bδ (ϑˆn )
7
exp{−n[pn (t) − pn (ϑˆn )]}dt = O(n−2 ) {det[nD2 pn (ϑˆn )]}1/2 ×
tk exp{−n[pn (t)−pn (ϑˆn )]}dt = O(n−2 ),
4
k = 1, 2, . . .,m. Here ϑˆn is the MLE, B² (t) denotes the open ball of radius ² centered at t ∈ Rm , D2 pn (ϑˆn ) is the Hessian of pn evaluated at ϑˆn . Its (i, j)-th component will be written either as ∂ij pn (ϑˆn ) or as pij for short and the corresponding components of the inverse of the Hessian at the MLE will be denoted by pij ( we shall omit the index n in pn and its derivatives.) THEOREM. If the density f (x) satisfies conditions B1)−B3) then ϑ˜n(k) equals
In order to check the presence of the improvement effect in correction formulae (22) and (23) we did some simulations for the case of the logistic distribution. The ”standard form” of the exp(x) logistic density function is f (x) = [1+exp(x)] 2 . It is easy to evaluate the functionals of the density which are used in formula (22). It holds: R (f 00 )2 R 04 = 1/5; (ff 3) = 1/5; If = 1/3. The asympf totically optimal choice of ψ(x) is ψ0 (x) =th(x/2). Hence putting everything together in (22), we get:
m 1 X ϑˆn(k) − ∂ijk pn (ϑˆn ).µ4ijkl + Op (1/n2 ) (23) 6n i,j,k=1
ψ1 (x) = th(x/2) +
where µ4ijkl = pij .pkl + pik .pjl + pil .pjk . PROOF. The estimator ϑ˜n can be computed by the formula ϑ˜n(k) = Gk (n)/H(n), k = 1, 2, . . .,m.
(24)
as an one- step correction towards the ”optimal”ψfunction for small sample values of n when estimating the parameter ϑ in the location family exp(x−ϑ) fϑ (x) = f (x − ϑ) = [1+exp(x−ϑ)] 2 by a translation invariant M - estimator. We expect the M - estimator Tn , defined as a solution of the equation n X
ψ1 (Xi − Tn ) = 0,
(25)
i=1
to have a smaller MSE in comparison to the MSE of the MLE ϑˆn ( which is defined as a solution of Pn Xi −ϑˆn } = 0). i=1 th{ 2 REMARK 2. The form of the function ψ1 deserves a closer investigation. Let us bear in mind the behaviour of ch(x) for large x- values. Observe also that (up to a norming constant) ψ1 (x) is the influence function of the estimator Tn . The fact that for large x- values ψ1 is nearer to the Ox axis in comparison to ψ0 means that Tn is a sort of a shrinking estimator (it gives less weight to observations far away from the origin). The difference between Tn and usual shrinking estimators is, however, that Tn preserves the translation invariance property. We shall also note that the comparison of ψ1 and ψ0 for large values of the argument also indicates that in addition Tn is expected to exhibit better robustness properties than the Maximum Likelihood Estimator.
1 p000 (ϑˆn ) ϑ˜n ≈ ϑˆn − . n . 2n [p00n (ϑˆn )]2 Given the density f (x) we can evaluate the derivatives of pn (ϑˆn ) as follows: n 1 X (f 0 (xi − ϑˆn ))2 f 00 (xi − ϑˆn ) − }, p00n (ϑˆn ) = . { n i=1 f 2 (xi − ϑˆn ) f (xi − ϑˆn )
−
1 3 { th(x/2)− 2n − 1 5
3 th(x/2) . } 2 ch2 (x/2)
The denominator H(n) is the same for all k = 1, 2, . . .,m. The Laplace expansion of the functions Gk (n) and H(n) can be obtained by a straightforward application of Theorem 1 of [4]. Using only the leading terms in the numerator and in the denominator gives the coarse approximation ϑ˜n ≈ ϑˆn . We involve one more term by applying a+(1/n)b+O(1/n2 ) 1 1 the rule 1+(1/n)d+O(1/n 2 ) = a + n .(b-ad) + O( n2 ) and get (23). COROLLARY. In case of a one- dimensional location parameter (m = 1) the formula (23) simplifies further to yield:
ˆ p000 n (ϑn ) =
Some simulation results
n 1 X 2(f 0 (xi − ϑˆn ))3 . { − n i=1 f 3 (xi − ϑˆn )
3f 0 (xi − ϑˆn )f 00 (xi − ϑˆn ) f 000 (xi − ϑˆn ) + } f 2 (xi − ϑˆn ) f (xi − ϑˆn ) 8
of the statistics about -3 ,for N = 8000 about 3.5 and for N = 10000 about -4. These values vary, of course, in each simulation run. We also computed the PE in each simulation run ( using the NAG-routine D01BAF for integration in infinite intervals) and estimated the relative improvement of the MSE by computing the quotient of the corresponding estimated quantities for MSE(ϑˆn )-MSE(Tn ) and MSE(ϑˆn )-MSE(T˜n ) . For T , 3 MSE(ϑˆn )-MSE(ϑ˜n ) MSE(ϑˆn )-MSE(ϑ˜n ) the relative improvement was usually about 30% whereas for T˜3 it was about 35%.
REMARK 3. For small sample sizes such as n = 3, 4, 5, 6 the improvement is statistically significant. This is supported by our simulations. We used the NAG- Fortran library subroutine G05DCF to generate logistically distributed random variables. The parameter ϑ0 to be estimated was set to be equal to zero. (Note that the MSE is not be influenced by the specific choice of ϑ0 .) The quantities to compare were the estimated MSE’s for the estimators Tn and ϑˆn . The comparison was done using t- test for 2 related samples. For a fixed sample size n and given number of simulations N we computed the dif2 ferences Di = Tn(i) − ϑˆ2n(i) of the squares of the corresponding estimators we got in the i-th simulation round (i = 1, 2, . . . , N ). Then we computed PN 1 ¯ = 1 . PN Di and s2 = ¯ 2 D i=1 i=1 (Di − D) . N N (N −1) ¯ was used to measure the signifThe statistics D/s icance of the differences between the MSE’s of the estimators Tn and ϑˆn . Under the zero hypothesis of equal MSE’s this quotient is t- distributed with (N − 1) degrees of freedom. Since N is large,this virtually coincides with the standard normal distribution. Values of the quotient smaller than 1.96 indicate a significant improvement. This was observed in all simulations we did for n = 3, 4, 5, 6. Most significant were the results for n = 4. Significant values of the statistic arise as a rule at N about 4000. Afterwards the statistic rises gradually in absolute value. For N = 6000 we usually get values about -3.5 ,for N = 8000 about -4.2 and for N = 10000 about -5. Similar approach was undertaken to compare the estimator T˜n we get by using formula (23) to the MLE. It is easy to check that T˜n = ϑˆn − Pn [ i=1
ˆ n X th( xi −2ϑn )
1 1
ˆ x −ϑ ch2 ( i 2 n )
]2
[
i=1
ˆ
ch2 ( xi −2ϑn )
5
Conclusion
We suggested two types of improvement of the Maximum Likelihood estimator of the location parameter within the class of translation invariant Mestimators. These improvements behave well for small sample sizes (exactly the case of interest). It should be noted that the logistic distribution used in the test examples, is close to the normal distribution in shape and still, the improvement was visible. More significant improvements can be achieved for other distributions with heavier tails. References: [1] G. Easton, Compromise Maximum Likelihood Estimators for Location, Journal of the American Statistical Association, Vol. 416, 1991, pp. 10511064. [2] C. Field and E. Ronchetti, Small Sample Asymptotics, IMS Lecture Notes- Monograph Series,Vol. 13,Hayward,California, 1991. [3] F. Hampel, E. Ronchetti, P. Rousseeuw, W. Stahel, Robust Statistics. The Approach based on Influence Functions, Wiley, New York, 1986. [4] R. Kass, L. Tierney, J. Kadane, The validity of Posterior Expansions Based on Laplace’s Method. In: Bayesian and Likelihood Methods in Statistics and Econometrics. S. Geisser, S. Hodges, S. Press, A. Zellner (Editors), Elsevier, North-Holland, pp. 473- 488, 1990. [5] F. W. J. Olver, Introduction to Asymptotics and Special Functions, Academic Press, New York, 1974.
]
The quantities to compare were again the estimated MSE’s for the estimators T˜n and ϑˆn . The comparison was done like in the previous case and the results were comparable. Again most significant was the improvement for n = 4. Significant values of the statistic arise as a rule at N about 4000. For N = 6000 we usually get values 9