Sep 13, 2016 - (MLE) of a d-dimensional parameter and its asymptotic multivariate normal distribution. Our bounds apply in situations in which the MLE can be ...
Multivariate normal approximation of the maximum likelihood estimator via the delta method
arXiv:1609.03970v1 [math.ST] 13 Sep 2016
Andreas Anastasiou∗ and Robert E. Gaunt† September 2016
Abstract We use the delta method and Stein’s method to derive, under regularity conditions, explicit upper bounds for the distributional distance between the distribution of the maximum likelihood estimator (MLE) of a d-dimensional parameter and its asymptotic multivariate normal distribution. Our bounds apply in situations in which the MLE can be written as a function of a sum of i.i.d. t-dimensional random vectors. We apply our general bound to establish a bound for the multivariate normal approximation of the MLE of the normal distribution with unknown mean and variance.
Keywords: Multi-parameter maximum likelihood estimation; multivariate normal approximation; delta method; Stein’s method AMS 2010 Subject Classification: Primary 60F05; 62E17; 62F12
1
Introduction
The asymptotic normality of maximum likelihood estimators (MLEs), under regularity conditions, is one of the most well-known and fundamental results in mathematical statistics. In a very recent paper, [1] obtained explicit upper bounds on the distributional distance between the distribution of the MLE of a d-dimensional parameter and its asymptotic multivariate normal distribution. In the present paper, we use a different approach to derive such bounds in the special situation where the MLE can be written as a function of a sum of i.i.d. t-dimensional random vectors. In this setting, our bounds improve on (or are at least as good as) those of [1] in terms of sharpness, simplicity and strength of distributional distance. The process used to derive our bounds consists a combination of Stein’s method and the multivariate delta method. It is a natural generalisation of the approach used by [2] to obtain improved bounds – only in cases where the MLE can be expressed as a function of a sum of i.i.d. random variables – than those obtained by [3] for the asymptotic normality of the single parameter MLE. Let us now introduce the notation and setting of this paper. Let θ = (θ1 , . . . , θd ) denote the unknown parameter found in a statistical model. Let θ0 = (θ01 , . . . , θ0d ) be the true (still unknown) value and let Θ ⊂ Rd denote the parameter space. Suppose X = (X1 , . . . , Xn ) is a random sample of n i.i.d. t-dimensional random vectors with joint probability density (or mass) function f (x|θ), where x = (x1 , x2 , . . . , xn ). The likelihood function is L(θ; x) = f (x|θ), and its natural logarithm, called the loglikelihood function, is denoted by l(θ; x). The expected Fisher information matrix for one random vector will be denoted by I(θ). We shall assume that the MLE θˆn (X) of the parameter of interest θ exists and is unique. It should, however, be noted that uniqueness and existence of the MLE can not be taken for granted; see, for example, [4] for an example of non-uniqueness. Assumptions that ensure existence and uniqueness of the MLE are given in [9]. ∗
Department of Statistics, The London School of Economics and Political Science, Columbia House, Houghton Street, London, WC2A 2AE, UK † Department of Statistics, University of Oxford, 24-29 St. Giles’, Oxford, OX1 3LB, UK
1
In addition to assuming the existence and uniqueness of the MLE, we shall assume that the MLE takes the following form. Let q : Θ → Rd be such as all the entries of the function q have continuous partial derivatives with respect to θ and that q(θˆn (x)) =
n n ⊺ 1X 1X g1 (xi ), . . . , gd (xi ) g(xi ) = n n i=1
(1.1)
i=1
for some g : Rt → Rd . This setting is a natural generalisation of that of [2] to multiparameter MLEs. The representation (1.1) for the MLE allows us to approximate the MLE using a different approach to that given in [1], which results in a bound that improves on that obtained by [1]. As we shall see in Section 3, a simple and important example of an MLE of the form (1.1) is given by the normal distribution with 1 1 2 2 √ exp − 2σ2 (x − µ) , x ∈ R. unknown mean and variance: f (x|µ, σ ) = 2πσ2 In this paper, we derive general bounds for the distributional distance between an MLE of the form (1.1) and its limiting multivariate normal distribution. That is, in this setting, we give a quantitative result to compliment the following well-known and fundamental theorem (see [6] for a proof). Theorem 1.1. Let X1 , X2 , . . . , Xn be i.i.d. random vectors with probability density (mass) functions f (xi |θ), 1 ≤ i ≤ n, where θ ∈ Θ ⊂ Rd . Also let Z ∼ Nd (0, Id×d ), where 0 is the d × 1 zero vector and Id×d is the d × d identity matrix. Then, under regularity conditions, √
1
d
n[I(θ0 )] 2 (θˆn (X) − θ0 ) −−−→ Nd (0, Id×d ). n→∞
In Section 2, we derive a general upper bound on the distributional distance between the distribution of the MLE of a d-dimensional parameter and the limiting multivariate normal distribution for situations in which (1.1) is satisfied. In Section 3, we apply our general bound to obtain a bound for the multivariate normal approximation of the MLE of the normal distribution with unknown mean and variance.
2
The general bound
Let us begin by introducing some notation. We let Cbn (Rd ) denote the space of bounded
functions ∂ h and |h|2 := on Rd with bounded k-th order derivatives for k ≤ n. We abbreviate |h|1 := supi ∂x i
∂2
supi,j ∂xi ∂xj h , where k·k = k·k∞ is the supremum norm. For ease of presentation, we let the subscript (m) denote an index for which ˆ θn (x)(m) − θ0(m) is the largest among the d components: and also
(m) ∈ {1, 2, . . . , d} : ˆ θn (x)(m) − θ0(m) ≥ ˆ θn (x)j − θ0j , ∀j ∈ {1, 2, . . . , d} , Q(m) = Q(m) (X, θ0 ) := θˆn (X)(m) − θ0(m) .
(2.2)
The following theorem gives the main result of this paper. Theorem 2.1. Let X1 , X2 , . . . , Xn be i.i.d. Rt -valued random vectors with probability density (or mass) function f (xi |θ), for which the parameter space Θ is an open subset of Rd . Let Z ∼ Nd (0, Id×d ). Assume that the MLE θˆn (X) exists and is unique. Let q : Θ → Rd be twice differentiable and g : Rt → Rd such that the special structure (1.1) is satisfied. In addition, assume ∂ 2 that for any θ0 ∈ Θ there exists 0 < ǫ = ǫ(θ0 ) and constants Cjkl , ∀j, k, l ∈ {1, 2, . . . , d} such that ∂θk ∂θl qj (θ) ≤ Cjkl for all θ ∈ Θ such that θj − θ0j < ǫ, ∀j ∈ {1, 2, . . . , d}. Let ξij =
d X l=1
1
K− 2
kl
(gl (Xi ) − ql (θ0 )) ,
2
where K = [∇q(θ0 )] [I(θ0 )]−1 [∇q(θ0 )]⊺ .
(2.3)
Also, let Q(m) be as in (2.2). Then, for all h ∈ Cb2 (Rd ), h √ i 1 ˆ 2 − E[h(Z)] θ (X) − θ n [I(θ )] h E n 0 0
√ d n d n o X X X 2 khk π|h|2 ≤ √ E |ξij ξik ξil | + 2 |E [ξij ξik ]| E |ξil | + 2 2 E θˆn (X)j − θ0j 3/2 ǫ 4 2n i=1 j,k,l=1 j=1 d X |h|1 ∗ √ + A1j (θ0 , X) − A2j (θ0 , θ0 , X) Q(m) < ǫ , sup E n θ0∗ j=1
(2.4)
where, the supremum is taken over all θ0∗ between θˆn (x) and θ0 , and A1j (θ0 , X) = n
d X
1 2
−1
[I(θ0 )] [∇q(θ0 )]
k=1
A2j (θ0 , θ0∗ , X)
−K
− 21
jk
ˆ qk θn (X) − qk (θ0 ) ,
d X d d X ∂2 1 nX −1 ∗ ˆ ˆ 2 = θn (X)m − θ0m θn (X)l − θ0l [I(θ0 )] [∇q(θ0 )] qk (θ0 ) . 2 ∂θm ∂θl jk m=1 l=1
k=1
Remark 2.2. 1. For fixed d, it is clear that the first term is O(n−1/2 ). For the second term, using [5] P we know that the mean error (MSE) of the MLE is of order n−1 . This yields, for fixed d, squared d 2 −1 ˆ E = O(n ). To establish that (2.4) is order n−1/2 , it suffices to show that for j=1 (θn (X)j − θ0,j ) the third term the quantities E A1j (θ0 , X) Q(m) < ǫ and E A2j (θ0 , θ0∗ , X) Q(m) < ǫ are O(1). For A2j (θ0 , θ0∗ , X), we use the Cauchy-Schwarz inequality to get that r h r h h i 2 i 2 i ˆ ˆ ˆ E θˆn (X)l − θ0 E θn (X)m − θ0 ≤ E θn (X)m − θ0 . (2.5) θn (X)l − θ0 m
m
l
l
Since, from [5], squared error is of order n−1 , we apply (2.5) to the expression of A2j (θ 0 , θ0∗ , X) the mean to get that E A2j (θ0 , θ0∗ , X) Q(m) < ǫ = O(1). Whilst we strongly suspect that E A1j (θ0 , X) Q(m) < ǫ is O(1), we have been unable to establish this fact. In the univariate d = 1 case, this term vanishes, and this is also the case in our application in Section 3. √ 1 2. An upper bound on the quantity |E[h( n[I(θ0 )] 2 (θˆn (X) − θ0 ))] − E[h(Z)]| was given in [1]. A brief comparison of the bound in [1] and our bound follows. The bound in [1] is more general than our bound. It covers the case of independent but not necessarily identically distributed random vectors. In addition, the bound in [1] is more general in the sense that it is applicable whatever the form of the MLE is; our bound can be used only when (1.1) is satisfied. Furthermore, [1] gives bounds even for cases where the MLE cannot be expressed analytically. On the other hand, our bound is preferable when the MLE has a representation of the form (1.1). Our bound in (2.4) applies to a wider class of test functions h; our class is Cb2 (Rd ), as opposed to Cb3 (Rd ), and has a better dependence on the dimension d. Moreover, in situations where the MLE is already a sum of independent terms our bound is easier to use since if we apply q(x) = x to (2.13) yields R2 = 0 in (2.12). This means that our bound (2.4) simplifies to √ d n h √ o i 1 π|h|2 X X n ˆ 2 E |ξij ξik ξil |+2 |E [ξij ξik ]| E |ξil | . − E[h(Z)] ≤ √ n [I(θ0 )] θn (X) − θ0 E h 4 2n3/2 i=1 j,k,l=1
Another simplification of the general bound is given in Section 3, where we take X1 , X2 , . . . , Xn to be i.i.d. normal random variables with unknown mean and variance, which was also treated by [1]. Indeed, 3
the simpler terms in our bound lead to a much improved bound for this example that has a superior dependence on the parameters and numerical constants that are two orders of magnitude smaller (see Remark 3.2). ∂2 3. If for all k, m, l ∈ {1, 2, . . . , d}, ∂θm ∂θl qk (θ) is uniformly bounded in Θ, then we do not need to use ǫ or conditional expectations in deriving an upper bound. This leads to the following simpler bound: h √ i 1 − E[h(Z)] n [I(θ0 )] 2 θˆn (X) − θ0 E h X √ d n d o |h| π|h|2 X X n 1 ∗ ≤ √ A1j (θ0 , X) − A2j (θ0 , θ0 , X) . E|ξij ξik ξil | + 2|E[ξij ξik ]|E|ξil | + √ sup E n θ0∗ 4 2n3/2 i=1 j,k,l=1
j=1
(2.6)
To prove Theorem 2.1, we employ Stein’s method and the multivariate delta method. Our strategy consists of benefiting from the special form of q θˆn (X) , which is a sum of random vectors. It is here ˆ where multivariate delta method comes into play; instead of comparing θn (X) to Z ∼ Nd (0, Id×d ), ˆ we compare q θn (X) to Z and then find upper bounds on the distributional distance between the distribution of θˆn (X) and q θˆn (X) . As well as recalling the multivariate delta method, we state and prove two lemmas that will be used in the proof of Theorem 2.1. Theorem 2.3. (Multivariate delta method) Let the parameter θ ∈ Θ, where Θ ⊂ Rd . Furthermore, let Yn be a sequence of random vectors in Rd which, for Z ∼ Nd (0, Id×d ), satisfies √
1
d
n (Yn − θ) −−−→ Σ 2 Z, n→∞
where Σ is a symmetric, positive definite covariance matrix. Then, for any function b : Rd → Rd of which the Jacobian matrix ∇b(θ) ∈ Rd×d exists and is invertible, we have that for Λ := [∇b(θ)] Σ [∇b(θ)]⊺ , √
d
1
n (b (Yn ) − b(θ)) −−−→ Λ 2 Z. n→∞
Lemma 2.4. Let 1 n K− 2 X W = √ [g(Xi ) − q(θ0 )] , n
(2.7)
i=1
where K is as in (2.3). Then EW = 0 and Cov(W) = Id×d . Proof. By the multivariate central limit theorem, n
1 X d √ (g(Xi ) − Eg(X1 )) −−−→ MVN(0, Cov(g(X1 )), n→∞ n
(2.8)
i=1
and, by Theorem 2.3,
√ d n q θˆn (X) − q(θ0 ) −−−→ MVN(0, K). n→∞
(2.9)
P Since q(θˆn (X)) = n1 ni=1 g(Xi ) and X 1 , X 2 , . . . , X n are i.i.d. random vectors, it follows that Eq(θ0 ) = Eg(X1 ). Furthermore, K = Cov(g(X1 )) because the limiting multivariate normal distributions in (2.8) and (2.9) must have the same mean and covariance. It is now immediate that EW = 0, and a simple calculation using that K = Cov(g(X1 )) yields that Cov(W) = Id×d .
4
Lemma 2.5. Suppose that X1,j , . . . , Xn,j are independent for a fixed j, but that P the random variables 1 √ Xi,1 , . . . , Xi,d may be dependent for any fixed i. For j = 1, . . . , d, let Wj = n ni=1 Xij and denote W = (W1 , . . . , Wd )⊺ . Suppose that E [W] = 0 and Cov(W) = Id×d . Then, for all h ∈ Cb2 (Rd ), √ d n o π|h|2 X X n |E [h(W)] − E [h(Z)]| ≤ √ E |Xij Xik Xil | + 2 |E [Xij Xik ]| E |Xil | . 4 2n3/2 i=1 j,k,l=1
(2.10)
Proof. The following bound follows immediately from Lemma 2.1 of [8]:
n d n o X X ∂ 3 f (x) 1
E |Xij Xik Xil | + 2 |E [Xij Xik ]| E |Xil | , |E [h(W)] − E [h(Z)]| ≤ 3/2 ∂xi1 ∂xi2 ∂xi3 2n i=1 j,k,l=1
(2.11)
where f solves the so-called Stein equation ∇⊺ ∇f (w) − w⊺ ∇f (w) = h(w) − E [h(Z)] .
√ 3
√π |h|2 , which when applied to (2.11) From Proposition 2.1 of [7], we have the bound ∂xi∂∂xf (x) ≤
i2 ∂xi3 2 2 1 yields (2.10). Proof of Theorem 2.1. The triangle inequality yields h √ i 1 − E[h(Z)] ≤ R1 + R2 , n[I(θ0 )] 2 θˆn (X) − θ0 E h
(2.12)
where
h √ 1 R1 ≤ E h nK − 2 q h √ 1 n[I(θ0 )] 2 R2 ≤ E h
i θˆn (X) − q (θ0 ) − E[h(Z)] , i √ 1 nK − 2 q θˆn (X) − q (θ0 ) θˆn (X) − θ0 − h .
(2.13)
The rest of the proof focuses on bounding the terms R1 and R2 .
Step 1: Upper bound for R1 . For ease of presentation, we denote by W =
√
n K − 21 X 1 [g(Xi ) − q(θ0 )] , nK − 2 q(θˆn (X)) − q(θ0 ) = √ n i=1
as in (2.7). Thus, W = (W1 , W2 , . . . , Wd )⊺ with n
d
n
1 X 1 X X −1 Wj = √ K 2 jl (gl (Xi ) − ql (θ0 )) = √ ξij , n n i=1 l=1
i=1
∀j ∈ {1, 2, . . . , d} ,
P 1 where ξij = dl=1 K − 2 jl (gl (Xi ) − ql (θ0 )) are independent random variables, since the Xi are assumed to be independent. Therefore, Wj can be expressed as a sum of independent terms. This together with Lemma 2.4 ensures that the assumptions of Lemma 2.5 are satisfied. Therefore, (2.10) yields √ d n o π|h|2 X X n E|ξij ξik ξil | + 2|E[ξij ξik |E|ξil | . R1 = |E[h(W )] − E[h(Z)]| ≤ √ 4 2n3/2 i=1 j,k,l=1
5
Step 2: Upper bound for R2 . For ease of presentation, we let √ √ 1 1 n[I(θ0 )] 2 θˆn (X) − θ0 − h nK − 2 q θˆn (X) − q (θ0 ) , C1 := C1 (h, q, X, θ0 ) := h
and our aim is to find an upper bound for |E[C1 ]|. Let us denote by [A][j] the j th row of a matrix A. We begin by using a first order Taylor expansion to obtain √ √ 1 1 n[I(θ0 )] 2 θˆn (x) − θ0 nK − 2 q θˆn (X) − q (θ0 ) =h h d n √ o ∂ X √ 1 1 n[[I(θ0 )] 2 ][j] θˆn (x) − θ0 − n K − 2 [j] q θˆn (X) − q(θ0 ) + h(t(x)), ∂xj j=1
√ √ 1 1 where t(x) is between n[I(θ0 )] 2 θˆn (x)−θ0 and nK − 2 q θˆn (X) −q (θ0 ) . Rearranging the terms gives d i X √ h √ 1 − 12 ˆ ˆ . 2 n [I(θ0 )] |C1 | ≤ |h|1 θn (x) − θ0 − n K q θ (X) − q(θ ) (2.14) n 0 [j] [j] j=1
For θ0∗ between θˆn (x) and θ0 , a second order Taylor expansion of qj θˆn (x) about θ0 gives that, for all j ∈ {1, 2, . . . , d}, d ∂ X qj (θ0 ) qj θˆn (x) = qj (θ0 ) + θˆn (x)k − θ0k ∂θk
+
1 2
k=1 d d XX k=1 l=1
θˆn (x)k − θ0k
θˆn (x)l − θ0l
and after a small rearrangement of the terms we get that
∂2 qj θ0∗ ∂θk ∂θl
d d ∂2 1 XX ˆ qj (θ0∗ ). θn (x)k − θ0k θˆn (x)l − θ0l [∇q(θ0 )][j] θˆn (x) − θ0 = qj θˆn (x) − qj (θ0 ) − 2 ∂θk ∂θl k=1 l=1
Since this holds for all j ∈ {1, 2, . . . , d}, we have that
d d ∂2 1 XX ˆ ∇q(θ0 ) θˆn (x) − θ0 = q θˆn (X) − q(θ0 ) − q(θ0∗ ), θn (x)k − θ0k θˆn (x)l − θ0l 2 ∂θk ∂θl k=1 l=1
which then leads to √ √ 1 1 n[I(θ0 )] 2 θˆn (X) − θ0 = n[I(θ0 )] 2 [∇q(θ0 )]−1 q θˆn (X) − q(θ0 ) √ d d X ∂2 X 1 n − [I(θ0 )] 2 [∇q(θ0 )]−1 q(θ0∗ ). θˆn (x)k − θ0k θˆn (x)l − θ0l 2 ∂θk ∂θl k=1 l=1
Subtracting the quantity
√
1
nK − 2
q θˆn (X) − q (θ0 ) from both sides of (2.15) now gives
√ 1 1 n[I(θ0 )] 2 θˆn (X) − θ0 − nK − 2 q θˆn (X) − q(θ0 ) √ 1 1 = n [I(θ0 )] 2 [∇q(θ0 )]−1 − K − 2 q θˆn (X) − q(θ0 ) √ d d X ∂2 X 1 n [I(θ0 )] 2 [∇q(θ0 )]−1 q(θ0∗ ). θˆn (x)k − θ0k θˆn (x)l − θ0l − 2 ∂θk ∂θl
√
k=1 l=1
6
(2.15)
Componentwise we have that i √ √ h 1 1 θˆn (X) − θ0 − n K − 2 [j] q θˆn (X) − q(θ0 ) n [I(θ0 )] 2 [j] h i −1 √ 1 −1 ˆn (X) − q(θ0 ) 2 2 − K q θ = n [I(θ0 )] [∇q(θ0 )] [j] [j]
√ h d X d i X ∂2 1 n [I(θ0 )] 2 [∇q(θ0 )]−1 q(θ0∗ ) − θˆn (x)k − θ0k θˆn (x)l − θ0l 2 ∂θk ∂θl [j] k=1 l=1 d X √ 1 1 −1 − ˆ qk θn (X) − qk (θ0 ) [I(θ0 )] 2 [∇q(θ0 )] − K 2 = n jk
k=1
√ X d 1 n − [I(θ0 )] 2 [∇q(θ0 )]−1 2 jk k=1
d X d X × θˆn (x)m − θ0m θˆn (x)l − θ0l m=1 l=1
qk (θ0∗ ) . ∂θm ∂θl ∂2
(2.16)
2 We need to take into account that ∂θ∂k ∂θl qj (θ) is not uniformly bounded in θ. However, it is assumed constants Cjkl , ∀j, k, l ∈ {1, 2, . . . , d}, such that that2 for any θ0 ∈ Θ there exists 0 < ǫ = ǫ(θ0 ) and ∂ ∂θk ∂θl qj (θ) ≤ Cjkl for all θ ∈ Θ such that θj − θ0j < ǫ, ∀j ∈ {1, 2, . . . , d}. Therefore, with ǫ > 0 and Q(m) as in (2.2), the law of total expectation yields R2 = |E[C1 ]| ≤ E|C1 | = E |C1 | Q(m) ≥ ǫ P Q(m) ≥ ǫ + E |C1 | Q(m) < ǫ P Q(m) < ǫ . (2.17) The Markov inequality applied to P Q(m) ≥ ǫ yields d 2 X P Q(m) ≥ ǫ = P θˆn (X)(m) − θ0(m) ≥ ǫ ≤ P θˆn (X)j − θ0j ≥ ǫ2 j=1
d X 2 1 θˆn (X)j − θ0j . ≤ 2E ǫ
(2.18)
j=1
In addition, P (|Q(m) | < ǫ) ≤ 1, which is a reasonable bound because the consistency of the MLE, a result of (R.C.1)-(R.C.4), ensures that ˆ θn (x)(m) − θ0(m) should be small and therefore P (|Q(m) | < ǫ) = P (|θˆn (x)(m) − θ0(m) | < ǫ) ≈ 1. This result along with (2.18) applied to (2.17) gives d X 2 khk θˆn (X)j − θ0j + E |C1 | Q(m) < ǫ . (2.19) R2 ≤ 2 2 E ǫ j=1
For an upper bound on E |C1 | Q(m) < ǫ , using (2.14) and (2.16) yields X d d X 2 khk |h|1 R2 ≤ 2 2 E θˆn (X)j − θ0j + √ sup E A1j (θ0 , X) − A2j (θ0 , θ0∗ , X) Q(m) < ǫ , ǫ n θ0∗ j=1
j=1
with A1j (θ0 , X) and A2j (θ0 , θ0∗ , X) given as in the statement of the theorem. Summing our bounds for R1 and R2 gives the assertion of the theorem. 7
3
Normally distributed random variables
In this section, we illustrate the application of the general bound of Theorem 2.1 by considering the important case that X1 , X2 , . . . , Xn are i.i.d. N(µ, σ 2 ) random variables with θ0 = (µ, σ 2 ). Here the density function is 1 1 2 exp − 2 (x − µ) , x ∈ R. (3.20) f (x|θ) = √ 2σ 2πσ 2 ¯ 1 Pn Xi − X ¯ 2 ⊺; It is well-known that in this case the MLE exists, is unique and equal to θˆn (X) = X, i=1 n see for example [6, p.118]. As well shall see in the proof of the following corollary, the MLE has a representation of the form (1.1). Thus, Theorem 2.1 can be applied to derive the following bound. Corollary 3.1. Let X1 , X2 , . . . , Xn be i.i.d. N µ, σ 2 random variables and let Z ∼ N2 (0, I2×2 ). Then, for all h ∈ Cb2 (R2 ), h √ 7|h| i 1 |h|1 2 − E[h(Z)] ≤ √ + √ . n [I(θ0 )] 2 θˆn (X) − θ0 E h n 2n
(3.21)
Remark 3.2. An upper bound for the distributional distance between the MLE of the normal distribution with unknown mean and variance and its limiting multivariate normal distribution was also given in Corollary 2.3 of [1]. Our bound represents a significant improvement in terms of the constants. Even though both bounds are of order n−1/2 , our numerical constants are two orders of magnitude smaller. Furthermore, our test functions h are less restrictive, coming from the class Cb2 (R2 ) rather than the class Cb3 (R2 ). Finally, the bound in (3.21) does not depend on the parameter θ0 = (µ, σ 2 ), while the bound given in [1] depends on σ 2 and blows up in the limits σ 2 → 0 and σ 2 → ∞. Proof. From the general approach in the previous section, we want to have q : Θ → R2 , such that n 1X g1 (xi ) ˆ . q θn (x) = g2 (xi ) n i=1
We take
⊺ q(θ) = q(θ1 , θ2 ) = θ1 , θ2 + (θ1 − µ)2 ,
which then yields
n
X ¯ 2 ¯ 1 (Xi − X) q(θˆn (X)) = q X, n i=1
=
!⊺
n
=
n n 1X 1X Xi , (Xi − µ)2 n n i=1
i=1
and therefore g1 (xi ) = xi ,
X ¯ 2+ X ¯ −µ 2 ¯ 1 (Xi − X) X, n i=1
!⊺
,
g2 (xi ) = (xi − µ)2 ,
and thus the MLE has a representation of the form (1.1). We now note that the Jacobian matrix of q(θ) evaluated at θ0 = (µ, σ 2 ) is 1 0 . ∇q(θ0 ) = 0 1
(3.22)
!⊺ (3.23)
(3.24)
(3.25)
∂2 Furthermore, it is easily deduced from (3.22) that ∀k, m, l ∈ {1, 2}, ∂θm ∂θl qk (θ) ≤ 2. Hence, using Remark 2.2 we conclude that the general bound is as in (2.6) and no positive constant ǫ is necessary. 8
Now, for l(θ0 ; X) denoting the log-likelihood function, we obtain from (3.20) that the expected Fisher information matrix is 1 0 2 ⊺ σ . (3.26) I(θ0 ) = −E [∇ ∇ (l(θ0 ; X))] = 0 2σ1 4 The results in (3.25) and (3.26) yield K = [∇q(θ0 )] [I(θ0 )]
−1
⊺
−1
[∇q(θ0 )] = [I(θ0 )]
=
σ2 0 . 0 2σ 4
(3.27)
Using (2.7), (3.23), (3.24) and (3.27), we get that 1
n
1 σ
1 K− 2 X [g(Xi ) − q(θ0 )] = √ W = √ n n i=1 Pn Xi −µ ! =
√
i=1 σ n (Xi −µ)2 −σ2 √ i=1 2nσ2
0
0 √1 2σ2
!
n X i=1
Xi − µ (Xi − µ)2 − σ 2
,
Pn
and therefore, for i ∈ {1, . . . , n}, ξi1 =
Xi − µ , σ
ξi2 =
(Xi − µ)2 − σ 2 √ . 2σ 2
For the first term of the general upper bound in (2.6), we are required to bound certain expectations involving ξij . We first note that
Xi −µ d = σ
E|ξi1 | = E|Z| =
r
Z ∼ N(0, 1). We have, for all i = 1, . . . , n,
2 , π
2 Eξi1
2
= EZ = 1,
3
3
r
E|ξi1 | = E|Z| = 2
2 π
and, by H¨older’s inequality, E|ξi2 | = 2 Eξi2 =
E |ξi2 |3 = =
2 1 1 √ E Z 2 − 1 ≤ √ (E Z 2 − 1 )1/2 = 1, 2 2 1 2 E Z 2 − 1 = 1, 2 2 3/4 3/4 1 1 Z − 1 3 ≤ 1 E Z 2 − 1 4 E = 3/2 EZ 8 − 4EZ 6 + 6EZ 4 − 4EZ 2 + 1 3/2 3/2 2 2 2 1 3/4 3/4 (105 − 60 + 18 − 4 + 1) = 15 . 23/2
Therefore, 2 n X X i=1 j,k,l
E|ξij ξik ξil | = ≤
and
2 n X X
i=1 j,k,l=1
n X
2 2 (E|ξi1 |3 + 3Eξi1 E|ξi2 | + 3E|ξi1 |Eξi2 + E|ξi2 |3 )
i=1 n X i=1
r
2
|Eξij ξik |E|ξil | =
r 2 2 +3+3 + 153/4 < 14.612n, π π
(3.28)
2 n X X
(3.29)
i=1 j,l=1
9
2 Eξij E|ξil |
r 2 . = 2n 1 + π
Applying the results of (3.28) and (3.29) to the first term of the general bound in (2.6) yields r √ √ 2 n o π|h|2 X X n π|h|2 2 √ E|ξij ξik ξil | + 2|Eξij ξik |E|ξil | ≤ √ 4n 1 + + 14.612n π 4 2n3/2 i=1 j,k,l=1 4 2n3/2 =
6.833|h|2 7|h|2 √ ≤ √ . n n
(3.30)
For the second term, we need to calculate A1j (θ0 , X) and A2j (θ0 , θ0∗ , X) as in Theorem 2.1. The results of (3.25) and (3.27) yield to 1
1
[I(θ0 )] 2 [∇(q(θ0 ))]−1 − K − 2 = 02×2 , where 02×2 denotes the 2 by 2 zero matrix. Therefore A1j (θ0 , X) = 0 for j = 1, 2. In order to calculate A2j (θ0 , θ0∗ , X), we note that ∂2 q1 (θ) = 0, ∀m, l ∈ {1, 2} , ∂θm ∂θl ∂2 ∂2 ∂2 q2 (θ) = q2 (θ) = 2 q2 (θ) = 0, ∂θ1 ∂θ2 ∂θ2 ∂θ1 ∂θ2 ∂2 q2 (θ) = 2. ∂θ12 Using this result as well as (3.25) and (3.26), we get that A21 (θ0 , θ0∗ , X) = 0, n ¯ − µ 2 = √n ¯ −µ 2. 2 X X A22 (θ0 , θ0∗ , X) = √ 2 2σ 2 2σ 2
Therefore the last term of the general upper bound in (2.6) becomes X 2 |h|1 |h|1 ∗ √ E A1j (θ0 , X) − A2j (θ0 , θ0 , X) = √ E A22 (θ0 , θ0∗ , X) n n j=1 √ h i |h|1 n ¯ −µ 2 = √ |h|1 E X . =√ 2 2σ 2n
(3.31)
Finally, the bounds in (3.30) and (3.31) are applied to (2.6) to obtain the assertion of the corollary.
Acknowledgements This research occurred whilst AA was studying for a DPhil at the University of Oxford, supported by a Teaching Assistantship Bursary from the Department of Statistics, University of Oxford, and the EPSRC grant EP/K503113/1. AA is currently funded by EPSRC Fellowship EP/L014246/1. RG is supported by EPSRC grant EP/K032402/1.
References [1] Anastasiou, A. Assessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous data. arXiv:1510.03679, 2015. [2] Anastasiou, A. and Ley, C. Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method. arXiv:1508.04948, 2015. 10
[3] Anastasiou, A. and Reinert, G. Bounds for the normal approximation of the maximum likelihood estimator. To appear in Bernoulli, 2016+. [4] Billingsley, P. Statistical Methods in Markov Chains. Ann. Math. Stat. 32 (1961), 12–40. [5] Cox, D. R. and Snell, E. J. A General Definition of Residuals. J. Roy. Stat. Soc. B Met. 30 (1968), 248–275. [6] Davison A. C. Statistical Models (First ed.). Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2008. [7] Gaunt, R. E. Rates of Convergence in Normal Approximation Under Moment Conditions Via New Bounds on Solutions of the Stein Equation. J. Theoret. Probab. 29 (2016), 231-247. [8] Gaunt, R. E. and Reinert, G. The rate of convergence of some asymptotically chi-square distributed statistics by Stein’s method. arXiv:1603:01889, 2016. [9] Makelainen, T., Schmidt, T. K. and Styan, G. P. H. On the existence and uniqueness of the maximum likelihood estimate of a vector-valued parameter in fixed size samples. Ann. Stat. 9 (1981), 758–767.
11