Computation of the Finite Sample Risk of M-Estimators on ... - CiteSeerX

2 downloads 0 Views 913KB Size Report
Mar 9, 2005 - section 3, we define our M-estimators which arise as minimax solutions under the .... The risk of an M-estimator to IC ψ is taken as the function.
Computation of the Finite Sample Risk of M-Estimators on Neighborhoods Peter Ruckdeschel∗ Matthias Kohl† Mathematisches Institut Universit¨at Bayreuth D-95440 Bayreuth Germany e-Mail: [email protected] 9th March 2005

Abstract We compare various approaches for the determination of finite sample risks of one-dimensional location M-estimators on convex-contamination- and total-variation-type neighborhoods. As risks we consider MSE and certain over-/ undershooting probabilities like in Huber (1968) and Rieder (1980). These methods include (numerically) exact formulae (via FFT, Kohl et al. (2005)), Edgeworth expansions, saddlepoint approximations, an approach by Fraiman et al. (2001), first-, second- and third order asymptotics as well as simulations.

Contents 1 Motivation and outline

2

2 Setup

3

3 Minimax-Estimators

5

4 Computation of the finite-sample risk for M-estimators

8

5 Higher order approximations of densities/cdf ’s

16

6 Numerical results

17

∗ .. † ..

1

2

Finite Sample Risk

7 Application: comparison of different “optimal procedures”

22

8 Acknowledgement

24

1 1.1

Motivation and outline Motivation

On the one hand, finite sample theory of estimation is bound to very restricted settings, on the other hand, asymptotics, being available quite generally, assumes sample size to tend to infinity which is hardly ever justified in practice. The standard way out in the situation with only a small sample available is simulation and bootstrap. Another approach is approximating the distribution function of corresponding estimators more accurately by Edgeworth expansions, saddlepoint techniques (also probably refined...). W.r.t. the hard- and software available these days, computation of exact finite sample risks seems to be a reasonable and straightforward idea. With the upcoming standard of R (confer R Development Core Team (2005)) also it is worth the effort of implementing not only single cases but also writing quite general code using OOP. This is pursued to some extent in our package distr available on CRAN, compare Ruckdeschel et al. (2005). Our algorithms presented in this paper are realized in this setup, too. As reference setup we use the probably best investigated statistical model, that is, one-dimensional location. We consider finite sample risks evaluated uniformly on convex-contamination/total-variation neighborhoods. As risks we use MSE and over-/undershooting probabilities like in Huber (1968). These latter provide an exact finite sample minimax theory. For MSE, however, only an asymptotic optimality theory is available (cf. Rieder (1994)). As a consequence, the optimal M-estimators we consider will be optimal either in a finite-sample setting or asymptotically. More recently, the present authors have been able to improve these asymptotic minimax estimators by providing finite sample corrections, confer Ruckdeschel (2004a) and Kohl (2005). Our goal in this paper is to provide crosschecks in form of good computational approximations to the exact finite sample risks in order to objectively assess the quality of these (and other) estimators.

1.2

Outline of the paper

In section 2, we present the setup in which the M-estimators are defined, that is ideal model, deviations from it, and risks by which we judge the estimators. In section 3, we define our M-estimators which arise as minimax solutions under the considered risks. In the key section of this paper, section 4, numerical algorithms are presented to calculate the corresponding finite sample risks. For both MSE and over-/undershooting risk we propose two algorithms each: Algorithms B and C are computationally more demanding but yield more accurate results for small n , ( n ∼ 30 ). Algorithms A and D become increasingly exact for growing n while

P. Ruckdeschel, M. Kohl

3

at the same time they are much faster than the first ones. These algorithms are checked against analytically exact terms (which exist in but exceptional cases), against each other, against simulations and against other methods like Edgeworth expansions and saddlepoint approximations, which are introduced in section 5. A comparison of the numeric results is presented in section 6, where we also include plots of the finite-sample cdf of our M-estimators. Finally, in section 7 we apply our algorithms to judge the different M-estimators as to their finite-sample risks.

2 2.1

Setup Ideal model

We consider one-dimensional location, y =θ+u

(2.1)

with y the observed variable, u the error and θ ∈ R the location parameter. Given a fixed number of i.i.d. observations yi , i = 1, . . . , n ( n ∈ N ), the location parameter θ ∈ R is to be estimated. For simplicity, in the sequel, we work with Gaussian errors, that is F = N (0, 1)

(2.2)

However, all numeric results are obtainable correspondingly for arbitrary error distribution with finite Fisher-information of location and with log-concave densities.

2.2

Deviations from the ideal model

Following Rieder (1980), we introduce the following types of neighborhoods of Pθ as deviations from the ideal model. Definition 2.1 For given numbers ε, δ ∈ [0, 1) with ε + δ ∈ (0, 1) ,  Ucv (θ) = Ucv (θ; ε; δ) = Q ∈ M1 (B) Q(dy) ≥ (1 − ε)Pθ (dy) − δ

(2.3)

Remark 2.2 (a) These neighborhoods include contamination ( δ = 0 ) as well as total variation neighborhoods ( ε = 0 ). If δ = 0 , we write Uc (θ) = Uc (θ; ε) , and if ε = 0 , we write Uv (θ) = Uv (θ; δ) . In case of the MSE-risk introduced below, we write r for ε to be consistent with the literature. √ (b) In case of asymptotic estimators we have to replace ε and δ by εn = ε/ n and √ δn = δ/ n , respectively. (c) Due to the translation invariance of both ideal model and neighborhoods, we may reduce ourselves to the parameter value θ = 0 ; confer Rieder (1994, Section 7.2.3) where this is verified for the linear regression model. ////

As is shown in Ruckdeschel (2004a), if the loss function is unbounded, we have to thin out the original neighborhood. In case of the MSE-risk introduced below,

4

Finite Sample Risk

we limit ourselves to contamination neighborhoods. As usual, we may interpret Qn ∈ Uc (0, εn ) as the distribution of the random variable Y defined as Y := (1 − U )Y id + U Y di

(2.4)

for Y id , U , Y di stochastically independent, Y id ∼ N (0, 1) , U ∼ Bin(1, εn ) , and Y di ∼ P di for some arbitrary P di ∈ M1 . The balls Uc (ε; n) defined as n o Uc (ε; n) := Qnn | Qn ∈ Uc (0, εn ) (2.5) are then thinned out to the sets U˜c (ε; n) of X Ui < n/2 } Qn = L{[(1 − Ui )Yiid + Ui Yidi ]i

(2.6)

that is only those samples are retained where less than half the sample is contaminated; confer Ruckdeschel (2004a, Section 2; e.g. Proposition 2.1 and Theorem 3.4).

2.3

Considered risks

We consider two types of finite sample risks —(maximal) mean squared error (maxMSE) and the probability of over-/undershooting a certain bound. In the risk definitions we suppress the dependence √ on the size of the neighborhood and assume ε, δ ∈ [0, 1) respectively 0 < r < n given. 2.3.1

MSE

For the maximal MSE at sample size n , we obtain as risk of an estimator Sn maxMSE(Sn ) :=

sup ˜c (n) Qn ∈U

2.3.2

n EQ Sn2

(2.7)

Over-/undershooting probability

For the over-/undershooting finite sample risk at sample size n we define Risk (Sn ) = sup Riskθ (Sn )

(2.8)

θ∈R

with Riskθ (Sn ) =

sup Qθ ∈Ucv (θ)

 max Qnθ (Sn > θ + τ ), Qnθ (Sn < θ − τ )

(2.9)

for a given constant τ ∈ (0, ∞) . Remark 2.3 Just as noted in Remark 2.2 (b), for an asymptotic version of this risk, √ we replace τ by some τn = τ / n ; also the over-/undershooting bounds are allowed asymmetric, i.e. we have two bounds τn0 , τn00 of sum 2τn . For details confer Rieder (1980).

P. Ruckdeschel, M. Kohl

3

5

Minimax-Estimators

For both risks considered here, the minimax estimator may be realized as an Mestimator to monotone scores ψ , that is, following the notation in Huber (1981, pp. 46), for ψt (x) = ψ(x − t) , let o n X Sn0 := sup t | ψt (xi ) > 0 ,

Sn00 := inf

n

t|

i≤n

X

o ψt (xi ) < 0

(3.1)

i≤n

In the Gaussian location model, for all types of risks and neighborhoods considered here, the minimax scores are of Huber form   c (c) (3.2) ψ (u) = u min 1, |u| and the minimax estimator is an equal randomization between Sn0 and Sn00 .

3.1 3.1.1

MSE Risk Higher Order Expansion for M-Estimators

In the location model from Section 2 —with F not necessarily Gaussian—, in Ruckdeschel (2004a), we obtain an expansion for maxMSE(Sn ) for M-estimators to monotone scores of form maxMSE(Sn ) = r2 b2 + v0 2 +

√r n

A1 +

1 n

A2 + o(n−1 )

(3.3)

where b is the sup -norm and v0 the L2 (P ) -norm of the corresponding influence curve and A1 and A2 are certain constants (in n ) depending on r and√ the influence curve. The expansion up to o(1) will be called first-order, up to o(1/ n ) secondorder and up to o(1/n) third-order asymptotics. In Ruckdeschel (2004a), you find explicit expressions for A1 and A2 which in the Gaussian case read A1

=

A2

=

v0 2 + b2 (1 + 2r 2 ) 2 3

  ρ1 v0 3 + v0 4 (3 v˜2 + l3 ) + [ v0 2 (3 v˜2 + 2 l3 )b2 + 1 + 5 b2 ] r2 +   + 13 l3 b4 + 3 b2 r 4

(3.4)

(3.5)

for Ac = (2Φ(c) − 1)−1 ,

b = Ac c,

l3 = 2cϕ(c)/(2Φ(c) − 1), ρ1 =

v˜2 =

v02 = 2b2 (1 − Φ(c)) + Ac (1 − 2bϕ(c)) 4Φ(c)2

6Φ(c) − − 2 − 2cϕ(c) 2c2 (1 − Φ(c)) + 2Φ(c) − 1 − 2cϕ(c)

3A3c (1 − 2Φ(c) + 2cϕ(c)) + 3v0−1 v03

(3.6) (3.7) (3.8)

for Φ and ϕ the cumulative distribution function (cdf) of N (0, 1) and its density, respectively.

6

3.1.2

Finite Sample Risk

Asymptotically optimal c

Given expansion 3.3 we may, depending on r , determine first-, second-, and thirdorder optimal clipping bounds c . For first-order asymptotics, cfo is determined such that r2 cfo = 2 E(u − cfo )+ = 2(ϕ(cfo )) − cfo Φ(−cfo )) (3.9) For second-order asymptotics, cso is determined such that  r2 cso 1 +

r2 + 1  √ = 2(ϕ(cso )) − cso Φ(−cso )) r2 + r n

(3.10)

These results may be read off from Rieder (1994, Theorem 5.5.7), and Ruckdeschel (2004b, Corollary 2.2), respectively. For third-order asymptotics, we determine cto by numerical optimization such that the corresponding numerical value of the third-order asymptotic MSE is minimized. By means of the implicit function theorem, one shows that   1 r3 + r 1 cso = cfo 1 − √ 2 + o( √ ) n r + 2Φ(cfo ) n

(3.11)

confer Ruckdeschel (2004b, equation (2.20)). Similarly, we conjecture that cto = cso + O(1/n)

(3.12)

Additionally, we also determine co by numerical optimization such that the corresponding numerical value of the (finite-sample) MSE itself is minimized. 3.1.3

Approach of Fraiman et al. (2001)

In Fraiman et al. (2001), the authors work in a similar setup, that is, the onedimensional location problem, where the center distribution is F = N (0, σ 2 ) and an M-estimator Sn to skew symmetric scores ψ is searched which minimizes the maximal risk on a neighborhood about F . Contrary to our approach, the authors work with convex contamination neighborhoods V = V(F, ε) to a fixed radius ε . They propose risks constructed by means of a function g : R × R+ → R+ of asymptotic bias b(G, ψ) and asymptotic variance v 2 (G, ψ) where R √ b(G, ψ)/ n := B = {β | (1 − ε) ψβ dG + εb = 0} (3.13) R 2 (1 − ε) ψB dG + εb2 v 2 (G, ψ) := (3.14) R (1 − ε)2 ( ψ˙ B dG)2 The risk of an M-estimator to IC ψ is taken as the function Lg (ψ) = sup g(b(G, ψ), v(G, ψ)/n)

(3.15)

G∈V

A mean squared error-type risk is given by g(u, v) = u2 + v . It is not quite the MSE, as it employs the asymptotic terms b(·) , v(·) . Differently to the scores of

P. Ruckdeschel, M. Kohl

7

type (3.2), the solutions to this problem are of form ψa,b,c,t (x) ψ˜a,b,t (x)

 c = ψ˜a,b,t x min{1, |x| } ,

(3.16)

= a tanh(tx) + b[x − t tanh(tx)]

(3.17)

but the solutions optimal in this approach are numerically quite close (up to 10−5 ) to corresponding Huber-scores ψ (c) . To be able to compare their procedure to √ ours, for fixed sample size n , we translate their radius ε into our r/ n , and determine the optimal ψˆa,b,c,t (x) and the corresponding cFYZ for the Huber-scores ψ (c) “next” to ψˆa,b,c,t (x) . Anyway, ψˆa,b,c,t (x) and ψ (c) are practically not distinguishable; confer simplification 2 on p. 206 of Fraiman et al. (2001).

3.2 3.2.1

Over-/Undershooting Risk Contamination Case

Finite Sample Setup: unique solution to

For the minimax solution as in Huber (1968), cfic is the

εn = exp(−2cfic τn )Φ(τn − cfic ) − Φ(−τn − cfic ) 1 − εn

(3.18)

Asymptotic Setup: In an asymptotic setup corresponding to this over-/undershooting risk, confer Rieder (1980), cas c is found as the unique solution to   as as ε = 2τ ϕ(cas (3.19) c ) − cc Φ(−cc ) The derivation of this result is to be found in Kohl (2005, Section 10.3). In addition, in Lemma 11.1.3 (ibid.), the following relations between finite-sample and asymptotic results are derived Lemma 3.1 For εn ∈ (0, 1) , τn ∈ (0, ∞) ( n ∈ N fixed) it holds, −1/2 cfic = cas ) c + O(n

(3.20)

and the O(n−1/2 ) -corrected optimal asymptotic clipping bound is 1 ε(ε + cas c τ) as √ cas.c := c − c c n 2τ Φ(−cas c )

(3.21)

where cfic = cas.c + O(n−1 ) c 3.2.2

(3.22)

Total Variation Case

Finite Sample Setup: For the minimax solution as in Huber (1968), cfiv is the unique solution to   1 + exp(−2cfiv τn ) δn = exp(−2cfiv τn )Φ(τn − cfiv ) − Φ(−τn − cfiv ) (3.23)

8

Finite Sample Risk

Asymptotic Setup:

cas v is the unique solution to   as as δ = τ ϕ(cas v ) − cv Φ(−cv )

(3.24)

The derivation of this result is to be found in Kohl (2005, Section 10.3). In addition, in Lemma 11.2.3 (ibid.), the following relations between finite-sample and asymptotic results are derived Lemma 3.2 For δn ∈ (0, 1) , τn ∈ (0, ∞) ( n ∈ N fixed) it holds, fi −1 cas ) v = cv + O(n

(3.25)

and the O(n−1 ) -corrected asymptotic optimal clipping bound is cas.c = cas v v −

 2  δ + τ ϕ(cas 1 τ 2 cas v ) v n 6Φ(−cas v )

(3.26)

where cfiv = cas.c + O(n−3/2 ) v

(3.27)

Remark 3.3 Lemma 3.1 and Lemma 3.2 show that there is a clear difference between contamination and total variation neighborhoods concerning the speed of convergence of the optimal clipping bounds ( O(n−1/2 ) vs. O(n−1 ) ). We think this is caused by the higher symmetry of the total variation neighborhoods, which also shows up by calculating //// the Edgeworth expansions; confer Remark 5.1. Remark 3.4 The assertion that the minimax estimator may be realized as an M-estimator to scores of form (3.2) is drawn from several references: For risk (2.8), this is shown in Huber (1968); in a corresponding asymptotic setup the same class of estimators arises in Rieder (1980). There is also an extension to (univariate) linear regression, confer Rieder (1989, 1995) and Kohl (2005). For the first order asymptotic maximal MSE the assertion is shown in Rieder (1994, Theorem 5.5.7) and in the construction part in section 6.2.1 (ibid.) For the second order asymptotic maximal MSE it is shown in in Ruckdeschel (2004a,b). For the third order asymptotic maximal MSE we do not know whether the assertion holds. ////

4

Computation of the finite-sample risk for Mestimators

In this section, we present algorithms for the computation of the finite-sample risk of the minimax M-Estimators of the preceding section. We present two algorithms —Algorithms A and B— for the computation of the finite-sample risk (2.8), and two algorithms —Algorithms C and D— for the maxMSE . Different checks for the accuracy of the algorithms are included. Since we consider contamination and total variation neighborhoods simultaneously in this section, the index ∗ substitutes the indices c and v (contamination/total variation neighborhoods).

P. Ruckdeschel, M. Kohl

4.1

9

Exact expressions

We fix n ∈ N and radius εn [δn ] ∈ [0, 1) ( ∗ = c ), [( ∗ = v )]. Given some clipping bound c ∈ (0, ∞) , we consider M-estimators S to scores of form (3.2) with equal randomization between the smallest and the largest solutions. In case of the over-/undershooting risk, we are interested in the finite-sample minimax estimator S˜∗fi , the asymptotic minimax estimator S˜∗as and the estimator based on the O(n−1/2 ) [ O(n−1 ) ]-corrected ( ∗ = c ) [( ∗ = v )] asymptotic optimal clipping bound. Suppressing the dependency on the radius ε resp. δ , the finite-sample over-/undershooting risk of our M-estimator S for given τn ∈ (0, ∞) reads,   n n Risk (S; ∗) = max sup Q−τn (S > 0), sup Qτn (S < 0) (4.1) Q−τn ∈U∗ (−τn )

Qτn ∈U∗ (τn )

For the finite sample MSE risk, we obtain Z maxMSE(Sn ) = n

sup ˜c (r;n) Qn ∈U

Sn2 dQn

(4.2)

which we evaluate for M-estimators S to scores of form (3.2) with c one of cfo , cso , cto , co , cFYZ . 4.1.1

Preparations

For any M-estimator that is based on a score function χθ (u) , which is measurable in u and monotone increasing in θ ∈ R , from strictly positive to strictly negative values, the following inclusions hold nP o n {S 0 > θ} ⊆ χ (y ) > 0 ⊆ {S 0 ≥ θ} (4.3) i=1 θ i nP o n 00 {S 00 > θ} ⊆ (4.4) i=1 χθ (yi ) ≥ 0 ⊆ {S ≥ θ} for any given y1 , . . . , yn —confer of Huber (1981, pp. 45). By equivariance, we have χθ (y) = χ0 (y − θ) . Therefore, we obtain for any such M-estimator S and any Q−τn ∈ U∗ (−τn ) , respectively Qτn ∈ U∗ (τn ) ,   Qn−τn (S > 0) = 12 Qn−τn (S 0 > 0) + Qn−τn (S 00 > 0) (4.5)  X  X  n n ≤ 12 Qn−τn χ0 (yi ) > 0 + Qn−τn χ0 (yi ) ≥ 0 (4.6) i=1



1 2

 n  Q−τn (S 0 ≥ 0) + Qn−τn (S 00 ≥ 0)

i=1

(4.7)

and corresponding relations for Qnτn (S < 0) . 4.1.2

Q maximizing over-/undershoooting risk Pn By monotonicity of χ0 the distribution P of i=1 χ0 (yi ) is monotone with respect to n stochastic order; i.e., the probability of i=1 χ0 (yi ) > [≥] 0 under Q−τn ∈ U∗ (−τn )

10

Finite Sample Risk

is maximal if Q−τn (χ0 (y) = c) = Q−τn (y ≥ c) = Q0 (y ≥ c + τn ) = max!

(4.8)

Pn Analogously, we obtain that the probability of i=1 χ0 (yi ) < [≤] 0 under Qτn ∈ U∗ (τn ) is maximal if Q0 (y ≤ −c − τn ) is maximal. So we are lead to least favorable pairs Q0−τn ∈ U∗ (−τn ) and Q00τn ∈ U∗ (τn ) which in case ( ∗ = c ) may be written as 0 Q0−τn =(1 − εn )N (−τn , 1) + εn H−τ , n 00 00 Qτn =(1 − εn )N (τn , 1) + εn Hτn

(4.9) (4.10)

0 with H−τ and Hτ00n concentrated on [τn + c, ∞) and (−∞, −τn − c] , respectively. n In case ( ∗ = v ), this leads us to Q0−τn and Q00τn having cdf (confer Rieder (1994, pp. 174))

 0 Q0−τn (t) = Φ(t + τn ) − δn + + δn H−τ (t), n   00 00 Qτn (t) = Φ(t − τn ) + δn Hτn (t) ∧ 1

(4.11) (4.12)

0 with H−τ and Hτ00n concentrated on [τn + c, ∞) and (−∞, −τn − c] , respectively. n

Remark 4.1 We obtain equality in (4.6) and (4.7) if Qθ ∈ U∗ (θ) ( θ ∈ R ) is absolutely continuous (a.c.) (confer Huber (1981, Lemma 10.6.1)). Else, we may have inequalities. 0 or Hτ00n , However, as shown in Kohl (2005, Remark 11.3.2), for any given non-a.c. H−τ n we can specify an a.c. distribution which is stochastically larger, respectively smaller than 0 H−τ or Hτ00n such that equality holds in (4.6) and (4.7) for this new distribution. Also, n 0 or Hτ00n is not absolutely continuous. But, there is at least equality in (4.7) even if H−τ n (4.5) could be really smaller than (4.6). That is, to make sure that the finite-sample risk is really attained under Q0−τn and Q00τn , (at least) 0 has to be a continuity point of the //// distributions of S 0 and S 00 under Q0−τn and Q00τn .

4.1.3

Terms for risk calculation

As a first step, we state the distribution of χ0 under Q0−τn and Q00 τn where in case ( ∗ = c ) we have Q0−τn (χ0 (y) = −c) 0 Q−τn (−c < χ0 (y) < t) Q0−τn (χ0 (y) = c) and correspondingly for In case ( ∗ = v ) we get,

Q00 τn

= (1 − εn )Φ(−c + τn )   = (1 − εn ) Φ(t + τn ) − Φ(−c + τn )

(4.13) t ∈ (−c, c)

= (1 − εn )Φ(−c − τn ) + εn

(4.14) (4.15)

(compare Kohl (2005), formulae (11.3.24)–(11.3.26))

 Q0−τn (χ0 (y) = −c) = Φ(−c + τn ) − δn +   Q0−τn (−c < χ0 (y) < t) = Φ(t + τn ) − δn + − Φ(−c + τn ) − δn +  Q0−τn (χ0 (y) = c) = 1 − Φ(c + τn ) − δn +

(4.16) t ∈ (−c, c)

and correspondingly for Q00 τn (compare Kohl (2005), formulae (11.3.30)–(11.3.32)).

(4.17) (4.18)

P. Ruckdeschel, M. Kohl

4.2

11

Numerical algorithms for over/-undershooting risk

With the preparations of the preceding subsection we may formulate our two algorithms for the numerical computation of the finite-sample risk (2.8). 4.2.1

Algorithm A

This procedure directly uses the distribution of χ0 under Q0−τn / Q00τn , which can be read off from equations (4.13)–(4.18). The n -fold convolution of this distribution is calculated using Algorithm 3.4 of Kohl et al. (2005) based on fast Fourier transform. This algorithm has proven to produce very accurate results, confer ibid. Remark 4.2 Since the law of ψ˜∗ ( ∗ = c, v ) under Q0−τn and Q00τ puts mass at the points −c∗ , c∗ , for even n (Q0−τn )n

X n



ψ˜∗ (yi ) > 0

< (Q0−τn )n

X n



ψ˜∗ (yi ) ≥ 0

(4.19)

i=1

i=1

and similarly for (Q00−τn )n . Hence the corresponding non-randomized estimate 21 (S 0 +S 00 ) is not minimax; see also Huber (1968, Section 4). In contrast, if n is odd, there is no mass at zero and equality holds in (4.19). Anyway, for increasing n the difference between > and ≥ in (4.19) becomes small very fast as the mass at zero decays exponentially in n . This immediately follows from Hall (1992, Theorem 2.3). ////

4.2.2

Algorithm B

Since all four cases ( ± τn , ∗ = c, v ) may be treated analogously, we only specify the case (Q0−τn )n and (∗ = c) . To lighten the notation, we define R := Q0−τn . In view of (4.13)–(4.15) we can rewrite, R

n

X n

  X n   n (1 − Vi )Zi + Vi Wi > 0 χ0 (yi ) > 0 = R

(4.20)

i=1

i=1

with the following stochastically independent random variables i.i.d.

Vi ∼ Bin (1, p1,n ),

 i.i.d. ˜i Z ˜i ∈ [−c, c] , Zi ∼ L Z

˜ i − c, Wi := 2cW where

˜i i.i.d. Z ∼ N (−τn , 1)

i.i.d.

˜ i ∼ Bin (1, p2,n ) W

  p1,n := (1 − εn ) Φ(−c + τn ) + Φ(−c − τn ) + εn   p2,n := (1 − εn )Φ(−c − τn ) + εn p1,n

(4.21) (4.22) (4.23) (4.24)

we abbreviate sums of these random variables of length m ≤ n by a superscript (m) at the random variable. By stochastic independence we get, Rn

X n



χ0 (yi ) > 0

    ˜ (n) > n/2 Rn V (n) = n + = Rn Z (n) > 0 Rn V (n) = 0 + Rn W

i=1

+

n−1 Xh

j X

j=1

k=0

 i  ˜ (j) = k Rn V (n) = j Rn−j Z (n−j) > c(j − 2k) Rj W

(4.25)

12

Finite Sample Risk

Of course, we can do the same calculations for > replaced by ≥ . Thus, the finitesimple risk by absolute continuity of the law of Z (m) under Rm reads,   Risk(S; c) = Rn Z (n) > 0 Rn V (n) = 0 +

1 2

    ˜ (n) ≥ n/2 × ˜ (n) > n/2 + Rn W Rn W

  XX    n−1 ˜ (j) = k Rn V (n) = j (4.26) Rn−j Z (n−j) > c(j − 2k) Rj W ×Rn V (n) = n + j

j=1 k=0

The m -fold convolution of the law of Zi is calculated using the FFT-Algorithm. ˜ (n) puts mass at n/2 . Thus, Remark 4.3 If n is even, the law of W 



˜ (n) ≥ n/2 ˜ (n) > n/2 < Rn W Rn W and again it follows that the non-randomized estimate Remark 4.2.

4.2.3

1 (S 0 2

(4.27)

+ S 00 ) is not minimax; confer ////

Checks

Based on Algorithm B, we may compute the finite-sample risk numerically exact at least for sample size n = 2 where analytic calculations yield   R2 Z (2) > 0 = 1 − R2 Z (2) ≤ 0 = √ √ √ √ R0  √ 2 − ϕ(y + 2 τ2 )Φ(y + 2 c) dy − Φ( 2 τ2 ) + Φ 2 (τ2 − c) 2c =1−  2 Φ(c + τ2 ) − Φ(−c + τ2 )

(4.28)

Moreover, if we choose ε , respectively δ such that the corresponding optimal clipping bound c∗ = τ2 ( ∗ = c, v ) we obtain √ √ 1 − 4Φ( 2 c∗ )Φ(− 2 c∗ ) 2 (2) R (Z ≤ 0) = (4.29) 2 2Φ(2c∗ ) − 1 We restrict our considerations to the computation of the finite-sample risk of the asymptotic minimax estimator S˜∗as since cas ∗ = τ2 leads to a larger radius than cfi∗ = τ2 and Algorithm A is less accurate for this larger radius. Furthermore, we denote the absolute deviation of the finite-sample risk obtained with Algorithm A, respectively Algorithm B from the numerically exact finite-sample risk Risk\ (S˜∗as ; ∗) by errorA and errorB , respectively. The results for contamination neighborhoods are contained in Table 1 where 2q denotes the number of lattice points used in step 2 of the FFT-Algorithm. The errors of these algorithms for total variation neighborhoods may be read off from Table 2.

P. Ruckdeschel, M. Kohl

13

(∗ = c) -Neighborhoods ε

τ2

cas c

Risk\ (S˜cas ; c)

0.0480

2.000

2.000

0.052560

0.2357

1.000

1.000

0.294290

q 10 12 14 10 12 14

errorA 5.1e−05 1.3e−05 3.2e−06 8.4e−05 2.1e−05 5.2e−06

errorB 2.4e−08 1.5e−09 9.5e−11 3.9e−08 2.4e−09 1.5e−10

Table 1: Precision of the computation for risk (2.8) for n = 2

(∗ = v) -Neighborhoods δ

τ2

cas v

Risk\ (S˜vas ; v)

0.0240

2.000

2.000

0.027821

0.1178

1.000

1.000

0.206917

q 10 12 14 10 12 14

errorA 2.6e−05 6.6e−06 1.6e−06 3.9e−05 9.7e−06 2.4e−06

errorB 2.6e−08 1.6e−09 1.0e−10 5.6e−08 3.5e−09 2.2e−10

Table 2: Precision of computation of risk (2.8) for n = 2

Remark 4.4 (a) Algorithms A and B yield small, respectively very small errors for sample size n = 2 which underlines the precision of the FFT-Algorithm. In addition, the accuracy of these algorithms turns out to be independent of the neighborhood-type. Conversely, since both algorithms are based on the FFT-Algorithm which maintains its high precision with increasing convolution power, we expect the same behavior for Algorithms A and B for increasing sample size n ∈ N . However, to have another cross check, we compare the results of our algorithms with results obtained by numerical simulations; see Table 3. (b) Huber (1968, p. 278), determines the finite-sample risk of the finite-sample minimax estimator S˜∗fi in case of n = 2 , τ2 = 1.0 and ε2 = 0.0430 , respectively δ2 = 0.0396 which by (3.18), respectively (3.23) leads to cfic = 1.0 , respectively cfiv = 1.0 . Our calculations yield Risk\ (S˜cfi ; c) = 0.140359 , respectively Risk\ (S˜vfi ; v) = 0.142338 which confirms the results contained in Huber (1968). ////

We next study the differences between Algorithm A and Algorithm B for increasing sample size n . For this purpose, we only consider the finite-sample risk of the asymptotic minimax estimator, restricting the comparison to contamination neighborhoods. Moreover, we choose one “typical” situation as the results are almost independent of τ and ε . That is, we fix τ = Φ−1 (0.995) ≈ 2.576 and ε = 0.2 which by (3.19) leads to cas c = 1.374 and determine the distance distAB between the results of Algorithm A and Algorithm B. In addition, we give the computation

14

Finite Sample Risk

time in seconds TA , respectively TB on an Athlon with 2.5 GHz and 512 MB RAM. Table 3 shows the corresponding results where RiskB denotes the finite-sample risk of Scas computed with Algorithm B and 2q stands for the number of lattice points used in step 2 of the FFT-Algorithm. Moreover, to give another cross check for the results of our algorithms, we do some simulations. That is, we simulate 1e 06 samples of size n , solve the M equation and compute the empirical finite-sample risk and the corresponding 95% confidence interval. As it turns out, the results of Algorithms A and B always lie well within the 95% confidence interval. n

emp. risk

95% conf. int.

3

0.0899

[0.0892, 0.0904]

5

0.0634

[0.0628, 0.0638]

10

0.0387

[0.0382, 0.0391]

q 10 12 14 10 12 14 10 12 14

RiskB 0.090029 0.090029 0.090029 0.063232 0.063232 0.063232 0.038605 0.038605 0.038605

distAB 3.4e−05 8.6e−06 2.1e−06 3.3e−05 8.3e−06 2.1e−06 2.0e−05 4.9e−06 1.2e−06

TA 0.01 0.03 0.18 0.01 0.04 0.32 0.02 0.12 0.76

TB 0.05 0.19 1.25 0.18 0.72 5.21 0.94 5.30 31.54

Table 3: A comparison between Algorithm A and Algorithm B in case (∗ = c) . [A description is to be found on page 13]

Remark 4.5 Obviously, Algorithm B is more accurate than Algorithm A (cf. Tables 1, 2) but Algorithm A is clearly faster (cf. Tables 3). Moreover, the differences between Algorithm A and Algorithm B are small and get even smaller with increasing sample size n . That is, for sample sizes n ≤ 10 we choose Algorithm B whereas for n > 10 we work with Algorithm A where we use q = 12 in almost all computations. ////

4.3 4.3.1

Numerical algorithms for MSE-risk Qn maximizing MSE Risk

As shown in Ruckdeschel (2004a), for an M-estimator S to scores of form (3.2), in order to maximize MSE up p to remainders of order o(1/n) , it suffices to take P di concentrated on [c + O( log(n)/n ), ∞) —in our examples, a smeared Dirac ˜ n such that the measure in 100 will largely suffice. In particular, on an event Ω ˜ c to the MSE-integral is o(1/n) , ψ ≡ c [P di ] . Let Q ˆ n ∈ U˜c (r; n) contribution of Ω n be constructed with an arbitrary such distribution P di . 4.3.2

MSE knowing

P

Ui = k

By integration by parts, neglecting o(1/n) remainders, Z Z ∞ √ 2 ˆ ˆ n (|Sn | ≥ t ) dt maxMSE(Sn ) = Q Sn Qn (Sn ∈ dx) = 0

(4.30)

P. Ruckdeschel, M. Kohl

15

The probability that we have k outliers in U˜c (r; n) is just P ρk;n := P ( i Ui = k) = P (Bin(n, rn ) = k)/P (Bin(n, rn ) < n/2)

(4.31)

√ with rn = r/ n and hence, under the precautions of Remark 4.1 ˆ n (Sn ≥ u ) Q

=

X

ˆn( Q

n−k X

X

ρk;n F n (

P

i

Ui = k) =

ψ (c) (yi − u) > −kc)

(4.32)

i=1

1≤k 0,

i:Ui =0

1≤k −kc) = Q

i=1

  (1 − Vi;ν )Zi;ν + Vi;ν Wi;ν > −kc



=

i=1

(n−k)

= R n Zν +

 n−k X

n−1−k X h l=1

    (n−k) ˜ ν(n−k) > n/2 − kc Rn Vν(n−k) = n − k + > −kc Rn Vν = 0 + Rn W

l X

  i  (n−k−l) ˜ ν(l) = h Rn Vν(n−k) = l (4.36) Rn−k−l Zν > c(l − k − 2h) Rl W

h=0

Now for j = 1, . . . , n and ν = 1, . . . , N and s = −(n − 1), . . . , (n − 1) , by means of the FFT-Algorithm, we determine the triple array  Rν;s;j := Rj Zν(j) > sc

(4.37)

and then evaluate (4.36) respectively (4.32) term by term. Remark 4.6 (a) Of course this triple loop —over ν, s, j for Rν;s;j and over k, l, h for (4.32) and (4.36)— makes this approach prohibitively slow already for n about 30 . (b) The reason, why we cannot avoid this triple loop —as far as we see— is that the Ui no longer are stochastically independent after having thinned out the neighborhoods. This dependence however is rather weak and may be neglected for n about 30 .

16

Finite Sample Risk

4.3.4

Algorithm D

P On the other hand, the contribution of the event Ui > n/2 is decaying exponentially, and the same goes for the probability that 0 is a mass point of ψ (c) under ˆ nn . Hence, even for n about 30 we only make a minor error ignoring that these Q events are excluded. Then, we may directly consider the c.d.f. R(t) of ψ(Y − u) ˆ n on a u -grid, which is given by under Q R(t) = I{−c≤t=c}

(4.38)

and correspondingly its convolutional powers. 4.3.5

Calculation of MSE

ˆ n (Sn ≥ u ) on the u -grid proBoth Algorithms C and D, after having determined Q ceed by numerical integration of (4.30) which we do with the R-function integrate, compare R Development Core Team (2005).

5

Higher order approximations of densities/cdf ’s

As already noted before, it is very difficult or even impossible to determine the exakt finite-sample risk for sample size n ≥ 3 analytically. Thus, one might think of higher order approximations of the distributions or densities provided by Edgeworth expansions or saddlepoint approximations to compute (at least) an approximation of the exakt finite-sample risk.

5.1

Edgeworth expansions

We consider ξi =

χ0 (yi ) − ER χ0 √ VarR χ0

(5.1)

with some χ0 (u) of form (3.2) and assume R = Q0−τn , Q00τn of form (4.9)–(4.12) to be absolutely continuous. Thus, ER |ξi |5 < ∞ and the corresponding Edgeworth expansion may be read off from Ibragimov (1967, Theorem 1) and Field and Ronchetti (1990, p. 16). That is, we obtain X n

 √ χ0 (yi ) − ER χ0 √ R < n t + O(n−3/2 ) = Var χ R 0 i=1    ρR 1 κR 3 ρ2 = Φ(t) − ϕ(t) √ (t2 − 1) + (t − 3t) + R (t5 − 10t3 + 15t) n 24 72 6 n n

(5.2)

where  ρR = ER

χ0 − ER χ0 √ VarR χ0

3

 and

κR = ER

χ0 − ER χ0 √ VarR χ0

4 −3

(5.3)

P. Ruckdeschel, M. Kohl

17

To determine an approximation of the finite-sample risk (4.1), we have to choose √ ER χ0 t=− n√ VarR χ0

(5.4)

Remark 5.1 RIf k is even, calculating ER χk0 R( k ∈ N ) in case (∗ = v) , we obtain EQ0−τ χk0 = χk0 dN (−τn , 1) and EQ00τn χk0 = χk0 dN (τn , 1) , respectively. Probably, n this also causes the faster convergence of the finite-sample risks in case of total variation neighborhoods at least for the Edgeworth expansions; confer Remark 6.1. ////

5.2

Saddlepoint approximations

The assumptions A 4.1–A 4.5 of Field and Ronchetti (1990, Theorem 4.3) are fulfilled for χt (y) = χ0 (y − t) and R = Q0−τn , Q00τn as defined in (4.9)–(4.12), where we assume R to be absolutely continuous (i.e., dR = ρ dλ ). Thus, we can read off an asymptotic expansion of the density of the corresponding M estimator which is r  n −n A(t)  fn (t) = c (t) 1 + O(n−1 ) t∈R (5.5) 2π σ(t) where

Z

c−1 (t) =

exp{α(t)χt (y)} ρ(y) dy

(5.6)

χt (y)2 c(t) exp{α(t)χt (y)} ρ(y) dy

(5.7)

Z

σ 2 (t) = Z

A(t) =



∂ χ (y) ∂t t



c(t) exp{α(t)χt (y)} ρ(y) dy

and α(t) is the solution α ∈ R to Z 0= χt (y) exp{αχt (y)} ρ(y) dy

(5.8)

(5.9)

Remark 5.2 (a) One key advantage of saddlepoint approximations, being invariant in n , once the functions A(t) and σ(t) of (5.7) and (5.8) are determined on a sufficiently dense grid, gets lost in our shrinking neighborhood setting, where the (least favorable) R is changing from n to n . So in fact little is gained w.r.t. the FFT algorithm. (b) Of course, saddlepoint approximations might also be used to fill the triple array Rν;s;j of (4.37). But, for n ≤ 30 we do not loose too much time in using the FFTAlgorithm instead, while for n > 30 , we are not sure about the performance. ////

6 6.1

Numerical results Over/-undershooting risk

We consider the finite-sample risk of the asymptotic minimax estimator and check the quality of the introduced higher order approximations via the very accurate

18

Finite Sample Risk

results obtained by our Algorithms A and B. The first table (Table 4) shows the results for contamination neighborhoods where we choose ε = 0.1, 0.5 and τ = Φ−1 (0.95), Φ−1 (0.995) ≈ 1.645, 2.576 . For radius ε = 0.1 the saddle point approximation performs well down to sample size 5 or even 3 , for radius ε = 0.5 the results get a little bit worse and we need about 5 − 10 observations to obtain a good approximation. For the smaller τ the results of the Edgeworth expansion up to second order are comparable with the results of the saddlepoint approximation; i.e., for radius ε = 0.1 already 3 observations are enough to get a good approximation, whereas for radius ε = 0.5 we need about 5 − 10 observations. For τ = 2.576 and ε = 0.1 the Edgeworth expansion up to second order performs a little bit worse compared to the saddle point approximation, whereas for τ = 2.576 and ε = 0.5 the results are again comparable. The Edgeworth expansion up to first order in all cases yields acceptable results down to sample size 10 . But, we need 20 or even 50 observations to obtain results that are as good as the results obtained by the expansion up to second order and the saddle point approximation, respectively. In case of total variation neighborhoods (cf. Table 5) the results are very similar. If we consider the saddlepoint approximation or the Edgeworth expansion up to second order, respectively, a sample size of about 5 is enough to obtain good approximation to the exact finite-sample risk. Again, the Edgeworth expansion up to first order performs a little bit worse and we need 20 − 50 observations to get comparable good approximations as in the two other cases. Remark 6.1 (a) Apparently, the speed of convergence towards the asymptotic risk is much faster in case of total variation neighborhoods. As Lemma 3.1 (c) and Lemma 3.2 (c) −1/2 −1 show cfic = cas ) whereas cfiv = cas ) . Moreover, our numerical calculac + O(n v + O(n tions yield that the same holds for the corresponding finite-sample risks of the finite-sample minimax estimator; confer Subsubsection 7.2.1. However, this also seems to be true for the finite-sample risk of the asymptotic minimax estimator as the results included in Tables 4 and 5 as well as in Tables 9 and 10 indicate. (b) On the one hand, we could use these higher order approximations to obtain a better approximation for the finite-sample risk of the asymptotic minimax estimator since the asymptotic risk of this estimator is much to optimistic for small sample sizes; confer Section in Kohl (2005). On the other hand, as the computation time of our Algorithm A and even more of our Algorithm B strongly increases with increasing sample size one could also think of saddle point approximations, respectively Edgeworth expansions to determine the finite-sample risk of the finite-sample minimax estimator especially for larger sample sizes ( n > 20 ) since the computation time of these approximations is independent of n . In particular, the results for the Edgeworth expansions can be obtained with very little computational effort. ////

6.2

MSE

Under R 1.7.1, we simulated M = 10000 runs of sample size n = 5, 10, 30, 100 in the ideal location model. In a contaminated situation, we used observations stemming from X Qn = L{[(1 − Ui )Yiid + Ui Yidi ]i Ui ≤ pn/2q − 1 } (6.1)

P. Ruckdeschel, M. Kohl

19

(∗ = c) -Neighborhoods ε

τ

cas c

1.645

1.484

0.1

2.576

1.645

1.675

0.663

0.5

2.576

0.919

n Risk\ (S˜cas ; c) RiskEW1 3 12.08 11.83 5 10.62 10.43 10 9.36 9.26 100 7.89 7.88 asymptotic risk: 7.42 3 4.75 5.48 5 3.14 3.30 10 1.96 2.01 100 1.08 1.08 asymptotic risk: 0.91 3 37.10 37.19 5 32.74 32.41 10 28.24 27.84 100 21.28 21.25 asymptotic risk: 18.74 3 26.97 27.37 5 20.10 19.90 10 13.42 13.14 100 5.73 5.73 asymptotic risk: 3.98

RiskEW2 12.05 10.57 9.35 7.89

RiskSP 12.22 10.57 9.34 7.89

4.44 3.02 1.94 1.08

5.16 3.14 1.95 1.08

38.07 33.10 28.23 21.28

39.36 33.37 28.20 21.28

28.84 20.75 13.41 5.73

30.53 21.08 13.40 5.73

Table 4: Approximation quality of Edgeworth expansions and Saddlepoint approximations [Risks are given in percent. Risk\ (S˜cas ; c) denotes risk (4.1) of S˜cas , RiskEW1 and RiskEW2 the approximations by means of Edgeworth expansions up to first/second order and RiskSP by means of saddlepoint approximations.]

√ i.i.d. i.i.d. i.i.d. for Ui ∼ Bin(1, r/ n) , Yiid ∼ N (0, 1) , Yidi ∼ I{100} all stochastically independent and for contamination radii r = 0.1 . As already indicated in Subsubsection 4.3.1 for the considered estimators to be introduced in a moment, a contamination point 100 will largely suffice to attain the maximal MSE on U˜c (r; n) . As estimators we considered M-estimators to scores of type (3.2) with clipping heights c = 0.7 (according to the H07-estimator from Andrews et al. (1972)) and cfo (r) = 1.9483 , the first-order optimal clipping height according to (3.9). All empirical MSE ’s come with an asymtptotic 95% –confidence interval, which is based on the CLT for the variables X n empMSEn = 10000 [Sn (samplej )]2 (6.2) j

To get an idea of the speed of the convergence of the MSE to its asymptotic values, we consider the M-estimators for different sample sizes n . The simulated empirical risk comes with an (empirical) 95% confidence interval and is compared to the corresponding numerical approximations and to the first-

20

Finite Sample Risk

(∗ = v) -Neighborhoods δ

τ

cas v

1.645

1.484

0.05

2.576

1.645

1.675

0.663

0.25

2.576

0.919

n Risk\ (S˜vas ; v) RiskEW1 3 9.11 9.06 5 8.40 8.31 10 7.89 7.83 100 7.47 7.46 asymptotic risk: 7.42 3 2.62 2.82 5 1.75 1.82 10 1.25 1.29 100 0.93 0.94 asymptotic risk: 0.91 3 23.47 22.73 5 21.84 21.19 10 20.35 19.98 100 18.90 18.86 asymptotic risk: 18.74 3 11.88 12.35 5 8.80 8.83 10 6.20 6.20 100 4.17 4.17 asymptotic risk: 3.98

RiskEW2 8.92 8.34 7.88 7.47

RiskSP 8.97 8.32 7.87 7.47

2.60 1.72 1.24 0.93

2.66 1.73 1.24 0.93

24.19 22.00 20.36 18.90

24.23 21.90 20.32 18.90

12.79 8.79 6.16 4.17

13.29 8.92 6.18 4.17

Table 5: Approximation quality of Edgeworth expansions and Saddlepoint approximations [Notation is analogue to table 4 ]

order, second-order, and third-order expansion in (3.3). The results are tabulated in Tables 6 and 7 on page 21.

6.3

Finite sample distribution function of minimax estimators to over-/undershooting risk

We use Algorithm A to compute the cumulative distribution function of Pn finally ˜c (yi ) under Q00 n for different values of n and compare the results with ψ τn i=1 the cumulative distribution function of the normal distribution which has minimum Kolmogorov distance. By it suffices to  consider only the cumulative Pnsymmetry ˜c (yi ) under Q00 n . Moreover, we give only the distribution function of ψ τn i=1 two “extreme” situations ε = 0.1 ( δ = 0.05 ), τ = 1.645 (see Figures 1 and 2) and ε = 0.5 ( δ = 0.25 ), τ = 2.576 (see Figure 3 and 4). For the total variation case, in the first situation already 5 observations seem to be enough to get quite close to a normal distribution whereas in the second situation we need a sample size of about 10 . This indicates that the speed of convergence in case of total variation neighborhoods is not only faster at zero (finite-sample risk) but uniformly

P. Ruckdeschel, M. Kohl

21

Table 6: emp., num., and as. n maxMSE at r = 0.1 , c = cfo = 1.9483 n/ situation 5 10 30 100

id cont id cont id cont id cont

simulation S¯m [low; up] 0.981 [0.954 ;1.009 ] 1.471 [1.419 ;1.532 ] 1.001 [0.973 ;1.029 ] 1.288 [1.248 ;1.328 ] 1.028 [1.000 ;1.057 ] 1.192 [1.158 ;1.226 ] 0.984 [0.956 ;1.011 ] 1.081 [1.050 ;1.111 ]

numeric Algo C

Algo D

1.008 1.501 1.010 1.290 1.011 1.165 – –

1.007 1.612 1.009 1.296 1.011 1.167 1.010 1.111

asymptotics n0 n−1/2 n−1 1.012 1.012 1.007 1.054 1.292 1.331 1.012 1.012 1.010 1.054 1.222 1.242 1.012 1.012 1.011 1.054 1.151 1.158 1.012 1.012 1.012 1.054 1.107 1.109

Table 7: emp., num., and as. n maxMSE at r = 0.1 , c = 0.7 n/ situation 5 10 30 100

id cont id cont id cont id cont

simulation S¯m [low; up] 1.147 [1.114 ;1.179 ] 1.403 [1.359 ;1.447 ] 1.179 [1.139 ;1.205 ] 1.331 [1.292 ;1.369 ] 1.209 [1.175 ;1.242 ] 1.301 [1.264 ;1.337 ] 1.161 [1.128 ;1.193 ] 1.212 [1.178 ;1.246 ]

over the whole support of the law of contamination.

numeric Algo C

Algo D

1.172 1.434 1.177 1.327 1.183 1.265 – –

1.168 1.535 1.174 1.326 1.180 1.262 1.182 1.232

Pn

i=1

asymptotics n0 n−1/2 n−1 1.187 1.187 1.169 1.205 1.342 1.345 1.187 1.187 1.178 1.205 1.302 1.303 1.187 1.187 1.184 1.205 1.261 1.261 1.187 1.187 1.186 1.205 1.236 1.236

ψ˜v (yi ) under Q00τn

n

than under convex

Remark 6.2 If Hτ00n is absolutely continuous, then also Q00τP is absolutely continuous n  n n 00 n ˜ and by Remark 4.1 the distribution of S˜cfi under Q00τn and i=1 ψc (yi ) under Qτn coincide. ////

To determine the minimum Kolmogorov distance normal distribution, we use a numerical approximation; i.e., we compute  distance dκ of2 the Pn ˜ the Kolmogorov 00 n and N (µ, σ ) on cumulative distribution functions of i=1 ψc (yi ) under Qτn a grid of 1e05 points and minimize this distance in µ and σ using the R function optim, compare R Development Core Team (2005). As we see, in both cases about 10 observations are enough to get already quite close to a normal distribution where the jumps included in the cumulative distribution functions decay exponentially; confer also Remark 4.2.

22

Finite Sample Risk

7

Application: comparison of different “optimal procedures”

7.1

MSE

For the finite sample MSE risk, for the values of n = 5, 10, 50, 100 and r = 0.1, 0.5, 1.0 , we determine cfo , cso , cto , co , cFYZ and calculate the MSE-risk for the corresponding M-estimators S . We compare the resulting IC’s as to their clipping- c and the corresponding numerically exact finite MSE. As the absolute value of MSE is of secondary interest when we want to find the optimal procedure, for all but the procedure with c = co we determine the relative MSE, that is o

relMSE(ψ (c) ) = maxMSE(ψ (c) )/maxMSE(ψ (c ) )

(7.1)

For n = ∞ , we evaluate the corresponding first-order asMSE . As a cross-check the clipping heights cfo , cso , cto are also determined for n = 108 . In case of cFZY , for all finite n ’s the error tolerance used in optimize in R was 10−4 , while for n = ∞ it was 10−12 . For co and n = 108 , an optimization of the (numerically) exact MSE would have been too time-consuming and has been skipped for this reason. Also, for n = 5 , the radius r = 1.0 , corresponding to ε = 0.447 , is not admitted for an optimization of the risk proposed by Fraiman et al. (2001) and thus no result is available in this case.

7.2

Over-/undershooting risk

In this subsection we present various numerical comparisons between finite-sample and asymptotic results; i.e., we numerically check the asymptotics against finitesample results obtained for fixed neighborhoods. To restrict the amount of results, we choose τ = Φ−1 (0.95), Φ−1 (0.995) such that 2τ corresponds to the width of 90% , 99% -confidence intervals in case of the standard normal distribution. Moreover, in most cases we use q = 12 in Algorithms A and B. In Subsubsection 7.2.1 we compare the finite-sample risks of the corresponding M estimators and determine the speed of convergence of the finite-sample risk of the finite-sample minimax estimator numerically. 7.2.1

Finite-sample risk

In the setup of the preceding subsection, we determine the finite-sample risk of the finite-sample minimax estimator S˜cfi , the asymptotic minimax estimator S˜cas and the estimator Scas.c which is based on the O(n−1/2 ) -corrected asymptotic optimal clipping bound and correspondingly for total variation (with a O(n−1 ) -corrected bound). Convex contamination: Although there are clear differences between the clipping bounds of the finite-sample and the asymptotic minimax estimator, the differences (in absolute values) concerning the corresponding finite-sample risks are only small; see Figure 5 and Table 9. Moreover, the finite-sample risks of the estimator

P. Ruckdeschel, M. Kohl

23

Table 8: Optimal clipping heights and corresponding relMSEn n=5

n = 10

n = 50

n = 100

n=∞

c relMSEn (cfo ) cso relMSEn (cso ) cto

1.948 8.679% 1.394 0.833% 1.309

1.948 4.065% 1.484 0.207% 1.428

1.948 0.836% 1.663 0.014% 1.644

1.948 0.448% 1.724 0.010% 1.713

1.948 – 1.948 – 1.948

relMSEn (cto ) cFYZ relMSEn (cFYZ ) co MSEn (co )

0.332% 1.368 0.658% 1.167 1.388

0.066% 1.370 0.002% 1.358 1.239

0.004% 1.668 0.021% 1.630 1.129

0.006% 1.756 0.031% 1.704 1.107

– 1.939 – – –

cfo relMSEn (cfo ) cso relMSEn (cso ) cto

0.862 2.930% 0.650 0.756% 0.547

0.862 2.655% 0.690 0.615% 0.620

0.862 0.446% 0.767 0.036% 0.744

0.862 0.218% 0.790 0.013% 0.777

0.862 – 0.862 – 0.862

relMSEn (cto ) cFYZ relMSEn (cFYZ ) co MSEn (co )

0.230% 0.539 0.200% 0.413 4.632

0.191% 0.632 0.248% 0.531 3.039

0.008% 0.749 0.011% 0.728 2.008

0.003% 0.782 0.008% 0.770 1.879

– 0.866 – – –

cfo relMSEn (cfo ) cso relMSEn (cso ) cto

0.436 2.716% 0.320 1.411% 0.255

0.436 3.132% 0.340 1.610% 0.291

0.436 0.348% 0.380 0.076% 0.361

0.436 0.149% 0.394 0.021% 0.382

0.436 – 0.436 – 0.436

relMSEn (cto ) cFYZ relMSEn (cFYZ ) co MSEn (co )

0.876% – – 0.001 12.627

0.999% 0.281 0.892% 0.125 8.445

0.027% 0.344 0.063% 0.334 4.296

0.006% 0.387 0.012% 0.366 3.787

– 0.440 – – –

r fo

0.1

0.5

1.0

A description to this table is located on page 22.

24

Finite Sample Risk

which is based on the O(n−1/2 ) -corrected asymptotic optimal clipping bound are very close to the finite-sample risk of the finite-sample minimax estimator. Total variation: The differences between the finite-sample risks (in absolute values) are small already for very small sample sizes ( n ≤ 5 ); see Figure 6 and Table 10. In particular, the finite-sample risks of Svas.c is very close to the finite-sample risk of S˜vfi , already for sample size n = 5 . ε

τ 1.645

0.1 2.576

1.645 0.5 2.576

n 3 5 10 100 3 5 10 100 3 5 10 100 3 5 10 100

cfic 0.903 1.021 1.148 1.374 0.788 0.953 1.142 1.497 0.198 0.287 0.386 0.570 0.232 0.347 0.487 0.769

ccas.c 0.837 0.983 1.130 1.372 0.621 0.859 1.098 1.493 0.113 0.237 0.362 0.568 0.022 0.224 0.428 0.763

cas c 1.484

1.675

0.663

0.919

Risk\fi 11.327 10.188 9.156 7.872 3.845 2.610 1.751 1.067 36.695 32.161 27.866 21.239 26.461 19.006 12.669 5.667

Risk\as.c 11.337 10.192 9.156 7.872 3.871 2.621 1.753 1.067 36.699 32.167 27.869 21.239 26.468 19.033 12.683 5.667

Risk\as 12.079 10.617 9.357 7.888 4.748 3.143 1.958 1.078 37.100 32.741 28.244 21.284 26.966 20.099 13.419 5.731

Table 9: Comparison of optimal clipping bounds and corresponding finitesample risks in case (∗ = c) . [Risks are given in percent.]

7.2.2

Speed of convergence

Convex contamination: The second Figure (Figure 7) shows the speed of convergence for the finite-sample risks. It seems to be of order n−1/2 as in case of the optimal clipping bounds; confer Lemma 3.1. Total variation: The results suggest that not only in case of the optimal clipping bound (cf. Lemma 3.2) but also in case of the corresponding finite-sample risk of the finite-sample minimax estimator the speed of convergence is of order O(n−1 ) . Figure 8 and the results obtained by the Box-Cox power transformation provided by the MASS package of Venables and Ripley (2002) strongly confirm this conjecture. Moreover, the results are almost independent from τ and δ .

8

Acknowledgement

P. Ruckdeschel, M. Kohl

δ

τ 1.645

0.05 2.576

1.645 0.25 2.576

n 3 5 10 100 3 5 10 100 3 5 10 100 3 5 10 100

25 cfiv 1.120 1.232 1.340 1.467 0.984 1.160 1.351 1.630 0.467 0.532 0.591 0.655 0.477 0.595 0.722 0.893

cas.c v 0.903 1.135 1.310 1.466 0.049 0.700 1.188 1.627 0.394 0.502 0.583 0.655 0.043 0.393 0.656 0.892

cas v 1.484

1.675

0.663

0.919

Risk\fi 8.879 8.306 7.863 7.465 2.214 1.588 1.201 0.934 23.347 21.762 20.326 18.898 11.559 8.484 6.099 4.168

Risk\as.c 8.972 8.321 7.864 7.465 2.654 1.761 1.215 0.934 23.360 21.766 20.326 18.898 11.658 8.589 6.110 4.168

Risk\as 9.106 8.399 7.891 7.465 2.618 1.754 1.247 0.935 23.471 21.843 20.350 18.898 11.882 8.796 6.203 4.170

Table 10: Comparison of optimal clipping bounds and corresponding finitesample risks in case (∗ = v) . [Risks are given in percent.]

References Andrews D.F., Bickel P.J., Hampel F.R., Huber P.J., Rogers W.H. and Tukey J.W. (1972): Robust estimates of location. Survey and advances. Princeton University Press, Princeton, N. J. 6.2 Field C. and Ronchetti E. (1990): Small sample asymptotics, Vol. 13 of IMS Lecture Notes - Monograph Series.. Institute of Mathematical Statistics, Hayward, CA. 5.1, 5.2 Fraiman R., Yohai V.J. and Zamar R.H. (2001): Optimal robust M -estimates of location. Ann. Stat., 29(1): 194–223. (document), 3.1.3, 3.1.3, 7.1 Hall P. (1992): The bootstrap and Edgeworth expansion. Springer Series in Statistics. Springer-Verlag. 4.2 Huber P.J. (1968): Robust confidence limits. Z. Wahrscheinlichkeitstheor. Verw. Geb., 10: 269–278. (document), 1.1, 3.2.1, 3.2.2, 3.4, 4.2, 4.4 —— (1981): Robust statistics. Wiley Series in Probability and Mathematical Statistics. Wiley. 3, 4.1.1, 4.1 Ibragimov I. (1967): The Chebyshev-Cram´er asymptotic expansions. Probab. Appl., 12: 454–469. 5.1

Theor.

26

Finite Sample Risk

Kohl M. (2005): Numerical contributions to the asymptotic theory of robustness. Dissertation, Universit¨at Bayreuth, Bayreuth. 1.1, 3.2.1, 3.2.2, 3.4, 4.1, 4.1.3, 4.1.3, 6.1 Kohl, M., Ruckdeschel, P., and Stabla, T. (2005): General purpose convolution algorithm for distributions in S4-Classes by means of FFT. Technical report, Feb. 2005. Also available in http://www.uni-bayreuth.de/departments/math/org/mathe7/RUCKDESCHEL/pubs/comp.pdf. (document), 4.2.1 R Development Core Team (2005): R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org. 1.1, 4.3.5, 6.3 Rieder H. (1980): Estimates derived from robust tests. Ann. Stat., 8: 106–115. (document), 2.2, 2.3, 3.2.1, 3.4 —— (1989): A finite-sample minimax regression estimator. Statistics, 20(2): 211– 221. 3.4 —— (1994): Robust asymptotic statistics. Springer Series in Statistics. Springer. 1.1, 2.2, 3.1.2, 3.4, 4.1.2 —— (1995): Robustness in structured models. In: Rinne H. et al. (Eds.) Grundlagen der Statistik und ihre Anwendungen. Festschrift f¨ ur Kurt Weichselberger , p. 172–187. Physica-Verlag, Berlin. 3.4 Ruckdeschel P. (2004a): Higher Order Asymptotics for the MSE of M-Estimators on Shrinking Neighborhoods. unpublished manuscript. Also available in http://www.uni-bayreuth.de/departments/math/org/mathe7/RUCKDESCHEL/pubs/mest2.pdf. 1.1, 2.2, 2.2, 3.1.1, 3.1.1, 3.4, 4.3.1 —— (2004b): Consequences of Higher Order Asymptotics for M-Estimators. unpublished manuscript. Also available in http://www.uni-bayreuth.de/departments/math/org/mathe7/RUCKDESCHEL/pubs/cmest.pdf. 3.1.2, 3.1.2, 3.4 Ruckdeschel P., Kohl M., Stabla T., and Camphausen F. (2005): S4 Classes for Distributions. Submitted. Also available in http://www.uni-bayreuth.de/departments/math/org/mathe7/RUCKDESCHEL/pubs/distr-rnews.pdf.

1.1 Venables W. and Ripley B. (2002): Modern Applied Statistics with S-Plus. Statistics and Computing. Springer, 4. edition. 7.2.2

P. Ruckdeschel, M. Kohl

27

n=2

n=3 1.0

µ = 1.089, σ = 0.545

0.8

0.8

0.6

0.6

cdf(y)

cdf(y)

1.0

0.4 dκ = 0.1788

0.2

µ = 1.437, σ = 0.893

0.4 dκ = 0.0771

0.2

0.0

0.0 −2

−1

0

1

2

3

−2

0

y

µ = 1.776, σ = 1.209

0.8

0.6

0.6

cdf(y)

cdf(y)

n=5 1.0

0.8

0.4 dκ = 0.0403

0.2

µ = 2.104, σ = 1.494

0.4 dκ = 0.0223

0.2

0.0

0.0 −4

−2

0

2

4

6

−4

−2

0

y

µ = 3.338, σ = 2.423

0.8

0.6

0.6

cdf(y)

cdf(y)

4

6

8

15

20

n=20 1.0

0.8

0.4 dκ = 0.0074

0.2

2 y

n=10 1.0

4

y

n=4 1.0

2

µ = 5.045, σ = 3.627

0.4 dκ = 0.0034

0.2

0.0

0.0 −5

0

5

10

−10

−5

y

0

5

10

y

cdf of

n n ~   ∑ ψc(yi) under Qτn″ i=1 

cdf of N(µ, σ2 )

Figure 1: Finite-sample distributions for radius ε = 0.1 and τ = 1.645 in case of contamination neighborhoods.

28

Finite Sample Risk

n=2

n=3 1.0

µ = 1.384, σ = 0.584

0.8

0.8

0.6

0.6

cdf(y)

cdf(y)

1.0

0.4 dκ = 0.1387

0.2

µ = 1.790, σ = 1.000

0.4 dκ = 0.0584

0.2

0.0

0.0 −3

−2

−1

0

1

2

3

4

−2

0

2

y

y

n=4

n=5 1.0

µ = 2.159, σ = 1.369

0.8

0.8

0.6

0.6

cdf(y)

cdf(y)

1.0

0.4 dκ = 0.0294

0.2

µ = 2.501, σ = 1.671

0.4 dκ = 0.0160

0.2

0.0

0.0 −4

−2

0

2

4

6

8

−5

0

5

y

y

n=10

n=20 1.0

µ = 3.771, σ = 2.582

0.8

0.8

0.6

0.6

cdf(y)

cdf(y)

1.0

4

0.4 dκ = 0.0066

0.2

µ = 5.504, σ = 3.794

0.4 dκ = 0.0030

0.2

0.0

0.0 −5

0

5

10

−10

y

−5

0

5

10

15

20

y

cdf of

n n ~   ∑ ψv(yi) under Qτn″ i=1 

cdf of N(µ, σ2 )

Figure 2: Finite-sample distributions for radius δ = 0.05 and τ = 1.645 in case of total variation neighborhoods.

P. Ruckdeschel, M. Kohl

29

n=2

n=3 1.0

µ = 0.069, σ = 0.216

0.8

0.8

0.6

0.6

cdf(y)

cdf(y)

1.0

0.4 dκ = 0.2278

0.2

µ = 0.239, σ = 0.301

0.4 dκ = 0.1939

0.2

0.0

0.0 −0.6

−0.2 0.0

0.2

0.4

0.6

0.8

−1.0

−0.5

0.0

y

µ = 0.416, σ = 0.642

0.8

0.6

0.6

cdf(y)

cdf(y)

n=5 1.0

0.8

0.4 dκ = 0.1456

0.2

µ = 0.634, σ = 0.706

0.4 dκ = 0.0973

0.2

0.0

0.0 −1

0

1

2

−1

0

y

µ = 1.577, σ = 1.319

0.8

0.6

0.6

cdf(y)

cdf(y)

2

3

n=20 1.0

0.8

0.4 dκ = 0.0149

0.2

1 y

n=10 1.0

1.0

y

n=4 1.0

0.5

µ = 3.081, σ = 2.247

0.4 dκ = 0.0048

0.2

0.0

0.0 −2

0

2

4

6

−5

y

0

5

10

y

cdf of

n n ~   ∑ ψc(yi) under Qτn″ i=1 

cdf of N(µ, σ2 )

Figure 3: Finite-sample distributions for radius ε = 0.5 and τ = 2.576 in case of contamination neighborhoods.

30

Finite Sample Risk

n=2

n=3 1.0

µ = 0.315, σ = 0.751

0.8

0.8

0.6

0.6

cdf(y)

cdf(y)

1.0

0.4 dκ = 0.2806

0.2

µ = 0.738, σ = 0.731

0.4 dκ = 0.1710

0.2

0.0

0.0 −2

−1

0

1

2

3

−3

−2

−1

0

y

0.8

0.6

0.6

cdf(y)

cdf(y)

µ = 1.194, σ = 0.748

0.4 dκ = 0.0930

4

µ = 1.524, σ = 0.942

0.4 dκ = 0.0617

0.2

0.0

0.0 −2

0

2

4

−4

−2

0

y

4

6

n=20 1.0

µ = 2.838, σ = 1.723

0.8

0.6

0.6

cdf(y)

0.8

0.4 dκ = 0.0119

0.2

2 y

n=10 1.0

cdf(y)

3

n=5 1.0

0.8

0.2

2

y

n=4 1.0

1

µ = 4.622, σ = 2.725

0.4 dκ = 0.0052

0.2

0.0

0.0 −5

0

5

10

−5

y

0

5

10

y

cdf of

n n ~   ∑ ψv(yi) under Qτn″ i=1 

cdf of N(µ, σ2 )

Figure 4: Finite-sample distributions for radius δ = 0.25 and τ = 2.576 in case of total variation neighborhoods.

P. Ruckdeschel, M. Kohl

31

ε = 0.1, τ = 1.645 10 ●●●●●●●●●●●●●●●●●●●●●●●●

5

5

10

15

20

Finite−sample risk [in %]

Finite−sample risk [in %]

15 40

30

20

●●●●●●●●●●●●●●●●●●●●●●●●

ε = 0.5, τ = 1.645 10

25

5

Sample size n

10

15

20

25

Sample size n

10 ε = 0.1, τ = 1.960 5

●●●●●●●●●●●●●●●●●●●●●●●●

5

10

15

20

Finite−sample risk [in %]

Finite−sample risk [in %]

15 40 ε = 0.5, τ = 1.960

30

20 ●●●●●●●●●●●●●●●●●●●●●●●●

10

25

5

Sample size n

10

15

20

25

Sample size n

Finite−sample risk [in %]

Finite−sample risk [in %]

15

10

5

ε = 0.1, τ = 2.576 ●●●●●●●●●●●●●●●●●●●●●●●●

5

10

15

20

40

30 ε = 0.5, τ = 2.576

20

10 ●●●●●●●●●●●●●●●●●●●●●●●●

25

5

Sample size n



10

15

20

25

Sample size n

asymptotic risk asymptotic

O(n−1 2)−corrected finite−sample

Figure 5: Finite-sample risk for sample size n ≤ 25 given radius ε = 0.1, 0.5 ( ∗ = c ) and width τ = 2.576, 1.960, 1.645 .

10 8

●●●●●●●●●●●●●●●●●●●●●●●●

6

δ = 0.05, τ = 1.645

4

Finite−sample risk [in %]

Finite Sample Risk

Finite−sample risk [in %]

32

25 20

●●●●●●●●●●●●●●●●●●●●●●●●

15 δ = 0.25, τ = 1.645 10

2 5

5

10

15

20

25

5

Sample size n

10

15

20

25

Sample size n

8 δ = 0.05, τ = 1.960

6 4

●●●●●●●●●●●●●●●●●●●●●●●●

Finite−sample risk [in %]

Finite−sample risk [in %]

10 25 20 δ = 0.25, τ = 1.960 15 ●●●●●●●●●●●●●●●●●●●●●●●●

10

2 5

5

10

15

20

25

5

Sample size n

10

15

20

25

Sample size n

Finite−sample risk [in %]

Finite−sample risk [in %]

10 8 6 4

δ = 0.05, τ = 2.576

25 20 15 δ = 0.25, τ = 2.576

10

2 ●●●●●●●●●●●●●●●●●●●●●●●●

5

10

15

20

5

●●●●●●●●●●●●●●●●●●●●●●●●

25

5

Sample size n



10

15

20

25

Sample size n

asymptotic risk asymptotic

O(n−1)−corrected finite−sample

Figure 6: Finite-sample risk for sample size n ≤ 25 given radius δ = 0.05, 0.25 ( ∗ = v ) and width τ = 2.576, 1.960, 1.645 .

P. Ruckdeschel, M. Kohl

33

15 Finite−sample risk

Finite−sample risk

40 ε = 0.1, τ = 1.645

10

5

30 ε = 0.5, τ = 1.645 20

10

0

100

200

300

400

500

0

100

Sample size n

200

300

400

500

400

500

400

500

Sample size n

15 Finite−sample risk

Finite−sample risk

40

10 ε = 0.1, τ = 1.960 5

30 ε = 0.5, τ = 1.960

20

10

0

100

200

300

400

500

0

100

Sample size n

200

300

Sample size n

15 Finite−sample risk

Finite−sample risk

40

10

ε = 0.1, τ = 2.576

5

30

20 ε = 0.5, τ = 2.576

10

0

100

200

300

400

500

0

100

Sample size n

200

300

Sample size n

asymptotic risk asymptotic

O(n−1 2)−corrected finite−sample

Figure 7: Finite-sample risk for increasing sample size n given radius ε = 0.1, 0.5 ( ∗ = c ) and width τ = 2.576, 1.645 .

34

Finite Sample Risk

δ = 0.05, τ = 1.645

δ = 0.25, τ = 1.645

25 Finite−sample risk

Finite−sample risk

10 8 6 4

20 15 10

2 5

0

50

100

150

200

250

0

50

Sample size n

100

150

200

250

200

250

200

250

Sample size n

10 Finite−sample risk

Finite−sample risk

25 8 δ = 0.05, τ = 1.960

6 4

20 δ = 0.25, τ = 1.960

15 10

2 5

0

50

100

150

200

250

0

50

Sample size n

100

150

Sample size n

10 Finite−sample risk

Finite−sample risk

25 8 6 4

δ = 0.05, τ = 2.576

20 15 δ = 0.25, τ = 2.576

10

2 5

0

50

100

150

200

250

0

Sample size n

50

100

150

Sample size n

asymptotic risk asymptotic

O(n−1)−corrected finite−sample

Figure 8: Finite-sample risk for increasing sample size n given radius δ = 0.05, 0.25 ( ∗ = v ) and width τ = 2.576, 1.645 .

Suggest Documents