On the E cient Score Function for some Semiparametric Location ...

On the Ecient Score Function for some Semiparametric Location-Scale Models Rodrigo Labouriau January 16, 1996

Abstract

The paper studies some semiparametric extensions of the location-scale models. The ecient score function for these models is calculated using a technique based on expansions in orthogonal polynomial of the parametric partial score function. As an auxiliary result a sucient condition involving the Laplace transform is given for having the class of polynomials dense in L2 . In several of the examples considered, the ecient score function depends on the nuisance parameter; however, this dependence is only through an intermediate nite dimensional parameter. In some examples the ecient score function essentially does not depend on the nuisance parameter, thus implying optimality of these estimating functions.

Department of Theoretical Statistics, University of Aarhus, and Department of Biometry and Informatics, Foulum Research Center, Danish Ministry of Agriculture and Fisheries.

1

1

Contents 1 2 3 4 5

Introduction Preliminaries Semiparametric Location-Scale Models Calculation of the nuisance tangent space Calculation of the ecient score function

3 4 7 11 14

5.1 The case where the rst two standardized cumulants are xed 15 5.2 The case where the rst three standardized cumulants are xed 17

6 Discussion A Appendices A.1 A.2 A.3 A.4

The Laplace transform and polynomial approximation in L2 : Calculation of the L2- nuisance tangent space at (0; 1; a) : : : Calculation of the tangent space at an arbitrary point : : : : Calculation of the rst four orthogonal polynomials in L2 (a) :

21 25

25 32 37 42

This paper is part of the PhD work of the author at the Department of Theoretical Statistics, University of Aarhus, under the supervision of Professor Ole E. BarndorNielsen. 1

2

1 Introduction This paper studies some semiparametric extensions of the location-scale model. The classic location-scale model is constructed by taking a ( xed) distribution and applying on it a shift (location) and rescaling (scale) transformation. Now consider the situation where instead of having a xed distribution one deals with a given class of distributions. A shift-rescaling transformation is applied to an unknown element of this class. Our interest is in estimating the shift and the rescaling, but now in the presence of the indetermination given by the unknown particular element of the class of distribution that generated the data, i.e. the location (shift) and the scale (rescaling) are the parameters of interest and the unknown distribution is the nuisance parameter. We will consider location-scale models de ned for distributions contained in exponential families and with the support equal to the whole real line. Here we do not know in which exponential family the supposed data distribution is contained. Additionally, we consider some models for which some standardized cumulants are xed and known. This will allow us to obtain a range of semiparametric models of various sizes. The main propose of the paper is to study the behavior of the ecient score function (i.e. the projection of the partial score function onto the orthogonal complement of the nuisance tangent space) for estimating the location and the scale. To pursue this task, it is carried out a detailed calculation of the nuisance tangent spaces. In section 2 we review some notions of the classic theory of semiparametric models which are used in the rest of the paper. Section 3 presents and discusses the semiparametric location scale model with which we work. The nuisance tangent spaces are calculated in section 4 and the details supplied in the appendices. Section 5 studies the ecient score function and in the subsections 5.1 and 5.2 we specialize to the case where the rst two and the rst three standardized cumulants are xed respectively. Some discussion is provided in section 6. There is an appendix with a brief summary of the theory of the Laplace transform, adapted to the context we need, and we prove a sucient condition, in terms of the Laplace transform, for having the class of polynomials dense in the L2 space associated to a certain probability measure. 3

2 Preliminaries In this section we review some notions of the classic theory of semiparametric models. The notions studied here are: path dierentiability, tangent sets and tangent spaces, nuisance tangent spaces and ecient score functions. Let us consider a family of probability measures P de ned on a common measurable space (X ; B). It is assumed that the elements of P possess a common support, say X , and that there exists a - nite measure de ned on (X ; B) such that each member of P is absolutely continuous with respect to . Each P 2 P is identi ed with a version of its Radon-Nikodym derivative with respect to . Denote the class of these densities by

P =

dP ( ) : P 2 P : d

Without loss of generality we assume that the versions of the Radon-Nikodym derivatives used are such that for all P 2 P and for each x 2 X ,

p(x) = dP d (x) > 0 :

(1)

We introduce next the notation for Lq spaces used throughout. For q 2 [1; 1), the Lq -space with respect to the probability measure P 2 P with density p 2 P will be denoted by

Lq (P ) = Lq (p) = f : X ?! IR :

Z

jf (x)jqp(x)(dx) < 1 : X The usual norm of Lq (p) will be denoted by k kLq (p) . In the special case of

the Hilbert space L2(p), the natural inner product will be denoted, for all f; g 2 L2(p), by

< f; g >P = < f; g >p =

Z

X

f (x)g(x)p(x)(dx) :

Furthermore, sometimes the norm of L2(p) is denoted by k kP or k kp . The space of L2(P ) functions that have zero expectation under P is denoted by

L20(P ) = L20 (p) = f 2 L2(P ) :

Z

X

f (x)p(x)(dx) = 0 :

Given a set A L20 (P ), we denote the closure of A with respect to the topology of L2 (P ) by clL2(P ) (A) = clL2(p) (A), and the orthogonal complement of A in L20(P ) by A? . 4

A path fptgt2V (converging to p) is dierentiable at p 2 P if there exists a neighborhood V of zero from the right and for each t 2 V we have the representation

pt( ) = p( ) + tp( ) ( ) + tp( )rt( ) for a certain ( ) 2 L20(p), and rt ?! 0; as t # 0 :

(2) (3)

The convergence in (3) is in some appropriate sense to be speci ed later. The term rt in (2) will be referred as the remainder term . The function : X ?! IR given in (2) is said to be the tangent associated to the dierentiable path fptg. Here the tangent plays the role of the score function of a submodel parametrized by t 2 V at p0 = p. A path is said to be weakly dierentiable if the sequence of remainder terms frtgt2V in (2) and (3) converges to zero in the following sense, 1Z jr (x)jp(x)(dx) ?! 0 ; as t # 0 ; (4)

t

fx:tjrt j>1g

Z

fx:tjrt j1g

t

jrt(x)j2p(x)(dx) ?! 0 ; as t # 0 :

(5)

It can be shown that a path is weakly dierentiable if and only if it is Hellinger dierentiable (see Pfanzagl, 1985 ). A path fptgt2V is Lq -dierentiable (for q 2 [1; 1]) if the sequence of remainder terms frtgt2V in (2) and (3) converges to zero in the Lq sense, i.e.

krtkLq(p) ?! 0 ; as t # 0 : (6) Clearly, if r; q 2 [1; 1] are such that r q and a path is Lq dierentiable at p 2 P , then it is also Lr dierentiable at p with the same tangent. The tangent set of P at p 2 P is the class ( 2 (p) : 9V; fptgt2V P ; frtgt2V ; ) 2 L o o 0 T (p) = T (p; P ) = such that 8t 2 V; (2) and (3) hold : The tangent space of P at p 2 P is given by T (p) = T (p; P ) = clL (p) [spanfT o(p; P )g] : 2 0

5

Since the tangent sets and spaces depend on the notion of path dierentiability adopted, we speak of Lq (for q 2 [1; 1)) or weak (or Hellinger) tangent w w sets and tangent spaces. When necessary we use the notation T 0 and T for the weak tangent set and tangent space respectively. Analogously, the Lq q q tangent spaces (or sets) are represented by T (or T 0). Suppose now that the family P can be indexed in the following form P = fPa : 2 IRp and a 2 Ag : (7) We treat as a ( nite-dimensional) parameter of interest and a as a nuisance parameter of arbitrary nature (typically in nite dimensional). It is of interest to de ne the submodel, for each 2 , P = fPa : a 2 Ag ; (8) obtained by xing the parameter of interest and letting the nuisance parameter vary freely. The tangent space of P at (; a) 2 A is called the nuisance tangent space at (; a). We use the notation TN to denote the nuisance tangent spaces (with the similar conventions for the nuisance tangent sets and for expressing the speci c notion of path dierentiability, if necessary). The orthogonal complement of TN with respect to L20(Pa ) endowed with the restriction of the L2(Pa ) usual inner product, is denoted by TN?(; a) = TN? . The partial parametric score function at (; a) 2 A is the score along the directions of the parameter of interest, de ned by l=( ) = l= ( ; ; a) = ( l=1 ( ) ; :::; l=p ( ) )T where l=i : X ?! IR is given, for each x 2 X , and i = 1; :::;p, by

l=i (x) = @@ i log p(x; ; a): : (9) Here 1 ; :::; p are the components of the vector . It is presupposed that for all x 2 X the derivatives in (9) are well de ned, and that l=i ( ) 2 L20 (Pa). The orthogonal projection of the partial score function l= onto TN? is the ecient score function at (; a) 2 A for the parameter and it E . The function lE : X A ?! IRp given by, for all is denoted by la x 2 X ; 2 ; a 2 A, E (x) ; lE (x ; ; a) = la is said to be the ecient score function.

6

3 Semiparametric Location-Scale Models In this section we de ne the semiparametric location-scale model used in the rest of the paper. Let be the Lebesgue measure and P a family of probability measures de ned on (IR ; B(IR)), dominated by and given by P = P : dPa ( ) = 1 a ? ; 2 IR; 2 IR ; a 2 A : (10) a

d

+

Here A is the class of functions a : IR ?! IR such that (11)-(19) given below hold.

8x 2 IR ; a(x) > 0; Z

IR Z

IR Z

and

IR

(11)

a(x)(dx) = 1 ;

(12)

xa(x)(dx) = 0 ;

(13)

x2a(x)(dx) = 1

(14)

a is dierentiable -almost everywhere.

(15)

We consider also the following technical conditions involving the Laplace transform and the behavior of the function a in the tails. Assume that there exists a > 0 (we stress that may depend on a) such that for all s 2 (?; )

M [s ; a( )] =

Z

IR

esx a(x)(dx) < 1 ;

(16)

and for all s 2 (?; ) "

#

Z 0 ( )g2 0 2 f a M s ; a( ) = esx faa((xx))g (dx) < 1 : IR

(17)

Assume further that i 8i 2 N0 ; xlim lim xi a(x) = 0 : !1 x a(x) = x!?1

7

(18)

Here N0 = f0; 1; 2; : : :g. Let us impose also the following additional condition, for i = 3; :::;k ;

Z

IR

xi a(x)(dx) = mi ;

(19)

where k is an integer greater than 1, m3 ; :::;mk are real quantities supposed given and xed (if k = 2 we assume the convention that condition (19) vanishes). Note that m1 = 0; m2 = 1; m3; : : : ; mk are in fact the rst k standardized cumulants of the distributions of P , which are hence assumed to be xed. Here the term standardized cumulants refers to the moments (about zero) of the standardized distribution (i.e. the distribution shifted and rescaled in order to have mean zero and variance one). We will treat a 2 A as the non-parametric component of a semiparametric model and (; ) as the parameters of interest. Conditions (11) and (12) imply that each a 2 A is a density of a probability measure with support equal to the whole real line. The identi ability of the parametrization given is a consequence of (13) and (14). Clearly, this is not the only possible way to obtain identi ability, but it turns out to be convenient for our purposes. We stress that conditions (11)-(18) are essential for the development given. On the other hand, condition (19) was imposed only to enable us to control the size of the class of models to be considered. From the statistical viewpoint condition (19) can be used to study the impact of knowing higher order standardized cumulants of the distribution in play. A potential eld of application for these techniques is in the study of turbulent ow of

uids where, due to the Kolmogorov theory, one can predict the values of the cumulants of the distribution involved (see Barndor-Nielsen, 1978 and Barndor-Nielsen et al., 1990 ). From the theoretical point of view we use condition (19) to impose constraints on A, yielding semiparametric models of diering types. In fact, the model obtained without condition (19) is, according to the classi cation of Wellner et al. (1994), a nonparametric model, in the sense that the tangent spaces are the whole L20 space. On the other hand, by imposing condition (19) (with some integer k greater than 2) one obtains a genuine semiparametric model, in the sense that the tangent spaces are in nite dimensional proper subsets of the L20 spaces. We will then study the eect of this qualitative change on the ecient score function. This will illustrate how the estimation problem becomes harder when one jumps from a nonparametric model to a genuine semiparametric 8

model. The meaning of the conditions (11)-(18) is discussed in detail in the following. We x for the rest of this section (; ; a) 2 IR IR+ A. We show now that condition (17) implies that the location and scale scores are in L2 (Pa ). The components of the score function with respect to the location and the scale parameters are given respectively by (

)

0 ? ? a ? 1 l=( ) = a ? ? ;

(20)

and

(

)

?

0 ? ? 1 ? l= ( ) = 1 + aa ? ? :

(21)

Note that condition (17) together with proposition 2 in appendix A.1 ensures 2 2 2 that the functions faa(( ))g and ( )2 fa a(()) g are Lebesgue integrable. It can be seen that condition (18) implies that 0

0

Z

IR

l= (x)a(x)(dx) =

Z

IR

l= (x)a(x)(dx) = 0 ;

i.e. the location and the scale partial scores are unbiased. We will need the condition (18) with polynomials of arbitrary order in the calculation of the nuisance tangent space and the projection of the score function onto the orthogonal complement of the nuisance tangent space. Condition (16) implies that the distribution associated with a possesses nite moments of all orders and that the polynomials are dense in L2(a) (see the proposition 2 and theorem 3 in the appendix A.1). Those properties will be crucial in the calculation of the nuisance tangent space and in the projection of the score function onto the orthogonal complement of the nuisance tangent space. The following proposition gives a useful sucient condition for verifying whether a given probability density satis es the technical conditions (16)(18).

Proposition 1 Let a : IR ?! IR be a function for which (11), (12) and (15) hold. Assume moreover that there exists > 0 such that for all s 2 (?; ), lim x! +1 e

sx a(x)

= x!?1 lim esxa(x) = 0 9

and

a0 (x) = lim esx pa0 (x) = 0 sx p lim e x!+1 a(x) x!?1 a(x) Then the technical conditions (16)-(18) hold.

Proof: Conditions (16) and (17) follow from proposition 3 in appendix A.1, part i) and (18) from part ii). tu Using the proposition above it is easy to see that the following classic families of distributions have probability densities satisfying the technical conditions (16)-(18): the normal distributions, the hyperbolic distributions, the Gumbel distributions, the double exponential or Laplace distribution.

10

4 Calculation of the nuisance tangent space In this section we characterize the nuisance tangent spaces of the semiparametric location-scale model given in section 3 in terms of orthonormal polynomials. More precisely, we will calculate the L2 - , weak and L1 - nuisance tangent spaces for the semiparametric location-scale model presented. Here (; ; a) will be treated as a xed (but arbitrary) point of the parameter space IR IR+ A. We denote by feigi2N0 the result of a Gram-Schmidt orthonormalization process with respect to the inner product of L2 (a) applied to the sequence of monomials, say f1; ( ); ( )2; :::g. Let us adopt the convention that for each i 2 N0, the polynomial ei ( ) is of degree i and introduce the notation ei ( ) = ei ? :

The following theorem gives a polynomial characterization for the L2- nuisance tangent space. Theorem 1 The L2- nuisance tangent space of the location-scale model given by (10) at (; ; a) 2 IR IR+ A is 2

T N (; ; a) = clL2(Pa ) [spanfei ( ) : i = k + 1; k + 2; :::g] :

Proof: We give now the main steps of the proof (the details can be found in appendix A.2). First of all, it can be shown that under a semiparametric location-scale model the L2 - nuisance tangent space at (; ; a) 2 IR IR+ A is given by 2 ? : 2T2 (0; 1;a) T N (; ; a) = N (see theorem 4 in the appendix A.3 for a detailed proof). Therefore there is no loss of generality in restricting our attention to the case where (; ; a) = (0; 1; a) and show that 2

T N (0; 1; a) = clL2(P01a ) fspan [fei( ) : i = k + 1; k + 2; :::g]g : 2

For notational simplicity, in this proof we denote T N (0; 1; a) by TN (and use the analogous convention for the tangent sets). De ne, for each i 2 N , Hi = spanfe1 ( ); :::;ei ( )g : 11

We prove next that TN Hk? , where Hk? is the orthogonal complement of Hk in L20(P01a). Take an arbitrary 2 TN0 and h 2 Hk . There exists a dierentiable path (at a) fatg A with tangent . Let frtg L20 (P01a) be the sequence of remainder terms of fat g. For each t,

j < ; h >a j = < atta? a ? rt; h >a Z

Z = 1 h(x)at(x)(dx) ? h(x)at (x)(dx) ? < rt; h >a t (from (12)-(14) and (19) ) = j < r t ; h >a j :

(P01a ) Since rt L ?! 0, < rt; h >a ?! 0 as t # 0. We conclude that < ; h >a = 0. Therefore TN0 Hk? and since Hk? is a closed linear space, TN Hk? = clL2(P01a ) fspan [fei( ) : i = k + 1; k + 2; :::g]g : Next we sketch the proof that Hk? TN . The veri cation of this inclusion above can be reduced to proving that for each i 2 fk + 1; k + 2; :::g 8 > < ei ( ); if i is even; hi ( ) = > : e ( ) ? e ( ); if i is odd; i+1 i is in TN (0; 1;a). The proof is done by showing that for t small enough we have that at( ) = a( ) + ta( )hi( ) belongs to A. The conditions (11)-(19) for at are veri ed in the appendix A.2. There, the crucial point is the veri cation of (16) and (17), which is done with the Laplace transform properties given in appendix A.1. tu 2

We characterize next the weak- and the L1- nuisance tangent spaces. Consider the following condition on the family of distributions P , which will be called the tail balance condition: for all p; q 2 P ; qp(( )) is ? a.s. bounded. Theorem 2 i) Under the location-scale model given by (10), for each (; ; a) 2 IR IR+ A we have 1

clL2(Pa) [spanfei ( ) : i = k + 1; k + 2; :::g] T N (; ; a) : 12

(22)

ii) Equality in (22) holds under the tail balance condition.

Proof:

2

The rst statement is straightforward from theorem and the inclusion T N 1 (; ; a) T N (; ; a). ii) The proof of the second part of the theorem is rather technical and can be found in a more general context in Labouriau (1995d). tu It is a straightforward corollary of the theorem above that under the tail balance condition the weak nuisance tangent space at (; ; a) is given by w

T N (; ; a) = clL2(Pa) [spanfei ( ) : i = k + 1; k + 2; :::g] :

(23)

Without the tail balanced condition it is dicult to ensure the equality in (23), however from the rst part of theorem 2 we have the inclusion W

clL2(Pa ) [spanfei ( ) : i = k + 1; k + 2; :::g] T N (; ; a) : Note that as discussed in Bickel et al. (1993, page 76 ) the equality in (23) is not needed to establish a bound for the asymptotic concentration of sequences of regular estimators (or for the asymptotic variance of regular asymptotic linear estimating sequences), provided the bound given by using 2 w T N (instead of T N ) is attained (by a regular asymptotic linear estimating sequence). We will show some examples where it happens indeed. We can then work with the L2- nuisance tangent spaces without essentially aecting our conclusions. The superscript \2" will be suppressed from the notation for the nuisance tangent space from now on.

13

5 Calculation of the ecient score function We calculate in this section the ecient score function for the location-scale model (10) by projecting the location and the scale partial scores onto the orthogonal complement of the nuisance tangent space. Thought this section (; ; a) is an arbitrary element of the parameter space IR IR+ A. We denote the probability measure Pa by P0 and the inner product and the norm of L2(P0 ) by < ; > and k k respectively. Moreover, A? will denote the orthogonal complement of A in L20 (P0). Recall that the L2- nuisance tangent space at (; ; a) is given by TN (; ; a) = clL2(P0 )[spanfek+1; ek+2; :::g]? : Since fei gi2N0 is an orthonormal basis in L2 (P0), the orthogonal complement of TN (; ; a) in L20(P0 ) is given by TN? (; ; a) = spanfe1 ; :::;ek g : (24) The ecient score function is calculated now by expanding the location and scale scores in Fourier series and taking the terms of indices 1; : : : ; k. More precisely, since l= ( ) is in L2 (P0) we have the following Fourier expansion in terms of the orthonormal basis fei gi2N0 ,

l=( ) = c1e1( ) + ::: + ck ek ( ) +

1 X

i=k+1

ci ei ( )

(25)

and hence the the location component of the ecient score function at (; ; a) is given by E ( ) = c e ( ) + ::: + c e ( ) l= (26) 1 1 k k and analogously the scale component of the ecient score function is E ( ) = d e ( ) + ::: + d e ( ) : l= (27) 1 1 k k Here the Fourier coecients in (26) and (27) are given, for all i 2 N , by (28) ci = < l=( ); ei ( ) > ? Z 0 x? = ? 1 a ? e x ? 1 a x ? (dx)

a x? i Z 1 + 1 0 = ? ei (y )a(y ) ?1 ? ei(y )a(y )(dy ) IR

Z 1 e0i (y)a(y)(dy) : =

IR

IR

14

Here we used the condition (18). A similar calculation leads to the following formula for the coecients of (27): Z d = < l ( ); e( ) > = 1 ye0 (y)a(y)(dy); for i = 1;: : : ; k :(29) i

=

i

IR

i

We discuss now the dependence of the ecient score function on the nuisance parameter. First of all, since for each i 2 N , ei ( ) is a polynomial of degree i, the coecient ci is a linear combination of the standardized cumulants up to order k ? 1 of the distribution with density a (see (28)). Moreover, the coecients di are linear combinations of the standardized cumulants of the distribution in play up to order k. We conclude that E ( ) and lE ( ) (given in (26) and (27) ) depend on the coecients of l= = the nuisance parameter a only through the standardized cumulants up to order k. However, the dependence of the ecient score function on the nuisance parameter is more complex because the polynomials e0 ; e1 ; :::;ek generated by the orthonormalization procedure (in L2(a)) depend on higher order standardized cumulants. In fact the polynomial ek ( ) depends on the moments of order up to 2k, because in order to normalize the polynomial of degree k in the Gram-Schmidt procedure, we have to divide the polynomial by its L2(a) norm, which clearly depends on the standardized cumulant of order 2k. Summing up, the dependence of the ecient score function on the in nite dimensional parameter for the location-scale model under study occurs here via a nite dimensional intermediate parameter involving only the standardized cumulants of order up to 2k.

5.1 The case where the rst two standardized cumulants are xed We study now in detail the case where only the standardized cumulants up to order 2 are xed (i.e. k = 2). It will be shown that in this case the ecient score function is equivalent to an estimating function, which is independent of the nuisance parameter and has the sample mean and standard deviation as roots. The rst three elements of the basis fei gi2N0 are e0( ) = 1 ; e1( ) = ( ) (30) q e2( ) = 1 f( )2 ? m3 ( ) ? 1g ; where 2 = m4 ? m23 ? 1 : 2

15

The detailed calculations are given in appendix A.4. There can be found also an argument showing that 2 > 0, and hence (30) is well de ned. Using the formulas (28) and (29) to calculate the coecients c0; c1 ; c2; d0 ; d1; d2 of the ecient score function gives c1 = 1 c2 = ?m23 d1 = 0 d2 = 2 2 : Inserting the coecients given above into (26) and (27) we obtain that the ecient score function at (; ; a) is given by E ( ) = c e ( ) + c e ( ) l= (31) 1 1 2 2 ( " )# 2 = 1 ? ? m23 ? ? m3 ? ? 1 2

and E () l=

=

d2e2 ( )

2 = 22

(

)

? 2 ? m ? ? 1 : 3

(32)

Under independent repeated sampling, with sample x = (x1; :::;xn )T we obtain the following expression for the ecient score function, 2P 6

lE (x ; ; ; a) = ?164

n ? xi ? i=1

? ?

m23

2

Pn

i=1

2 Pn 22 i=1

n? x

i ?

2

n? xi ? 2

? m3 ? xi? ? 1 ? m3

? xi?

?1

o3 7 7 o5

:

Multiplying the ecient score function by the nonsingular matrix "

1

M = m 3

? m2

3

#

?m23 ?22 2

we obtain the following estimating function which is equivalent to the ecient score function 2 P n

M lE (x ; ; ; a) = 4

xi ?^ i=1 ^ Pn xi ?^ 2 ? n i=1 ^

3 5

:

Note that the matrix M is indeed nonsingular (its determinant is ? 22=2 6= 0, see a justi cation in appendix A.4). Hence the ecient score function is 16

equivalent to an estimating function independent of the nuisance parameter with roots v u n n X u X ^ = 1 x and ^ = t 1 (x ? ^)2 :

n i=1

i

n i=1

i

In view of the optimality theory of estimating functions given in Labouriau (1996b), the ecient score function is optimal. Clearly, the sample mean and the sample variance are regular asymptotic linear estimators (for and 2). They are ecient, since the full tangent space is the whole L20 . Note that, in this case, the bound given by the L2 path dierentiability is attained (by regular linear asymptotic estimators), hence it coincides with the bound given by the weak path dierentiability.

5.2 The case where the rst three standardized cumulants are xed We show in this section that in the case where the standardized cumulants up to order 3 are xed (i.e. k = 3) the roots of the ecient score function do depend on the nuisance parameter through the cumulants up to order 6. Moreover, we exemplify some situations where the roots of the ecient score function are not the sample mean and the sample variance. We have computed in the last section the rst three coecients of the ecient score function, namely c0; c1 ; c2; d0 ; d1 and d2. We calculate now the coecients c3 and d3 which will allow us to compute the ecient score function by using (26) and (27). Note that the polynomial e3 is given by (see appendix A.4) 1

m

3 3 2 e3( ) = ( ) ? 2 ( ) ? m4 ? 2 ( ) ? m3 ? 2 3 2 2 2 where

= m5 ? m3m4 ? m3 and

3 =

( )3 ? 2 ( )2 ? m4 ? m32 ( ) ? m3 ? 2

2 2 2 Note also that according to the argument given in appendix A.4, 3 > 0 and hence e3 ( ) is well de ned.

17

Using (28) and (29) we obtain Z 1 1 m

3 0 c3 = e3(x)a(x)(dx) = 3 ? m4 ? 2 IR 3 2 and Z 1 1

0 d3 = xe3(x)a(x)(dx) = 3m3 ? 2 2 : (33) IR 3 2 Then the ecient score function is given by E ( ) = c e ( ) + c e ( ) + c e ( ) l= 1 1 2 2 3 3 ( " )# 2 ? ? ? m 1 3 ? m3 ? 1 = ? 22 + c3 3

and

(

? 3 ?

? 2

22 ? m4 ? m32 ? ? m3 ? 2 2 2

E ( ) = d e ( ) + d e ( ) l= 2 2 3 3 " ( )# 2 2 ? ? 1 ? m3 ? 1 = ? 22 ( ? 3 ? ? 2 d 3 + 22 3 ? m4 ? m32 ? ? m3 ? 2 : 2 2

We consider now some examples where the ecient score function simpli es a bit. Example 1 Let us consider the case where m3 = m5 = 0 and m4 = 3,i.e. the standardized cumulants up to order 3 coincide with those of the normal distribution. We have then that the coecients c3 and d3 vanish and hence the ecient score function coincides with the one obtained in the case where only the cumulants up to order 2 were xed. Therefore, in this case the roots of the ecient score function are the sample mean and standard deviation.

tu

18

Example 2 We study now the case where m3 = m5 = 0 and m4 6= 3. This is the case for instance for the Laplace distribution (double exponential) or the hyperbolic distribution with symmetry parameter vanishing, which are symmetric (hence m3 = m5 = 0) and have the standardized cumulant of fourth order dierent from 3. In this case the coecients of the ecient score function are given by n

o

c3 = 1 3 (3 ? m4) 6= 0 ; d3 = 1 3 3m3 ? 2 22 = 0 ; d2 = 2 2 c2 = ?m23 = 0 1 c1 = ; d1 = 0

The ecient score function is then of the form E ( ) = 1 e ( ) + 1 (3 ? m )e ( ) l= 4 3 1 3 3 2 ? 3 ? m 3 ? m + ? 4 4 3 ? 2 + 23 3 and E ( ) = d e ( ) l= 2 2

? 2 + 1 :

Under independent repeated sampling, with sample x = (x1; :::;xn)T , we obtain the following expression for the ecient score function 3 n 2 n E (x ) ? 3 ? m4 X xi ? + 3 ? m4 + 3 X xi ? l= (34) 23 i=1 23 i=1 and E (x l=

2 x ? i ) +n : i=1

n X

(35)

Now, equating (35) to zero we obtain v u u t

n X ^ = n1 (xi ? ^)2 : i=1

(36)

Equating (34) to zero, inserting (36) and rearranging we obtain the following equation

? n (2A + 1) ^3 + (4A + 3)

n X i=1

!

xi ^2 19

2

? 4f(n + 1)A + ng (

+ A

n X i=1

!

n X i=1

!

x2i + 2(A + 1)

x3i + 1 (A + 1) n

n X

n X i=1

x2 i

!

n X i=1

i=1

xi

xi

!2 3

!)

5

^

= 0;

where A = 3?m23 4 . We obtained then that ^ is a root of a polynomial of third degree with coecients depending on the standardized cumulants up to order 6 (note that 3 depends on m6 ). Hence ^ depends on the nuisance parameter and the ecient score function cannot be equivalent to an estimating function independent of the nuisance parameter. tu

20

6 Discussion There exists in the literature many studies of the extensions of the pure location model. We refer to Stein (1956), van der Vaart (1988,1991) and Bickel et al. (1993), among others. In all the studies referred to the unknown distributions are assumed to be symmetric, which can simplify the mathematical treatment considerably. In this paper we presented a class of distributions that are not necessarily symmetric and we treat at the same time the problem of estimating the scale. It is to be noted that we did not attempt to obtain the largest class (or even a very large class) of distributions, containing asymmetric distributions, for which it is possible to treat the problem of estimating the location (and the scale). Rather, we restricted the discussion to some interesting in nite dimensional classes of distributions. It would be a considerable improvement to eliminate the technical assumptions on the Laplace transform and the polynomial decay (of all orders) of the tails of the densities (i.e. conditions (16)-(18) ). In the case of the location-scale model where only the rst two moments are xed, the (global) tangent space is the whole space L20 (see Labouriau, 1996a). That is, the model is not a semiparametric model in the terminology of J. Wellner (see Groeneboom and Wellner, 1992, page 7) but rather is a nonparametric model. This implies that there is essentially only one sequence of regular linear asymptotic estimators. Clearly, this sequence of estimators is optimal, even though optimality in a class containing only one element does not say very much. On the other hand, the ecient score function, in this case, does not essentially depend on the nuisance parameter, and hence it is a genuine estimating function. Moreover, according to Labouriau (1996b) (see also Jrgensen and Labouriau, 1995), the ecient score function is an optimal estimating function. This result is not surprising in view of the previous discussion on regular asymptotic linear sequences of estimators. Note also that this is in agreement with the literature for the location model with symmetry, namely, the sample mean and the sample variance are the optimal estimator for the location and the scale respectively. Hence, we do not improve the \traditional" way for estimating location (and scale), but instead we give an alternative justi cation for this procedure. When one xes some standardized cumulants of higher order, the model becomes a genuine semiparametric model, in the sense that its (global) tan21

gent space becomes a proper subspace of L20 . In this case the estimation problem is harder even though the family of distributions now is smaller than the \unrestricted case" discussed before. In this case the ecient score function and its roots will depend on the nuisance parameter, however only through a nite dimensional intermediate parameter, namely only through a nite number of standardized cumulants (2k). This suggests that plugging some reasonable estimators of the standardized cumulants (of order up to 2k) into the ecient score function, can produce reasonable asymptotic results. However, since one has to estimate standardized cumulants of high order (2k), one can expect a poor performance for nite samples of moderate size. The question of how to estimate the location and the scale in this case remains open. The method of local ecient estimation and the method of sieves can perhaps oer some attractive alternatives.

Acknowledgement I wish to thank Professor Shun-Ichi Amari and Professor Ole BarndorNielsen for many useful discussions related with this work.

22

References [1] Barndor-Nielsen, O.E. (1978). Hyperbolic distributions and distributions on hyperbolae. Scand.J.Statist. 5, 151-157. [2] Barndor-Nielsen, O.E. ; Jensen, J.L. and Srensen, M. (1990). Parametric modelling of turbulence. Phil. Trans. R. Soc. Lond. A 332, 439-445. [3] Bickel, P.J.; Klaassen, C.A.J.; Ritov, Y. and Wellner, J.A. (1993). Ecient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press, London. [4] Billingsley, P. (1986). Probability and Measure. Second edition. John Wiley and Sons, New York. [5] Chow, Y.S. and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martingales. Springer-Verlag, Heidelberg. [6] Dunford, N. and Schwartz, J.T. (1958). Linear Operators, Part I . Interscience, New York. [7] Groeneboom P. and Wellner, J.A. (1992). Information Bounds and Nonparametric Maximum Likelihood Estimation . Birkhauser Verlag, Berlin. [8] Jrgensen, B. and Labouriau, R. (1995). Exponential Families and Theoretical Inference . Lecture notes at the University of British Columbia, Vancouver. [9] Kendall, G.M. and Stuart, A. (1952). The Advanced Theory of Statistics. Vol. 1. Charles Grin, London. [10] Labouriau, R. (1996a). A review of path and functional dierentiability. To appear. [11] Labouriau, R. (1996b). Estimating and quasi estimating functions. To appear. [12] Labouriau, R. (1996c). Characterizing estimating functions and nuisance ancillary quasi estimating functions. To appear. 23

[13] Labouriau, R. (1996d). Semiparametric L2- constrained models. To appear. [14] Luenberg, D.G. (1969). Optimization by Vector Space Methods. John Wiley and Sons, New York. [15] Kendall, G.M. and Stuart, A. (1952). The Advanced Theory of Statistics. Vol. 1. Charles Grin, London. [16] Pfanzagl, J. (1982). Contributions to a General Asymptotic Theory. Lecture Notes in Statistics 13. Springer-Verlag, New York. [17] Pfanzagl, J. (1985). Asymptotic Expansions for General Statistical Models. Lecture Notes in Statistics 31. Springer-Verlag, New York. [18] Pfanzagl, J. (1990). Estimation in Semiparametric Models: Some Recent Developments. Lecture Notes in Statistics 63. Springer-Verlag, New York. [19] Stein, C. (1956). Ecient nonparametric testing and estimation. Proc. Third Berkeley Symp. Math. Statist. 1, 187-195, Univ. California Berkeley. [20] Vaart, A.W. van der (1988). Estimating a real parameter in a class of semiparametric models. Ann. Statist. 16 , 1450-1474. [21] Vaart, A.W. van der (1988). Eciency and Hadamard dierentiability. Scand. J. Statist. 18, 63-75.

24

A Appendices

A.1 The 2 Laplace transform and polynomial approximation in L

Basic properties of the Laplace Transform

In this section we review the basic properties of the Laplace transform and prove some technical lemmas required in the study of the semiparametric location-scale model de ned in section 3. The properties presented here are essentially well known for the case were the distribution is concentrated in the positive real line, however the result concerning densities of polynomials in L2 is original. Let f : IR ?! [0; 1) be a function such that for some s 2 IR the integral

M (s; f ) =

Z

IR

esx f (x)(dx)

(37)

converges. The function M ( ; f ) : IR ?! [0; 1] such that for each s 2 IR, M (s; f ) is given by (37) is said to be the Laplace transform of f . We now study some properties of the functions with nite Laplace transform in a neighborhood of zero. Proposition 2 Let f : IR ?! [0; 1) be a continuous function such that for some > 0 and for all s 2 (?; ) M (s; f ) < 1 : (38) Then f possesses nite moments of all orders, i.e. for all n 2 N0 Z

IR

xnf (x)(dx) 2 IR :

Proof: Since for all s 2 (?; ), M (s; f ) < 1, ejsxj esx + e?sx and using

the series version of the monotone convergence theorem (see Billingsley 1986 page 214 theorem 16.6 2 ) we have

1 > 2

Z

IR

Z

IR

ex f (x)(dx) + ejxj f (x)(dx)

Z

IR

e?x f (x)(dx)

The referred theorem states: "If fn 0, then

RP

n fn d

25

=

P R n

fn d.".

=

Z

(

1 X

IR k=0

)

jxjk f (x)(dx) k!

(from theorem 16.6 in Billingsley 1986) =

1 X

k=0

(Z

IR

)

jxjk f (x)(dx) ; k!

and we conclude that the moments of all orders of f are in IR.

tu

The notion of Laplace transform can be extended to functions with range equal to the whole real line in the following way. Given a function f : IR ?! IR we de ne the positive and the negative part of f respectively by

f + ( ) = f ( )[0;1) ff ( )g and f ? ( ) = ?f ( )(?1;0] ff ( )g : Here A ( ) is the indicator function of the set A. We have clearly the decomposition f ( ) = f + ( ) ? f ?( ) : We de ne the Laplace transform of a function f : IR ?! IR as the function M ( ; f ) : IR ?! [?1; 1] given by

M ( ; f ) = M ( ; f + ) ? M ( ; f ?) ;

(39)

provided that at least one of the terms of the right side of (39) is nite (otherwise the Laplace transform of f is not de ned). The following proposition will be useful for the calculation of the L2 - nuisance tangent space of the location-scale model considered in section 3.

Proposition 3 Let f : IR ?! IR and > 0 be such that for all s 2 [?; ] M (s; f ) 2 IR. Then, for all n 2 N and all s 2 (?=2; =2) we have M [s; ( )nf ( )] 2 IR : Proof: Assume without loss of generality that the function f is nonnegative. Take an arbitrary s 2 [?=2; =2] and n 2 N . By hypothesis, f has nite Laplace transform in a neighborhood of zero; then, from proposition 2, f has nite moments of all orders, in particular Z

IR

x2n f (x)(dx) 2 IR : 26

Using the Cauchy-Schwartz inequality we obtain

jM [s;( )nf ( )]j = j < e( )s; ( )nf ( ) > j = j < e( )sf 1=2 ( ); ( )nf 1=2( ) > j (Cauchy-Schwartz inequality)

e( )sf 1=2( )

( )nf 1=2( )

=

Z

IR

e2sx f (x)(dx)

1=2Z

IR

x2nf (x)(dx)

1=2

a and k ka respectively. The conditions we give will ensure that the measure a possesses all moments nite, i.e. for all k 2 N , Z

IR

xk a(x)(dx) 2 IR :

In that case we can de ne the sequence of polynomials fei ( )gi2N0 L2(a) as the result of a Gram-Schmidt orthonormalization process applied to the sequence f1; ( ); ( )2; :::g. The following theorem gives a sucient condition for fei( )g to be a complete sequence in L2(a), which implies that the polynomials are dense in L2 (a).

Theorem 3 Let a : IR ?! IR be a function such that 8x 2 IR; a(x) > 0; 9 > 0 such that 8s 2 [?; ]; M (s; a) =

Z

IR

(40)

esx a(x)(dx) < 1 : (41)

Then the orthonormal sequence fei( )gi2N0 is complete in L2(a).

27

Proof: First of all we observe that condition (41) implies that the measure determined by a possesses nite moments of all orders (see proposition 2). Let f : IR ?! IR be a function in L2(a) such that for all k 2 N0 , Z

IR

xkf (x)a(x)(dx) = 0 :

(42)

We prove that f ( ) = 0 a-a.e. which implies the theorem (see Luenberg, 1969, Lemma 1, page 61). De ne for each k 2 N0 , t 2 [?=2; =2] and x 2 IR,

fk (x) = (xt)k f (x)a(x) : We will use a series version of the dominated convergence theorem applied to ffk g. In the following we nd a Lebesgue integrable function dominating uniformly (i.e. for all k) the functions fk , which will enable as to use the referred theorem. We have for each n 2 N , k 2 N0, t 2 [?=2;=2] and x 2 IR, n X

k=0

n X

fk (x)

k=0

jfk(x)j =

= jf (x)ja(x)

n X

n X

k=0

k=0 jf (x)ja(x)ejxtj

jxtjk jf (x)ja(x) k!

(43)

1 jxtjk jxtjk jf (x)ja(x) X k! k! k=0 xt jf (x)ja(x)fe + e?xtg

= q q xt ? xt a(x)(e + e ) = jf (x)j a(x) = g (x) ; where the function g is given, for all x 2 IR, by

q

q

g(x) = jf (x)j a(x)

a(x)(ext + e?xt ) :

(44)

We prove that the function g is Lebesgue integrable. For, note that

q

jf ( )j a( )

2

2

L ()

=

Z

IR

jf (x)j2a(x)(dx) = kf ( )k2a < 1:

Then the rst term in the right side of (44) is in L2 (). On the other hand,

2

( ) t a( )e

q

L2 ()

=

Z

IR

e2tx a(x)(dx) = M (2t; a) < 1 28

and

q

2

a( )e?( )t

L () 2

=

Z

IR

e?2txa(x)(dx) = M (?2t; a) < 1 :

Then the second term in the right side of (44) is in L2(). Using the CauchySchwartz inequality (see Luenberg, 1969, lemma 1, page 47) we obtain Z

IR

q q g(x)(dx) = < jf ( )j a( ) ; a( ) e( )t + e?( )t >

q

jf ( )j a( )

L2 ()

q

a( ) e( )t + e?( )t

< 1:

L2 ()

Since (43) holds for each n 2 N , x 2 IR, t 2 [?=2; =2] and g is Lebesgue integrable we can use the series version of the dominated convergence theorem (see Billingsley, 1986, theorem 16.7 page 214 3 ) to obtain Z

IR

(

)

1 X

(xt)k f (x)a(x) (dx) = IR k=0 k ! (from the series dominated convergence theorem) ) 1 (Z (xt)k X = f (x)a(x)(dx) = 0 : k=0 IR k ! Z

extf (x)a(x)(dx)

We conclude that for all t 2 [?=2; =2],

M [t; f ( )a( )] = 0 :

(45)

We show that (45) implies that f ( ) = 0 a-a.e. . For,

kf ( )k2a = j < f ( ); 1 >a j q q = < f ( ) f ( ); e( )=4 e?( )=4 >a

q

f ( )e( )=4 ;

q

f ( )e?( )=4 >a

= < (from the Cauchy-Schwartz inequality)

q

q

? ( ) = 4 ( ) = 4

f ( )e f ( )e

P

a

a

P

The theorem states: "If n fn converges almost everywhere and j nk=1 fk j g P almost everywhere, where g is integrable, then n fn and the fn are integrable, and R P R P n fn d". n fn d = 3

29

=

Z

IR

f (x)e=2xa(x)(dx)

1=2 Z

IR

f (x)e?=2xa(x)(dx)

1=2

= fM [=2; f ( )a( )]g1=2 fM [?=2; f ( )a( )]g1=2 = (from (45)) = 0 :

tu

Functions with exponentially decaying tails The following proposition gives a sucient condition for having the Laplace transform de ned in a neighborhood of zero, which is easy to verify.

Proposition 4 Let f : IR ?! [0; 1) be a continuous function such that for some > 0 and for all s 2 [?; ] lim esxf (x) = x!?1 lim esx f (x) = 0 (46) x!+1 Then we have: i) For all s 2 (?; ) the Laplace transform of f , M (s; f ), is nite. ii) For all k 2 N ,

lim xk f (x) = x!?1 lim xk f (x) = 0

x!+1

Proof:

i) Take s 2 (?; ). Condition (46) implies that there exists L 2 IR+ such that for all x 2 IR n [?L; L], ex f (x) < 1 and e?x f (x) < 1. We have then

M (s; f ) =

Z

esx f (x)(dx)

IR Z Z = esx f (x)(dx) + esxf (x)(dx) + esxf (x)(dx) [?L;L] [L;1) (?1;?L] Z Z Z sx ( s ? ) x x = e f (x)(dx) + e e f (x)(dx) + e(?s)xe?x f (x)(dx) [?L;L] [L;1) (?1;?L] Z Z Z esxf (x)(dx) + e(s?)x(dx) + e(?s)x(dx) < 1 : [?L;L] [L;1) (?1;?L] Z

30

ii) For each k 2 N ?x k x lim xk f (x) = x!lim +1fe x gfe f (x)g = 0

x!+1

and

lim xk f (x) = x!?1 lim fex xk gfe?xf (x)g = 0

x!?1

31

tu

A.2 Calculation of the 2- nuisance tangent space at (0 1 L

;

; a)

In this appendix we complete the details of the second part of the proof of theorem ??. Recall the notational conventions given there, fei gi2N0 denotes the result of a Gram-Schmidt orthonormalization process with respect to the inner product of L20 (a) applied to the sequence of polynomi2 als f1; ( ); ( )2; :::g. Moreover, T N (0; 1; a) is denoted by TN (and use the analougue convention for the tangent sets). De ne, for each i 2 N ,

Hi = spanfe1 ( ); :::;ei ( )g :

Lemma 1 Under the location-scale model (10) we have Hk? TN . Proof: For each i 2 fk + 1; k + 2; :::g de ne 8 > < ei ( ); if i is even; hi ( ) = > : e ( ) ? e ( ); if i is odd: i+1 i We will prove that for i 2 fk + 1; k + 2; :::g, hi ( ) 2 TN0 TN . This implies the lemma. To see that take any linear combination of fhk+1 ( ); hk+2 ( ); :::g in TN . In particular, we have that if i is even, then ei ( ) 2 TN , and if i is odd, then ei ( ) = hi ( ) ? hi?1( ) 2 TN . Hence for all i 2 fk + 1; k + 2; :::g, ei ( ) is in TN . Since TN is a closed linear subspace of L2 (a), clL (P a ) f[fei( ) : i 2 fk + 1; k + 2; :::g]g TN : 2

01

On the other hand, the condition (16) and the theorem 3 implies that the polynomials are dense in L2 (a) and, since Hk = spanfe0 ( ); :::;ek ( )g and fei( )gi2N0 is an orthonormal system, we have

Hk? = clL2(P01a) fspan [fei ( ) : i = k + 1; k + 2; :::g]g : We conclude that the lemma is proved if we show that for all i 2 fk + 1; k + 2; :::g, hi ( ) 2 TN0 and this is proved below. Let ( ) = hi ( ) for a xed but arbitrary i 2 fk +1; k +2; :::g. We prove that for t small enough,

at ( ) a( ) + ta( ) ( ) 2 A :

(47) 32

Note that (47) is equivalent to take a dierentiable path with vanishing remainder term. Hence, if we show (47) we prove in fact that 2 TN0 . We verify next that each at (for t in a neighborhood of zero) satis es the conditions (11)-(19). Veri cation of (11): Note that lim (x) = x!?1 lim (x) = 1 : x!+1

Hence there exists L > 0 such that for all x 2 IR n [?L; L], (x) > 0. Then, since a( ) is strictly positive, for all x 2 IR n [?L; L], at(x) = a(x) + ta(x) (x) > 0 : On the other hand, since ( ) is continuous its restriction to the compact interval [?L; L] is bounded, and since a( ) is continuous and strictly positive, the restriction of a( ) to [?L; L] is bounded alway from zero. It can be easily shown then that for t small enough and for all x 2 [?L; L]; at(x) > 0. Veri cation of (12)-(14) and (19): Given i 2 f1; 2; :::;kg and t 2 IR+ , Z

IR

xiat(x)(dx) =

Z

IR

xi a(x)(dx)+ t

Z

IR

xi (x)a(x)(dx) = mi + t0 = mi :

The second last inequality comes from the fact that fei gi2N0 is an orthogonal system in L2 (a) and from (12)-(14) and (19). Veri cation of (15): Since the polynomials are of class C 1 the property follows immediately for at . Veri cation of (16): We have for all s 2 (?; ) ( = (a) ) and all t 2 IR+ M (s; at) = M (s; a) + tM (s; a ) : Since is a polynomial and M (s; a) < 1 by hypothesis, from proposition 3 (in the appendix on the Laplace transform), M (s; a ) < 1, then, for all s 2 (?; ) M (s; at) < 1 : Veri cation of (17): A routine calculation yields fa0t( )g2 = p( )fa0( )g2 + q( )fa( )g2 + w( )a( )a0( ) ; (48) 33

for some polynomials p; q , and w. We will show that for all s 2 [?=2; =2] and for t 2 IR+ small enough, M [s; p( )fa0( )g2=at( )] ; M [s; q( )a2( )=at( )] ; (49) 0 M [s; w( )fa( )at( )g=at( )] 2 IR : Using (48) together with (49) yields 0 ( )g2 0 ( )g2 f a f a t M s; a ( ) = M s; p( ) a ( ) (50) t t 2 0 + M s; q ( ) a ( ) + M s; r( ) a( )at( ) 2 IR ; at ( ) at ( ) for all s 2 [?=2; =2], which implies (17). We prove (49). Take t small enough in such a way that for all x 2 IR, at(x) > 0 and let s be an arbitrary element of [?=2;=2]. The CauchySchwartz inequality gives that 0( ) 0 ( ) a ( ) a a ( ) a s s t ()2 p M s; r( ) = < e( ) 2 r( ) p 2 () ; e > (51) L at ( ) at ( ) at ( )

0 a ( )

( )s=2

( )s=2 a ( )

e r( ) pa ( )

e pa ( )

t t L2 () L2 () We verify that each of the terms in the right side of (51) are nite. Note that since limx!1 (x) = 1, there exists a L > 0 such that for all x 2 IR n [?L; L], at(x) > a(x). We have then

Z

2 2 a ( )

( ) s= 2 sx r2(x) a (x) (dx) e r( ) p (52) = e

at(x) at( ) L2() IR Z Z 2 2(x) a sx 2 = e r (x) a (x) (dx) + esx r2(x) aa ((xx)) (dx) IRn[?L;L] [?L;L] t t Z Z 2 2 esxr2(x)) aa ((xx)) (dx) + esxr2(x) aa((xx)) (dx) < 1 :

[?L;L]

t

IR

The rst integral in the last line is nite because its integrand is continuous and hence bounded in [?L; L], and the second integral is nite because it coincides with the Laplace transform of r2( )a( ) which according to proposition 3 is nite. Moreover,

2 Z

0 0 2

( )s=2 a ( ) sx fa (x)g (dx) p (53) = e

e

at(x) at( ) L2 () IR 34

Z

[?L;L]

esx faa ((xx))g (dx) + 2

0

t

esx faa((xx))g (dx) < 1 : 2

0

Z

IR

The rst integral in the last line is nite because its integrand is continuous and hence bounded in [?L; L], and the second integral is nite because it 2 coincides with the Laplace transform of faat(())g which according to condition (17) is nite. Inserting (52) and (53) in (51) we obtain, for all s 2 [?=2;=2], a( )a0( ) < 1 : M s; r( ) at( ) 0

h

i

We show now that M s; q ( ) aat (( )) is nite. Using the Cauchy-Schwartz inequality we obtain

2

2 M s; q( ) aa (( )) = < e( )s=2 q( ) pa( ) ; e( )s=2 pa( ) >L2() (54) at ( ) at ( ) t

a ( )

( )s=2 a( )

( )s=2

e q( ) pa ( )

e pa ( )

< 1: t t L2 () L2 ()

To see that the right side of the expression above is nite note that for every polynomial, say r, we have

e( )s=2r( ) pa( )

at( ) L2()

=

esx r2(x) faa((xx))g (dx) 2

Z

IR

t

(55)

Z Z 2 2 esxr2(x) faa((xx))g (dx) + esxr2(x) faa((xx))g (dx) IRn[?L;L] [?L;L] t Z 2 f a ( x ) g (dx) + M [s; r2( )a( )] < 1: esxr2(x)

at(x) Note that proposition 3 implies that Mi[s; r2( )a( )] < 1. h 2 Finally, we show that M s; p( ) faat(())g is nite. For, [?L;L] 0

Z 0( )g2 0(x)g2 f a f a sx M s; p( ) a ( ) = e p(x) a (x) (dx) IR t t Z Z 0 2 0 2 f a ( x ) g sx e jp(x)j a (x) (dx) + esxjp(x)j faa((xx))g (dx) IRn[?L;L] [?L;L] t Z 0 2 0( )g2 f a ( x ) g f a sx e jp(x)j a (x) (dx) + M s; jp( )j a( ) < 1: [?L;L] t

35

Veri cation of (18): Given i 2 N0 we have for each t 2 IR+ ,

lim xi at (x) = x!1 lim fxia(x) + xi (x)a(x)g = 0 ;

x!1

because xi (x) is a polynomial and from (18), for each polynomial p( ) we have limx!1 p(x)a(x) = 0. tu

36

A.3 Calculation of the tangent space at an arbitrary point In this section we show how to extend the calculation of the strong nuisance tangent space of a semiparametric location-scale model at the point (0; 1;a0 ) to an arbitrary point (0 ; 0 ; a0 ) 2 IR IR+ A. More precisely we prove the following theorem Theorem 4 Under a semiparametric location-scale model the nuisance tangent space at (0 ; 0 ; a0 ) 2 IR IR+ A is given by ? 0 TN (0 ; 0 ; a0 ) = : 2 TN (0; 1; a0) :

0

The theorem above holds for any of the notions of path dierentiability given in Labouriau (1996a), and in particular for any path dierentiability used in this paper. Therefore we do not specify the path dierentiability adopted in this appendix. The point (0; 0; a0 ) will be considered xed in the rest of this section. We introduce the following notation: P0 0 a0 = P0 ; a ( ) = 1 a ? 0 :

0 0 0 The inner product and the norm of L2(P0 ) will be denoted by < ; >0 and

k k0 respectively. We also use the symbol P0 to denote P . Note that P = 1 a ? 0 : a 2 A 0 0

0

and

T 0 (0; 0; a0 ) = N

0

0

(

)

0 2 L20(P0) : 9 > 0; fa0t gt2[0;) P0; frt0gt2[0;) L2 (P0) : such that (56) and (57) hold

The conditions required above are a0t ( ) = a0( ) + a0 ( ) 0 ( ) + a0 ( ) rt0( ) 2 P0 (56) and rt0( ) ?! 0 ; as t # 0 ; (57) where the convergence above is one of the forms of convergence de ned in section 2. 37

Lemma 2 If 2 TN0 (0; 1; a0), then

?0

0

2 TN0 (0; 0; a0).

Proof: Since 2 TN0 (0; 1; a0) there exists > 0, fatgt2[0;) P01 = A and frtgt2[0;) L2(P01a ) such that for all t 2 [0;), at( ) = a0( ) + ta0( ) ( ) + ta0 ( ) rt( ) 2 A (58) 0

and

rt0( ) ?! 0 ; as t # 0 :

Using (58), for all t 2 [0;), we can write 1 a ? 0 = 1 a ? 0 + t 1 a

0

t

? 0 ? 0 (59) 0 0 a0 0 0 a0 a0 ? ? 1 + t a0 a 0 r t a 0 0 0 0 ? = a ( ) + ta ( ) a 0 + ta ( )rt ?a 0 :

a0

0

0

Clearly, 10 at ?a00 2 P0, because at( ) 2 A. Suppose now that the convergence of the remainder term is in the Lp sense (for q 2 [1; 1)). We have then

q

? 0 rt a

q

q x ? 0 a (x)(dx) (60) = rt a IR 0 L (P0 ) Z = frt(y)gq a0(y)(dy) = krt( )kqLq(a0) :

0

Z

IR

Since, for all t 2 [0;), rt( ) 2 Lq (P01a0 ), then from (60), rt ?a00 2 L2 (P0),

q and since krt( )kL2 (P0 ) ?! 0,

rt ?a00

Lq (P ) ?! 0. We conclude that

?0

01a0

q 0 2TN

(0; 0; a0 ). An analogous argument can be used to prove the lemma for the weak nuisance tangent spaces. The idea again is to use a suitable change of variables in the integrals used to de ne the convergence of the remainder term. tu 0

Lemma 3 If

?0 0

2 TN0 (0; 0; a0), then ( ) 2 TN0 (0; 1; a0). 38

Proof: We give next the argument for the L2- nuisance tangent space.

For the other notions of path dierentiability the argument is analogous. Since ?00 2 TN0 (0 ; 0 ; a0 ), there exist > 0, fa0t gt2[0;) P0 and frt0gt2[0;) L2(P0) such that

a0( ) = a( ) + ta ( )

t

and

? 0 + ta ( ) r ( )

0

(61)

t

rt0( ) 0 ?! 0 ; as t # 0 : Since for each t 2 [0; ) a0t ( ) 2 P0 , there exists at ( ) 2 A such that 1 ? 0 0 at ( ) = at : 0

Then (61) is equivalent to 1 a ? 0 = 1 a

t

0

0

? 0 + t 1 a ? 0 ? 0 0 0 0 0 +t 1 a ? 0 r0 ( ) : 0

t

0

Eliminating the common factor 1 and changing variables we obtain

at( ) = a0( ) + ta0( ) ( ) + ta0 ( )rt(0( ) + 0) :

Note that

k rt[0( ) + 0 ] k2 2

Z

frt[0(x) + 0]g2a0(x)(dx) (62) IR Z 1 y ? 0 2 = frt(y)g a0 (dy) IR 0 = krt( )k2L (P ) : Then rt(0 ( ) + 0 ) 2 L2 (P01a ) and rt [0( ) + 0 ]k2L (P a ) ?! 0 as t # 0. We conclude that ( ) 2 TN0 (0; 1; a0 ). tu =

L (P01a0 )

2

0

2

0

01 0

We give now the proof of theorem 4.

Proof: (of theorem 4) From the lemmas 2 and 3 we have ? 0 0 0 : 2 T (0; 1; a ) : T ( ; ; a ) = N

0 0 0

0

N

39

0

Using the lemmas 4 and 5 given below we have TN (0 ; 0 ; a0 ) = clL2(P0)[spanfTN0 (0; 0; a0)g]

= clL2(P0 ) span ? 0 : 2 TN0 (0; 1; a0 ) 0 (from lemma 4 ) = clL2(P0 ) ? 0 : 2 spanfTN0 (0; 1; a0 )g 0 (from lemma 5 ) ? 0 0 2 = : 2 clL (a0 ) [spanfTN (0; 1; a0 )g]

=

0

? 0 : 2 TN (0; 1; a0 )

0

tu

We present now the two technical lemmas required in the proof given above. Lemma 4 Given a class of functions A we have span ? : 2 A = ? : 2 span(A) :

Proof: ''

n

o

Take h 2 span ?00 : 2 A . Then there exists n 2 N , t1 ; :::;tn 2 IR o n and h1 ; :::;hn 2 ?00 : 2 A such that

h( ) =

n X i=1

ti hi ( ) :

n

o

Since for each i 2 f1; :::;ng, hi 2 ?00 : 2 A , there exists i 2 A such that i ? 0 = hi ( ) : Clearly

i=1 ti i ( ) 2 span(A)

Pn

h( ) =

0

and n X i=1

? 0 ti i :

0

40

n o Hence h 2 ?00 : 2 span(A) .

'' n o Take z 2 ?00 : 2 span(A) . Then there exists 2 span(A) such that z ( ) = ?00 . Since 2 span(A), there exists n 2 N , t1 ; :::;tn 2 IR P and 1 ; :::;n 2 A such that ( ) = ni=1 ti i ( ). We have then n X

z( ) =

i=1

? 0 ti i

0

? and hence z 2 span ? : 2 A .

tu

Lemma 5 Let A be a class of functions contained in L2(a0). Then ? ? cl : 2A = : 2 cl (A) : L2(P0 )

L2(a0 )

Proof: Note that for all f 2 L2(a0) we have

2 Z

? ? 1 0 0 2

f ? 0 = f (dx) (63)

a0 IR 0

L2 (P0 )

=

Z

IR

0

0

0

f 2 (y)a(y)(dy) = kf ( )kL2(a0 ) :

We prove now the lemma. "" ? Take z 2 clL2(P0 ) ? : 2 A . Then there exists a sequence fzng 2 (P ) ? ? 0 z. Moreover, for all n 2 N we have : 2 A such that zn L?! ? zn( ) = n 0 0 , for some n 2 A. Since fzng is convergent, it is a Cauchy sequence in L2 (P0). From (63), fn g is a Cauchy sequence in L2(a0) and, since L2(a0 ) is complete, fzng is convergent, say (a0 ) n L?! ; 2

for some 2 clL2(a0 ) (A). Using (63) and the L2 (a0)-continuity of the norm k kL2(a0) we obtain

? ? z( )

L2 (P0 )

= lim n" n

? ? z ( )

n

L2 (P0 )

41

= 0:

?

Hence, z ( ) = ? and the inclusion follows. "" ? Take z 2 ? : 2 clL2(a0 ) (A) . Since 2 clL2(a0 ) , there exists a 2 (a ) 0 sequence fn g A such that n L?! . Note that fng is a Cauchy 2 sequence in L (a0). De ne the sequence fzn g in L2(P0 ) by, for all n, zn ( ) = n ?00 . From (63) we see that fzng is a Cauchy sequence in L2(P0) and o n then it is convergent with limit, say 2 clL2 (P0 ) ?00 : 2 A . Using (63) and the L2 (P0)-continuity of the norm k kL2 (P0 ) we obtain

? 0

zn ( ) ? n k( ) ? z( )kL (P ) = lim n" 0 L (P ) = 0 : 2

0

2

0

(64)

tu

A.4 Calculation of the rst four orthogonal polynomials in 2 L

(a)

We calculate here the rst four orthogonal polynomials in L2(a) (i.e. e0; e1; e2 and e3 ) in terms of the standardized cumulants of the distribution given by a. Here a is an arbitrary element of the class of probability densities A de ned in section 3. Throughout this appendix the inner product and the norm of L2 (a) will be denoted byR < ; > and k k respectively. We use also the notation, for i 2 N0, mi = IR xia(x)(dx). Recall that feigi2N0 is the result of a Gram-Schmidt orthonormalization procedure applied to the sequence of polynomials f1; ( ); ( )2; :::g. We start the Gram-Schmidt procedure by setting

e0 ( ) = 1 ; and observing that

ke0k = 1 : Taking e1( ) = ( ) we have < e0 ; e1 >=

Z

IR

xa(x)(dx) = 0 42

and

ke1k2 =

Z

IR

x2a(x)(dx) = 1 :

(65)

Hence e1 satis es the required orthonormal conditions. We calculate now e2. The Gram-Schmidt orthogonalization procedure (see Luenberg, 1969, page 55) gives the following polynomial of degree 2 orthogonal to e0 and e1 : p2 ( ) = ( )2? < e0 ; ( )2 > e0( )? < e1 ; ( )2 > e1( ) = ( ) 2 ? m3 ( ) ? 1 : Note that the polynomials f1; ( ); ( )2g are linearly independent in L2(a) and p2 is a linear combination of these polynomials with non-vanishing leading coecient. Hence p2 6= 0 and consequently kp2k 6= 0. We denote kp2k by 2 . We stress that 2 > 0, which is a known result (see Kendall and Stuart, 1952) We have Z

22 = (x2 ? m3x ? 1)2a(x)(dx) = m4 ? m23 ? 1 : IR

We de ne

e2( ) = 1 f( )2 ? m3 ( ) ? 1g : 2

We compute now e3 . The Gram-Schmidt orthogonalization procedure gives the following polynomial of degree 3 orthogonal to e0 , e1 and e2 p3 ( ) =( )3? < e0 ; ( )3 > e0( )? < e1 ; ( )3 > e1( )? < e2; ( )3 > e2 ( ) 3m4 ? m3) ( ) =( )3 ? m5 ? m3m2 4 ? m3 ( )2 ? m4 ? m3 (m5 ? m 22 2 m ? m m ? m 5 3 4 3 ? m3 ? : 2

2

Denote kp3k by 3. Note that p3 is a linear combination of linearly independent elements of L2(a) (in fact e0 ; e1 ; e2 and ( )3) with a non-vanishing leading coecient. Then p3( ) 6= 0 and consequently 3 = kp3k > 0 (this generalizes the result of Kendall and Stuart (1952) concerning 2 ). We have then e3( ) = 1 p3( ) 3 43

=

1 ( )3 ? ?m3 m4 ? m3 ( )2 ? m ? m3 (m5 ? m3m4 ? m3 ) ( ) 4 3 22 22 m ? m m ? m 5 3 4 3 : ? m3 ? 2

2

44

On the E cient Score Function for some Semiparametric Location ...

On the E cient Score Function for some Semiparametric Location ...

Suggest Documents

Dynamic Hashing + Quorum = E cient Location

Forwarding Pointers for E cient Location Management in ... - CiteSeerX

Asymptotic Expansions for Some Semiparametric ...

Effi cient Semiparametric Estimation of the Fama&French Model and ...

Semiparametric Estimation of Covariance Function for ... - CiteSeerX

O-E cient Dynamic Point Location in Monotone Planar Subdivisions

On the E cient Allocation of Resources for Hypothesis

Space-e cient On-the- y Race Detection for

The Influence Function of Semiparametric Estimators

application semiparametric spline regression on modelling un score ...

E cient Dynamic Dispatch without Virtual Function Tables ... - CiteSeerX

E cient Algorithms for Learning Simple Belief

Compiling Array Expressions for E cient Execution on ... - CiteSeerX

E cient Dynamic Programming Algorithms for

O-E cient Algorithms for Multidimensional

E cient Algorithms for Learning Simple Belief

E cient Process Migration for Parallel Processing on Non-Dedicated ...

E cient Algorithms for MultiPolynomial Resultant

An E cient Recursive Factorization Method for

A Computationally E cient Oracle Estimator for

E cient Parallel Algorithms for Geometric

An e cient validation mechanism for Inductive

E cient Algorithms for Least Squares Type

E cient Optimization by Modifying the Objective