Linear Estimation of Scale Parameter for Logistic ... - Sankhya

2 downloads 0 Views 186KB Size Report
best linear unbiased estimators (BLUE's) for small sample sizes. AMS (2000) ...... Linear estimation of standard deviation of logistic distribution: Theory and ...
Sankhy¯ a : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 870-884 c 2007, Indian Statistical Institute °

Linear Estimation of Scale Parameter for Logistic Distribution Based on Consecutive Order Statistics Patrick G.O. Weke University of Nairobi, Nairobi, Kenya Abstract Linear estimation of the scale parameter of the logistic population based on the sum of consecutive order statistics, when the location parameter is unknown, is discussed. A method based on a pair of single spacing and ‘zero-one’ weights rather than the optimum weights is presented. Limited simulations indicate small bias and variance of the estimator, and reasonably high relative efficiencies with respect to the Cramer-Rao lower bound and best linear unbiased estimators (BLUE’s) for small sample sizes. AMS (2000) subject classification. Primary 62F10. Keywords and phrases. Order statistics, logistic distribution, linear estimation, relative efficiency.

1

Introduction

The logistic distribution arises frequently in statistical modelling. It has been used in the analysis of survival data, graduation of mortality statistics and is used in some applications as a substitute for the normal distribution (Balakrishnan and Cohen, 1991). Let X1:n ≤ · · · ≤ Xn:n denote the order statistics from a random sample of size n from the logistic distribution whose cumulative distribution function is h n √ oi−1 F (x; µ, σ) = 1 + exp −π(x − µ)/σ 3 ; − ∞ < x < ∞, − ∞ < µ < ∞, σ > 0. The distribution is absolutely continuous, symmetric about the location parameter µ and has scale parameter σ. There has been much work on the

Scale parameter estimation for logistic distribution

871

estimation of the location and scale parameters of a distribution through linear functions of order statistics. Optimal weights for these linear combinations can be calculated in the case of a given distribution (Balakrishnan and Cohen, 1991). Dixon (1957) introduced the notion of using ‘zero-one’ weights rather than the optimum weights and noted that high efficiencies are achievable. This work has been extended by many authors, notably for the logistic distribution (Raghunandanan and Srinivasan, 1970) and the normal distribution (Wang, 1980). Raghunandanan and Srinivasan (1970) constructed simplified estimators of the location and scale parameters for complete and symmetrically censored samples for sample sizes 4 ≤ n ≤ 20. The principal objective of this paper is to develop, based on the work of Wang (1980), Weke et al. (2001) and Weke (2005), a simplified linear estimator of the scale parameter of the population when its location parameter is unknown. The proposed procedure is applicable to both censored and uncensored samples and for large sample sizes. Bias and variance of this estimator are examined through Monte Carlo simulations. 2

Deriving Expectation Formulae

Let U1 , U2 , . . . , Un be a random sample of size n from the uniform U (0, 1) distribution with pdf f (u) = 1, 0 ≤ u ≤ 1. Then, the density function of the i-th order statistics Ui:n , 1 ≤ i ≤ n, is given by fi (u; n) =

Γ(n + 1) ui−1 (1 − u)n−i , Γ(i)Γ(n − i + 1)

0 ≤ u ≤ 1.

(2.1)

In the sequel, we will work with some adjustments to the values of i and n, and a corresponding adjustment to the beta distribution given in (2.1). If α and β are positive real numbers such that 1 + α ≤ i ≤ n − β, we consider the modified pdf Γ(n − α − β + 1) ui−α−1 (1 − u)n−i−β , 0 ≤ u ≤ 1. Γ(i−α)Γ(n−i−β +1) (2.2) 0 0 0 0 Let i = i − α and n = n − α − β. Let Ui :n denote a random variable having the pdf of (2.2). The mean of this pdf is fi−α (u; n − α − β) =

i0 , (2.3) n0 + 1 and the central moment of order k is given by i (k − 1)πi0 (1 − πi0 ) h k−2 0) µk = E[(Ui0 :n0 − πi0 )k ] = E (U − π . (2.4) i+1:n+2 i n0 + 2 πi0 = E(Ui0 :n0 ) =

872

Patrick G.O. Weke

To see this, note that Z 1 µk = (u − πi0 )k−1 (u − πi0 ) 0

πi0 (1 − πi0 ) = − n0 + 2

Z 0

1

Γ(n0 + 1) 0 0 0 ui −1 (1 − u)n −i du Γ(i0 )Γ(n0 − i0 + 1)

(u − πi0 )k−1

Γ(n0 + 3) Γ(i0 + 1)Γ(n0 − i0 + 2) 0

0

0

×[i0 (1 − u) − (n0 − i0 + 1)u]ui −1 (1 − u)n −i du (integrating by parts) =

(k − 1)πi0 (1 − πi0 ) n0 + 2

Z

1

0 i0

(u − πi0 )k−2 0

Γ(n0 + 3) Γ(i0 + 1)Γ(n0 − i0 + 2)

0

×u (1 − u)n −i +1 du. Hence, the following results may easily be obtained. (i) E(Ui0 :n0 ) = πi0 . (ii) µ2 =

(2.5)

1 πi0 (1 − πi0 ). n0 + 2

(2.6)

2 πi0 (1 − πi0 )(1 − 2πi0 ). (n0 + 2)(n0 + 3) ( µ ¶ ) 1 − 2πi0 2 3πi0 (1 − πi0 ) πi0 ,1 (1 − πi0 ,1 ) + , (iv) µ4 = n0 + 2 n0 + 4 n0 + 3

(iii) µ3 =

where πi0 ,1 =

(2.7) (2.8)

i0 + 1 . n+3

(v) For any non-negative integer r, it is possible to find a quantity M , which does not depend on n0 and i0 , such that γr , the absolute central moment of order r of Ui0 :n0 , satisfies M , nr/2 M < (r+1)/2 , n

γr < and γr

if r is even (2.9) if r is odd.

The inequalities in (2.9) can be proved by induction from (2.4). Let X1:n , . . . , Xn:n be order statistics from a continuous and strictly increasing distribution function F .

Scale parameter estimation for logistic distribution

873

Definition 1. For any fixed c ∈ [0, 1], a sequence of order statistics {Xin :n }, n = 1, 2, . . ., is called a c-sequence, if in /n → c as n → ∞. Definition 2. For ε > 0, an ε-neighbourhood of the convergent sequence ζr0 , r = 1, 2, . . ., is defined as the set of all points ζ satisfying the inequalities |ζ − ζr0 | ≤ ε, for all r. Let G denote the inverse of the distribution function F . Let H denote the function H(u) = uα (1 − u)β , 0 ≤ u ≤ 1, (2.10) where α and β are positive constants. We use the notations Lr (u) = u(1 − u)

G(r+1) (u) , G(r) (u)

r = 1, 2, 3,

(2.11)

H (r) (u) , H(u)

r = 1, 2, 3,

(2.12)

and Kr (u) = ur (1 − u)r

where G(r) (u) and H (r) (u) denote the r-th order derivatives of G(u) and H(u), respectively. Further, let 1 (2.13) Q1 (πi0 ; α, β) = K1 (πi0 ) + L1 (πi0 ) and 2µ ¶ 1 K1 (πi0 ) Q2 (πi0 ; α, β) = − K2 (πi0 ) 2 +1 2 L1 (πi0 ) · ¸ 2 K2 (πi0 ) + (1 − 2πi0 ) 3 + 3K1 (πi0 ) + L2 (πi0 ) 3 L1 (πi0 ) " # 0 ) + (n0 + 3)−1 (1 − 2π 0 )2 1 πi,1 (1 − πi,1 i + 4 πi0 (1 − πi0 ) · ¸ K3 (πi0 ) ×4 +6K2 (πi0 )+4L2 (πi0 )K1 (πi0 )+L2 (πi0 )L3 (πi0 ) L1 (πi0 ) (2.14) Theorem 2.1. Let Xi:n, . . . Xn:n be order statistics from the distribution F , such that the following conditions hold. (a) For some positive real numbers α and β, the inverse G of the distribution function F is such that G(u)uα (1 − u)β is bounded in the closed interval [0, 1];

874

Patrick G.O. Weke

(b) G(u) and its first four derivatives are continuous and bounded in the closed interval [0, 1], and G(r) is bounded in the open interval (0, 1), possibly with the exception of a finite number of points; (c) {Xin :n } is a c-sequence, where c ∈ (0, 1); and (d) 0 ≤ α < in and 0 ≤ β < n − in + 1. Then, with the simplified notations: i for the index in , i0 for i − α and n0 for n − α − β, the following holds. E(Xi:n ) = G(πi0 ) + +

2(n0

n0

1 G(1) (πi0 )Q1 (πi0 ; α, β) +2

£ ¤ 1 G(1) (πi0 )Q2 (πi0 ; α, β) + O (n0 + 1)−3 2 + 1)

(2.15)

for all 1 ≤ i ≤ n. Proof. E(Xi:n ) = = = =

n! (i − 1)!(n − i)!

Z



x[F (x)]i−1 [1 − F (x)]n−i f (x)dx (2.16)

−∞

Z 1 n! G(u)ui−1 (1 − u)n−i du (i − 1)!(n − i)! 0 Z 1 n! G(u)H(u)ui−α−1 (1 − u)n−i−β du (i − 1)!(n − i)! 0 R1 0 0 E(GH) 0 G(u)H(u)fi (u; n )du = . (2.17) R1 E(H) H(u)fi0 (u; n0 )du 0

The expectations in the last expression in (2.17) are calculated by using the pdf in (2.2). From condition (a), G(u)H(u) in (2.17) is bounded. From (d), fi0 (u; n0 ) may be considered as a pdf, and therefore the integral in the numerator exists and is bounded. For similar reasons, the integral in the denominator of (2.17) exists and bounded, and its value is greater than zero. Hence, the ratio in (2.17) is well defined. From (c), πi0 is located in an ε-neighbourhood of c, which ensures that (d) is satisfied for large n. Denote an ε-neighbourhood of {πi0 } by Ω1 and

Scale parameter estimation for logistic distribution

875

the remainder of the interval [0,1] by Ω2 . Separation of the numerator of (2.17) into two parts leads to Z 1 G(u)H(u)fi0 (u; n0 )du 0 Z Z 0 = G(u)H(u)fi0 (u; n )du + G(u)H(u)fi0 (u; n0 )du Ω1

Ω2

= J1 + J2

(say).

By first considering J2 , and using the fact that GH is bounded, we have, by Chebyschev’s inequality, |J2 | ≤ M ε−k γk , (2.18) where the quantity M is a positive number and k can be any positive integer. Now, using condition (b), the integral J1 can be expanded in Taylor series as G(u)H(u) =

4 ¯ X 1 ¯ [G(u)H(u)](r) ¯ (u − πi0 )r r! u=πi0 r=0 ¯ 1 ¯ + [G(u)H(u)](5) ¯ (u − πi0 )5 , 5! u=u0

(2.19)

where u0 is some number in Ω1 . Since

Z

µr =

Z r

Ω1

0

(u − πi0 ) fi0 (u; n )du +

we have J1 =

Ω2

(u − πi0 )r fi0 (u; n0 )du,

4 ¯ X 1 ¯ (GH)(r) ¯ · µr − H1 + H2 , r! u=πi0

(2.20)

r=0

where H1 and H2

Z 4 ¯ X 1 (r) ¯ · (u − πi0 )r fi0 (u; n0 )du = (GH) ¯ r! u=πi0 Ω 2 r=0 Z ¯ 1 ¯ · (u − πi0 )5 fi0 (u; n0 )du. (GH)5 ¯ = 5! u=u0 Ω2

(2.21) (2.22)

For the same reasoning as for J2 in (2.18), we have |H1 | < M γ5

(2.23)

876

Patrick G.O. Weke

for all positive integer k, where M is a positive number. Since H(·) is positive definite and (GH)(5) |u=u0 is bounded due to condition (b), we also have |H2 | < M γ5 ,

(2.24)

where M is a positive number. Let k = 5 in (2.18) and (2.23). By considering (2.17), (2.18), (2.20), (2.23), (2.24) and (2.9), we obtain 4 ¯ X £ ¤ 1 ¯ E(GH) = (G(u)H(u))(r) ¯ · µk + O (n0 + 1)−3 . r! u=πi0

(2.25)

r=0

Similarly, the denominator in (2.17) simplifies to 4 ¯ X £ ¤ 1 ¯ (H)(r) ¯ · µk + O (n0 + 1)−3 . E(H) = r! u=πi0

(2.26)

r=0

Since µ1 = 0, formula (2.17), together with (2.25) and (2.26), yields E(Xi:n ) GH + 2!1 µ2 (GH)(2) + 3!1 µ3 (GH)(3) + 4!1 µ4 (GH)(4) + O[(n0 + 1)−3 ] (2.27) H + 2!1 µ2 H (2) 3!1 µ3 H (3) + 4!1 µ4 H (4) + O[(n0 + 1)−3 ] " # 1 (GH)(2) 1 (GH)(3) 1 (GH)(4) = G 1 + µ2 + µ3 + µ4 2! GH 3! GH 4! GH   à !2 (2) (2) (3) (4) 1 H 1 H 1 H 1 H  +O[(n0 +1)−3 ] ×1− µ2 + µ2 − µ3 − µ4 2! H 2! H 3! H 4! H   à ! µ à !2  ¶2 (2) (2) (2) (2) H (2) 1 (GH) H 1 H (GH)  = G1+ µ2 − − µ2  − 2! GH H 2 GH.H H à ! à !# 1 (GH)(3) H (3) 1 (GH)(4) H (4) + µ3 − + µ4 − +O[(n0 +1)−3 ] 3! GH H 4! GH H " à ! µ ¶Ã ! 2 1 G(1) H (1) G(2) 1 G(1) H (1) H (2) G(2) H (2) = G 1+ µ2 2 + − µ2 2 + 2! GH G 2 GH 2 GH à ! 1 G(1) H (2) G(2) H (1) G(3) + µ3 3 +3 + 3! GH GH G à !# G(1) H (3) G(2) H (2) G(3) H (1) G(4) 1 +6 +4 + +O[(n0 +1)−3 ], (2.28) + µ4 4 4! GH GH GH G =

Scale parameter estimation for logistic distribution

877

where all the functions are evaluated at πi0 . Multiplying the terms within the bracket in the above equation by G, and by taking out G(2) from the third, the fourth and the fifth terms, we obtain E(Xi:n )

"

= G + µ2 G

1 + µ3 3

(1)

# " Ã ! (2) H (1) 1 G(2) 1 (2) G(1) H (1) 2H + + G −µ2 2 (2) +1 H 2 G(1) 2 H G H

Ã

H (1) G(3) G(1) H (2) +3 3 (2) + (2) H G H G

!

à !# £ ¤ 1 G(1) H (3) G(3) H (1) G(4) G(3) H (2) + µ4 4 (2) +4 (2) +6 + (3) · (2) +O (n0 +1)−3 . 12 H G H G H G G Use of Equations (2.6)–(2.8) and (2.11)–(2.14) gives E(Xi:n ) = G(πi0 ) + +

2(n0

n0

1 G(1) (πi0 )Q1 (πi0 ; α, β) +2

£ ¤ 1 G(1) (πi0 )Q2 (πi0 ; α, β) + O (n0 + 1)−3 , (2.29) 2 + 1)

which completes the proof.

2

Note that Weke et al. (2001) used the above approach but expanded G(u)H(u) only up to the second term. Definition 3. If, for arbitrarily small ε > 0, the function G(·) satisfies (µ )# µ ¶ " ¶ 1 −1+ε 1 k 1+O ln G(u) = −c0 ln u u or

µ G(u) = c0 ln

1 1−u

¶k "

(µ 1+O ln

1 1−u

(2.30)

¶−1+ε )# (2.31)

for some positive constants c0 and k, then G(·) is called asymptotic logarithm transform (abbreviated as ALT) at u = 0 and u = 1, and a random variable having distribution G−1 is called an AL-variable.

878

Patrick G.O. Weke Furthermore, if the error terms in formulae (2.30) and (2.31) satisfy (  " " (µ ¶−1+ε )#(r) ¶−1 )(r) ¶−1+ε # µ µ 1 1 1 O O ln = ln ln , u u u for r = 1, 2, 3, 4, (2.32)

and " (µ O

1 ln 1−u

¶−1+ε )#(r)

( µ  = ln

1 1−u

¶−1 )(r)



"µ O ln

1 1−u

¶−1+ε # ,

for r = 1, 2, 3, 4, (2.33) then from the relations (2.30) and (2.31), it can be deduced that the r-th derivatives of G(u) satisfy "µ (µ ¶k #(r) " ¶−1+ε )# 1 1 ln 1+O ln , for r = 1, 2, 3, 4 G(r) (u) = −c0 u u (2.34) and "µ (µ ¶k #(r) " ¶−1+ε )# 1 1 (r) ln 1+O ln , for r = 1, 2, 3, 4. G (u) = c0 1−u 1−u (2.35) Let Lr (0) = lim Lr (u), and Lr (1) = lim Lr (u), r = 1, 2, 3. Then, it can u→0+

u→1−

be obtained from the relations (2.34) and (2.35) that Lr (0) = −r,

r = 1, 2, 3

(2.36)

r = 1, 2, 3.

(2.37)

and Lr (1) = r,

Let Kr (0) = lim Kr (u) and Kr (1) = lim Kr (u), r = 1, 2, 3. Then, it can u→0+

u→1−

be obtained by using the relation (2.12) that K1 (0) = α,

K2 (0) = α(α − 1) and K3 (0) = α(α − 1)(α − 2)

(2.38)

and K1 (1) = −β,

K2 (1) = β(β − 1) and K3 (1) = −β(β − 1)(β − 2). (2.39)

Scale parameter estimation for logistic distribution

879

Thus, it can be derived from relations (2.36)–(2.39) and (2.13) that Q1 (c; α, β) = α − 1/2 + O(c) = −β − 1/2 + O(1 − c).

(2.40) (2.41)

By choosing α = β = 1/2, we have Q1 (c; 1/2, 1/2) = O[c(1 − c)].

(2.42)

Similarly, by considering equations (2.36)–(2.39) and (2.14), the following equation is obtained µ ¶ ·µ ¶ µ ¶¸ 3 1 10 2 6 1 1− − 1− + 1− Q2 (πi0 ; 1/2, 1/2) = 6 n 4 n 4t n +O [c(1 − c)] + O(n−1 ),

(2.43)

where 4t = min[i − 1/2, n − i + 1/2]. Based on the preceding work, the theorem below is stated and proved. Theorem 2.2. Suppose that conditions (a), (b) and (d) in Theorem 2.1 are satisfied, and (a) G(u) is ALT at both u = 0 and u = 1, (b) equations (2.36) and (2.37) hold, and (c) i, the integer part of nc + 0.5, satisfies 0 < i < n. Then, E(Xi:n + Xi+1:n ) · ½µ ¶ ¾ i O[c(1 − c)] (1) = 2 G(c) + G (c) −c + n n+1 ½ µ ¶ ¾# (2) G (c) 1 1 1 1 + + (i−nc)2 − + + [1+2(i−nc)]O[c(1−c)] 2n2 6 4 4i 4i+1 +O(n−3 ).

(2.44)

Proof. From condition (c), {Xi:n } and {Xi+1:n } are c-sequences. Suppose that conditions (a), (b) and (d) in Theorem 2.1 are satisfied. Using

880

Patrick G.O. Weke

Equation (2.29), the following equation is obtained. E(Xi:n + Xi+1:n ) 0 +1 ½ ¾ iX 1 1 (2) (1) = G(πk )+ 0 G (πk )Q2 (πk ; α, β) G (πk )Q1 (πk ; α, β)+ n +2 2(n0 +1)2 k=i0 £ ¤ +O (n0 + 1)−3 . (2.45) Suppose that α = β = 1/2. By using conditions (a) and (b) of the theorem and substituting equations (2.42) and (2.43) into relation (2.45), we obtain

E(Xi:n + Xi+1:n ) =

0 +1 · iX

1 G(1) (πk )O[c(1 − c)] n + 1 k=i0 ¾¸ ½ 1 1 (2) 1 + O[c(1 − c)] + 2 G (πk ) − − 2n 12 24k ¡ ¢ + O n−3 . (2.46) G(πk ) +

Expanding G(πk ), G(1) (πk ) and G(2) (πk ) at πk = c up to the second, the first and the constant terms, respectively, and putting together terms with higher order in O(n−3 ), we obtain E(Xi:n + Xi+1:n )

£ ¤ 1 = 2G(c) + G(1) (c)(πi0 + πi0 +1 − 2c) + G(2) (c) (πi0 − c)2 + (πi0 +1 − c)2 2 O[c(1 − c)] O[c(1 − c)] + G(2) (c)(πi0 + πi0 +1 − 2c) +2G0 (c) n +½1 n + 1¾ µ ¶ 1 (2) 1 1 1 1 +2 2 G (c) − − + + O[c(1 − c)] + O(n−3 ) 2n 12 4 4i 4i+1 ¶ ¾ · ½µ O[c(1 − c)] i (1) −c + = 2 G(c) + G (c) n n+1 ½ µ ¶ ¾¸ 1 (2) 1 1 1 1 2 + 2 G (c) +(i−nc) − + + [1 + 2(i−nc)]O[c(1−c)] 2n 6 4 4i 4i+1 +O(n−3 ).

Hence the proof.

(2.47) 2

Scale parameter estimation for logistic distribution

881

Finally, it can easily be seen that for the standard logistic distribution, µ ¶ µ ¶ µ ¶ u u 1 G(u) = log = log − 1−u 1−u u ( µ µ ¶k " ¶−1+ε )# u 1 1 + O ln = c0 ln 1−u 1−u ( µ ¶ )# µ ¶k " 1 1 −1+ε −c0 ln 1 + O ln , (2.48) u u for c0 = k = 1. Hence, G(u) is asymptotic logarithm transform at both u = 0 and u = 1. Hence, it follows that the function GH is bounded. Consequently, it is concluded that a random variable having the logistic distribution is an AL-variable. The inverse function, G(u), of the logistic distribution also satisfies condition (b) in Theorem 2.2. Thus if i is the integer part of nc + 0.5, then equation (2.44) can be applied. 3

Estimator of Standard Deviation

The method discussed here is based on a single spacing and the expectation of the sum of consecutive order statistics in a sample of size n. Let a spacing c be defined in relation to the rank i of the order statistic Xi:n and sample size n, so that i is the integer part of nc+0.5. Since limn→∞ i/n = c, it can therefore be easily shown that σ ˆ1 , the desired estimator of σ, is asymptotically unbiased. Using the fact that the inverse function is √ ¶ µ 3 u −1 , τ= , 0 < u < 1, (3.1) G(u) = F (u; 0, 1) = τ ln 1−u π and by noting that i−0.5 ≤ c ≤ i+0.5 n n , the value the expansion point for the expectation in (2.44). simplifies to " ½ G(2) (c∗ ) 1 ∗ E(Xi:n + Xi+1:n ) = 2 G(c ) + − 2n2 6

c∗ = i/n is chosen as Thus, equation (2.44) 1 4

µ

1 1 + 4i 4i+1

+ O(n−3 ). An approximate expression for −E(Xi:n + Xi+1:n ) is · µ ¶ ½ µ ¶¾¸ n−i n(n − 2i) 1 1 1 1 Ei = 2τ ln + 2 − + . i 2i (n − i)2 6 4 4i 4i+1

¶¾#

(3.2)

(3.3)

882

Patrick G.O. Weke

Then, Ei = −En−i due to symmetry. Let Ri denote the i-th sample quasi-range: Ri = Xn−i:n − Xi+1:n , 1 ≤ i