Optimal Combinations of Pairs of Estimators - InterStat

8 downloads 0 Views 219KB Size Report
Variation,” The American Statistician 49, 367-369. Gleser, L. J. and J. D. Healy (1976), “Estimating the Mean of a Normal Distribution with. Known Coefficient of ...
Optimal Combinations of Pairs of Estimators Alan T. Arnholt, Associate Professor Department of Mathematical Sciences Appalachian State University Boone, NC 28608 [email protected] Jaimie L. Hebert, Associate Professor and Chair Department of Mathematics, Computer Science, and Statistics Sam Houston State University Huntsville, TX 77341 mth [email protected] Abstract Several authors consider the optimization of linear combinations of independent estimators with respect to mean squared error. The minimization of variance for convex combinations of estimators having a known correlation coefficient is also considered in the literature. We unify and generalize the results pertaining to these two problems by minimizing mean squared error for linear combinations of dependent estimators. We examine the role of the correlation coefficient in establishing the optimal weights for these combinations and uncover a relationship between these optimal weights and those provided in the literature for minimizing the mean squared error of a single estimator.

Key Words: Weighted estimator, coefficient of variation, mean squared error.

1

1

INTRODUCTION

Let X = X1 , ..., Xn be a random sample from a population with distribution f (x; θ) and let T1 (X) ≡ T1 and T2 (X) ≡ T2 be estimators of θ. A problem that is addressed in most mathematical statistics courses is to find weights, α1∗ and α∗2 , that optimize the linear combination T = α1 T1 + α2 T2 under a decision criterion such as minimum variance or minimum mean squared error. Although the problem is straightforward for most distributions, several authors consider the problem under general assumptions on the population distribution. Despite their common theme, these papers have developed along two different lines with one collection of papers minimizing variance for convex combinations and the other collection minimizing mean squared error for non-convex combinations. We unify these results in the present manuscript and present a theorem generalizing both approaches. We begin with a brief overview of the literature on this topic. Khan (1968) minimizes variance over the set of all convex combinations of two independent, unbiased estimators of the mean θ when sampling from a normal population having a known coefficient of variation, √ CV = a. Under these conditions, the estimators T1 ≡ X n and T2 ≡ cn s where s is the sample standard deviation and ³ ´ ¡ ¢´.³ (2a)1/2 Γ (n/2) , cn = n1/2 Γ (n − 1) /2

(1)

are unbiased estimators of the mean θ. Since the population is normally distributed, the estimators are also independent. The author optimizes the unbiased linear combination T = α¯ x + (1 − α) cn s with respect to minimum variance. The optimal weight in this case is α∗ = dn where dn = [n−1 (n − 1) ac2n − 1].

¢ ±¡ dn + n−1 a

(2)

It may be desirable, but it is not necessary, to restrict these combinations to convex combinations. In fact, Gleser and Healy (1976) consider the more general case T = α1 T1 + α2 T2 , where α1 +α2 is not necessarily 1, and the Ti are any independent, unbiased estimators 2

of θ. Their only restriction regarding distributional characteristics is that the ratios vi2 = θ−2 V arθ (Ti ) are independent of θ for i = 1, 2. The quantity vi is the coefficient of variation (CV ) for the distribution of the estimator. This restriction holds, for instance, when the Ti are unbiased and the CV is known. Since the combination T is not necessarily convex, it is not necessarily an unbiased estimator for θ and minimum mean squared error is a more appropriate optimization criterion. The authors show that the optimal weights in this case are α∗1 = v2 /(v1 + v2 + v1 v2 ) and α2∗ = v1 /(v1 + v2 + v1 v2 )

(3)

In each of these two papers, the authors work with combinations of independent estimators. Samuel-Cahn (1994) considers a more general case by optimizing a convex combination of unbiased, dependent estimators with known correlation coefficient, ρ. The optimal weight in this case is shown to be ±¡ ¢ α∗ = (1 − ρλ) 1 − 2ρλ + λ2 ,

(4)

where λ2 = V ar (T1 )/V ar (T2 ). The author assumes that λ2 is known and independent of θ and provides an interesting collection of observations concerning the value of the optimal weight under various conditions on ρ and λ. It is interesting to note that if the coefficients of variation for both estimators are known and independent of θ, then this restriction is satisfied. Although the author never compares the weight (4) to those in Kahn (1968) or Gleser and Healy (1976), the assumption of a known coefficient of variation is common to both. This commonality is the basis of our generalization in Section 2. These three papers clearly share a common theme, and we examine this relationship more closely in the remainder of the manuscript. In Section 2, we consider the minimization of mean squared error for non-convex linear combinations of dependent estimators. Our results generalize those of Samuel-Cahn (1994) in the same manner that Gleser and Healy (1976) generalize Kahn (1968). We provide optimal weights and present a collection of results similar to those of Samuel-Cahn (1994) regarding the relationship among the weights, the correlation coefficient, and the ratio of variances. Several examples are provided to illustrate these results. 3

2

OPTIMAL WEIGHTS UNDER MINIMUM MEAN SQUARED ERROR

The following theorem is a generalization of the results discussed in Section 1. The proof is straightforward and omitted. Theorem 2.1. Let X = X1 , ..., Xn be a random sample from a population with distribution f (x; θ) and let T1 (X) ≡ T1 and T2 (X) ≡ T2 be estimators of θ, possibly correlated with E (Ti ) = ki θ, for i = 1, 2. Assume that the ratios vi = θ−2 V arθ (Ti ) are independent of θ and V arθ (T1 ) < V arθ (T2 ). Furthermore, assume that the constants λ2 = k22 v1 /k12 v2 .p and ρ = cov (T1 , T2 ) V arθ (T1 ) V arθ (T2 ) are independent of θ and known. Without loss of generality, we assume that k1 and k2 are both positive. Under these conditions, T ∗ =

α1∗ T1 + α∗2 T2 has uniformly minimum mean squared error (in θ) among all estimators that are linear in T1 and T2 , where α1∗ =

1 − ρλ λ (λ − ρ) µ ¶ and α∗2 = µ ¶. v v 1 1 k1 1 − 2ρλ + λ2 + (1 + ρ2 ) 2 k2 1 − 2ρλ + λ2 + (1 + ρ2 ) 2 k1 k1

The assumptions in Theorem 2.1 are a culmination of the assumptions required for the theorems appearing in the three papers described in the Introduction. In essence, we must assume that we know the coefficient of variation for the distribution of each estimator and the correlation between the two estimators. Samuel-Cahn (1994) considers convex combinations of two estimators and notes that when ρ = 0 the weights α and (1 − α) are proportional to σ22 and σ12 , respectively. In our case, we consider general combinations, but note that when ρ = 0, the estimator k22 v1 1 k12 v2 ¶ µ ¶ T2 T + T∗ = µ 1 k22 v1 v1 k22 v1 v1 k1 1 + 2 + 2 k2 1 + 2 + 2 k1 v2 k1 k1 v2 k1 has uniformly minimum mean squared error among all estimators that are linear in T1 and 4

T2 . Further, if T1 and T2 are unbiased, then k1 = k2 = 1 and we have T∗ =

v2 T1 + v1 T2 . v1 + v2 + v1 v2

This is precisely the result (3) provided by Gleser and Healy (1976). Samuel-Cahn (1994) considers several properties of the weights with respect to the relationship between the correlation coefficient and the variance ratio. We consider similar results in the remainder of the manuscript. Samuel-Cahn (1994) notes that the estimator with smaller variance always gets positive weight when optimizing convex combinations. A similar result can be established when optimizing non-convex combinations under mean squared error. The following corollary provides sufficient conditions under which the estimator with smaller mean squared error has positive weight after optimization. p Corollary 1. If k2 ≤ k1 v2 /v1 then T1 has smaller mean squared error and α∗1 ≥ 0. When p k2 ≥ k1 v2 /v1 , then T2 has smaller mean squared error and α∗2 ≥ 0. The sufficient conditions, k2 ≤ k1

p p v2 /v1 and k2 ≥ k1 v2 /v1 , may seem unusual, but

they have a very clear interpretation under closer observation. Recall that E (Ti ) = ki θ and

vi = θ−2 V arθ (Ti ) , i = 1, 2, so the condition implies that the ratio of variances exceeds the ratio of the scale biases for the estimators. This is a sufficient condition to insure that the MSE (T1 ) is smaller than the MSE (T2 ). The resulting inequalities for the optimal weights follow directly from this inequality. Samuel-Cahn (1994) also shows that α1∗ = 1 if and only if λ = ρ for convex combinations. It follows that α2∗ = 0 in this case, implying that the estimator T1 has smaller MSE than any convex combination of the two estimators. The following corollary states an analogous result for non-convex combinations. Corollary 2. Under the conditions of Theorem 2.1, if λ = ρ then α2∗ = 0 and α1∗ =

£ k1 1 +

λ2 1−λ2

5

1 ¤ (CV12 + CV22 )

(5)

where CVi is the coefficient of variation for Ti , i = 1, 2.

Corollary 2 implies that, as in Samuel-Cahn (1994), the statistic T2 should not be used to reduce MSE. But, unlike the Samuel-Cahn result, equation (5) implies that α∗1 is not necessarily 1 when minimizing non-convex combinations. In fact, α1∗ may be greater than 1. With α2∗ = 0, we essentially construct a new estimator that reduces the mean squared error of T1 by premultiplication of a constant. This technique for reducing the MSE of a single statistic and the value of the constant in (5) are similar to the approach taken in Searls (1964) and Arnholt and Hebert (1995). The following examples illustrate the use of Theorem 2.1 and point out the similarity between this result and those of Searls (1964) and Arnholt and Hebert (1995).

3

EXAMPLES

Example 1. For i = 1, . . . , n, let Xi be iid with E (Xi ) = θ and V ar (Xi ) = σ 2 . Let Zi be iid with E (Zi ) = 0 and V ar (Zi ) = τ 2 . Assume that the squared coefficient of variation (CV 2 ) for Xi is a known constant a so that σ2 = aθ2 and suppose that the variance of Zi satisfies τ 2 = bθ2 where b is a known constant. Note that the assumption on the variance of Zi is identical to the constraint on the variance of Xi , but the coeffient of variation for Zi is undefined since E (Zi ) = 0. Now define the random variable Yi = Xi + Zi and consider the estimators of θ defined by T1 ≡ X n and T2 ≡ Y n . It is easy to verify that (1) E (T1 ) = E (T2 ) = θ (2) V ar (T1 ) = aθ 2 /n and V ar (T2 ) = θ2 (a + b)/n (3) CV12 = a/n and CV22 = (a + b)/n. P P (Xi + Zi ) = T1 + n1 Zi and T1 and Zi are independent, we Since, for i = 1, . . . , n, T2 = n1 ¡ 2 1P ±√ ¢ 2 also have Cov (T1 , T2 ) = E T1 + n T1 Zi −θ = V ar (T1 ). It follows that ρ = σ σ 2 + τ 2 6

and λ2 = V ar (T1 )/V ar (T2 ) = σ 2 /(σ2 + τ 2 ), so ρ = λ and the optimal weights for combining T1 and T2 are α1∗ =

a 1+ b

µ1 ¶ 2a+b n

and α2∗ = 0. It is interesting to note that while T2 is not used

in the optimal combination, its coefficient of variation is included in the weight of T1 . 1 θ

Example 2. Let X1 , . . . , Xn be iid exponential with f (x) =

¡ ¢ exp − 1θ x . The estimator

T1 ≡ X n is unbiased and satisfies V ar (T1 ) = θ2 /n and CV12 = n−1 . It is easily shown that if X(k) is the kth order statistic of the sample, then µ ¶X ¶ k−1 µ ´ ³ n 1 k−1 p p . (−1)i E X(k) = θ p!k! 1+i k i=0 i ¡n¢ k−1 P ¡k−1¢

(−1)i

1 we find that the estimator T2 ≡ an,k X(k) is i=0 ´ ³ ¡ ¢ 2 2 − 1 and CV22 = an,k − 1. It unbiased and satisfies V ar (T2 ) = a21 V ar X(k) = θ2 an,k n,k . ³ ´ 2 − 1 . Samuel-Cahn (1994) shows that follows that λ2 = V ar (T1 )/V ar (T2 ) = 1 n an,k

Now, letting an,k = k!

k

i

1 , 1+i

ρ = λ in this case, so

α1∗ =

1+

1 ³

an,k 2n−(n−1)an,k

The optimal estimator then is T ∗ =

1 n

+

2 an,k

n X. n+1

1 ´= 1+ −1

1 n

=

n . n+1

This is precisely the coefficient and estimator ¡ ¢ determined by Searls (1964) to minimize MSE X n and by Arnholt and Hebert (1995) to

minimize the MSE of general estimators.

7

REFERENCES Arnholt, A. T. and J. L. Hebert (1995), “Estimating the Mean with Known Coefficient of Variation,” The American Statistician 49, 367-369. Gleser, L. J. and J. D. Healy (1976), “Estimating the Mean of a Normal Distribution with Known Coefficient of Variation,” Journal of the American Statistical Association 71, 977-981. Kahn, R. A. (1968), “A Note on Estimating the Mean of a Normal Distribution with Known Coefficient of Variation,” Journal of the American Statistical Association 63, 10391041. Samuel-Cahn, E. (1994), “Combining Unbiased Estimators,” The American Statistician 48, 34-36. Searls, D. T. (1964), “The Utilization of Known Coefficient of Variation in the Estimation Procedure,” Journal of the American Statistical Association 59, 1225-1226.

8

Suggest Documents