On Variance Estimators for Normal Population - InterStat

1 downloads 0 Views 285KB Size Report
Sample Counterpart of the Searles' Normal Mean Estimator”. The estimators per ... Universally Minimum Variance Unbiased Estimate (UMVUE). In this paper we ...
On Efficient Iterative Estimation Algorithm Using Sample Counterpart of the Searles’ Normal Mean Estimator with Exceptionally Large But Unknown Coefficient of Variation 1

Winston A. Richards, 2Robin Antoine, *2Ashok Sahai, and 3M. Raghunadh Acharya 1

Department of Mathematics and Statistics The Pennsylvania State University, Harrisburg 2

Department of Mathematics & Computer Science; Faculty of Science & Agriculture St. Augustine Campus; The University of The West Indies. Trinidad & Tobago; West Indies. & Department of Statistics and Computer Science, Aurora’s Post Graduate College; Osmania University, Hyderabad (Andhra Pradesh); India. 3

Abstract This paper addresses the issue of finding an optimal estimator of the normal population mean when the coefficient of variation is unknown but is expected to be exceptionally high, as per the pilot surveys of the population at hand. The paper proposes an “Efficient Iterative Estimation Algorithm Using Sample Counterpart of the Searles’ Normal Mean Estimator”. The estimators per this strategy have no close form, and hence are not amenable to an analytical study determining their relative efficiencies as compared to the usual unbiased sample mean estimator. Nevertheless, we examine these relative efficiencies of our estimators with respect to the usual unbiased estimator x by means of an illustrative numerical empirical study. MATLAB 7.7.0.471 (R2008b) is used in programming this illustrative „Simulated Empirical Numerical Study‟.

Key words and phrases: MVUE, MMSE, Complete Sufficient Statistic, Numerical Study. AMS Classification: 62F10.

[*[email protected] : Corresponding Author’s Email Address] 1

1 Introduction This paper addresses the issue of finding an optimal estimator of the normal population mean when the coefficient of variation is unknown but is expected to be exceptionally high, as per the pilot surveys of the population at hand. This might well happen to be the case particularly if the population mean θ is expected to be positive but very close to zero/ very small, even though the population standard deviation, say σ happens to be relatively quite large. Such cases are very exceptional but are known to be relevant practically in the statistical modeling applications in the areas of astronomy, stock-markets, biodiversity, and environmental sciences, etc. Searls (1964), Khan (1968), Gleser & Healy (1976), and Arnholt & Hebert (1995) considered the problem of estimating the normal population mean, when its coefficient of variation (c. v.) is known. It is very well known that the Searles (1964) estimator for the normal population mean gets to be: Say, SE = x / (1 + V2); wherein „V‟ is the coefficient of variation of „ x ‟.

(1.1)

As such, c. v. ( x ) = σ/ (√n. θ) = V.

(1.2)

The Searles (1964) estimator is known to be “MMSE”, and hence optimal. But this estimator could be used only when for such a Normal population, N (  , a2), a = V2 is known. _

In that case, the sufficient statistic ( X ; S2) is not complete. Consequently, when estimating the mean, variance, or essentially any function of the unknown parameter θ, we are groping in the dark when attempting to find its Universally Minimum Mean Squared Error Estimate (UMMSE), or to find the Universally Minimum Variance Unbiased Estimate (UMVUE). In this paper we have, however, been motivated by the results of Searls (1964) leading to the estimator “SE”, as in (1.1) concerning the estimation of the normal population mean when its ‟C.V‟ is unknown. The fact that „V‟ might not be known is a reality much more often in practice than when it is known. In fact the assumption that „V‟ is quite closely known is also seldom justified. As such, this paper proposes the estimator using the sample counter-part of “V2”, namely: Say v2 = S2/ {n.( x )2}; x and S2 being, respectively, the sample mean & the sample variance.

(1.3)

Hence, the initially proposed estimator of the Normal Population Mean „θ‟ happens to be as below: Say, SE (0) = x / (1 + v2); wherein „v2‟ is the estimated squared c. v. of „ x ‟, as in (1.3)

(1.4) 2

2 The Proposed Estimators We start by noting that the aforesaid estimator SE ( 0 ) involves infinite terms containing “ x and S2”, as per the „Taylor‟s Expansion! Hence, the proposed estimator SE ( 0 ) has no close-form expression, and thus is not amenable to any analytical study determining its relative efficiency as compared to the usual unbiased sample mean estimator. The only open recourse is to study this problem numerically with a large number of simulated samples of various illustrative sizes (Say „n‟) from the parent normal population with illustrative values of the population mean and variance or standard deviation. This type of “Simulated Numerical Empirical Study” has been attempted later in the subsequent section considering “51,000 replications” via simulated samples. But, before we take to that, we take up an observation which is simple but significant as follows. That the estimator SE ( 0 ) is intuitionally expected to perform better than the usual unbiased estimator „ x ‟, as is duly ascertained by the “Simulated Numerical Empirical Study”, we could achieve a better estimator by using that instead of x in the “Initially” proposed estimator SE ( 0 ) [At the Iteration # „0‟ of our proposed „Iterative Algorithm‟], as in (1.4). This leads us to propose: Say SE ( 1 ) = SE ( 0 )/ (1 + v2)

(2.1)

This happens to be the FIRST “Iteration”/ step of our proposed “Efficient Iterative Estimation Algorithm Using Sample Counterpart of the Searles’ Normal Mean Estimator with Unknown Coefficient of Variation”. In fact exactly analogously reasonable will be the proposition of the following estimator, inasmuch as it is intuitionally expected that the estimator SE ( 1 ) would perform better than SE ( 0 ), as is later duly ascertained by the “Simulated Numerical Empirical Study”. Say SE ( 2 ) = SE ( 1 )/ (1 + v2)

(2.2)

In general, now, we are ready to define our “Iterative Algorithm” to achieve more efficient estimator of the normal population mean, as below: Say SE ( I + 1 ) = SE ( I )/ (1 + v2) ; I (Iteration #) = 0 (1) M [Any chosen positive Integer].

(2.3) 3

Now, to quantify the improvement brought forth by these iteratively generated estimators proposed in our paper, we have taken to an empirical study in the following section. We define the Relative Efficiency, Reff ( . ) of the estimator at hand, relative to x , i.e., say,T in equation (2.1) follows in (2.4)  Reff ( • ) = [100.MSE ( • ) / V ( T )] % ; wherein MSE ( • ) = E [ • - θ ]2 .

(2.4)

In above, MSE ( • ) = V ( • ) + B2 ( • ) stands for the MSE {Mean Square Error} of the estimator at hand, V ( • ) for its variance, and B ( • ) for its bias. In this context, we have considered the estimators for their „Relative Efficiencies‟ in the following order as per the “Iteration #”, # = 1 (1) 7 for the illustration. 3 The Simulation Empirical Numerical Study In the preceding section, it is apparent, from the fact that in the absence of any „closed form analytical expressions‟ of the Reff ( • )‟s, that the answer to the question as to what is the relative gain/ achievement in pursuing the „Iteratively More Efficient Estimation‟ of the normal population mean lies in trying to know it through an illustrative “Simulation Empirical Numerical Study”, as is attempted in this section. These Reff ( • )„s have been calculated for NINE illustrative values of the population standard deviation, namely σ = 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, and 5.0. We could assume that, without any loss of generality & for the simplicity of the illustration, the NORMAL population mean θ = 0.1 [i.e. positive, but very close to zero!). Thus, The value of V [i.e., the c. v. ( x )] will be at least 0.5 even when n = 101! As also, we have carried out the illustrative numerical study for NINE examplevalues of the sample size, namely n = 3, 5, 7, 11, 21, 41, 61, 81, and 101. The values of the actual MSE‟s are calculated by considering the random samples of size „n‟ using ’51,000’ replications (pseudo- random normal samples of the size „n‟) for various estimators SE ( I )’s with I = 0 ( 1 ) 7 as also for the usual unbiased estimator, namely the sample mean “ x “. Hence values of Reff ( • )‟s are calculated as per (2.4). These Reff ( • )‟s are reported to the closest three decimal places of their respective actual values in nine tables in the APPENDIX, corresponding to each of the NINE illustrative values of the sample-size „n‟. MATLAB 7.7.0 (R2008b) is used in programming the calculations in this illustrative „Simulated Empirical Numerical Study‟. 4

4 Conclusions As expected, the “Relative Efficiencies” of the proposed “Iteratively More Efficient” estimators of the „Normal population mean‟ are progressively better with the increase of the “Iteration Number I” in the impugned estimator SE ( I ). Though we have limited the illustration up to I = 7, clearly the indications are well-supportive of the fact that we could do better by proceeding further to generate progressively more efficient estimators, till we please! Incidentally, this achievement has been the motivating aim behind this paper. It might be remarked in this context, that the extent-and-the-stage till which the proposed “Algorithm” of generating iteratively more efficient estimator depends on the largeness of the coefficient-of-variation “V” [i.e., the c. v. ( x )] as would be available through the value of its sample counterpart “v”,. This value of “v” might be used with the value of “ x ” [as an estimate of θ] together with that of S2 [as an estimate of σ2] to seek the guidance, from the outlined „Empirical Simulation Study‟ used in this paper, regarding the desirable number of “Iteration” till which the estimator if gainfully improvable.

5

References Arnholt, A. T. & Hebert, J. E. (1995), “Estimating the Mean with known Coefficient of Variation”, The American Statistician, 49, 367-369. Gleser, L. J., & Healy, J.D. (1976), “Estimating the Mean of a Normal Distribution with Known Coefficient of variation”, Journal of the American Statistical Association, 71, 977-981. Khan, R. A. (1968), “A Note on Estimating the Mean of a Normal Distribution with Known Coefficient of Variation”, Journal of the American Statistical Association, 63, 1039-1041. Searls, Donald T. (1964), “The Utilization of a known Coefficient of Variation in the Estimation Procedure”, Journal of the American Statistical Association, 59, 1225-1226.

6

APPENDIX.

Table A.1. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n=3. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 174.814 176.043 176.649 178.684 176.892 178.058 177.978 177.259 178.146 SE(1) 240.671 244.337 246.216 251.177 246.931 249.547 249.782 247.904 249.766 SE(2) 302.142 309.264 312.985 321.285 314.462 318.651 319.427 316.141 318.972 SE(3) 360.532 371.962 378.032 390.014 380.616 386.543 387.946 383.062 386.898 SE(4) 416.431 432.914 441.775 457.797 445.826 453.708 455.752 449.113 454.041 SE(5) 470.175 492.366 504.421 524.853 510.292 520.380 523.045 514.529 520.666 SE(6) 521.981 550.473 566.097 591.301 574.120 586.683 589.937 579.447 586.935 SE(7) 572.005 607.339 626.885 657.208 637.379 652.689 656.499 643.960 652.952

Table A.2. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 5. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 185.661 188.693 191.462 192.034 191.305 193.054 191.513 191.440 192.398 SE(1) 272.820 282.922 290.547 292.612 290.787 295.610 291.362 292.126 294.582 SE(0) 362.504 384.548 399.612 403.820 401.177 409.940 402.253 404.893 409.163 SE(0) 453.465 492.899 518.466 525.490 522.803 536.044 524.582 530.174 536.454 SE(0) 544.212 606.847 646.333 657.095 655.454 673.513 658.286 667.948 676.404 SE(0) 633.395 725.175 782.217 798.005 798.710 821.811 803.138 818.038 828.859 SE(0) 719.909 846.678 925.019 947.536 952.047 980.366 958.831 980.183 993.622 SE(0) 802.923 970.224 1073.600 1104.965 1114.882 1148.614 1125.016 1154.069 1170.473

Table A.3. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 7. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 188.754 194.119 195.436 198.016 197.067 197.739 198.384 197.723 198.371 SE(1) 282.463 300.279 305.542 312.289 310.383 311.833 314.799 312.009 314.928 SE(2) 380.138 419.471 432.740 445.450 443.453 445.964 453.147 446.627 453.822 SE(3) 478.528 549.891 576.513 597.621 597.485 601.353 614.730 602.933 616.915 SE(4) 574.453 688.892 735.378 768.210 773.019 778.503 800.139 781.679 805.512 SE(5) 665.404 833.491 907.306 956.188 970.114 977.482 1009.533 983.290 1020.621 SE(6) 749.697 980.684 1089.925 1160.159 1188.405 1198.020 1242.712 1207.940 1263.004 SE(7) 826.434 1127.671 1280.670 1378.427 1427.123 1439.556 1499.158 1455.581 1533.194

7

Table A.4. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 11. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 189.017 197.213 200.802 201.118 202.402 202.872 204.154 204.443 203.169 SE(1) 281.815 310.608 321.196 324.221 329.170 330.643 334.884 335.434 332.544 SE(2) 374.481 439.786 462.562 473.243 485.113 488.718 498.617 499.481 494.874 SE(3) 461.851 580.448 622.932 648.931 672.115 679.750 699.342 700.946 694.645 SE(4) 540.340 726.941 798.878 850.368 890.688 905.165 940.085 943.491 935.650 SE(5) 608.275 873.473 986.077 1075.202 1140.087 1165.274 1222.957 1230.108 1221.098 SE(6) 665.500 1014.965 1179.773 1319.820 1418.307 1459.241 1549.014 1562.967 1553.499 SE(7) 712.799 1147.560 1375.229 1579.635 1722.130 1785.075 1918.106 1943.228 1934.508

Table A.5. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 21. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 180.185 194.376 201.043 203.076 205.809 206.065 206.304 207.305 209.363 SE(1) 254.703 299.807 323.251 331.894 340.728 342.481 343.629 347.550 353.869 SE(2) 318.856 411.633 465.554 488.643 509.297 515.722 519.037 528.891 542.572 SE(3) 369.826 522.273 621.895 670.663 711.617 728.471 736.478 756.504 781.827 SE(4) 407.931 625.002 784.075 872.268 944.577 980.561 997.742 1033.556 1076.313 SE(5) 435.221 715.438 943.664 1085.721 1202.164 1268.798 1302.180 1360.869 1428.642 SE(6) 454.181 791.760 1093.580 1302.453 1476.148 1587.022 1646.442 1736.559 1838.833 SE(7) 467.047 854.143 1228.989 1514.325 1757.147 1926.629 2024.505 2155.892 2303.946

Table A.6. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 41. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 161.301 185.145 196.051 200.751 203.671 205.332 207.587 208.812 209.446 SE(1) 204.631 269.001 304.146 321.270 332.152 338.887 347.183 351.520 354.707 SE(2) 232.780 344.990 417.936 458.528 485.576 503.290 523.820 534.609 544.239 SE(3) 249.556 408.017 528.210 604.521 658.660 695.969 737.351 759.994 782.597 SE(4) 258.754 456.731 627.454 749.963 843.048 910.654 983.421 1025.673 1070.830 SE(5) 263.246 492.476 711.509 886.670 1029.117 1138.356 1253.867 1325.595 1405.776 SE(6) 264.941 517.747 779.438 1009.002 1207.868 1368.888 1537.843 1650.144 1779.773 SE(7) 265.035 535.121 832.465 1114.196 1372.362 1592.562 1823.534 1987.354 2181.193

8

Table A.7. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 61. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 147.479 174.867 189.005 195.680 199.802 204.528 206.676 207.160 208.812 SE(1) 173.930 239.416 281.333 303.763 319.159 334.870 342.576 345.886 352.211 SE(2) 187.009 290.384 370.058 418.781 455.291 490.519 509.308 520.261 536.457 SE(3) 192.291 327.308 448.071 532.066 600.367 664.929 702.506 728.684 762.611 SE(4) 193.412 352.356 511.712 635.988 745.017 848.276 913.655 965.128 1027.021 SE(5) 192.465 368.502 560.707 725.795 880.826 1029.833 1131.808 1219.940 1321.329 SE(6) 190.582 378.436 596.841 799.834 1001.957 1200.259 1345.810 1481.397 1633.450 SE(7) 188.353 384.218 622.658 858.747 1105.586 1353.073 1546.323 1737.750 1949.479

Table A.8. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 81. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 135.987 166.493 183.625 192.637 198.761 200.873 205.527 206.201 207.717 SE(1) 151.618 217.696 264.042 292.563 313.485 322.491 337.562 341.951 347.747 SE(2) 156.835 253.488 334.739 392.500 438.303 462.300 495.295 509.047 523.999 SE(3) 157.030 276.221 391.276 483.970 563.288 612.285 671.490 702.781 734.311 SE(4) 155.034 289.525 433.197 561.509 679.166 762.571 855.512 913.994 971.592 SE(5) 152.234 296.643 462.556 623.364 779.793 904.087 1036.053 1131.027 1224.857 SE(6) 149.274 299.932 482.234 670.503 862.726 1030.415 1203.637 1342.198 1481.239 SE(7) 146.441 300.943 494.924 705.241 928.405 1138.357 1352.072 1537.989 1728.415

Table A.9. Relative Efficiencies (%) of SE(I) ~ Searles Estimated Estimator at the Iteration # I = 0 (1) 7. For Sample Size ~ n= 101. SE(I) σ =1.000 σ =1.500 σ =2.000 σ =2.500 σ =3.000 σ =3.500 σ =4.000 σ =4.500 σ =5.000 SE(0) 127.160 158.926 177.122 188.143 194.840 199.448 201.880 205.066 206.682 SE(1) 135.655 199.183 245.644 278.938 300.471 315.418 326.224 336.558 343.396 SE(2) 136.322 224.176 301.196 365.050 410.094 442.933 470.514 494.106 512.221 SE(3) 133.887 238.180 342.391 439.531 514.328 572.921 626.441 671.149 708.697 SE(4) 130.453 245.162 370.971 499.202 605.998 696.367 783.569 857.586 923.577 SE(5) 126.908 247.975 389.851 544.295 681.679 806.635 932.170 1042.200 1144.733 SE(6) 123.599 248.439 401.825 576.938 741.237 900.316 1065.238 1215.125 1359.810 SE(7) 120.639 247.642 409.102 599.825 786.505 976.844 1179.180 1369.516 1558.672

9

Suggest Documents