Range estimation: A Monte Carlo examination of variability ... - InterStat

0 downloads 0 Views 137KB Size Report
3.359. 3.714. 3.764. 4.448. 4.652. 4.773. 4.879. 5.037. Average R/S. 2.460. 3.152. 3.905. 4.459. 4.799. 4.958. 5.433. 5.795. 5.957. 6.127. 6.228. Me dia n. R. /S.
Range estimation: A Monte Carlo examination of variability. sample size and distributional shape Robert M. Lynch Kenneth Monfort College of Business Administration University of Northern Colorado Jon Fortune Department of Health State of Wyoming Range estimation is a ‘first’ topic in introductory statistics and is usually discussed in concert with the ‘Empirical Rule’ and Chebyshev’s Theorem. Neither considers sample size nor estimates ranges with any precision for small samples. At the same time, popular textbooks omit discussions of the relationship of the range to standard deviation, sample size and distributional form. In this paper, the authors examine (1) the impact of sample size on the range, (2) the use of the range to estimate the sample standard deviation, and (3) the effect of the distributional form on the accuracy of estimation. Monte Carlo methods were used to produce samples for skewed and symmetrical distributions. Snedecor (1967) described the relationship between sample size, range and standard deviation for samples selected from normal populations. A principal finding is summarized in the abbreviated table below. It suggests as sample size increases, the range increases and it identifies some key values for commonly used standard deviation distances. For example, with a sample of size 10, the range can be estimated at 3 standard deviations, with a high and low for the sample estimated at +/- 1.5 standard deviations from the mean.

Table 1 Estimates for the Range Sample Size 10 25 100 >100

Range 3s 4s 5s 6s

High/Low +/- 1.5 s +/- 2.0 s +/- 2.5 s +/- 3.0 s

Analysis Two hundred and fifty samples were generated for sample sizes ranging from 5 to 800. Four distributional forms were considered: Normal, Uniform, Poisson (Mean=3) and Poisson (Mean=10). The latter two illustrate skewed distributions. The simulations were produced using the Excel data analysis functions. Several measures were determined: a. b. c. d.

The Range, Mean and Standard Deviation were determined for each sample. The maximum and minimum of the ratio of the range-to-standard deviation (R/S) for each run of the 250 samples was determined. The Mean and Median Range were determined for the 250 samples for each sample size. The Mean and Median Standard Deviation were determined for the 250 samples for each sample size.

The results of each of the simulations are presented in Tables 2 through 5. Table 6 summarizes the range estimates for the distributions and sample sizes examined. Table 7 provides estimates of the standard deviation from the range for each distribution and sample size. Lastly, Table 8 summarizes several general findings and compares the results to Chebyshev’s results. Table 2 through Table 5 describe the results for each distributional form understudy. Using Table 2 as an illustration, a normal variable with mean = 0 and standard deviation = 1, 250 samples of size 5 to 800 were generated. The ratio of the range to the standard deviation (R/S) for each sample was computed. The maximum R/S and minimum R/S for the 250 samples at each sample size are presented. This is followed by the average R/S and median R/S for each sample size. The average and median standard deviations for the samples are also presented. The median and mean R/S provide insight into the relationship of range to the standard deviation. The median and mean standard deviation provide a gauge of the underestimation in the standard deviation in small samples. As one would expect, the underestimation appears in small samples. From Table 2, the data suggests that for samples of size 5, the range can be estimated at (2.478 x standard deviation) or about 2.5 standard deviations. Conversely, for a sample of size 5 with a range of 4, the standard deviation (s) can be estimated at 1.6 (4/2.5). Table 3 through Table 5 are interpreted in the same way for the other distributions. Table 6 summarizes the estimation of the range from the standard deviation by distributional form. These values were selected from Tables 2 to Table 5 and rounded for simple estimation. Table 7 summarizes the estimation of the standard deviation from the range by distributional form. Lastly, Table 8 provides a table that can be used to estimate the range from the standard deviation across all distributions. It is this last table that is useful in teaching elementary statistics. Findings Normal distribution and Poisson distribution (Mean=10). The ratios of the range to the standard deviation for both distributions are similar and are also similar to those presented in Snedecor (1967). The exception here is at larger samples sizes, N=100 and N=200. Uniform distribution. The ratios are similar at smaller sample sizes but as expected the range is limited by the distributions range of 0-1. Poisson distribution (Mean=3). For small sample sizes the results are similar to the Normal and Poisson (Mean=10) but are less closely aligned as sample sizes increases. Discussion The results presented in tables 2 through 5 can be helpful in teaching the impact of sample size on range and the underestimation present in the standard deviation for small samples. As expected, the underestimation decreases quickly as sample size increases regardless of the distribution form. From the tables, the population standard deviation is noted and one can observe at what sample size the sample standard deviation begins to closely approximate the population value. For example, one observes that at a sample size of 25, the sample standard deviation begins to approximate the population value for a normal variate. Table 8 provides the student with estimates of the range from the standard deviation for samples of different sizes. Students may find it helpful to understand the impact of sample size on the standard deviation and to estimate the range for a sample when sample size, mean and standard deviation are reported.

Acknowledgments Snedecor, G. W. & Cochran, W. G. (1967) Statistical methods. Ames, IA: Iowa State University Press.

5 2.837 1.904

2.475 2.478

0.914 0.862

5 2.813 1.847

2.402 2.418

0.283 0.288

5 2.828 1.826

2.435 2.434

1.623 1.581

Sample Size Max of R/S Min of R/S

Average R/S Median R/S

Average StDev Median StDev

N Max of R/S Min of R/S

Average R/S Median R/S

Average StDev Median StDev

N Max of R/S Min of R/S

Average R/S Median R/S

Average StDev Median StDev

1.695 1.687

3.084 3.076

10 3.906 2.145

0.281 0.282

2.921 2.900

10 3.740 2.270

0.941 0.926

3.164 3.146

10 3.913 2.464

0.995 0.998

4.496 4.418

50 5.800 3.605

0.989 0.988

4.791 4.760

75 6.770 3.570

0.999 0.990

5.009 4.955

100 9.323 4.026

1.004 1.006

5.543 5.460

200 10.462 4.279

0.286 0.288

3.360 3.326

50 4.186 2.868

0.289 0.288

3.382 3.392

75 3.894 2.994

0.288 0.288

3.409 3.403

100 3.902 3.079

0.287 0.287

3.449 3.449

200 3.770 3.184

1.729 1.720

3.838 3.801

25 5.388 2.533

1.719 1.708

4.350 4.334

50 5.910 3.126

1.712 1.696

4.538 4.520

75 6.289 3.288

1.734 1.736

4.691 4.577

100 6.285 3.486

1.725 1.729

5.045 5.002

200 6.546 3.970

Table 4 Samples Drawn from A Poisson Distribution = Mean = 3, StDev = 1.732

0.289 0.289

3.220 3.214

25 3.922 2.649

Table 3 Samples Drawn from A Uniform Distribution = Mean = .5, StDev = .289

0.998 0.982

3.940 3.884

25 5.142 3.185

Table 2 Samples Drawn from A Normal Distribution = Mean = 0, StDev = 1

1.728 1.729

5.330 5.249

400 7.152 4.329

0.288 0.288

3.458 3.458

400 3.682 3.252

1.000 0.997

6.021 5.897

400 11.646 4.761

1.734 1.734

5.480 5.375

500 7.785 4.388

0.289 0.289

3.444 3.441

500 3.639 3.253

0.997 0.996

6.092 6.023

500 11.895 5.005

1.726 1.725

5.529 5.413

600 7.499 4.489

0.289 0.289

3.450 3.449

600 3.616 3.314

0.998 0.996

6.214 6.050

600 11.886 5.316

1.729 1.727

5.697 5.638

800 7.434 4.455

0.289 0.289

3.452 3.451

800 3.607 3.307

0.999 0.999

6.474 6.389

800 11.861 5.312

3.076

2.434

2.5 2.4 2.4 2.5

s s s s

4.334

3.148 3.148

4.459 4.422

50 5.851 3.359

4.520

3.145 3.120

4.799 4.732

75 6.367 3.714

4.577

3.151 3.140

4.958 4.924

100 6.560 3.764

50 4.5 s 4.5 s 4.5 s

25 4.0 s 4.0 s 4.0 s

3.4 s 4.5 s

75

10 R / 3.0 R / 3.0 R / 3.0 R / 3.0

50 R / 4.5 R / 4.5 R / 4.5

25 R / 4.0 R / 4.0 R / 4.0

R / 3.4 R / 4.5

75

R / 5.0

100 R / 5.0

5.0 s

5 2.5 s 84

10 3.0 s 89

25 4.0 s 94

50 4.5 s 95

75 5.0 s 96

>=200 6.0 s 97

R / 5.0 R / 5.5

200 R / 5.5

5.0 s 5.5 s

R / 5.5 R / 6.0

500 R / 6.0

6.0 s

5.413

3.162 3.164

6.127 6.041

600 8.072 4.879

500 6.0 s

5.375

3.163 3.160

5.957 5.961

500 8.047 4.773

200 5.5 s

5.249

3.172 3.178

5.795 5.746

400 7.751 4.652

100 5.0 s

5.002

3.170 3.176

5.433 5.381

200 7.187 4.448

Table 7 Standard Deviation Estimated from Range

s s s s

Table 6 Range Estimation from Std Dev 10 3.0 3.0 3.0 3.0

3.801

3.088 3.118

3.905 3.881

25 5.196 2.935

Table 8 Estimates of the Range from the Standard Deviation

5 R/ R/ R/ R/

5 2.5 2.4 2.4 2.5

Sample Size Estimate Chevyshev percent

Sample Size Normal Uniform Poisson (3) Poisson (10)

Sample Size Normal Uniform Poisson (3) Poisson (10)

3.076 3.019

2.918 2.846

Average StDev Median StDev

3.152 3.140

2.460 2.468

Average R/S Median R/S

10 4.015 2.304

5 2.828 1.932

N Max of R/S Min of R/S

Table 5 Samples Drawn from A Poisson Distribution = Mean = 10, StDev = 3.162

5.638

3.161 3.157

6.228 6.162

800 7.726 5.037