3.359. 3.714. 3.764. 4.448. 4.652. 4.773. 4.879. 5.037. Average R/S. 2.460. 3.152. 3.905. 4.459. 4.799. 4.958. 5.433. 5.795. 5.957. 6.127. 6.228. Me dia n. R. /S.
Range estimation: A Monte Carlo examination of variability. sample size and distributional shape Robert M. Lynch Kenneth Monfort College of Business Administration University of Northern Colorado Jon Fortune Department of Health State of Wyoming Range estimation is a ‘first’ topic in introductory statistics and is usually discussed in concert with the ‘Empirical Rule’ and Chebyshev’s Theorem. Neither considers sample size nor estimates ranges with any precision for small samples. At the same time, popular textbooks omit discussions of the relationship of the range to standard deviation, sample size and distributional form. In this paper, the authors examine (1) the impact of sample size on the range, (2) the use of the range to estimate the sample standard deviation, and (3) the effect of the distributional form on the accuracy of estimation. Monte Carlo methods were used to produce samples for skewed and symmetrical distributions. Snedecor (1967) described the relationship between sample size, range and standard deviation for samples selected from normal populations. A principal finding is summarized in the abbreviated table below. It suggests as sample size increases, the range increases and it identifies some key values for commonly used standard deviation distances. For example, with a sample of size 10, the range can be estimated at 3 standard deviations, with a high and low for the sample estimated at +/- 1.5 standard deviations from the mean.
Table 1 Estimates for the Range Sample Size 10 25 100 >100
Range 3s 4s 5s 6s
High/Low +/- 1.5 s +/- 2.0 s +/- 2.5 s +/- 3.0 s
Analysis Two hundred and fifty samples were generated for sample sizes ranging from 5 to 800. Four distributional forms were considered: Normal, Uniform, Poisson (Mean=3) and Poisson (Mean=10). The latter two illustrate skewed distributions. The simulations were produced using the Excel data analysis functions. Several measures were determined: a. b. c. d.
The Range, Mean and Standard Deviation were determined for each sample. The maximum and minimum of the ratio of the range-to-standard deviation (R/S) for each run of the 250 samples was determined. The Mean and Median Range were determined for the 250 samples for each sample size. The Mean and Median Standard Deviation were determined for the 250 samples for each sample size.
The results of each of the simulations are presented in Tables 2 through 5. Table 6 summarizes the range estimates for the distributions and sample sizes examined. Table 7 provides estimates of the standard deviation from the range for each distribution and sample size. Lastly, Table 8 summarizes several general findings and compares the results to Chebyshev’s results. Table 2 through Table 5 describe the results for each distributional form understudy. Using Table 2 as an illustration, a normal variable with mean = 0 and standard deviation = 1, 250 samples of size 5 to 800 were generated. The ratio of the range to the standard deviation (R/S) for each sample was computed. The maximum R/S and minimum R/S for the 250 samples at each sample size are presented. This is followed by the average R/S and median R/S for each sample size. The average and median standard deviations for the samples are also presented. The median and mean R/S provide insight into the relationship of range to the standard deviation. The median and mean standard deviation provide a gauge of the underestimation in the standard deviation in small samples. As one would expect, the underestimation appears in small samples. From Table 2, the data suggests that for samples of size 5, the range can be estimated at (2.478 x standard deviation) or about 2.5 standard deviations. Conversely, for a sample of size 5 with a range of 4, the standard deviation (s) can be estimated at 1.6 (4/2.5). Table 3 through Table 5 are interpreted in the same way for the other distributions. Table 6 summarizes the estimation of the range from the standard deviation by distributional form. These values were selected from Tables 2 to Table 5 and rounded for simple estimation. Table 7 summarizes the estimation of the standard deviation from the range by distributional form. Lastly, Table 8 provides a table that can be used to estimate the range from the standard deviation across all distributions. It is this last table that is useful in teaching elementary statistics. Findings Normal distribution and Poisson distribution (Mean=10). The ratios of the range to the standard deviation for both distributions are similar and are also similar to those presented in Snedecor (1967). The exception here is at larger samples sizes, N=100 and N=200. Uniform distribution. The ratios are similar at smaller sample sizes but as expected the range is limited by the distributions range of 0-1. Poisson distribution (Mean=3). For small sample sizes the results are similar to the Normal and Poisson (Mean=10) but are less closely aligned as sample sizes increases. Discussion The results presented in tables 2 through 5 can be helpful in teaching the impact of sample size on range and the underestimation present in the standard deviation for small samples. As expected, the underestimation decreases quickly as sample size increases regardless of the distribution form. From the tables, the population standard deviation is noted and one can observe at what sample size the sample standard deviation begins to closely approximate the population value. For example, one observes that at a sample size of 25, the sample standard deviation begins to approximate the population value for a normal variate. Table 8 provides the student with estimates of the range from the standard deviation for samples of different sizes. Students may find it helpful to understand the impact of sample size on the standard deviation and to estimate the range for a sample when sample size, mean and standard deviation are reported.
Acknowledgments Snedecor, G. W. & Cochran, W. G. (1967) Statistical methods. Ames, IA: Iowa State University Press.
5 2.837 1.904
2.475 2.478
0.914 0.862
5 2.813 1.847
2.402 2.418
0.283 0.288
5 2.828 1.826
2.435 2.434
1.623 1.581
Sample Size Max of R/S Min of R/S
Average R/S Median R/S
Average StDev Median StDev
N Max of R/S Min of R/S
Average R/S Median R/S
Average StDev Median StDev
N Max of R/S Min of R/S
Average R/S Median R/S
Average StDev Median StDev
1.695 1.687
3.084 3.076
10 3.906 2.145
0.281 0.282
2.921 2.900
10 3.740 2.270
0.941 0.926
3.164 3.146
10 3.913 2.464
0.995 0.998
4.496 4.418
50 5.800 3.605
0.989 0.988
4.791 4.760
75 6.770 3.570
0.999 0.990
5.009 4.955
100 9.323 4.026
1.004 1.006
5.543 5.460
200 10.462 4.279
0.286 0.288
3.360 3.326
50 4.186 2.868
0.289 0.288
3.382 3.392
75 3.894 2.994
0.288 0.288
3.409 3.403
100 3.902 3.079
0.287 0.287
3.449 3.449
200 3.770 3.184
1.729 1.720
3.838 3.801
25 5.388 2.533
1.719 1.708
4.350 4.334
50 5.910 3.126
1.712 1.696
4.538 4.520
75 6.289 3.288
1.734 1.736
4.691 4.577
100 6.285 3.486
1.725 1.729
5.045 5.002
200 6.546 3.970
Table 4 Samples Drawn from A Poisson Distribution = Mean = 3, StDev = 1.732
0.289 0.289
3.220 3.214
25 3.922 2.649
Table 3 Samples Drawn from A Uniform Distribution = Mean = .5, StDev = .289
0.998 0.982
3.940 3.884
25 5.142 3.185
Table 2 Samples Drawn from A Normal Distribution = Mean = 0, StDev = 1
1.728 1.729
5.330 5.249
400 7.152 4.329
0.288 0.288
3.458 3.458
400 3.682 3.252
1.000 0.997
6.021 5.897
400 11.646 4.761
1.734 1.734
5.480 5.375
500 7.785 4.388
0.289 0.289
3.444 3.441
500 3.639 3.253
0.997 0.996
6.092 6.023
500 11.895 5.005
1.726 1.725
5.529 5.413
600 7.499 4.489
0.289 0.289
3.450 3.449
600 3.616 3.314
0.998 0.996
6.214 6.050
600 11.886 5.316
1.729 1.727
5.697 5.638
800 7.434 4.455
0.289 0.289
3.452 3.451
800 3.607 3.307
0.999 0.999
6.474 6.389
800 11.861 5.312
3.076
2.434
2.5 2.4 2.4 2.5
s s s s
4.334
3.148 3.148
4.459 4.422
50 5.851 3.359
4.520
3.145 3.120
4.799 4.732
75 6.367 3.714
4.577
3.151 3.140
4.958 4.924
100 6.560 3.764
50 4.5 s 4.5 s 4.5 s
25 4.0 s 4.0 s 4.0 s
3.4 s 4.5 s
75
10 R / 3.0 R / 3.0 R / 3.0 R / 3.0
50 R / 4.5 R / 4.5 R / 4.5
25 R / 4.0 R / 4.0 R / 4.0
R / 3.4 R / 4.5
75
R / 5.0
100 R / 5.0
5.0 s
5 2.5 s 84
10 3.0 s 89
25 4.0 s 94
50 4.5 s 95
75 5.0 s 96
>=200 6.0 s 97
R / 5.0 R / 5.5
200 R / 5.5
5.0 s 5.5 s
R / 5.5 R / 6.0
500 R / 6.0
6.0 s
5.413
3.162 3.164
6.127 6.041
600 8.072 4.879
500 6.0 s
5.375
3.163 3.160
5.957 5.961
500 8.047 4.773
200 5.5 s
5.249
3.172 3.178
5.795 5.746
400 7.751 4.652
100 5.0 s
5.002
3.170 3.176
5.433 5.381
200 7.187 4.448
Table 7 Standard Deviation Estimated from Range
s s s s
Table 6 Range Estimation from Std Dev 10 3.0 3.0 3.0 3.0
3.801
3.088 3.118
3.905 3.881
25 5.196 2.935
Table 8 Estimates of the Range from the Standard Deviation
5 R/ R/ R/ R/
5 2.5 2.4 2.4 2.5
Sample Size Estimate Chevyshev percent
Sample Size Normal Uniform Poisson (3) Poisson (10)
Sample Size Normal Uniform Poisson (3) Poisson (10)
3.076 3.019
2.918 2.846
Average StDev Median StDev
3.152 3.140
2.460 2.468
Average R/S Median R/S
10 4.015 2.304
5 2.828 1.932
N Max of R/S Min of R/S
Table 5 Samples Drawn from A Poisson Distribution = Mean = 10, StDev = 3.162
5.638
3.161 3.157
6.228 6.162
800 7.726 5.037