Area estimators variance from systematic samples.

16 downloads 0 Views 117KB Size Report
Area estimators variance from systematic samples. Javier Gallego. JRC, I-21020 Ispra. Italy e-mail: [email protected]. Abstract. We consider two-dimensional ...
Area estimators variance from systematic samples. Javier Gallego JRC, I-21020 Ispra. Italy

e-mail: [email protected]

Abstract. We consider two-dimensional systematic samples on a square grid for an example using CORINE Land Cover as pseudo-ground-truth. A variance estimator based on the comparison of each observation with the neighbouring values is compared with the real variance and with the result of applying the usual variance estimator for random sampling. Keywords: land cover area estimation, systematic sampling. area frame sampling.

1 Two-dimensional systematic sampling. Systematic sampling is generally more efficient than simple random sampling (SRS). One-dimensional systematic sampling is optimal if the autocorrelation is positive, decreasing and convex (Bellhouse, 1988). For point sampling in the plane, triangular grids perform slightly better than square grids (Olea, 1984), but square area units are more practical for field survey. The main drawback of systematic sampling is the absence of an unbiased estimator for the variance. The classical variance estimation formulae for random sampling are sometimes used, but this generally overestimates the variance. Other options are splitting the sample or combining several replicates (Koop, 1971), but estimates of the variance are often unstable. Drawing several replicates reduces the efficiency (Gautschi, 1957). Other variance estimators compare each sample element with neighbours. Wolter (1984) compares several estimators of this type for the one-dimensional case. Matern (1986) studies similar estimators for the plane with an assumption of stationarity, that may be debatable. We test here neighbourhood variance estimators with semi-real data.

2. Test data set and results CORINE Land Cover (CLC) is a land cover map made by photo-interpretation of satellite images and additional information (CEC, 1993). We use it as pseudo-truth, so that we can assess different estimators knowing the whole population. We assess estimates for the area of 7 major land cover types in Andalucía (Spain). A 1 km grid has been overlaid keeping only the cells fully inside the region to simplify boundary effects (N=86715). Each cell (i, j ) is identified by a row i and a column j . All possible 100 systematic samples are drawn with a 10 km step (1% sampling rate). The target variable is Yc : % land cover class c . The next variances of yc are compared: • The expected variance with simple random sampling (SRS). • The expected variance with systematic sampling (this can be computed because the complete population is known). • The average estimated variance using SRS formulae on systematic samples .



The mean estimated variance from differences between neighbouring observations: ) V ( yc ) =

1 2 2 dijN + dijE ∑ 2m n ij

where dijN = yc (i, j ) − yc (i + 10, j )

(1)

dijE is the difference with the neighbour to the east, and m is the total number of pairs. Table 1: Variance estimators of land cover from CLC in Andalucía

Artificial Arable Perm. Crops Heterog. Agric. Forest Nat. veg Water

area (km2) 1165 24626 12242 12859 11555 22933 1336

SRS Std error 235 1125 844 867 811 1053 300

Systematic Std. Error Relat. Eff. 204 1.32 750 2.25 646 1.71 660 1.72 506 2.57 943 1.25 161 3.47

as SRS 236 1131 848 872 816 1058 301

Estimated std. error % bias from % bias variance neighbours variance 34 220 16 128 826 21 73 654 3 74 736 24 160 691 87 26 876 -14 250 231 106

Table 1 gives the values of standard errors of the area estimates (in km2), easier to interpret than the variances of yc . Relative efficiency and bias are computed from the variances. Several comments can be made: • Systematic sampling is more efficient than SRS, confirming results in the literature. • The usual SRS variance estimator strongly overestimates the variance of systematic sampling. The estimated variance appears to be slightly worse than with SRS. • Estimating the variance by comparing neighbouring values generally gives a moderate overestimation. The correlograms for certain land cover types still need to be analysed to understand why a strong overestimation or an underestimation happen in these cases.

References Bellhouse D.R. (1988), Systematic sampling, in Handbook of Statistics, ed. P.R. Krisnaiah, C.R. Rao, pp. 125-146, North Holland-Elsevier, Amsterdam CEC (1993), CORINE Land Cover; guide technique, Report EUR 12585EN. Office for Publications of the European Communities. Luxembourg,. 144 pp Gautschi W. (1957), Some remarks on systematic sampling, Annals of Mathematical Statistics, Vol. 28, pp. 385-394 Koop J.C. (1971), On splitting a systematic sample for variance estimation. Annals of Mathematical Statistics. Vol 42, n° 3, 1084-1087 Matern B. (1986), Spatial variation. Springer Verlag lecture notes in statistics, n. 36. Olea R.A. (1984), Sampling design optimization for spatial functions, Mathematical Geology, 16, 4, 369-392. Wolter K.M. (1984), An investigation of some estimators of variance for systematic sampling. Journal of the American Statistical Association, Vol. 79 No 388, pp. 781790.

Suggest Documents