Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management Taipei, Taiwan, ROC June 6, 2004 pp. g1-g14
October, 2004
ICAQM
Adaptive Allocation Sampling Designs Mohammad Salehi M. Isfahan University of Technology
ABSTRACT In conventional stratified sampling the population is partitioned into strata and simple random samples are selected in strata, independently. Given total sample size and ignoring cost, optimal allocation of sample size among the strata involves allocating larger sample sizes in these strata which are larger and more variable. If there is no prior knowledge of the stratum variances we may take the sample in two steps, the first being stratified simple random sample (e.g., proportional to sizes). The second step involves allocating more sample units based on information, related to the sample variances, in the first step. We call such a sample design as adaptive allocation sampling design. In this article, our primary aim is to introduce appropriate adaptive allocation sampling for estimating parameters of rare populations. We review some recently developed allocation sampling designs and introduce a new adaptive allocation sampling which is appropriate for rare populations. We use a real life population to evaluate the proposed sampling designs. Keywords
----------.
1. Introduction We consider the problem of choosing the sample sizes from strata in stratified random sample. The optimal allocation, in the sense of having minimum variance, rules out taking larger sample if the stratum is larger, more variable internally and sampling is cheaper in the stratum. Ignoring the cost, we need to know the stratum sizes and variances. Knowing the stratum variances will never be the case in practice. One may use past experiences to approximate variances in repeated survey or to use auxiliary variable being highly correlated ______________________ □ Received April 2004, revised September 2004. □ Mohammad Salehi M.is an Associate Professor in the School of Mathematical Science at Iafahan University of Technology; Isfahan, Iran; email:
[email protected]. Published by: The International Chinese Association of Quantitative Management , Taipei, Taiwan, ROC.
g2
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
with variable of interest for which the stratum variances are known. If such a prior information is not available that leads to a decent estimation of stratum variances it is natural to take the sample in two step. In the first step, a portion of available resource is spent to take a stratified random sample. Using information related to stratum variances from the first step, we allocate larger sample to more variable strata. We call this kind sampling design as an adaptive allocation sampling design. Thompson and seber ([15], section 7) provided an excellent review of adaptive allocation sampling design with a predetermined (fixed) sample size. However, our definition covers variable sample size designs as well as fixed sample size design. Designing an efficient sampling scheme for surveying a rare population is one of the most challenging tasks confronting the sampling statistician. For a rare population, Kalton and Anderson [3] and Christman [1] recommended using stratified random sampling in which strata were constructed such that rare elements were confined to very few strata and that strata were discretionally oversampled. In section 2, we briefly review some recent developed adaptive allocation sampling designs. In section 3, we discuss a sequential adaptive allocation sampling which is stratified version of stratified sequential sampling (Salehi and Smith, [9]).
2. Approximately Adaptive Allocation Sampling Designs Consider a population of size N is partitioned into H strata and there is N h units in stratum h (h D 1; ; H ). Let yhi is the value of variable of study unit i in stratum h. In step r (r D 1; 2), a P simple random sample of size nhr is taken without replacement from stratum h, with n r D h nhr and n D n1 C n2 . Then the nh1 and their sum n1 are fixed but the nh2 is given by 8 ˆ < .n n1h / PHNh sh1 if PHNh sh1 > nh1 N s 0 0 0 h h 1 h h0 Nh0 sh0 1 nh2 D (1) ˆ : 0 if PHNh sh nh1 h0
Pnh1
Nh0 sh0 1
P h1 yhi =nh1 where sh2 D iD1 .yhi yNh1 /2 =.nh1 1/ is the sample variance in stratum h and yNh1 D niD1 is the sample mean for stratum h in step 1. We use conventional estimator of stratified sampling for P P the population total D h i yhi H X Ooa D Nh yNh Pnh
hD1
where yNh D hD1 =nh is the final sample mean in stratum h. This estimator is biased. A simulation study shows that the size of the bias tends to be small if the variable of study normally distributed. When the correlation coefficient of the sample mean and the sample variance in the first step is positive Ooa tends to underestimate and vice versa. The size of the bias decrease as the ratio n1 =n of the total step 1 sample size to the total sample size increases. This is not surprising as the bails is zero when the ratio is one. Francis [2] introduced a version of an adaptive allocation sampling design in which the second step was carried out in a sequential fashion. His sampling design was in fishery context. We now describe it in a general format.
Adaptive Allocation Sampling Designs
Mohammad Salehi M.
g3
If some prior information is available, then the step 1 sample can be allocated among the strata using approximate Neyman optimal allocation. At the step 2, we see that if one additional unit is 2 , the reduction in the variance estimation is selected in the stratum h, using the same estimate sh1 2 2 sh1 nh1 sh1 nh1 C 1 2 2 Gh D Nh 1 Nh 1 Nh nh1 Nh nh1 C 1 2 Nh2 sh1 1 1 2 2 (2) D D Nh sh1 nh1 nh1 C 1 nh1 .nh1 C 1/ The Gh ’s are used to determine step 2 allocation sequentially. The first unit of step 2 is chosen at random in the stratum for which G h is the greatest. If stratum j is selected Gj is recalculated as Nh2 sj21 =.nj 1C1 .nj 1 C 2//. The next unit is selected in the stratum for which Gh is a maximum, and so 2 from step 1 sampling is used, all the second step units can be determined before further on. Since sh1 sampling is carried out. Thompson and Seber [15] recommended, one approach to finding a estimator which is design unbiased with an adaptive allocation procedure is to use the Rao-Blackwell methods. The idea is to take a simple random sample of units from each units and then one returns to those strata with, say largest y-values and samples more units. Optimal allocation rule out to select more unit from units with greater sample variances. But using a complicated criterion such as sample variance for extra sampling makes Rao-Blackwell method complicated and inefficient. So, natural solution is to choose any other simpler criterion which is related to the variance of strata. Kermer [4] and Thompson and Seber [15] considered yN h1 > c for a given c as a criterion for choosing extra sampling. If stratum variances tend to increase with stratum mean we expect that this sampling design would be fairly as good as approximate optimal allocation. With this criterion computations of the estimator and variance estimator are some how manageable and this Rao-Blackwell estimator can be even more efficient, in the sense having smaller variance, than the Rao-Blackwell estimator using the sample variance as a criterion. For more detail see Thompson and Seber ([15], P. 184-189). From now on, we focus on estimating parameters of a rare and clustered population. Estimating parameters of a rare population, Kalton and Anderson [3] recommended to use stratified sampling with considering the following two points 1) Confining the rare events to a very few strata 2) Allocating more sampling units to strata containing the rare events. If we have some information about the location of the rare events we would partition the population region into strata such that the rare clusters are confined to few strata and allocate more sampling units to them. If we do not have such a information we partition the population region into strata with having sizes roughly equal to the sizes of the clusters. Since the number of clusters are small the clusters are naturally confined to few strata. Therefore, we just need to allocate more sample units to strata containing the rare events (clusters). In the next section we introduce stratified sequential sampling which is a special case of two-stage sequential sampling (Salehi and Smith, [9]).
3. Stratified Sequential Sampling Once again suppose that we have a total population of N units which are partitioned into H
g4
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
strata of size Nh units .h D 1; 2; :::; H /. Based on our information about the rare clusters, their size and natural restrictions we choose strata sizes. We endeavor to confine the rare clusters to strata so that we increase the probability of observing the rare events. Let the unit .h; i/ denote the ith unit in PNh yhi be the sum of the the hth stratum with an associated measurement or count y hi . Let h D hD1 PH y values in the hth stratum, and let D hD1 h be the total for the whole population. In the first step, we take an initial simple random sample of n h1 units without replacement from PH stratum h (h D 1; 2; ; H ) so that n1 D hD1 nh1 is the total initial sample size. Let C be the condition, say yhi > c for a given value c, that if satisfied for at least one unit in stratum h causes a predetermined number of additional units, say n h2 , to be selected at random from stratum h. As a PH result, n2 D hD1 nh2 is the number of adaptively added units in step 2 and is a random variable. Let lh be the number of units satisfying condition C in the final sample from stratum h. Murthy’s estimator is originally a Rao-Blackwell improvement of Raj estimator [6]. Salehi and Seber [7] Showed that Murthy estimator is also Rao-Blackwell improvement of a trivial unbiased estimator. They also showed that Murthy’s estimator can be used for sequential sampling design. Let Ihi be an indicator function which takes the values 1 (with probability p hi )when unit i is selected as the first unit in stratum h and 0 otherwise. tOh D
N X yhi Ihi phi iD1
is a trivial unbiased estimator of h provided that phi > 0 for i D 1; ; Nh . Let sh be the final sample set in the stratum h. Using Rao-Blackwell theorem, we have Murthy’s estimator Oh D EŒtOh jsh D
X P .sh ji/ i2sh
P .sh /
yhi
(3)
where P .sh / is the probability of obtaining the sample s h in stratum h and P .sh ji/ is the conditional probability of getting the sample s h given the ith unit was selected in the first draw in stratum h. If phi > 0 for i D 1; ; N , according Rao-Blackwell theorem, estimator O h is unbiased for h . The variance of Oh is given by 0 1 2 Nh X Nh X X P .sh ji/P .sh ji 0 / yhi y hi 0 @1 A phi phi 0 : varŒOh D 0 P .s / p p h hi hi i nh2 ˆ nh1 Cnh2 ˆ ˆ ˆ < Nh .nh1 Cnh2 1/! P .sh ji/ nh2 > 0 & lh nh2 .nh1 Cnh2 /!nh2 !.nh1 Cnh2 lh /!=.nh2 lh /! D ˆ P .sh / ˆ and i satisfies C ˆ ˆ ˆ ˆ ˆ Nh f.nh1 Cnh2 1/!nh2 !.nh1 Cnh2 1lh /!=.nh2lh /!g ˆ nh2 > 0 & lh nh2 ˆ .nh1 Cnh2 /!nh2 !.nh1 Cnh2 lh /!=.nh2 lh /! ˆ ˆ ˆ : and i not satisfy C ;
(6)
On substituting (6) into (3) we can compute O h . Note that when nh2 D 0 or nh2 > 0 & lh > nh2 , Oh is Nh times the sample mean, which is the population total estimator for simple random sampling design. For evaluating (5) we need to compute P .s h ji; i 0 /=P .sh / which is given by 8 Nh .Nh 1/ ˆ ˆ nh2 D 0 ˆ nh1 .nh1 1/ ˆ ˆ ˆ ˆ ˆ Nh .Nh 1/ ˆ nh2 > 0 & lh > nh2 ˆ ˆ .nh1 Cnh2 /.nh1 Cnh2 1/ ˆ ˆ ˆ ˆ Nh .Nh 1/.nh1 Cnh2 2/! ˆ nh2 > 0 & lh nh2 and 0 h2 lh /! P .sh ji; i / < .nh1 Cnh2 /!nh2 ! .nh1.nCnl h2 h /! D ˆ P .sh / ˆ either i or i 0 satisfies C ˆ ˆ ˆ ˆ .n Cn lh 2/! ˆ Nh .Nh 1/f.nh1 Cnh2 2/!nh2 ! h1.n h2 g ˆ ˆ h2 lh /! ˆ nh2 > 0 & lh nh2 and ˆ .nh1 Cnh2 lh /! ˆ .n Cn /!n ! h1 h2 h2 ˆ .nh2 lh /! ˆ ˆ ˆ : neither i nor i 0 satisfy C ;
(7)
We now have On substituting (6) and (7) into (5) we can compute vc arŒO h . Note that when nh2 D 0 or nh2 > 0 & lh > nh2 , vc arŒOh is an unbiased variance estimator of the population total for simple random sampling design. we now have O D
H X
Oh ;
(8)
varŒOh ;
(9)
varŒO c h ;
(10)
hD1
which is unbiased for h . Its variance is varŒO D
M X hD1
An unbiased estimator of the above is vc arŒO D
H X hD1
g6
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
4. Simulations We simulated stratified sequential sampling on two real populations which were based on biological populations: blue-winged teal (Figure 1) and freshwater mussels (Figure 2). We computed relative efficiencies of stratified sequential sampling over simple random sampling and traditional stratified sampling. To compute variance of simple random sampling, say var[O s ], and of conventional stratified sampling, say var[O st ], we set sample size equal to the effective sample size, namely EŒ; is final sample size for stratified sequential sampling. Expected final sample size is N Lh H X n fnh1 C nh2 .1 Nh1 /g; EŒ D nh1
hD1
where Lh is the number of units satisfying condition C in the stratum h. We simulated 10,000 samples for given n1 , and n2 . If k denotes the final sample size for replication k, we then have r
1X Ok EŒ D N D r
r
and
kD1
1X EŒ D k r kD1
where r (=10,000) is the number of replications. We then calculated 1 X .Ok N /2 : r 1 r
var./ O D
kD1
For all cases Monte Carlo errors, .N /, was very close to zero. Blue-winged Teal Population
Figure 1 Numbers of blue-winged teal as given by Smith et al. [10] and a demonstration of stratified sequential sampling. Units are partitioned into 8 strata, which are labelled along the left and right margins. Shaded units show results from stratified sequential sampling. Two units are initially selected without replacement from each selected stratum (units shaded light gray). Because the criteria is met, 4 additional units are selected without replacement from stratum 1 and 5 (units shaded dark gray).
Adaptive Allocation Sampling Designs
Mohammad Salehi M.
g7
Elliptio complanata
Elliptio fisheriana
Lampsilis cariosa
Figure 2 Populations of three species of freshwater mussels in a 40 m section of the Cacapon River, West Virginia USA. An exhaustive search for mussels at the substrate surface was conducted during June 1994, and counts of individuals were recorded at a resolution of 0.25 m 2 . Panel A shows the distribution of Elliptio complanata. Panel B shows the distribution of E. fisheriana. Panel C shows the distribution of Lampsilis cariosa. River flow is from right to left.
g8
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
4.1 Blue-Winged Teal Population Smith et al. [10] used a population of blue-winged teal and populations of two other waterfowl species to evaluate adaptive cluster sampling (Thompson [13]). The populations came from comprehensive counts, which were made from helicopters during 13-15 December 1992 in central Florida. The population is extremely clustered, with a total of N D 200 units (Figure 1). We partitioned the blue-winged teal population into H D 8 strata to compare results from Salehi and Seber [8]. As it is shown in Figure 1, each stratum contained Nh D 25 units. For one set of cases we let nh1 D 1; 2; :::; 10; nh2 D 1; 2; :::; 10. In another set of cases, nh1 D 1; 2; 3; 4; 5; nh2 D 13; 14; :::; 20 we computed E[] and var[] O using 10,000 replications for all cases. The condition for sequential sampling was yhi > 10. We computed var[Os ] with sample size equal to the effective sample size, E[], and computed var[O st ] with a sample size of E[]/H units in each stratum. We defined the efficiency of stratified sequential sampling relative to simple random sampling and conventional stratified sampling as the following effsŒO D
varŒOs ; varŒO
effstŒO D
varŒOst : varŒO
The plot of effst ŒO for different values of n h1 and nh2 is given in Figure 3. The range of eff stŒO was from 0.98 to 2.1 or equivalently a 2% loss to 110% gain in efficiency over conventional stratified sampling. It was 0.98 for, nh1 D 1, nh2 D 1. Efficiency was an increasing function of m, nh1 , and nh2 . The plot of effsŒO for different values of EŒ is given in Figure 4. The range of eff sŒO was from 0.98 to 2.09 which means a 2% loss to 109% gain in efficiency over simple random sample. In only 1 of the cases effsŒO was less than one which was for nh1 D 1, nh2 D 1. Efficiency was an increasing function of m, nh1 , and nh2 . To compare stratified adaptive cluster sampling (Thompson, [14]; Salehi and Seber, [8]) and stratified sequential sampling, we computed variances of the modified Hansen-Hurwitz estimator, say varŒOH H , the modified Horvitz-Thompson estimator, say varŒOH T , and the effective sample size for stratified adaptive cluster sampling, say EŒ ; for different initial sample sizes. Because sequentially added units, nh2 , are confined in the selected primary units it is more appropriate to compare var[ O with variances of estimators for the non-overlapping scheme of the stratified adaptive design. For n0h D 1; 2; 10 (i.e., the initial sample size from each stratum), and with the condition y hi > 10, we computed the variances of estimators and the effective sample sizes for stratified adaptive cluster sampling, EŒ . We defined effŒOH H D
varŒO ; varŒOH H
effŒOH T D
varŒO varŒOH T
and in place of ’.’ we used s to denote simple random sampling and t to denote conventional stratified sampling. For specific m and EŒ , we found the closest EŒ with the same m from our study, and the results are given in Table 1. As expected the efficiency of the HT estimator is greater than that of
Adaptive Allocation Sampling Designs
Mohammad Salehi M.
g9
the HH estimator (Salehi, 2003). For 16 out of the 20 cases, the eff ŒOs were greater than effŒOH T s (Table 1).
Figure 3 The relative efficiency of estimator O for stratified sequential sampling over estimator O t for conventional stratified sampling with sample size EŒ=H in each stratum. Each panel represents a different value for n h2 . The target population was the blue-winged teal population presented in Figure 1. 4.2 Freshwater mussel population Density of freshwater mussels is difficult to estimate well because of their tendency to be rare and clustered at some spatial scales (Strayer and Smith [12]). Smith et al. [11]) applied adaptive cluster sampling to low-density populations of freshwater mussels and found adaptive cluster sampling increased observations of individuals and rare species. However, sampling of edge units greatly increased effort with little or no gain in efficiency in density estimates. Thus, an adaptive sampling procedure that reduces or eliminates sampling of edge units would be of great interest for freshwater mussel population assessments. An exhaustive search for freshwater mussels at the substrate surface was conducted in a section of the Cacapon River, West Virginia USA during June 1994. Individual mussels were counted and mapped to the nearest 0.25 m2 (Figure 3). The three species, Elliptio complanata, E. fisheriana, and Lampsilis cariosa exhibited different spatial distributions. The E. complanata population was not rare, but it was relatively clustered. The E. fisheriana population was relatively rare and clustered. The L. cariosa population was rare, but not clustered. Appli-
g10
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
cation of stratified sequential sampling to these populations will demonstrate design performance over a range of spatial distributions. The region for freshwater mussel populations was partitioned into 4800 quadrates. Simulations for 24, 48, and 96 strata were conducted. Because results were similar in each case, we show results for 48 strata of size 100 quadrates each. We computed the efficiency of stratified sequential sampling compared to conventional stratified sampling, eff s t ŒH T , for nh1 D 4; 6; 8; 10; 12; 14; and nh2 D 2; 4; 6. These cases were selected to arrive at sample sizes typical of freshwater mussel surveys.
Figure 4 The relative efficiency of estimator for stratified sequential sampling over estimator O s for simple random sampling with the same effective sample size EŒ. Each panel represents a different value for n h2 . The target population was the blue-winged teal population presented in Figure 1. For the E. complanata population, stratified sequential sampling was efficient for 16 out of the 18 cases (Fig. 5). Losses in efficiency occurred when nh1 D 4 and nh2 D 6. For the clustered and rare population of E. fisheriana, stratified sequential sampling was efficient in all cases (Figure 5). Efficiency ranged from 1.01 to 1.18. There did not appear to be much gain in efficiency for nh1 > 10. Efficiency tended to increase as nh2 increased. For the rare but not clustered population of L. cariosa, stratified sequential sampling did not result in a major gain or loss in efficiency (Figure 5). Efficiency ranged from 0.97 to 1.03. Because the population is not clustered, extra sampling in the vicinity of observed mussels does not increase
Adaptive Allocation Sampling Designs
Mohammad Salehi M.
g11
the likelihood of finding additional mussels. Thus, we did not expect a gain in efficiency. However, there was not a loss in efficiency either due to the application of stratified sequential sampling.
Figure 5 The relative efficiency of estimator O for stratified sequential sampling over estimator O st for conventional stratified sampling with sample size EŒ=H in strata. The target populations were Elliptio fisheriana, Elliptio complanata and Lampsilis cariosa populations presented in Figure 2.
5. Conclusion The most important step when applying sequential stratified sampling is to partition population units into strata that confine the rare events to a few strata for which a large proportion of their units contain the rare events. We then select relatively small nh1 and relatively large nh2 . For example, we partitioned the blue-winged teal population into strata ranging from 50 strata of 4 units to 2 strata of 100 units and found stratified sequential sampling to be an efficient sampling design for all cases. However, the efficiency of stratified sequential sampling relative to simple random sampling, eff s ŒO, ranged from 1.05 to a stunning 2,051 and the efficiency of stratified sequential sampling relative to stratified sampling, effst ŒO , ranged from 1.06 to an equally stunning 1,350. The most efficient sampling design corresponded to size 4 units with condition c > 0, n h1 D 1 , and nh2 D 3. In this design, one stratum contains units with counts of 7,144, 6,339, 150, and 6. This stratum has 13,639 out of the 14,121 blue-winged teals in the population. All the units in this stratum will be in the sample set with probability one in this sampling design.
g12
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
We can provide general guidelines for choosing appropriate values for C , n h1 , and nh2 . Suppose that the condition has the form of y hi > c and consider the two extreme cases. If we choose c too small such that almost all yhi s become greater than c stratified sequential sampling would tend to be more like conventional stratified sampling with n i D nh1 C nh2 . If we choose c too big such that almost all yhi s are smaller than c stratified sequential sampling would tend to be more like conventional stratified sampling with ni D nh1 . If the variable of interest is an indicator function (i.e., takes either 0 or 1), there is no problem choosing the condition C . However, there should be a set of optimal (or nearly optimal) values for different population models. Table 1 The relative efficiencies of estimators for population of blue-winged teal in Figure 2. Estimators are HH and HT for stratified adaptive cluster sampling design with non-overlapping scheme. Efficiencies for and initial sample size nhi D 1; 2; ; 10 are presented in columns 7, 8, 9 and 10. The effective sample sizes, EŒ , are given in column 2. From our study given in Figure 3 and 4, we choose the closest effective sample sizes, EŒ, for stratified adaptive sampling, and their relative efficiencies are given in columns 10 and 11. n0h
EŒ
nh1
nh2
EŒ
eff t ŒOH T
effst ŒOH H
effs ŒOH T
effs ŒOH H
effst Œ O
effs ŒO
1
10.88
1
7
10.84
1.27
1.27
1.26
1.26
1.18
1.17
2
21.28
2
7
21.29
1.30
1.28
1.29
1.27
1.19
1.18
3
31.25
2
20
31.22
1.33
1.29
1.33
1.28
1.62
1.62
4
40.82
3
16
41.26
1.37
1.30
1.36
1.30
1.57
1.56
5
50.05
4
13
49.40
1.40
1.31
1.40
1.31
1.59
1.59
6
58.75
4
20
58.83
1.44
1.32
1.44
1.32
1.99
1.98
7
67.60
7
7
68.02
1.48
1.32
1.47
1.33
2.08
2.08
8
75.98
7
10
75.66
1.52
1.34
1.51
1.34
1.69
1.69
9
84.15
8
9
83.23
1.55
1.35
1.55
1.35
1.68
1.68
10
92.13
9
9
92.44
1.59
1.36
1.59
1.36
1.77
1.76
Based on our study, larger value of nh1 ’s and smaller value of nh2 s resulted in small improvement in efficiency, but the improvement was observed for almost all of the populations under study. For some cases, small values of nh1 s and large values of nh2 s resulted in a great improvement in efficiency, but there were some cases where such a design resulted in a loss of efficiency. If we do not know how clustered and rare the population is we recommend moderate or larger values of n h1 s and relatively smaller values of nh2 s. If we know the population is highly clustered and these clusters are confined in a few strata we recommend setting nh2 to be a large proportion of N h . A potential inefficiency is due to having to traverse the same area when sampling the n h2 s after already sampling the ni1 s. It is important to mention that if we fail to make optimal choices for strata, C , n h1 and nh2 efficiency of sequential stratified sampling would be similar to conventional stratified sampling. Stratified adaptive cluster sampling itself can be an adaptive allocation sampling. Selecting initial samples are the first step and adding the neighboring units to construct the networks can be seen as
Adaptive Allocation Sampling Designs
Mohammad Salehi M.
g13
the allocating more sampling units (second step). I am now working on another adaptive allocation sampling design. It is similar to stratified sequential sampling with the following difference. If the number of selected units from stratum h satisfying the condition is m h , we select th mh more units in the second step, th is a predetermined value. I expect that this sampling design to be more efficient that stratified sequential sampling.
Appendix Evaluation of
P .shji/ P .sh/
If nh2 D 0, P .sh /would be 1=
Nh nh1
and P .sh ji/ would be 1=
Nh 1 . Thus nh1 1
P .sh ji/ Nh D : P .sh / nh1 If ni2 > 0 and lh > nh2 there would be at least one sample unit satisfying C which is placed in the nh1 first samples for all possible permutation of n h1 C nh2 samples units. This means that all possible h initial samples of size nh1 give rise to select another sample of size nh2 . Hence, P .sh / D 1= nh1NCn h2 h 1 and P .sh ji/ D 1= nh1NCn . Thus h2 1 P .sh ji/ Nh : D P .sh / nh1 C nh2 If nh2 > 0, lh nh2 and j satisfies C , the probability of choosing an ordered sample giving rise to sh is 1=.Nh Nh 1 Nh nh1 nh2 C 1/. Those permutations without a sample unit satisfying C in the nh1 first samples do not give rise to sh . To find the number of permutation not giving rise to sh , we first allocate all the lh sample units satisfying C to the nh2 last samples which can be done by nh2 nh2 1 nh2 lh C 1 and we allocate the nh1 C nh2 lh remaining sample units to the rest of places which can be done by .nh1 C nh2 lh /!. Hence the number of permutations giving rise to sh is .nh1 C nh2 /! .nh2 nh2 1 nh2 lh C 1/.nh1 C nh2 lh /!. Thus P .sh / D f.nh1 C nh2 /! .nh2 nh2 1 nh2 lh C 1/.nh1 C nh2 lh /!g=f.Nh Nh 1 Nh nh1 nh2 C 1/g. Since unit j satisfies C all permutations of the remaining sample units h 1 give rise to sh , P .sh ji/ D 1= nh1NCn . Thus 1 h2 P .sh ji/ Nh .nh1 C nh2 1/! D : P .sh / .nh1 C nh2 /! nh2 !.nh1 C nh2 lh /!=.nh2 lh /! When nh2 > 0, lh nh2 and j does not satisfy C , P .sh / is the same as the previous case. The probability of choosing an ordered sample giving rise to s h ji is 1=.Nh 1 Nh nh1 nh2 C 1/. Since j does not satisfy C those permutations without a sample unit satisfying C in the remaining nh1 1 first samples do not give rise to sh ji. To find the number of permutation not giving rise to sh ji, we first allocate all the lh sample units satisfying C to the nh2 last samples which can be done by nh2 nh2 1 nh2 lh C 1 and we allocate the nh1 C nh2 lh 1 remaining sample units to the rest of places which can be done by .nh1 C nh2 1 lh /!. Hence the number of permutations giving rise to sh is .nh1 C nh2 1/! .nh2 1 nh2 lh C 1/.nh1 C nh2 1 lh /!. Thus
g14
Proceedings of The First Sino-International Symposium on Probability, Statistics, and Quantitative Management 2004
P .sh ji/ D f.nh1 C nh2 1/! .nh2 1 nh2 2 nh2 lh C 1/.nh1 C nh2 1 lh /!g=f.Nh 1 Nh 2 Nh nh1 nh2 C 1/g. Thus P .sh ji/ Nh f.nh1 C nh2 1/! nh2 !.nh1 C nh2 1 lh /!=.nh2 lh /!g D : P .sh / .nh1 C nh2 /! nh2 !.nh1 C nh2 lh /!=.nh2 lh /!
References [1] Christman, M. C. (2000). A review of quadrat-based sampling of rare, geographically clustered populations, Journal of Agricultural, Biological and Environmental Statistics, 5, 168-201. [2] Francis R. I. C. C. (1984). An adaptive strategy for stratified random trawl surveys, New Zealand Journal of Marine and Freshwater Research, 18, 59-71. [3] Kalton, G. and Anderson, D. W. (1986). Sampling rare populations, Journal of the Royal Statistical Society A, 149, 65-82. [4] Kremers, W. K. (1987). Adaptive sampling to account for unknown variability among strata, preprit No. 128, Institut. F¨ur Mathematik, Universit¨at Augsburg, Germany. [5] Murthy, M. N. (1957). Ordered and unordered estimators in sampling without replacement, Sankhya, 18, 379-390. [6] Raj, D. (1956). Some estimators in sampling with varying probabilities without replacement, Journal of the American Statistical Association, 51, 269-284. [7] Salehi M, M. and Seber, G. A. F. (2001). A new proof of Murthy’s estimator which applies to sequential sampling, Australian & New Zealand Journal of Statistics, 43, 901-906. [8] Salehi M, M. and Seber, G. A. F. (1997). Two-stage adaptive cluster sampling, Biometrics, 53, 959-970. [9] Salehi M, M. and Smith (2004). Two-stage sequential sampling design: a neighborhood-free adaptive sampling procedure, submitted. [10] Smith, D. R., Conroy, M. J. and Brakhage, D. H. (1995). Efficiency of adaptive cluster sampling for estimating density of wintering waterfowl, Biometrics, 51, 777-788. [11] Smith, D. R., Villella, R. F. and Lemarie, D. P. (2003). Application of adaptive cluster sampling to low-density populations of freshwater mussels, Environmental and Ecological Statistics, 10, 7-15. [12] Strayer, D. L. and Smith, D. R. (2003). A Guide to Sampling Freshwater Mussel Populations, Bethesda, Maryland: American Fisheries Society. [13] Thompson, S. K. (1990). Adaptive cluster sampling, Journal of the American Statistical Association, 85, 1050 - 1059. [14] Thompson, S. K. (1991). Stratified adaptive cluster sampling, Biometrika, 78, 389-397. [15] Thompson, S. K. and Seber, G. A. F. (1996). Adaptive Sampling, New York: Wiley.