Optimal Allocation in Stratified and Double Random Sampling with a ...

1 downloads 0 Views 80KB Size Report
We will use the following notation for stratified random sampling: ... units in the sample, n = n1 +···+nL; Wk = Nk/N is the stratum weight; yk is the value obtained ...
Journal of Mathematical Sciences, Vol. 103, No. 4, 2001

OPTIMAL ALLOCATION IN STRATIFIED AND DOUBLE RANDOM SAMPLING WITH A NONLINEAR COST FUNCTION

A. Chernyak (Kiev, Ukraine)

UDC 519.2

1. Introduction Let the population of N units is divided into subpopulations of N1 , N2 , . . . , NL units, respectively. These subpopulations are nonoverlapping, and N1 + · · ·+ NL = N . The subpopulations are called strata. Stratified random sampling is used when the within-strata response variation is less than the variance within the entire population. We will use the following notation for stratified random sampling: L is the number of strata; nk is the number of sampling units selected from stratum k, n is the total number of sampling units in the sample, n = n1 + · · · + nL ; Wk = Nk /N is the stratum weight; yk is the value obtained for the ith unit from PNk Pn k PNk yki /Nk is the mean; y k = i=1 yki /nk is the sample mean; Sk2 = (1/(Nk − 1)) i=1 (yki − Y k )2 stratum k; Y k = i=1 is the true variance. PL PL For the population mean per unit Y = k=1 Wk Y k , the unbiased estimate is y st = k=1 Wk y k with variance V (y st ) =

L X

Wk2 Sk2 /nk −

k=1

L X

Wk2 Sk2 /Nk

(1)

k=1

(see [2, Theorem 5.3]). PL Let C = k=1 ck f(nk ) be the cost function, ck be the cost per unit in the kth stratum, and f(x) be some function. The values of the sample sizes nk in the respective strata are chosen by the sampler. They may be selected to minimize V (y st ) for a specified cost or to minimize the cost C for a specified value of V (y st ). In the case of the linear cost PL function (f(x) = x, C = k=1 ck nk ), optimal allocations were given by [2–4, 6, 7]. This paper is a continuation of research by the author (see [1]). 2. Results for Stratified Sampling PL THEOREM 1. In stratified random sampling with the cost function C = k=1 ck nα k , α > 0, the variance of the estimated mean y st is minimum for a specified cost C, and the cost is minimum for a specified variance V when (W 2 S 2 /ck )1/(1+α) nk = PL k k . 2 2 1/(1+α) n j=1 (Wj Sj /cj ) If cost is fixed, then C 1/α n = P L

PL

2 2 1/(1+α) k=1 (Wk Sk /ck ) .  2 2 1/α α/(1+α) 1/α (W S c ) j j j j=1

If V is fixed, then

(2)

PL 2 2 1/α 1/(1+α) 2 2 1/(1+α) k=1 (Wk Sk ck ) j=1 (Wj Sj /cj ) . PL V + k=1 Wk Sk2 /N

(3)

PL n=

(4)

Proof. Take λ as the Lagrangian multiplier and minimize PL

PL

2 2 k=1 Wk Sk

nk



2 2 k=1 Wk Sk

Nk

X  L +λ ck n α − C k k=1

for chosen nk . Added to the English translation. To be published in a forthcoming Russian issue. c 2001 Plenum Publishing Corporation 1072-3374/01/1034-0525$25.00

525

Hence, to minimize V (y st ) for fixed C or vice versa, we have λαck nkα−1 = and

 nk =

Wk2 Sk2 (ck αλ)

1/(1+α) n=

,

L X

Wk2 Sk2 n2k

nk = (αλ)−1/(1+α)

k=1

1/(1+α) L  X W 2 S2 k

k=1

k

ck

.

(5)

Dividing nk by n, we obtain (2). If the cost is fixed, substitute the values of nk in the cost function C and solve for (λα)1/(1+α) . This gives 1/(α+1)

(αλ)

=

X L

1/α 1/α (Wk2 Sk2 ck )α/(1+α)

C −1/α .

k=1

Then for the total number of sampling units n we obtain (3). If V is fixed, substituting the values of nk (see (5)) in the formula for variance (1) and solving for (λα)1/(1+α) , we obtain (4). The theorem is proved. PL THEOREM 2. If the cost function is of the form C = k=1 ck log nk , V (y st ) is minimum for a specified cost C, and the cost is minimum for a specified variance V when W 2 S 2 /ck nk = PL k k . 2 2 n j=1 Wj Sj /cj If C is fixed, then

(6)

  PL PL  2 2 C − k=1 ck log(Wk2 Sk2 /ck ) j=1 Wj Sj . n = exp PL cj k=1 ck

If V is fixed, then n=

L X k=1

PL ck

2 2 j=1 Wj Sj /cj

V +

PL

k=1

Wk Sk2 /N

.

The proof of this theorem is similar to that of Theorem 1. Remark 1. Let N be as large as Nk (N → ∞, Nk → ∞, k = 1, . . . , L). If α → 0, then the optimal size of samples n and nk tends to infinity in the case of a power cost function but the proportion nk /n has the limit (6); the same holds for a proportion for the logarithmical cost function. Example. Stratum 1 2

1) 2) 3) 4)

C C C C

Sk 10 20

ck $4 $9

We consider four cases: = c1 n21 + c2 n22 ; = c 1 n 1 + c2 n 2 ; √ √ = c 1 n 1 + c2 n 2 ; = c1 log n1 + c2 log n2 . PL Let N be so large that the term k=1 Wk Sk2 /N is negligible. Cases 1 2 3 4

526

Wk 0.4 0.6

TABLE 1. C = 100$ is fixed. n n1 n2 n1 /n 5 2 3 0.4 15 5 10 0.33 104 30 74 0.29 4006 801 3205 0.2

n2 /n 0.6 0.67 0.71 0.8

TABLE 2. V = 1 is fixed. n n1 n2 263 103 160 264 88 176 257 72 185 260 52 208

Cases 1 2 3 4

C($) 272836 1936 156 64

In the Table 1, the total sample size increases as well as the size of the sample from each stratum. In Table 2, the total sample size is almost the same, but the size of the sample from the first stratum decreases while that from the second stratum increases; the minimum cost substantially decreases, of course. 3. Optimal Allocation in Double Sampling for Stratification Further we consider the method of double sampling or two-phase sampling. The first sample is a simple random sample of size n0 , wk = n0k /n0 is the proportion of the first sample falling in stratum k, and Ewk = Wk . The second sample is a stratified random sample of size n in which the yki are measured: nk units are drawn from stratum k. The first sample is to estimate the strata weights and the second sample is to estimate the strata means Y k . We assume that the nk are a random subsample of the n0k , nk = vk n0k , 0 < vk ≤ 1, and vk are chosen in advance. For the population PL mean per unit Y the unbiased estimate is y α = k=1 wk y k with variance  V (y α ) = S

2

1 1 − n0 N



  L X Wk Sk2 1 + −1 n0 vk

(7)

k=1

(see [2, Theorems 12.1 and 12.2; 5, Theorem 1]); S 2 (N − 1) =

L X

(Nk − 1)Sk2 +

k=1

L X

Nk (Y k − Y )2

k=1

and S 2 is the population PLvariance. Let C = c0 (n0 )α + k=1 ck nk , α > 0, be a cost function, c0 be the cost of classification per unit, and ck be the cost of measuring a unit in stratum k. Since the nk are random variables, we consider the expected cost for chosen n0 and vk : C ∗ = E(C) = c0 (n0 )α + n0

L X

(8)

ck vk Wk .

k=1

The objective is to choose the n0 and vk so as to minimize V (y α ) for the specified expected cost or to minimize the expected cost for the specified variance V . In the case of the linear cost function (α = 1), the optimal allocations in double sampling for stratification were given in [2, 5]. 4. Results for Double Sampling THEOREM 3. The variance of the estimated mean y α is minimum for a specified cost C ∗ , and the expected cost is minimum for a specified variance when 0



n =

βSn2 αc0

1/(α+1) ,

Sk vk = √ ck



αc0 β (α−1)/2 Sn2

P 2 where Sn2 = S 2 − L k=1 Wk Sk > 0. For the first problem, β is a unique positive root of the equation p Aβ α/(α+1) + B β − C ∗ = 0, where A = [(Sn2 /α)α c0 ]1/(α+1), B =

PL k=1



1/(α+1) ,

(9)

(10)

ck Wk Sk . 527

For the second problem, λ = β −1 is unique positive root of the equation √ A1 λ1/(α+1) + B λ − V1 = 0,

(11)

where A1 = αA, V1 = V + S 2 /N . Proof. Take λ as the Lagrangian multiplier and minimize  S

2

1 1 − n0 N



    L L X X Wk Sk2 1 0 0 α 0 ∗ + − 1 + λ c (n ) + ck vk Wk n − C n0 vk k=1

k=1

for chosen n0 and vk . Hence, to minimize V (y α ) for fixed C ∗ or vice versa, we have Sk , n0 vk = √ ck λ

λαc0 (n0 )α+1 = Sn2 .

√ 2/(α+1) ). These give n0 = (Sn2 /(λαc0 ))1/(α+1) and vk = Sk (λαc0 )1/(α+1) /( ck λSn If the cost is fixed, substitute the optimum values of n0 and vk in the cost function (8) and solve √ for β = 1/λ. Then we have (10). For example, if α = 1, then β = (C ∗ /(A+B))2 ; if α = 1/3, then β = (A2 +2BC ∗ −A A2 + 4BC ∗)2 /4B 4 . Hence, we obtain (9). vk in (7). For λ we obtain (11). For example, if α = 1, λ = If V is fixed, substitute the optimum n0 and p 2 V1 /(A1 + B)2 ; if α = 3, then λ = (A21 + 2BV1 − A1 A21 + 4BV1 )2 /4B 4. The theorem is proved. PL THEOREM 4. If the expected cost function is of the form C ∗ = c0 log n0 + n0 k=1 ck vk Wk , V (y α ) is minimum for a specified cost C ∗ when βS 2 Sk c0 vk = √ , (12) n0 = 0 n , c ck β Sn2 where β is the unique positive root of the equation p log β + B β + c0 log(Sn2 /c0 ) − C ∗ = 0. The expected cost C ∗ is minimum for a specified variance V when n0 = 2c0 Sn2 (B 2 + 2V1 c0 −

p

B 2 + 4V1 c0 B)−1 ,

√ Sk ( B 2 + 4V1 c0 − B) . vk = √ 2 ck Sn2 The proof of this theorem is similar to that of Theorem 3. It is clear from (9) or (12) that vk ≤ 1 if c0  ck and/or Sk2 is not too large relative to Sn2 . REFERENCES 1. A. I. Chernyak, “Optimal allocation in stratified random sampling with nonlinear cost function,” in: Abstracts of the 4th World Congress of the Bernoulli Society (1996), p. 152. 2. W. G. Cochran, Sampling Techniques, Wiley, New York (1977). 3. M. H. Hansen, W. N. Hurwitz, and W. G. Madow, Sample Survey Methods and Theory, Wiley, New York (1953). 4. M. N. Murthy, Sampling Theory and Methods, Statistical Publishing Society, Calcutta (1967). 5. J. N. K. Rao, “On double sampling for stratification and analytical surveys,” Biometrika, 60, No. 1, 125–133 (1973). 6. C.-E. Sarndal, B. Swensson, and J. Wretman, Model Assisted Survey Sampling, Springer, New York (1992). 7. S. K. Thompson, Sampling, Wiley, New York (1992).

528