This article was downloaded by: [University of the South Pacific] On: 11 March 2015, At: 14:06 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20
Designing stratified sampling in economic and business surveys a
a
a
M.G.M. Khan , K.G. Reddy & D.K. Rao a
School of Computing, Information and Mathematical Sciences, The University of the South Pacific, Suva, Fiji Published online: 09 Mar 2015.
Click for updates To cite this article: M.G.M. Khan, K.G. Reddy & D.K. Rao (2015): Designing stratified sampling in economic and business surveys, Journal of Applied Statistics, DOI: 10.1080/02664763.2015.1018674 To link to this article: http://dx.doi.org/10.1080/02664763.2015.1018674
PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions
Journal of Applied Statistics, 2015 http://dx.doi.org/10.1080/02664763.2015.1018674
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Designing stratified sampling in economic and business surveys M.G.M. Khan, K.G. Reddy ∗ and D.K. Rao School of Computing, Information and Mathematical Sciences, The University of the South Pacific, Suva, Fiji (Received 23 June 2014; accepted 9 February 2015)
In most economic and business surveys, the target variables (e.g. turnover of enterprises, income of households, etc.) commonly resemble skewed distributions with many small and few large units. In such surveys, if a stratified sampling technique is used as a method of sampling and estimation, the convenient way of stratification such as the use of demographical variables (e.g. gender, socioeconomic class, geographical region, religion, ethnicity, etc.) or other natural criteria, which is widely practiced in economic surveys, may fail to form homogeneous strata and is not much useful in order to increase the precision of the estimates of variables of interest. In this paper, a stratified sampling design for economic surveys based on auxiliary information has been developed, which can be used for constructing optimum stratification and determining optimum sample allocation to maximize the precision in estimate. Keywords: economic survey; stratified random sampling; optimum stratification; optimum sample size allocation; mathematical programming problem; dynamic programming technique
1.
Introduction
Stratified random sampling is an important sampling technique in most economic surveys such as estimating the per capita income, average cost of living, average return on investment, market share of product, demand or import (or export) of a commodity, rate of unemployment and in many other parameter estimations conducted by countries’ Statistics Offices. In stratified sampling, the sampling frame is divided into non-overlapping groups or strata in such a way that the strata constructed are internally homogeneous with respect to the survey variable (y) under study that maximizes the precision of its estimate. Since in most practical situations, it is difficult to construct such optimum strata, more often the surveyors stratify the population in most convenient manners, such as the use of geographical regions (e.g. Noth, Central, South, etc.), administrative regions (e.g. provinces, districts, etc.) or other natural criteria (e.g. urban–rural, size of firms (small–medium–large), sex, age etc.). However, the stratification by convenience
*Corresponding author. Email:
[email protected]
c 2015 Taylor & Francis
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
2
M.G.M. Khan et al.
manner is not always a reasonable criterion as the strata so obtained may not be internally homogeneous with respect to the variable of interest, which may end up with the reduction of the precision of estimates. Thus, one has to look for the OSB that increases the precision of the estimates. The problem of determining OSB for a variable, when its frequency distribution f (y) is known, is well known in the sampling literature. The basic consideration involved in determining OSB is that the strata should be as internally homogenous as possible, that is, in order to achieve maximum precision, the stratum variances σh2 should be as small as possible for a given type of sample allocation. To achieve this, an ideal situation is that the distribution of the variable under study is known and the OSB is determined by cutting the range of this distribution at suitable points. This problem of determining OSB was first discussed by Dalenius [8] considering that both the estimation and stratification variables are same. He presented a set of minimal equations which are usually difficult to solve for OSB because of their implicit nature. Hence, subsequently the attempts for determining approximately optimum stratum boundaries have been made by several authors, such as Dalenius and Gurney [10], Mahalanobis [30], Hansen and Hurwitz [16], Sethi [40], Aoyama [2] and Ekman [13]. When the frequency distribution of the auxiliary variable (x) is known, several approximation methods of determining OSB using the auxiliary variable have been suggested and discussed by many authors, such as Dalenius [9], Dalenius and Hodges [11], Sethi [40], Taga [48], Serfling [39], Singh and Sukhatme [43–45], Singh [41], Singh and Prakash [42], Cochran [7], Mehta et al. [31], Rizvi et al. [38] and Gupta et al. [15]. Attempts have also been made to determine the global OSB by many authors. Unnithan [50] proposed an iterative method that requires a suitable initial solution. For a skewed population where a certainty stratum is necessary, Lavallée and Hidiroglou [28] proposed an algorithm to construct stratum boundaries for a power allocated stratified sample. Hidiroglou and Srinath [17] presented a more general form of the algorithm, which by assigning different values to operating parameters yields a power allocation, a Neyman allocation, or a combination of these allocations. Later, Sweet and Sigman [47] and Rivest [37] reviewed Lavallée and Hidiroglou algorithm and proposed their modified versions of the algorithm that incorporate the different relationships between the stratification and study variables. Detlefsen and Veum [12] investigated the Lavallée and Hidiroglou algorithm for several strata and observed that the algorithm’s convergence was slow or non-existent. They also found that different starting points lead to different OSB for the same population. There are several other algorithms available in the literature, for example, Niemiro [36] proposed a random search method and Nicolini [35] suggested natural class method. Lednicki and Wieczorkowski [29] presented a method of stratification using the simplex method of Nelder and Mead [33]. Later, Kozak [23] presented the modified random search algorithm, which was quite faster and efficient as compared to Rivest [37], and Lednicki and Wieczorkowski [29]. Gunning and Horgan [14] proposed an alternative method to determine approximate OSB based on a geometric progression. Horgan [18] compared this approach with Dalenius and Hodges [11], Ekman [13] and Lavallée and Hidiroglou [28] and confirmed that geometric progression method is more efficient. However, Kozak and Verma [24] studied the usefulness of Gunning and Horgan’s geometric progression method and obtained a different result that the geometric progression approach is less efficient than Lavallée and Hidiroglou’s algorithm [25]. Another kind of stratification method that has been proposed in the literature is due to Bühler and Deutler [5]. They formulated the problem of determining OSB as an optimization problem and developed a computational technique to solve the problem by using dynamic programming. This approach was also used by Lavallée [26,27] for determining the OSB which would divide the population domain of two stratification variables into distinct subsets such that the precision of the variables of interest is maximized. Khan et al. [19–22] and Nand and Khan [32] also use
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Journal of Applied Statistics
3
dynamic programming technique for determining the OSB when the frequency function of the survey variable is known. In this paper, a method of constructing optimum stratum boundaries (or optimum cut-off points) and determining optimum sample size for each stratum is developed for a survey (or main) variable, which leads to substantial gains in the precision of the estimates. However, the determination of OSB and the sample size based on the survey variable is not feasible in practice since the variable of interest is unavailable prior to conducting the survey. Thus, if an auxiliary variable is regressible with the study variable, the proposed technique determines the OSB and the sample size based on the auxiliary variable for which the data are readily available. As many economic variables are skewed in nature, the auxiliary variable is considered to have a positively skewed distribution with gamma density function that can characterize a wide range of economic and business data because of its versatility in fitting a variety of distributions. The problem of finding the OSB and optimum sample size is formulated as mathematical programming problem (MPP) that seeks minimization of the variance of the estimated population parameter under Neyman allocation. The MPP is then solved for OSB by developing a solution procedure using a dynamic programming technique. A numerical example with a real data set of skewed population that follows a gamma distribution, aiming for estimating an agricultural product or annual income in a national economic survey, is presented to illustrate the procedure developed in this paper. To emphasizethe improvement of the existing procedures, the empirical findings are compared with the cum f method of Delenius and Hodges [11], the geometric method of Gunning and Horgan [14] and the Lavallee–Hidiroglou [28] method. The results show that the proposed design yields a gain in efficiency on the variance of the estimator over the Cum f , Geometric and L-H (Kozak) methods. A simulation study also reveals similar results. 2.
The problem of OSB
Let the target population be stratified into L strata based on a single auxiliary variable x when the estimation of the mean of study variable y is of interest. If a simple random sample of size nh is to be drawn from the hth stratum with sample mean y¯ h ; (h = 1, 2, . . . , L), then the stratified sample mean, y¯ st , is given by L y¯ st = Wh y¯ h , (1) h=1
where Wh is the proportion of the population contained in the hth stratum. When the finite population correction factors are ignored, under the Neyman [34] allocation: Wh σhy , nh = n · L h=1 Wh σhy the variance of y¯ st is given by
Var(¯yst ) =
L h=1
Wh σhy n
(2)
2 ,
(3)
2 are the stratum weight and the stratum variance for the hth stratum; h = where Wh and σhy 1, 2, . . . , L, respectively, and n is the preassigned total sample size. Consider that the study variable has the regression model of the form:
y = λ(x) + ,
(4)
where λ(x) is a linear or a nonlinear function of x and is an error term such that E(|x) = 0 and V (|x) = φ(x) > 0 for all x.
4
M.G.M. Khan et al. 2 Under model (4), the stratum mean μhy and the stratum variance σhy can be expressed as [43]
and
μhy = μhλ
(5)
2 2 σhy = σhλ + μhφ ,
(6)
2 where μhλ and μhφ are the expected values of functions λ(x) and φ(x), respectively, and σhλ denotes the variance of λ(x) in the hth stratum. 2 can also be expressed as [10] If λ and are uncorrelated, from model (4), σhy
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
2 2 2 σhy = σhλ + σ+h ,
(7)
2 is the variance of in the hth stratum. It can be verified that the expression (6) and (7) where σh are equivalent. Let f (x); a ≤ x ≤ b be the frequency function of the auxiliary variable x that is used for the stratification. If the population mean of the study variable y is estimated under the Neyman allocation given in Equation (2), then the problem of determining the strata boundaries is to cut up the range, d = b − a, at (L − 1) intermediate points a = x0 ≤ x1 ≤ x2 ≤, . . . , ≤ xL−1 ≤ xL = b such that Equation (3) is minimum. For a fixed sample size n, minimizing the expression of the right-hand side of Equation (3) is equivalent to minimizing Lh=1 Wh σhy . Thus, from Equation (6), we minimize L
2 Wh σhλ + μhφ .
(8)
h=1 2 and μhφ can be obtained as a function If f (x), λ(x) and φ(x) are known and integrable, Wh , σhλ of the boundary points xh and xh−1 by using the following expressions: xh Wh = f (x) dx, (9) xh−1
and where
2 σhλ
1 = Wh
μhφ
1 = Wh
μhλ =
1 Wh
xh
xh−1
xh
λ2 (x)f (x) dx − μ2hλ ,
(10)
φ(x)f (x) dx
(11)
λ(x)f (x) dx
(12)
xh−1
xh
xh−1
and (xh−1 , xh ) are the boundaries of the hth stratum. Thus, the objective function in Equation (8) could be expressed as a function of boundary points xh and xh−1 only.
2 Let φh (xh , xh−1 ) = Wh σhy = Wh σhλ + μhφ . Then, the problem of determination of OSB can be expressed as the following optimization problem: Find x1 , x2 , . . . , xL that
Minimize
L
φh (xh , xh−1 )
h=1
subject to
a = x0 ≤ x1 ≤ x2 ≤, . . . , ≤ xL−1 ≤ xL = b.
(13)
Journal of Applied Statistics We further define
lh = xh − xh−1 ;
h = 1, 2, . . . , L,
5 (14)
where lh ≥ 0 denotes the range or width of the hth stratum. Obviously, with this definition of lh , the range of the distribution, d = b − a, is expressed as a function of stratum width as L L lh = (xh − xh−1 ) = b − a = xL − x0 = d. (15) h=1
h=1
The hth stratification point xh ; h = 1, 2, . . . , L is then expressed as
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
xh = x0 +
h
li
i=1
or,
xh = xh−1 + lh .
Adding Equation (15) as a constraint, the problem (13) can be treated as an equivalent problem of determining optimum strata widths (OSW), l1 , l2 , . . . , lL , and is expressed as the following MPP: L φh (lh , xh−1 ), Minimize h=1
subjectto
L
lh = d,
h=1
and
lh ≥ 0;
h = 1, 2, . . . , L.
(16)
Initially, x0 is known as this is the initial value for the auxiliary variable. Therefore, the first term, that is, φ1 (l1 , x0 ) in the objective function of the MPP (16) is a function of l1 alone. Once l1 is known, the second term φ2 (l2 , x1 ) will become a function of l2 alone and so on. Due to the special nature of functions, the MPP (16) may be treated as a function of lh alone and can be expressed as Minimize
L
φh (lh ),
h=1
subjectto
L
lh = d,
h=1
and 3.
lh ≥ 0;
h = 1, 2, . . . , L.
(17)
The solution procedure
The problem (17) is a multistage decision problem in which the objective function and the constraint are separable functions of lh , which allows us to use a dynamic programming technique [22]. Dynamic programming determines the optimum solution of a multi-variable problem by decomposing it into stages, each stage compromising a single variable subproblem. A dynamic programming model is basically a recursive equation based on Bellman’s principle of optimality [4]. This recursive equation links the different stages of the problem in a manner which guarantees that each stage’s optimal feasible solution is also optimal and feasible for the entire problem [49, Chapter 10].
6
M.G.M. Khan et al. Consider the following subproblem of Equation (17) for first k(< L) strata: Minimize
k
φh (lh )
h=1
subject to
k
lh = dk ,
h=1
and
lh ≥ 0;
; h = 1, 2, . . . , k,
(18)
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
where dk < d is the total width available for division into k strata or the state value at stage k. Note that dk = d for k = L. The transformation functions are given by dk = l1 + l2 + · · · + lk , dk−1 = l1 + l2 + · · · + lk−1 = dk − lk , dk−2 = l1 + l2 + · · · + lk−2 = dk−1 − lk−1 , .. .. . . d2 = l1 + l2 = d3 − l3 , d1 = l1 = d2 − l2 . Let k (dk ) denote the minimum value of the objective function of Equation (18), that is,
k k
k (dk ) = min φh (lh )
lh = dk , and lh ≥ 0; h = 1, 2, . . . , k and 1 ≤ k ≤ L .
h=1
h=1
With the above definition of k (dk ), the MPP (17) is equivalent to finding L (d) recursively by finding k (dk ) for k = 1, 2, . . . , L and 0 ≤ dk ≤ d. We can write
k−1 k−1
φh (lh )
lh = dk − lk , and lh ≥ 0; h = 1, 2, . . . , k . k (dk ) = min φk (lk ) +
h=1
h=1
For a fixed value of lk ; 0 ≤ lk ≤ dk , k (dk ) = φk (lk ) + min
k−1 h=1
k−1
φh (lh )
lh = dk − lk , and lh ≥ 0; h = 1, 2, . . . k − 1 and 1 ≤ k ≤ L .
h=1
Using Bellman’s principle of optimality, we write a forward recursive equation of the dynamic programming technique as k (dk ) =
min [φk (lk ) + k−1 (dk − lk )], 0 ≤ lk ≤ dk
k ≥ 2.
(19)
For the first stage, that is, for k = 1: 1 (d1 ) = φ1 (d1 ) =⇒ l1∗ = d1 ,
(20)
where l1∗ = d1 is the optimum width of the first stratum. The relations (19) and (20) are solved recursively for each k = 1, 2, . . . , L and 0 ≤ dk ≤ d, and L (d) is obtained. From L (d) the
Journal of Applied Statistics
7
optimum width of the Lth stratum, lL∗ , is obtained. From L−1 (d − lL∗ ) the optimum width of ∗ , is obtained and so on until l1∗ is obtained. (L − 1)th stratum, lL−1 4.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
4.1
The construction of OSB with gamma auxiliary variable Gamma distribution
The gamma distribution is a two-parameter family of continuous probability distributions bounded at zero. It is frequently a probability model for waiting times; for instance, in life testing, the waiting time until death is a random variable that is frequently modeled with a gamma distribution. The gamma is a flexible life distribution model and is extremely useful in risk analysis modeling. Due to its moderately skewed profile, it is versatile in fitting of variety of distributions (e.g. chi-square or exponential). It can be used as a model in a range of disciplines, including climatology and economic, where it can be used for modeling of rainfall and various economic data, such as insurance claims or risk, the size of loan defaults and wealth, income. [6,46]. If the auxiliary variable x follows the gamma distribution (i.e. x ∼ (r, θ )) on the interval [x0 , xL ], it has the following two-parameter probability density function: f (x; r, θ ) =
1 x xr−1 e− θ , θ r (r)
x > 0; r, θ > 0,
(21)
where r is a shape parameter and θ is the scale parameter and (r) is a gamma function defined by ∞ tr−1 e−t dt, r > 0. (22) (r) = 0
The function in Equation (22) is also defined by an upper incomplete gamma function (r, x) and a lower incomplete gamma function γ (r, x), respectively, as follows: ∞ tr−1 e−t dt; (23) (r, x) =
x x
γ (r, x) =
tr−1 e−t dt.
(24)
0
There also exist regularized/normalised incomplete gamma functions which give a value restricted between 0 and 1 and can be stated as ∞ 1 tr−1 e−t dt, r, x > 0; (r) = 0; (25) Q(r, x) = (r) x x 1 tr−1 e−t dt, r, x > 0; (r) = 0, (26) P(r, x) = (r) 0 where Q(r, x) denotes the upper regularized incomplete gamma function while P(r, x) denotes the regularized lower incomplete gamma function [1, Chapter 6]. Note that Q(r, x) = 1 − P(r, x). 4.2
Estimating the regression model
To illustrate the formulation of the problem of determining OSB as an MPP for a skewed population with Gamma auxiliary variable, we use a sugar cane farming population data of N = 13, 894 farmers. The data obtained from the Fiji Sugar Corporation (FSC), Fiji Islands for the following three characteristics for each farmer:
8
M.G.M. Khan et al.
Table 1. ANOVA for regression model of sugarcane on disposition land. Source Regression Residual error Total
Sum of squares (SS)
df
Mean square (MS)
f
p-value
80,767,260 176,999,089 257,766,349
1 13,892 13,893
80,767,260 12,741
6339.12
0.000
Table 2. Summary of model parameters.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Predictor α β
Coefficient
SE Coeff.
t
p-value
24.422 12.453
2.006 0.1564
12.17 79.62
.000 .000
(i) Production of sugar cane (in tonnes) during the period 2007–2008. (ii) Annual income (in Fiji dollars). (iii) Disposition area or land area under cultivation (in hectares). Suppose that a national economic survey is to be carried out to estimate the total production of sugar cane using stratified random sampling. If an auxiliary variable is used to increase the precision of the estimate in the surveys, then the Disposition Area may be a reasonable choice for the auxiliary variable, x. To estimate a regression model for the production of sugar cane (y) given in Equation (4), we observed that the data significantly fit a linear regression model with disposition area (x). Table 1 presents the analysis of variance (ANOVA) of the fitted regression model and Table 2 depicts the summary of the estimates of the model parameters. The results reveal that the fitted regression model and the estimated parameters are highly significant with p-values < .001. The coefficient of determination or correlation coefficient, R2 = SSR/SST, with a value of 31.3% indicates a good strength of linear relationship between the two variables. The model is adequate to fit the data since R2 is found to be one of the highest for the linear model when compared with the model summary of all the other nonlinear models available in standard statistical packages. Figure 1 depicts a positive linear association through the scatterplot for the Production (y) versus the Disposition Area (x). A huge number of points are concentrated in the vicinity of the identity line since the population size is quite large. Therefore, the production of sugar cane (y) and the disposition area (x) are fairly assumed to follow a linear regression model given in Equation (4) where λ(x) = α + βx
(27)
and the least-squares estimates of the parameters are given by αˆ = 24.422 4.3
and
βˆ = 12.453.
(28)
Estimating the distribution of auxiliary variable
For the data of auxiliary variable, disposition area (x), the coefficient of variation is found to be 54.33%, whereas, the skewness is +2.19 which indicates that the right tail of x is heavier than the left tail.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Journal of Applied Statistics
9
Figure 1. Scatterplot of production vs. disposition area (x).
Figure 2. Frequency histogram of the disposition area (x).
To determine the distribution, f (x), we construct a relative frequency histogram of x. Figure 2 reveals that the distribution of x is a right skewed distribution that matches the gamma distribution. We also obtained the probability plot (p–p) of x to determine whether the distribution of x matches the gamma distribution. As the points are clustered around a straight line, Figure 3
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
10
M.G.M. Khan et al.
Figure 3. Gamma P–P Plot of the disposition area (x).
reveals that x is fairly assumed to follow the gamma distribution with a probability density function given by Equation (21). The maximum likelihood estimate of the parameters for the gamma distribution were found to be r = 3.836157
and
θ = 2.937784.
(29)
Using the Kolmogorov–Smirnov test, the maximum difference between the observed distribution and the gamma distribution is found to be non-significant which supports that x follows the gamma distribution with parameters given in Equation (29). 4.4
Formulation of the problem of determining OSB as an MPP
When the auxiliary variable x follows the gamma distribution with density function given by 2 can be obtained Equation (21), using Equations (9)– (12), (23), (25) and (27), Wh , μhλ , and σhλ as a function of boundary points (xh−1 , xh ) as follows: x x h−1 h − Q r, . (30) Wh = Q r, θ θ Thus, substituting Equation (14), that is, xh as xh−1 + lh in Equation (30) gives the stratum weight as xh−1 xh−1 + lh Wh = Q r, − Q r, . (31) θ θ Similarly, μhλ is obtained by μhλ = α +
βθr[Q(r + 1, xh−1 /θ ) − Q(r + 1, (xh−1 + lh )/θ )] [Q(r, xh−1 /θ ) − Q(r, (xh−1 + lh )/θ )]
(32)
Journal of Applied Statistics
11
2 and σhλ is reduced to
β 2 θ 2 r(r + 1)[Q(r + 2, xh−1 /θ ) − Q(r + 2, xh /θ )] [Q(r, xh−1 /θ ) − Q(r, xh /θ )]
2 σhλ =
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
− β 2 θ 2 r2
[Q(r + 1, xh−1 /θ ) − Q(r + 1, (xh−1 + lh )/θ )]2 . [Q(r, xh−1 /θ ) − Q(r, (xh−1 + lh )/θ )]2
(33)
Then, using Equations (8), (31) and (33), the formulated MPP given in Equation (17) to determine the optimum stratum widths and hence the optimum stratum boundaries could be expressed as ⎫ ⎧ xh−1 xh−1 + lh ⎪ ⎪ 2 2 ⎪ ⎪ − Q r, Sqrt β θ r(r + 1) Q r, ⎪ ⎪ ⎪ ⎪ θ θ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x + l x h−1 h−1 h ⎪ ⎪ ⎪ ⎪ × Q r + 2, − Q r + 2, ⎪ ⎪ L ⎨ ⎬ θ θ 2 Minimize xh−1 + lh xh−1 ⎪ ⎪ ⎪ −β 2 θ 2 r2 Q r + 1, − Q r + 1, h=1 ⎪ ⎪ ⎪ ⎪ ⎪ θ θ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 ⎪ ⎪ + l x x ⎪ ⎪ h−1 h−1 h ⎪ ⎪ ⎪ ⎪ +μ − Q r, Q r, hφ ⎩ ⎭ θ θ subjectto
L
lh = d,
h=1
and
lh ≥ 0;
h = 1, 2, . . . , L,
(34)
where d = xL − x0 = b − a, β is the regression coefficient, θ and r are parameters of the gamma distribution, and Q(·) is the upper regularized incomplete gamma function obtained by Equation (25), whereas, μhφ is the expected variance given in Equation (7) for the error term in the regression model (4). 4.5
Estimating the variance of the error term
In the regression model given in Equation (4), it is assumed that the variance of the error term is V (|x) = φ(x) for all x in the range (a, b) and the expected value of the function φ(x) given by μhφ is obtained by Equation (11). Many authors have assumed that φ(x) may be of the form: φ(x) = cxg ;
c > 0, g ≥ 0,
(35)
where c and g are constants and in many populations 0 ≤ g ≤ 2 (see [38,41,43]). Thus, from Equations (11), (21), and (35), we may compute μhφ as a function of boundary points as follows: μhφ =
cθ g (r + g)[Q(r + g, xh−1 /θ ) − Q(r + g, (xh−1 + lh )/θ )] . (r)[Q(r, xh−1 /θ ) − Q(r, (xh−1 + lh )/θ )]
(36)
Therefore, one can determine the expected value of the stratum variance of the error term using Equation (36), if the values of the constants c and g are known. However, for our sample data, when a common regression model holds across the strata, we obtain the expected stratum
12
M.G.M. Khan et al.
variance of the error as μhφ =
SSRes = MSRes = 12741, N −p
(37)
where SSRes and MSRes are the sum of squares of residuals and MS of residuals, respectively, and p is the number of parameters in the regression model.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
5.
Determination of optimum sample size
When OSB (xh , xh−1 ) are determined as discussed in Sections 2–4, the optimum sample size nh ; h = 1, 2, . . . , L that minimizes the variance of the estimate can easily be computed. If the study variable holds the regression model (4) with the auxiliary variable across the strata, using Equations (2) and (6) the sample size nh under the Neyman allocation are obtained for a fixed total sample of size n as follows: 2 Wh σhλ + μhφ ; nh = n · L 2 W σ + μ h hφ h=1 hλ
h = 1, 2, . . . , L,
(38)
2 and μhφ are derived using Equations (9)–(12) for the optimum boundary points where Wh , σhλ (xh , xh−1 ).
6.
Numerical illustration
This section illustrates the computational details of the solution procedure using dynamic programming technique discussed in Section 3 for determining the OSB of a skewed population with gamma auxiliary variable. If the estimation of the production of sugar cane in an economic survey is of interest, for this farming population, the smallest and the largest values of the auxiliary variable x are x0 = 0.494211 and xL = 69.189507, respectively. This implies that the range of distribution is d = x0 − xL = 68.695296.
(39)
Substituting the values of β, r, θ, μhφ and d respectively, from Equation (28), (29), (37) and (39), the problem of determining the OSW given in MPP (34) is expressed as
Minimize
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ L ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
⎪ h=1 ⎪ ⎪
⎫ ⎪ ⎪ ⎪ Sqrt (3.836157)(4.836157) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ dk − lk + 0.494211 dk + 0.494211 ⎪ ⎪ ⎪ × Q 5.836157, − Q 5.836157, ⎪ ⎪ 2.937784 2.937784 ⎪ ⎪ ⎪ dk + 0.494211 ⎪ dk − lk + 0.494211 ⎪ ⎪ − Q 3.836157, × Q 3.836157, ⎪ ⎪ ⎪ 2.937784 ⎪ 2.937784 ⎪ ⎬ d − l + 0.494211 k k −(3.836157)2 Q 4.836157, 2.937784 ⎪ ⎪ ⎪ ⎪ dk + 0.494211 2 ⎪ ⎪ −Q 4.836157, ⎪ ⎪ ⎪ 2.937784 ⎪ ⎪ ⎪ ⎪ dk − lk + 0.494211 ⎪ ⎪ +12741 Q 3.836157, ⎪ ⎪ ⎪ 2.937784 ⎪ ⎪ 2 ⎪ ⎪ dk + 0.494211 ⎪ ⎪ −Q 3.836157, ⎪ ⎭ 2.937784
(12.453)2 (2.937784)2
Journal of Applied Statistics subjectto
L
13
lh = 68.695296,
h=1
and
lh ≥ 0;
h = 1, 2, . . . , L.
(40)
Note that the (h − 1)th stratification is xh−1 = x0 + l1 + l2 + · · · + lh−1 = 0.494211 + l1 + l2 + · · · + lh−1 = 0.494211 + dh−1
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
= 0.494211 + dh − lh . Substituting this value of xh−1 , the recurrence relations (19) and (20) are used to solve the MPP (40), which are reduced as follows: For the first stage, k = 1:
1 d1 = Sqrt (12.453) (2.937784) 2
2
(3.836157)(4.836157)
d1 + 0.494211 0.494211 − Q 5.836157, × Q 5.836157, 2.937784 2.937784 0.494211 d1 + 0.494211 × Q 3.836157, − Q 3.836157, 2.937784 2.937784 d1 + 0.494211 2 0.494211 2 −(3.836157) Q 4.836157, − Q 4.836157, 2.937784 2.937784 d1 + 0.494211 2 0.494211 − Q 3.836157, +12741 Q 3.836157, 2.937784 2.937784 at
l1∗ = d1 .
(41)
And for the stages (k ≥ 2): k dk = min
0≤lk ≤dk
Sqrt (12.453) (2.937784) 2
2
(3.836157)(4.836157)
dk + 0.494211 dk − lk + 0.494211 − Q 5.836157, × Q 5.836157, 2.937784 2.937784 dk − lk + 0.494211 dk + 0.494211 × Q 3.836157, − Q 3.836157, 2.937784 2.937784 dk − lk + 0.494211 − (3.836157)2 Q 4.836157, 2.937784 2 dk + 0.494211 −Q 4.836157, 2.937784
14
M.G.M. Khan et al. dk − lk + 0.494211 + 12741 Q 3.836157, 2.937784 dk + 0.494211 2 −Q 3.836157, + k−1 (dk − lk ) . 2.937784
(42)
Solving the recursive equations (41) and (42), the OSW lh∗ and hence the OSB x∗h = x∗h−1 − are obtained by executing a C + + computer program coded for the algorithm discussed in Section 3.
lh∗
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
7.
Results and discussion
In this section, results are presented using two different examples. In the first example, the OSB are obtained using a real data set discussed in Section 4.2. Whereas, in the second example, a simulated data that follows the gamma distribution are used to obtain the OSB. In both examples, an investigation is undertaken to compare the effectiveness of the proposed dynamic programming method with the following methods: (i) Dalenius and Hodges’ cum f [11] method. (ii) Geometric method by Gunning and Horgan [14]. (iii) Generalized Lavallee–Hidiroglou method with Kozak’s [23] algorithm. The stratification package recently developed by Baillargeon and Rivest [3] in the R statistical software is used to determine the OSB for the methods mentioned above. These OSB are then used to compute the sample size of each stratum and the variance of the estimated mean (or the values of the objective function) so that a comparative analysis could be carried out. 7.1
Example with real data
Table 3 presents the OSB together with the values of the objective function Lh=1 φh (lh ) = L 2 h=1 Wh σhλ + μhφ for 4 different strata, i.e. L = 2, 3, 4, and 5. For comparison purposes, the OSB determined for the cum f method, geometric method, Lavallee–Hidiroglou’s method using the stratification package with CV = 0.5433 (obtained from the data) and the proposed dynamic programming method are presented in Table 4. Considering all different values of L, where L = 2, 3, 4 and 5, the optimum values of variances are also presented. Whereas, from Equation (38), the optimum sample size nh for each stratum with a fixed total sample size n = 600 using these OSB for the different methods are presented in Table 5. Upon careful examination of the results in Tables 4 and 5, it is noted that the OSB and the sample sizes obtained by the cum f and Lavallee–Hidiroglou’s methods are by far the closest to the proposed dynamic programming method. Whereas, the results of the geometric method differ vastly from that of the dynamic programming method. Thus, it can be concluded that there seems to be a difference between the OSB and the sample size obtained using the different methods including the proposed dynamic programming method. By looking at the variance columns in Tables 4 and 5, it can be seen that the dynamic programming method yields the smallest variance for all L = 2, 3, 4 and 5 as compared with all the other methods. Thus, the study reveals that the dynamic programming technique is more efficient in the sense that it maximizes the precision of the estimate as compared with the other methods while stratifying a skewed population with gamma distribution. Finally, the optimum stratum boundary points of the survey variable, y, obtained by using
Journal of Applied Statistics
15
Table 3. OSW, OSB and optimum value of the variance.
OSW (lh∗ )
OSB (x∗h = x∗h−1 + lh∗ )
Optimum values of the objective function L L 2 +μ φ (l ) = W σ hφ h=1 h h h=1 h hλ
2
l1∗ = 12.411736 l2∗ = 56.283562
x∗1 = 12.905947
120.7117590069
3
l1∗ = 9.270456 l2∗ = 7.567950 l3∗ = 51.856892
x∗1 = 9.764667 x∗2 = 17.332617
116.9919519821
4
l1∗ = 7.649858 l2∗ = 5.282320 l3∗ = 6.965080 l4∗ = 48.798038
x∗1 = 8.144069 x∗2 = 13.426389 x∗3 = 20.391469
115.4155695210
5
l1∗ = 6.616776 l2∗ = 4.195930 l3∗ = 4.776070 l4∗ = 6.661330 l5∗ = 46.445190
x∗1 = 7.110987 x∗2 = 11.306917 x∗3 = 16.082987 x∗4 = 22.744317
114.5997657314
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
No. of Strata L
Table 4. OSB for the different methods. Cum f Method Geometric method
L-H (Kozak’s) method
L
OSB
OSB
2
14.23 120.92468311
5.85
127.82887764
13.47 120.75187555 12.905947 120.71174060
3
10.8 117.11136088 2.57 17.67 13.32
120.31365835
8.53 117.14688053 9.764667 116.99199104 16.68 17.332617
4
10.8 14.2 21.1
116.06362532
1.7 5.85 20.11
120.55856673
7.54 115.47962722 8.144069 115.41559050 12.23 13.426389 19.40 20.391469
5
7.36 10.8 17.7 24.5
114.81719654
1.33 118.976665219 7.54 114.64137652 7.110987 114.59979609 3.57 11.49 11.306917 9.58 16.43 16.082987 25.75 24.34 22.744317
Variance
OSB
Variance
Variance
Dynamic prog. technique OSB
Variance
the regression model (4), (27) and (28) under the proposed dynamic programming method are presented in Table 6.
7.2
Example with simulated data
To illustrate the proposed stratification methodology with another example, a data set of N = 10, 000 following gamma distribution with r = 1.5 and θ = 2.0 was randomly generated by the R software. The smallest and the largest values in the data set were x0 = 0.008892559 and xL = 22.591830031 which result the range d = 22.58293747. For the purpose of the stratification, it is assumed that the study variable holds a linear regression with the auxiliary variable and the estimated values of β = 5 and μhφ = 1.
16
M.G.M. Khan et al.
Table 5. Optimum sample size for the different methods with n = 600.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Cum
f Method
Geometric method
L-H Method
Proposed method
L
h
nh
Variance
nh
Variance
nh
Variance
nh
Variance
2
1 2
438 162
120.92468311
87 513
127.82887764
414 186
120.75187555
394 206
120.71174060
3
1 2 3
319 197 84
117.11136088
9 399 192
120.31365835
215 282 103
117.14688053
273 237 90
116.9919910
4
1 2 3 4
322 120 118 40
116.06362532
2 90 460 48
120.55856673
171 203 167 59
115.47962722
199 217 137 47
115.41559050
5
1 2 3 4 5
163 156 201 62 18
114.81719654
1 24 232 330 13
118.97666519
172 176 148 85 19
114.64137652
152 188 148 84 28
114.59979609
Table 6. Optimum stratum boundary for survey variable (y). OSB for x(xh )
ˆ OSB for y yˆ h = αˆ + βx
Variance
2
x∗1 = 12.905947
yˆ ∗1 = 185.139758
120.7117590069
3
x∗1 = 9.764667 x∗2 = 17.332617 x∗1 = 8.144069 x∗2 = 13.426389 x∗3 = 20.391469 x∗1 = 7.110987 x∗2 = 11.306917 x∗3 = 16.082987 x∗4 = 22.744317
yˆ ∗1 yˆ ∗2 yˆ ∗1 yˆ ∗2 yˆ ∗3 yˆ ∗1 yˆ ∗2 yˆ ∗3 yˆ ∗4
= 146.0214069 = 240.2650795 = 125.8400913
116.9919519821
= 191.6208222 = 278.3569635 = 112.9751211
115.4155695210
= 165.2270374 = 224.7034371 = 307.6569796
114.5997657314
No. of Strata L
4
5
The OSB obtained by the dynamic programming method and the values of the objective L L 2 function h=1 φh (lh ) = h=1 Wh σhλ + μhφ for 4 different strata, i.e. L = 2, 3, 4, and 5 are presented in Table 7. For comparison purposes, the OSB and the variances are determined for the cum f method, geometric method and Lavallee–Hidiroglou’s method using the stratification package with CV = 0.75. The computational results are presented in Table 8. The results in Table 8 show a similar trend with the OSB and the variances as seen with the real data set. The results reveal that the proposed dynamic programming method produces marginally lower variances than the cum f and Lavallee–Hidiroglou’s methods. However, the variances of the geometric method are much more higher than the proposed method. Thus, the proposed approach is more efficient method than the others. Finally, the optimum sample size for each stratum and the optimum stratum boundary points of the survey variable, y, can also be obtained respectively for a fixed total sample size n, and the value of α and β as given in Tables 5 and 6.
Journal of Applied Statistics
17
Table 7. OSW, OSB and optimum values for simulated gamma distribution.
OSW (lh∗ )
OSB (x∗h = x∗h−1 + lh∗ )
Optimum Values of the Objective function L L 2 +μ φ (l ) = W σ hφ h=1 h h h=1 h hλ
2
l1∗ = 3.567177 l2∗ = 19.015760
x∗1 = 3.576069559
6.8430578649
3
l1∗ = 2.340317 l2∗ = 2.988140 l3∗ = 17.254480
x∗1 = 2.349209559 x∗2 = 5.337349559
4.7896729419
4
l1∗ = 1.785598 l2∗ = 1.904530 l3∗ = 2.891600 l4∗ = 16.001209
x∗1 = 1.785598 x∗2 = 3.690128 x∗3 = 6.581728
3.7238972083
5
l1∗ = 1.462217 l2∗ = 1.432660 l3∗ = 1.828050 l4∗ = 2.841120 l5∗ = 15.018890
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
No. of Strata L
x∗1 x∗2 x∗3 x∗4
= = = =
1.471109559 2.903769559 4.731819559 7.572939559
3.0783175113
Table 8. OSB for the different methods. Cum f method Geometric method
L-H (Kozak’s) method
L
OSB
Variance
OSB
Variance
OSB
Variance
OSB
Variance
2
3.4
6.852928111
0.45
11.33317374
3.55
6.843268706
3.576069559
6.8430578649
3
2.27 5.65
4.81495398
0.12 1.66
8.352624898
2.32 5.31
4.789991583
2.349209559 5.337349559
4.7896729419
4
2.27 4.53 7.91
3.853350243
0.06 0.45 3.18
6.427157331
1.76 3.64 6.49
3.724740266
1.785598 3.690128 6.581728
3.7238972083
5
1.14 2.27 4.53 7.91
3.20625449
0.04 0.20 0.98 4.71
5.617436052
1.55 3.03 4.93 7.84
3.082802484
1.471109559 2.903769559 4.731819559 7.572939559
3.0783175113
8.
Conclusion
Dynamic prog. technique
Stratified random sampling is an efficient and widely used sampling technique in economic and business surveys to estimate many parameters. Often, the surveyors encounter two major difficulties prior to drawing the sample while using the stratified sampling that (i) how they construct the optimum strata within which the units are homogeneous as much as possible and (ii) what would be the optimum size of the sample to be drawn from each stratum, so that the precisions of the estimates of parameters of the study or target variables are maximized. In this paper, a technique of stratified sampling is proposed to address these two problems, which can be used to estimate parameters more accurately. Moreover, the optimum stratification based on the study variable is not feasible in practice since it is unknown prior to conducting the survey. Thus, the proposed technique uses auxiliary information in designing the sampling plan.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
18
M.G.M. Khan et al.
The problem of finding the OSB is formulated as an MPP that seeks minimization of the variance of the estimated population parameter and solved using a dynamic programming technique. The OSB are then used to compute the optimum sample size for each stratum. Numerical examples using a real data set and a simulated data set are presented to illustrate the application and to show the computational details of the proposed technique. The results are presented together with the results of Dalenius and Hodges’ [11] cum f method, geometric method by Gunning and Horgan [14] and the generalized [28] method with Kozak’s algorithm for a comparative analysis. It is found that the construction of strata and determining the sample allocation using auxiliary variable of the populations with Gamma distribution, leads to substantial gains in the precision of the estimates while using the proposed technique. Among the other methods available in the literature for determining the OSB using the auxiliary variable, most of them such as Dalenius [9], Taga [48], Dalenius and Hodges [11], Singh and Sukhatme [43], Singh [41], Mehta et al. [31], Rizvi et al. [38] and Gupta et al. [15] are classical methods to obtain approximate stratum boundaries. Many authors, such as Unnithan [50], Lavallée and Hidiroglou [28], Sweet and Sigman [47] and Rivest [37] suggested iterative procedures. These iterative procedures require initial approximate solutions. Also, there is no guarantee that an iterative procedure will give the global minimum in the absence of a suitable approximate initial solution. The advantage of our proposed method is that it does not require any initial approximate solution. It is relatively more efficient and can be applied for skewed populations with improved efficiency. More importantly, the technique has a wide scope of application as compared to other methods as it requires only the values of auxiliary parameters and not the complete data set which is a mandatory requirement of other techniques. According to this method, a population under study is subdivided efficiently into homogeneous strata based on an easily available auxiliary variable which is regressible with the target variable. Practical implication of this stratification procedure is to reduce the variance of the estimate of the target variable. A disadvantage of the proposed method is that the algorithm using dynamic programming may become slower with increasing number of strata, especially when the range of distribution is large. However, with improved computing power of modern processors, it will be a thing of the past. Standardization of the auxiliary variable can also be tried to decrease the range. This method works on a single auxiliary variable but in reality, surveys involve multiple auxiliary variables. Developing methods for multiple auxiliary variables as well as for other skewed distributions are possibilities for future work. Acknowledgments The authors are grateful to the editor and the referees for their valuable comments and suggestions to improve the manuscript.
Disclosure statement No potential conflict of interest was reported by the authors.
References [1] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions, Dover, New York, 1972. [2] H. Aoyama, A study of the stratified random sampling, Ann. Inst. Stat. Math. 6 (1954), pp. 1–36. [3] S. Baillargeon and L.P. Rivest, The construction of stratified designs in R with the package stratification, Surv. Methodol. 37(1) (2011), pp. 53–65.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
Journal of Applied Statistics
19
[4] R.E. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957. [5] W. Bühler and T. Deutler, Optimal stratification and grouping by dynamic programming, Metrika 22 (1975), pp. 161–175. [6] A. Chakraborti and M. Patriarca, Gamma-distribution and wealth inequality, Pramana – J. Phys. 71(2) (2008), pp. 233–243. [7] W.G. Cochran, Sampling Techniques, 3rd ed., John Wiley & Sons Inc., New York, 1977. [8] T. Dalenius, The problem of optimum stratification-II, Skand. Aktuartidskr 33 (1950), pp. 203–213. [9] T. Dalenius, Sampling in Sweden, Almqvist & Wiksell, Stockholm, 1957. [10] T. Dalenius and M. Gurney, The problem of optimum stratification, Skand. Akt. 34 (1951), pp. 133–148. [11] T. Dalenius and J.L. Hodges, Minimum variance stratification, J. Amer. Statist. Assoc. 54 (1959), pp. 88–101. [12] R.E. Detlefsen and C.S. Veum, Design issues for the retail trade sample surveys of the U.S. Bureau of the Census, Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA, 1991, pp. 214–219. [13] G. Ekman, Approximate expressions for conditional mean and variance over small intervals of a continuous distribution, Ann. Inst. Stat. Math. 30 (1959), pp. 1131–1134. [14] P. Gunning and J.M. Horgan, A new algorithm for the construction of stratum boundaries in skewed populations, Surv. Methodol. 30(2) (2004), pp. 159–166. [15] R.K. Gupta, R. Singh, and P.K. Mahajan, Approximate opimumum strata boundaries for ratio and regression estimators, Aligarh J. Stat. 25 (2005), pp. 49–55. [16] M.H. Hansen and W.N. Hurwitz, On the theory of sampling from finite population, Ann. Math. Statist. 14 (1953), pp. 333–362. [17] M.A. Hidiroglou and K.P. Srinath, Problems associated with designing subannual business surveys, J. Bus. Econ. Stat. 11 (1993), pp. 397–405. [18] J.M. Horgan, Stratification of skewed populations: A review, Int. Statist. Rev. 74(1) (2006), pp. 67–76. [19] M.G.M. Khan, N. Ahmad, and S. Khan, Determining the optimum stratum boundaries using mathematical programming, J. Math. Model. Algorithms 8(4) (2009), pp. 409–423, doi:10.1007/s10852-009-9115-3. [20] E.A. Khan, M.G.M. Khan, and M.J. Ahsan, Optimum stratification: A mathematical programming approach, Culcutta Statist. Assoc. Bull. 52(special) (2002), pp. 205–208. [21] M.G.M. Khan, XXX Najmussehar, and M.J. Ahsan, Optimum stratification for exponential study variable under Neyman allocation, J. Indian Soc. Agric. Stat. 59(2) (2005), pp. 146–150. [22] M.G.M. Khan, N. Nand, and N. Ahmad, Determining the optimum strata boundary points using dynamic programming, Surv. Methodol. 34(2) (2008), pp. 205–214. [23] M. Kozak, Optimal stratification using random search method in agricultural surveys, Stat. Transition 6(5) (2004), pp. 797–806. [24] M. Kozak and M.R. Verma, Geometric versus optimisation approach to stratification: A comparison of efficiency, Surv. Methodol. 32(2) (2006), pp. 157–163. [25] M. Kozak, M.R. Verma, and A. Zieli´nski, Modern approach to optimum stratification: Review and perspectives, Stat. Transit. – new Ser. 8(2) (2007), pp. 223–248. [26] P. Lavallée, Some contributions to optimal stratification, Master thesis, Carleton University, Ottawa, Canada, 1987. [27] P. Lavallée, Two-way optimal stratification using dynamic programming, Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, VA, 1988, pp. 646–651. [28] P. Lavallée and M. Hidiroglou, On the stratification of skewed populations, Surv. Methodol. 14 (1988), pp. 33–43. [29] B. Lednicki and R. Wieczorkowski, Optimal stratification and sample allocation between subpopulations and strata, Stat. Transit. 6 (2003), pp. 287–306. [30] P.C. Mahalanobis, Some aspects of the design of sample surveys, Sankhya 12 (1952), pp. 1–7. [31] S.K. Mehta, R. Singh, and L. Kishore, On optimum stratification for allocation proportional to strata totals, J. Indian Statist. Assoc. 34 (1996), pp. 9–19. [32] N. Nand and M.G.M. Khan, Optimum stratification for cauchy and power type study variables, J. Appl. Statist. Sci. 16(4) (2009), pp. 453–462. [33] J.A. Nelder and R. Mead, A simplex method for function minimization, Comput. J. 7 (1965), pp. 308–313. [34] J. Neyman, On the two different aspects of the representative method: The method stratified sampling and the method of purposive selection, J. R. Stat. Soc. 97 (1934), pp. 558–606. [35] G. Nicolini, A Method to Define Strata Boundaries, Departmental Working Papers 2001-01, Department of Economics, University of Milan, Italy, 2001. Available at www.economia.unimi.it/pubb/wp83.pdf. [36] W. Niemiro, Konstrukcja optymalnej stratyfikacja metoda poszukiwan losowych (Optimal stratification using random search method), Wiadomosci Statystyczne 10 (1999), pp. 1–9. [37] L.P. Rivest, A generalization of Lavallé and Hidiroglou algorithm for stratification in business survey, Surv. Methodol. 28 (2002), pp. 191–198.
Downloaded by [University of the South Pacific] at 14:06 11 March 2015
20
M.G.M. Khan et al.
[38] S.E.H. Rizvi, J.P. Gupta, and M. Bhargava, Optimum stratification based on auxiliary variable for compromise allocation, Metron 28(1) (2002), pp. 201–215. [39] R.J. Serfling, Approximately optimal stratification, J. Amer. Statist. Assoc. 63 (1968), pp. 1298–1309. [40] V.K. Sethi, A note on optimum stratification of populations for estimating the population means, Aust. J. Statist. 5 (1963), pp. 20–33. [41] R. Singh, Approximately optimum stratification on the auxiliary variable, J. Amer. Statist. Assoc. 66 (1971), pp. 829–833. [42] R. Singh and D. Parkash, Opimum stratification for equal allocation, Ann. Inst. Statist. Math. 27 (1975), pp. 273– 280. [43] R. Singh and B.V. Sukhatme, Optimum stratification, Ann. Inst. Statist, Math. 21(3) (1969), pp. 515–528. [44] R. Singh and B.V. Sukhatme, Optimum stratification in sampling with varying probabilities, Ann. Inst. Statist. Math. 24 (1972), pp. 485–494. [45] R. Singh and B.V. Sukhatme, Optimum stratification with ratio and regression methods of estimation, Ann. Inst. Statist. Math. 25 (1973), pp. 627–633. [46] E.W. Stacy, A generalization of the gamma distribution, Ann. Math. Stat. 33(3) (1962), pp. 1187–1192. [47] E.M. Sweet and R.S. Sigman, Evaluation of model-assisted procedures for stratifying skewed populations using auxiliary data, Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA, 1995, pp. 491–496. [48] Y. Taga, On optimum stratification for the objective variable based on concomitant variables using prior information, Ann. Inst. Statist. Math. 19 (1967), pp. 101–129. [49] H.A. Taha, Operations Research: An Introduction, 8th ed., Pearson Education, Inc., Upper Saddle River, NJ, 2007. [50] V.K.G. Unnithan, The minimum variance boundary points of stratification, Sankhya 40(C) (1978), pp. 60–72.