Int. J. Agricult. Stat. Sci. Vol. 13, No. 1, pp. 265-271, 2017
ISSN : 0973-1903
ORIGINAL ARTICLE
A MEDIAN BASED REGRESSION TYPE ESTIMATOR OF THE FINITE POPULATION MEAN S. K. Yadav1, Lakhan Singh2*, S. S. Mishra1, Prem Prakash Mishra3 and Surendra Kumar4 1
Department of Mathematics and Statistics, Dr. RML Avadh University, Faizabad - 224 001, India. 2 Department of Statistics, H. N. B. G. University, Garhwal, Srinagar - 264 174, India. 3 Department of Mathematics, National Institute of Technology, Chumukedima, Nagaland - 797 103, India. 4 Department of Mathematics, Govt. Degree College, Pihani, Hardoi - 241 406, India. E-mail:
[email protected] Abstract : In the present study, a regression type estimator of finite population mean of the study variable has been proposed using population median of the study variable. The expressions for the bias and mean squared error of the proposed estimator have been derived up to the first order of approximations. The minimum value of the mean squared error has also been obtained for the proposed estimator. A theoretical comparison has been made with the mean per unit estimator, usual ratio, usual regression estimator [Bahl and Tuteja (1991), Kadilar (2016) and Subramani (2016)] estimators. The different conditions under which the proposed estimator performs better than other estimators have been given. Through the numerical example, the theoretical findings have been judged of the proposed and other estimators. It is seen that the proposed estimator performs better than other existing estimators. Key words : Bias, Ratio estimator, Mean squared error, Simple random sampling, Efficiency.
1. Introduction Sampling is a good alternative of complete enumeration whenever the population is very large and it is very costly and time taking to take observations on every unit of the population. The most appropriate estimator of any population parameter is the corresponding statistic. Thus, the most suitable estimator for the population mean is the sample mean. We wish that the estimator must have all the desirable properties such as unbiasedness, minimum variance, most efficient etc. Although, sample mean is unbiased estimator of population mean but its sampling distribution is not closely scattered around the true population mean. Thus, it has a reasonably large amount of variance. Our aim is to search for such estimator, may be biased but its sampling distribution should be very close to true value of the parameter, meaning that it should have minimum mean squared error. This problem is solved by the use of auxiliary variable, which is highly positively or negatively correlated with the study variable. This auxiliary information is collected at additional cost of the survey. It would be better and economic, if we have *Author for correspondence
Received January 10, 2017
information on some parameter of the main variable in addition and by the use of this information, if estimation is improved, then this will be the better thing in the field of sampling as it does not increase the cost of the survey and the estimation is improved in addition. The additional information on population median of study variable, which easily available many times has been utilized in the present manuscript. There are various situations where this additional information is easily available such as in the surveys involving the estimation of average income, average marks etc, it is very reasonable to assume that the population mean is unknown whereas the population median is known. Let the finite population under consideration consists of N distinct and identifiable units and let (xi , yi ), i = 1, 2, ..., n, be a bivariate sample of size n taken from (X, Y) using a SRSWOR scheme. Let X and Y respectively be the population means of the auxiliary and the study variables and let x and y be the corresponding sample means. It is well established that in simple random sampling scheme, sample means and Revised March 28, 2017
Accepted April 17, 2017
S. K. Yadav et al.
266
are unbiased estimators of population means of X and Y , respectively. Population mean is one of the very important measures of central tendency in almost all fields of society including field of Medical sciences, Biological sciences, Agriculture, Industry, Social sciences, Humanities etc. Thus, the estimation of population mean is of great significance in above fields. The following four examples are of interest given by Subramani (2016) for the estimation of the population mean which make use of information on the population median of the study variable. Example 1.1 : In an Indian University, 5000 students entered for the University examination. The results are given below. The problem is to estimate the average marks scored by the students (population mean). Here, it is reasonable to assume that the median of the marks is known since we have the following information. Table 1 : Results of the University Examination. Passed with
Percentage of marks
Number of students
Cumulative total
Distinction
75-100
850
850
First Class
60-75
3100
3950
Second Class 50-60
600
4550
Failed
450
5000
5000
5000
0-50 Total
The median value will be between 60 and 75. Approximately one can assume the population median value as 67.5.
Example 1.2 : In an Indian University 800 faculty members are working in different categories and the basic salary drawn by different categories of the faculty members are given in Table 2. The problem is to estimate the average salary drawn by the faculty members (population mean) per month. Here, it is reasonable to assume that the median of the salary is known based on the information given in Table 2. Example 1.3 : In the estimation of body mass index (BMI) of the 350 patients of a Hospital, it is reasonable to assume that the population median of the BMI is known based on the information given in Table 3. Example 1.4 : In the problem of estimating the blood pressure of the 202 patients of a hospital, it is reasonable to assume that the median of the blood pressure is known based on the information available in Table 4.
2. Review of Estimation of Population Mean The most suitable estimator of population mean Y is the corresponding sample mean y of the study variable Y given by n
t0 y
1 yi n i 1
(1)
Sample mean is an unbiased estimator of population mean and its variance up to the first order of approximation is
V t0
1 f 2 1 f 2 2 Sy Y Cy n n
(2)
Table 2 : Salary of University faculty members. Category
Basic salary in Indian Rupees (IRs) Per month*
Number of faculty members
Cumulative total
Senior Professor
56000+10000**
20
20
Professor - Grade I
43000+10000
40
60
Professor - Grade II
37400+10000
60
120
Associate Professor - Grade I
37400+10000
80
200
Associate Professor - Grade II
37400+9000
100
300
Assistant Professor - Grade I
15100+8000
110
410
Assistant Professor - Grade II
15100+7000
140
550
Assistant Professor - Grade III
15100+6000
250
800
Total
800
800
*Actual salary depends on their experience in their designation and other allowances. **The basic salary is the sum of the basic (the first value) and the academic grade pay (the second value), which will differentiate people with same designation but different grades. The population median value will be assumed as IRs. 15100+8000 = IRs. 23100.
A Median based Regression Type Estimator of the Finite Population Mean
267
Table 3 : Body mass index of 350 patients of a hospital. Category
BMI range – kg/m2
Number of patients
Cumulative total
Very severely underweight
less than 15
15
15
Severely underweight
from 15.0 to 16.0
35
50
Underweight
from 16.0 to 18.5
67
117
Normal (healthy weight)
from 18.5 to 25
92
209
Overweight
from 25 to 30
47
256
Obese Class I (Moderately obese)
from 30 to 35
52
308
Obese Class II (Severely obese)
from 35 to 40
27
335
Obese Class III (Very severely obese)
over 40
15
350
350
350
Total
The median value will be between 18.5 and 25. Approximately one can assume the population median of the BMI value as 21.75. Table 4 : Blood pressure of 202 patients of a hospital. Category Hypotension Desired Pre-hypertension Stage 1 Hypertension Stage 2 Hypertension Hypertensive Emergency
Systolic, mmHg < 90 90–119 120–139 140–159 160–179 180 Total
Number of patients 10 112 40 20 13 7 202
Cumulative no. of patients 10 122 162 182 195 202 202
The median value will be between 90 and 119. Approximately one can assume the population median value as 104.5.
where,
Cy
where,
Sy Y
2
, Sy
2 1 N Yi Y , f Nn . N 1 i 1
Cx
Cochran (1940) utilized the positively correlated auxiliary variable with the study variable and proposed the traditional ratio estimator as
t1 y
X x
(3)
Above estimator is a biased estimator of population mean and its bias and mean squared error, up to the first order of approximation respectively are given by
B t1
1 f Y C x2 C yx n
MSE t1
1 f 2 2 Y C y C x2 2C yx n
Sx X
2
, Sx
yx
1 N 1
(4)
2
X i X i 1
Cov x , y , SxS y
Cov x , y
1 N 1
N
Y Y X i
i
X ,
i 1
Cyx = yxCyCx. The traditional linear regression estimator of population mean is given by
t 2 y yx X x
N
(5)
where, yx is the regression coefficient of the line Y on X. It is an unbiased estimator of population mean and
S. K. Yadav et al.
268
its variance up to the first order of approximation is given by
V t 2
1 f 2 2 Y C y 1 2yx n
(6)
Bahl and Tuteja (1991) proposed the following exponential ratio type estimator of population mean using additional information on population mean of the auxiliary variable as
X x t3 y exp X x
(7)
The bias and the mean squared error of the above estimator up to the first order of approximation is given by
B t 3
1 f Y 3C x2 4C yx 8n
MSE t3
1 f 2 2 C x2 Y C y C yx n 4
(8)
(9)
The bias and the mean squared error of the above estimator up to the first order of approximation respectively are
Bt 4
1 f 1 3 2 1 Y C x C yx n 8 2 2
MSEt4
1 f 2 2 2 1 2 Y C y C x 2 1 C yx (10) n 8
The optimum value of the characterizing scalar
The minimum value of the mean squared error of the estimator t4 is
1 f 2 2 Y C y 1 2yx n
(12)
The bias and the mean squared error of the above estimator, up to the first order of approximation respectively are
B t5
1 f 2 Biasm Y Cm C ym n M 1 f 2 2 Y C y R52Cm2 2 R5C ym n
Y S 1 , C m m , S m2 N where, R5 M M Cn N
S ym
1 N
Cn
Cn
y
i
Y
NC
(13)
n
m M , 2
i
i 1
m i M , C ym
i 1
S ym YM
.
For detailed study of the modified ratio type estimators of population mean of the study variable, the latest references can be made of Subramani (2013), Subramani and Kumarapandiyan (2012a,b,c, 2013a,b), Tailor and Sharma (2009), Yadav and Pandey (2011), Yadav and Adewara (2013), Yadav et al. (2014, 2015), Yadav et al. (2016a, 2016b, 2016c, 2016d), Abid et al. (2016). Using the information on population median of study variable, we have proposed the following regression type estimator of population mean of study variable as
t p y ym M m
(14)
where, ym is the regression coefficient of the line y on m and is to be estimated such that variance of tp is minimum.
1 yxC y / C x 2
MSEmin t 4
M t5 y m
3. Proposed Estimator
is
opt
Subramani (2016) utilized the additional information on population median of study variable and proposed the following ratio estimator of population mean of the study as
MSE t5
Kadilar (2016) proposed the following modified exponential type estimator of population mean using the auxiliary variable as
X x x t 4 y exp X X x
which is equal to the variance of the usual regression estimator.
(11)
Here, tp is an approximately unbiased estimator of population mean Y for known ym. Now,
2
V t p V y ymV m 2 ym Cov y , m
(15)
A Median based Regression Type Estimator of the Finite Population Mean
269
Table 5 : Parameter values and constants computed from three populations. For sample size
Parameters
For sample size
Popln-1
Popln-2
Popln-3
Popln-1
Popln-2
Popln-3
N
34
34
20
34
34
20
n
3
3
3
5
5
5
Cn
5984
5984
1140
278256
278256
15504
Y
856.4118
856.4118
41.5
856.4118
856.4118
41.5
M M
747.7223
747.7223
40.2351
736.9811
736.9811
40.0552
767.5
767.5
40.5
767.5
767.5
40.5
X
R1
208.8824 4.0999
199.4412 4.2941
441.95 0.0939
208.8824 4.0999
199.4412 4.2941
441.95 0.0939
R5
N
1.1158
1.1158
1.0247
1.1158
1.1158
1.0247
2 y
0.222726
0.222726
0.01575
0.125014
0.125014
0.008338
C x2
0.157785
0.172408
0.014818
0.088563
0.096771
0.007845
C m2
0.172341
0.172341
0.015931
0.100833
0.100833
0.006606
Cym
0.137284
0.137284
0.012549
0.07314
0.07314
0.005394
Cyx
0.084194
0.087264
0.009964
0.047257
0.048981
0.005275
y x
0.4491
0.4453
0.6522
0.4491
0.4453
0.6522
C
Table 6 : Bias of the existing and proposed estimators. For sample size
Estimator
For sample size
Popln-1
Popln-2
Popln-3
Popln-1
Popln-2
Popln-3
t1
63.0241
72.9186
0.2015
35.3748
40.9285
0.1067
t3
4.4436
5.4714
0.0068
1.39995
1.7238
0.0019
t5
52.0924
52.0924
0.4118
57.7705
57.7705
0.5061
tp
31.2036
31.2036
0.3616
43.3777
43.3777
0.6018
ym is estimated by minimizing Equation (15). So
ym
V t p ym
with other mentioned existing estimators have been made under this Section and the conditions under which it performs better than other estimators are given.
0 gives the estimate of ym as
Cov y , m V m
From Equations (17) and (2), we have (16)
Substituting the value of ym in (15), we get the minimum value of V(tp) as
2 Min.V t p V y 1 ym
where, ym
(17)
Cov y , m V y .V m
4. Efficiency Comparison A theoretical comparison of proposed estimator
V(t0) – MSEmin(tp) > 0, if 2ym 0 , If above condition is satisfied, proposed estimator is better than the usual mean per unit estimator of population mean. From Equations (17) and (4), we have MSE(t1) – MSEmin(tp) > 0, if
R12C x2 2 R1C yx 2ym 0 , or R12 C x2 2ym 2 R1C yx If above condition is fulfilled, proposed estimator performs better than the usual ratio estimator of Cochran (1940). From Equations (17) and (6), we have
S. K. Yadav et al.
270
Table 7 : Variance / Mean squared error of the existing and proposed estimators. For sample size
Estimator
For sample size
Popln-1
Popln-2
Popln-3
Popln-1
Popln-2
Popln-3
t0
163356.40
163356.40
27.12
91690.37
91690.37
14.36
t1
155579.70
161801.63
18.32
87325.38
90817.69
9.70
t2
39633.17
39801.98
4.41
12486.31
12539.49
1.23
t3
39672.88
39803.44
4.63
12498.81
12539.95
1.29
t4
39633.17
39801.98
4.41
12486.31
12539.49
1.23
t5
88379.06
88379.06
11.33
58356.92
58356.92
7.15
tp
25270.68
25270.68
2.86
9003.44
9003.44
1.02
Table 8 : PRE of the proposed estimator tp with respect to existing estimators. For sample size n = 3
Parameters
For sample size n = 5
Popln-1
Popln-2
Popln-3
Popln-1
Popln-2
Popln-3
t0
646.4266
615.6530
156.8346
156.9917
156.8346
349.7296
t1
646.4266
640.2741
157.5026
157.5084
157.5026
349.7296
t2
948.2517
640.5594
154.1958
161.8881
154.1958
396.1538
t3
1018.3930
969.9113
138.6838
138.8226
138.6838
648.1625
t4
1018.3930
1008.7000
139.2744
139.2795
139.2744
648.1625
t5
1407.8430
950.9804
120.5882
126.4706
120.5882
700.9804
MSE(t2) – MSEmin(tp) > 0, if
MSE(t5) – MSEmin(tp) > 0, if
2ym 2yx 0
R52 Cm2 2 R5 C ym 2ym 0 ,
The proposed estimator is better than the usual regression estimator, if the above condition is satisfied. From Equations (17) and (8), we have MSE(t3) – MSEmin(tp) > 0, if
C x2 C yx 2ym 0, or 4
R52 Cm2 2ym 2 R5C ym If above condition is satisfied, proposed estimator is better than the Subramani (2016) estimator of population mean using information on median of the study variable.
5. Numerical Study
MSE(t4) – MSEmin(tp) > 0, if
To judge the performances of the proposed and the existing estimation, we have considered the population given in Subramani (2016). Tables 5, 6, 7 and 8 represent the parameter values along with constants, biases of various estimators along with proposed estimator, variances and mean squared errors of existing and proposed estimator and percentage relative efficiencies (PRE) of the proposed estimator over other existing estimators respectively.
2ym 2yx 0
6. Results and Discussion
C x2 2ym C yx 4 If above condition proposed estimator performs better than the Bahl and Tuteja (1991) estimator of population mean. From Equations (17) and (11), we have
The proposed estimator is better than the Kadilar (2016) estimator, if the above condition is satisfied. From Equations (17) and (13), we have
The present paper deals with the estimation of population mean using information on the population median of the study variable. The regression type estimator of population mean of study variable using
A Median based Regression Type Estimator of the Finite Population Mean
population median of study variable is proposed. The expressions for the bias and mean squared error of the proposed estimator have been obtained up to the first order of approximation. The minimum value of the mean squared error of the proposed estimator is also obtained. The proposed estimator is compared with the existing estimators and the conditions under which the proposed estimator performs better than other existing estimators have been given. The numerical study is also carried out to judge the performances of various estimators. From Table 5, it is verified that the proposed estimator has minimum mean squared error among other existing estimators of population mean. The proposed estimator is better than the estimators of Cochran (1940), Bahl and Tuteja (1991) estimator, usual regression estimator, Kadilar (2016) and Subramani (2016). The proposed estimator may be used for the improved estimation of population mean under simple random sampling scheme.
Acknowledgements The authors are very much thankful to anonymous referees for critically examining the manuscripts and giving suggestions which improved the earlier draft.
271
Subramani, J. and G. Kumarapandiyan (2012c). Estimation of population mean using known median and coefficient of skewness. American Journal of Mathematics and Statistics, 2(5), 101–107. Subramani, J. and G. Kumarapandiyan (2013a). Estimation of population mean using deciles of an auxiliary variable. Statistics in Transition-New Series, 14(1), 75–88. Subramani, J. and G. Kumarapandiyan (2013b). A new modified ratio estimator of population mean when median of the auxiliary variable is known. Pakistan Journal of Statistics and Operation Research, 9(2), 137–145. Subramani, J. (2016). A new median based ratio estimator for estimation of the finite population mean. Statistics in Transition New Series, 17, 4, 1-14. Tailor, R. and B. Sharma (2009). A modified ratio-cum-product estimator of finite population mean using known coefficient of variation and coefficient of kurtosis. Statistics in Transition-New Series, 10(1), 15–24. Yadav, S. K. and H. Pandey (2011). Improved Exponential Estimators of Population Mean Using Qualitative Auxiliary Information under Two Phase Sampling. Investigations in Mathematical Sciences, 1, 85-94.
References
Yadav, S. K. and A. A. Adewara (2013). On Improved Estimation of Population Mean using Qualitative Auxiliary Information. Mathematical Theory and Modeling, 3, 11, 42-50.
Abid, M., N. Abbas, R. A. K. Sherwani and H. Z. Nazir (2016). Improved Ratio Estimators for the population mean using non-conventional measure of dispersion. Pakistan Journal of Statistics and Operations Research, XII(2), 353-367.
Yadav, S. K., S. S. Mishra and A. K. Shukla (2014). Improved Ratio Estimators for Population Mean Based on Median Using Linear Combination of Population Mean and Median of an Auxiliary Variable. American Journal of Operational Research, 4, 2, 21-27.
Bahl, S. and R. K. Tuteja (1991). Ratio and product type exponential estimator. Information and Optimization Sciences, XII(I), 159-163.
Yadav, S. K., S. S. Mishra and A. K. Shukla (2015). Estimation Approach to Ratio of Two Inventory Population Means in Stratified Random Sampling. American Journal of Operational Research, 5, 4, 96-101.
Cochran, W. G. (1940). The Estimation of the Yields of the Cereal Experiments by Sampling for the Ratio of Grain to Total Produce. The Journal of Agric. Science, 30, 262275. Kadilar, G. O. (2016). A new exponential type estimator for the population mean in simple random sampling. Journal of Modern Applied Statistical Methods, 15(2), 207-214. Subramani, J. (2013). Generalized modified ratio estimator of finite population mean. Journal of Modern Applied Statistical Methods, 12(2), 121–155.
Yadav, S. K., S. S. Mishra, A. K. Shukla, S. Kumar and R. S. Singh (2016a). Use of Non-Conventional Measures of Dispersion for Improved Estimation of Population Mean. American Journal of Operational Research, 6, 3, 6975. Yadav, S. K., S. A. T. Gupta, S. S. Mishra and A. K. Shukla (2016 b). Modified Ratio and Product Estimators for Estimating Population Mean in Two-Phase Sampling. American Journal of Operational Research, 6, 3, 61-68.
Subramani, J., and G. Kumarapandiyan (2012a). Estimation of population mean using coefficient of variation and median of an auxiliary variable. International Journal of Probability and Statistics, 1(4), 111–118.
Yadav, S. K., J. Subramani, S. S. Mishra and A. K. Shukla (2016c). Improved Ratio-Cum-Product Estimators of Population Mean Using Known Population Parameters of Auxiliary Variables. American Journal of Operational Research, 6, 2, 48-54.
Subramani, J. and G. Kumarapandiyan (2012b). Modified ratio estimators using known median and coefficient of kurtosis. American Journal of Mathematics and Statistics, 2(4), 95–100.
Yadav, S. K., S. Misra, S. S. Mishra and N. Chutiman (2016d). Improved ratio estimators of population mean in Adaptive Cluster Sampling. Journal of Statistics Applications and Probability Letter, 3, 1, 1-6.