IJMS, Vol. 11, No. 3-4, (July-December 2012), pp. 245-254
© Serials Publications ISSN: 0972-754X
A BAYESIAN APPROACH IN STRUCTURAL EQUATION MODELING FOR HUMAN FERTILITY IN MANIPUR Dilip C. Nath, Atanu Bhattacharjee & H. Brojeshwor Singh
Abstract: The Structural Equation Modeling (SEM) is useful to deal with unobserved heterogeneity in the data. This paper is contributed with application of SEM through Bayesian approach to deal with unobserved heterogeneity effect in the fertility level of women residing in Manipur. It has been found that the high educational level is contributed for low fertility as compared to the low educational level. Also low income level has been observed as the contributing factor for the high fertility. The findings may be baseline information for future policy implication. Keyords: MCMC, Heterogeneity.
1. INTRODUCTION The son preference is well documented and affected by declined sex ratio in India. Less well researched are the underlying to explore the effect of education and family status of women and its implications for next child sex preferences. Bardhan (1982), Das Gupta (1987) and Nair (1996) have concluded that couples preferred sons to provide nutritional food and proper health care to their property than daughters; this practice may lead the declining sex ratio. The desire number of child particularly the desire sex of next child can be influenced by family background of the couples. It is very much difficult to directly catch the effect of “family status” and “family attitude” on desire number of child’s by parents. Both the variables can be measured as latent variable. In socio-demographic research usually two type of variable occurs i.e. observed and latent variable. The latent variable can only be obtained through the observed variable. It is very much important to establish the relationship between observed variable and latent variables. In this scenario the application of Structural Equation Modeling (SEM) becomes useful to serve the purpose (Lee,2007). There are different tools to perform the SEM with specific situation, for instances Fletcher-Powell (FP) algorithm by Jöreskog and Yang, (1996), LISREL programme by Joreskog and Sorbom (1996) and EQS6 algorithm by Bentler and Wu (2002). Adequate work has not been done yet on the desire number of child’s for women in Manipur despite its direct effect on socio economic condition, education and family status. 1st International Conference on Mathematics and Mathematical Sciences (ICMMS), 7 July 2012.
246
Dilip C. Nath, Atanu Bhattacharjee & H. Brojeshwor Singh
This may perhaps be due to the lack of useful data or the lack of interest among researchers. The aim of this paper is to investigate the effect of mother’s education, family status and family attitude to the desire number of child. Thus effects are studied in respect to different districts of the state by the retrospective reporting data. A SEM technique has been applied through Bayesian approach to identify the effects on the desire sex and number of children. The summary measures such as mean and median have been calculated. This work is contributed to explore the relation between desire numbers of children through the latent data in 1296 mothers in Manipur, all of whom had ever married. The different models and approach on sample value on the parameter of interest have been applied and compared with frequency approach. 2. THE DATA COLLECTION METHODS For this study, data has been taken from the survey entitled “Effect of sex preference on fertility differential: A case study of rural Manipur valley” conducted during April 2009 to Dec 2009. Natural fertility is assumed to be existed in the study population. The study subjects are eligible married women during their reproductive span. A pre-tested and semi-structural type of interview schedule is utilized as the tool for collecting the required information from the eligible women. As there is no sampling frame (list of the eligible women) available, cluster sampling is adopted as sampling technique. The personal interview method is adopted for data collection as survey procedure. Firstly, the respondents are indentified and convinced the needs and importance of the survey. So, they realize the significance of their co-operation to draw some valid information which would help the policy makers and programme executors to formulate the future health planning effectively for the state. After establishing a good rapport, the factual information are collected from the respondents. 3. BAYESIAN APPROACH Suppose the researcher has formulated conclusion based on well established SEM. The researcher however, suspect that the infinite unobservable moderating factor accounts for heterogeneity. Joreskog (1971) and Sorborm (1974) have preferred to use the prior assumption of the sub-group. However, the general tendency is to use the cluster analysis in absence of the prior assumption. In addition, the measurement of error is also difficult to perform in SEM. To obtain the best fitted model the usual technique is to use the data reduction techniques which are not robust. In this problem, Jedidi et al., (1997) have discussed the methodology for detection and treating unobserved heterogeneity in SEM. The Bayesian approach provides several advantages by markov chain monte carlo (MCMC) in-place of conventional maximum likelihood estimation(MLE). Robert and
A Bayesian Approach in Structural Equation Modeling for Human Fertility in Manipur
247
Casella (1999) have explained how to obtain sample for the parameters through MCMC. This paper observes the fertility status among the women in Manipur using the primary data on the ever married women. To obtain the relation between latent and response variable, SEM has been applied through the Bayesian approach in the cross sectional data. The principle of model comparison is not to determine a ‘accurate’ model but to infer from the model, given a set of reasonable choices, is most ‘useful’ i.e., stand for an optimal equilibrium between accuracy and complexity. In other words, Bayesian model inference has nothing to say about ‘accurate’ models. All that it grants an inference about which is more to be expected in a given data set. The Bayesian approach gives consistent results compared to frequency approach (Wong et al., 1985 and West et al., 1985). In Bayesian approach, the inference can be obtained through the MCMC, rather in frequency approach. 4. MODEL ASSUMPTION The desired sex and number of child have been examined as the influences of the parent’s education level. Here, the simple regression modeling has been applied with the Bayesian approach. The models are like, Model 1: Desire Number of Sons by Wife = �0+ �1* Educational Qualification of Wife
(1)
Model 2: Desire Number of Sons by Husband = �0+ �1* Educational Qualification of Husband (2) Model 3: Desire Number of Daughter by Wife = �0+ �1* Educational Qualification of Wife (3) Model 4: Desire Number of Daughters by Husband = � 0+ � 1* Educational Qualification of Husband (4) The SEM approach has been further applied on latent variable by, Fs.i = �1 Si, 1 H1i + �2 Si, 2 H2i + eSi, i,
when
eki ~ N (0, 0.5).
(5)
The unobserved affect F (number of child) has been assumed to be dependent on the two latent variables namely “Family Status” and “Family Attitude”. Both, the latent variable have been measured under the observations of independent variable “Education Qualification”. The variable “Education Qualification” has been separated into “Education Qualification of Husband” and “Education Qualification of Wife”. There are k = 2 unobserved effect of “family status” and “family attitude” of equal size �k = 0.5, the first
248
Dilip C. Nath, Atanu Bhattacharjee & H. Brojeshwor Singh
being “having living children” (with latent group indicator Si = 1) and secondly “desire no more child” (Si = 2). The “family status” and “family attitude” have positive impact on affect for the first group but the negative impacts on the second group. The family status has been coded by “1” for nuclear family & “2" for joint family. The family behaviours has been coded by “1” for higher income family, “2” for middle income family & “3" lower income family. Where, H1i and H2i represents the “having living children” and “desire no more child” as the representative of the latent variable “family status” and “family attitude” respectively. The model in equation (5) has been extended to, x1i = �x1 + �x11 H1i + �x12 H2i + w1i,
(6)
x2i = �x1 + �x21 H1i + �x22 H2i + w2i,
(7)
x3i = �x1 + �x31 H1i + �x32 H2i + w3i,
(8)
x4i = �x4 + �x41 H1i + �x42 H2i + w4i,
(9)
It has been assumed that the error term wpi ~ N (0.0.5) and the intercept are �x1 = �x2 = �x3 = �x4 = 0 and �x11 = �x21 = �x31 = �x41 = 1. The terms (H1i, Hi2) have been assumed to follow normal distribution with mean zero. The response variable has been obtained through Y1i (desire number of sons) and Y2i (desire number of daughters) by,
where,
Y1i = �y1 + �y1 Fi + �1i,
(10)
Y2i = �y2 + �y2 Fi + �2i,
(11)
�y1 = �y2 = 0; �y11 = �y21 = 1;
and
upi ~ N (0, 0.5).
(12)
The model has been obtained by identification. Fi = �1 Hi1 + �2 H2i + ei, and var (ei) = 1 for � with prior assumption N (0, 1). The coefficient � has been assumed to follow N (0, 1) distribution. 4.1 Analyzing Data from a Cross Sectional Study The response of interest is the number of child of a woman. The higher number of child may be result of the low education and low literacy of the mothers and her family. A woman with good education, moderate literacy may expect to be with at the most two children. The models are given by, Fs.i = (Coefficient) * Si1 Hi1+( Coefficient) * Si2 Hi1 + eSi, i, x1i = (Intercept = �x.1) + (Coefficient = �1, 1) * H1i + (Coefficient = �1, 2) * H2i + w1i,
(13) (14)
A Bayesian Approach in Structural Equation Modeling for Human Fertility in Manipur
249
The posterior means, standard deviations and 95% highest probability density (HPD) interval estimates have been generated from the Model are given in Table 1. Table 1 Descriptive Statistics of Education Level, Age and Desire Number of Son and Daughter in Manipur Women Education
Age at marriage
Husband Iliterate
Wife
Husband 55 (4.2)
Wife
37 (2.9)
226 (17.4)
Below 20
Literate but below matric
152 (11.7)
239 (18.4)
20-25
362 (27.9)
341 (26.3) 475 (36.7)
10 th
359 (27.7)
352 (27.2)
25-30
461 (35.6)
330 (25.5)
Intermediate
290 (22.4)
253 (19.5)
30-35
289 (22.3)
110 (8.5)
Graduate
395 (30.5)
199 (15.4)
35 & above
129 (10.0)
40 (3.1)
63 (4.9)
27 (2.1)
Post Graduate & above
—
Desire number of sons
—
—
Desire number of daughters
1
322 (24.8)
322 (24.8)
1
627 (48.4)
639 (49.3)
2
625 (48.2)
640 (49.4)
2
545 (42.1)
544 (42.0)
3
277 (21.4)
271 (20.9)
3
101 (7.8)
90 (6.9)
4
61 (4.7)
53 (4.1)
4
15 (1.2)
15 (1.2)
5
10 (0.8)
9 (0.7)
5
6 (0.5)
6 (0.5)
6
1 (0.1)
1 (0.1)
6
2 (0.2)
2 (0.2)
5. DESCRIPTIVE STATISTICS Table 1 shows the distribution of study subjects according to education, age at marriage, desire number of sons and desire number of daughters. The result shows most of wife (475) 36.7% are in the age group of 25-30 years at the time of marriage followed by husbands (461) 35.6% are in age group of 25-30 years. Here, 395 (30.5%) and 199 (15.4 %) of the husbands and wife’s educational level have been found educated up to graduate level. Although the percentage 359 (27.2%) of educational level up to10 th level was high in wife. From the same community, 352 (27.7%) of husband has been found educated up to 10 th level. In this work the maximum number 475 (36.7%) of wife has been found of the age at marriage between 20 to 25 years. The result shows maximum husbands (622) 48.2% desires of having two sons followed by (322) 24.8% for one sons, while the (627) 48.4% desires only one daughter followed by 545 (42.1%) of having two daughters. It reveals the same information in case of wife also. The maximum of 640 (49.4%) and 639 (49.3%) wife’s have desired of having two sons and one daughter in their life respectively. It can be confirmed that both (husband and wife) are asking more number of sons in comparison to daughters.
250
Dilip C. Nath, Atanu Bhattacharjee & H. Brojeshwor Singh
6. MODEL SELECTION In order to select the sample of two independent chains of 20,000 iterations, each run has been obtained to a burn-in period of 5000 iterations to allow the normal proposal distribution to finish the adapting. The chains are appeared to converge well before the end of the burn-in period. The posterior estimates of the regression parameters (from a two-chain run of 5000 iterations with 1000 burn-in) are not same. Model in equation (5) has been extended to, Fs.i = 0.06 * Si, 1 Hi1 – 0.05 * Si, 2 Hi1 + eSi, i,
(15)
x1i = – 0.05 + 1.02 * H1i + 1.12 * H2i + w1i,
(16)
In Model the coefficients �x.1 and �x.2 have means (HPD interval) of – 0.05 (– 0.16, 0.02) and – 0.03 (– 0.10, 0.07) respectively. The posterior mean of coefficient r is 0.34 with 95% HPD interval (– 0.56, 0.38). The two chain of posterior means (Highest Posterior Density) have been completed by �1, 1 and �1, 2 at 1.02 (0.95, 1.17), 1.12 (0.89, 1.16) with a 95% HPD interval on wpi value by 0.03 (0.02, 0.04). The results of the descriptive statistics are given in Table 2. The coefficients of the latent variables (�y.1 and �y.2) have been computed with the value of 0.08 and 0.13 respectively. In case of �1, �2, the posterior mean value are obtained with 0.06 (– 0.05, 0.17) and – 0.05 (– 0.18, 0.05) respectively. The posterior mean value of �1, 1 and �1, 2 has been computed with 1.02 (0.07) and 1.12 (0.09) respectively. It can be concluded that the latent variable family status and family attitude are positively associated with less number of child in the population. The posterior means in this regards are found to be – 1.05 (0.02) and – .085 (0.09) respectively. The estimated posterior mean values of �1, 1 and �1, 2 says that the higher number of child have been desired by “joint family than nuclear family” and “lower income family than higher income family”. Table2 The Posterior Mean of the Parameter in the Bayesian Approach Parameter �x.1 �x.2 �x.3 �x.4 �y.1 �y.2 �1
Mean (SD) – 0.05 – 0.03 0.05 – 0.02 0.08 0.13 0.06
(0.05) (0.06) (0.07) (0.06) (0.04) (0.05) (0.02)
(2.5%, 97.5%) (– 0.16, (– 0.10, (– 0.10, (– 0.13, (0.08, (0.04, (– 0.05,
0.02) 0.07) 0.11) 0.08) 0.13) 0.18) 0.17)
Parameter �1, 1 �1, 2 �y, 1 �y, 2 P wpi �2
Mean (SD) 1.02 1.12 1.05 – 0.85 – 0.34 0.03 – 0.05
(0.07) (0.04) (0.02) (0.09) (0.04) (0.01) (0.03)
(2.5%, 97.5%) (0.95, 1.17) (0.89, 1.16) (1.22, 0.95) (– 1.07, – 0.83) (– 0.56, – 0.38) (0.02, 0.04) (– 0.18, 0.05)
A SEM has been applied to model the variables of under consideration. Four simultaneous models have been fitted with the couples’ desire number of sons and that of
A Bayesian Approach in Structural Equation Modeling for Human Fertility in Manipur
251
daughters. All variables of interest have been measured with nomial format. The estimation procedures has been obtained with posterior mean. The complete cases have been computed with WINBUGS 2.13. The credible interval (97.5%, 2.5%) has been computed for each model. Table 3 Structural Equation Model Fitted by Bayesian Approach in Manipur Women’s Data Parameters Model 1 Model 2 Model 3 Model 4
Mean (SD)
(97.5%, 2.5%)
�0
2.58 (0.20)
(3.00, 2.19)
�1
– 0.47 (0.08)
(– 0.31, – 0.64)
�0
3.76 (0.41)
(4.61, 2.96)
�1
– 0.11 (0.17)
(0.24, – 0.46)
�0
2.46 (0.18)
(2.84, 2.12)
�1
– 0.53 (0.09)
(– 0.36, – 0.73)
�0
4.03 (0.38)
(4.81, 3.30)
�1
– 0.29 (0.19)
(0.08, – 0.68)
The SEM model results are given in Table 2. It shows the results of family behaviors have the positive influences on desire number of child. The standard regression coefficients have been observed in this work, the magnitudes of each factor have been directly observed with other factors in the model. The ultimate aim of the work is to find the best fitted model. The SEM model has been applied specially to observe the effect of latent variables in the model. In case of Bayesian approach, the best fitted model can be checked in different way. However, the widely used methods are BIC, DIC and AIC. The DIC value of each models have been observed and compared with simplicity of the model. The minimum DIC (324.67) value has been observed in the Model 3. It can be stated that in this sample the model 3 is best fitted. However, all the assumed models are significant and play equal and important role for further policy implication. 7. DISCUSSION The Gibbs sampler with MCMC technique is useful to samples from the conditional posterior distribution. The Gibbs sample has been run after discarding a burn-in to allow convergence diagnosed by trace plots and standard tests. The posterior descriptive statistics has also been computed based on collected sample. The posterior mean of the conditional distribution of latent variables have been calculated. The package Win BUGS, (Spiegelhalter et al., 2003) has been used to run the Gibbs sampling algorithms of the samples. Arminger et al., (1998) and Lee et al., (2004) have applied the Bayesian approach to deal with nonlinear
252
Dilip C. Nath, Atanu Bhattacharjee & H. Brojeshwor Singh
structural equation modelling. Ansari et al., (2000) and Lee et al., (2003) have applied the Bayesian approach to control the heterogeneity effect through SEM. Raftery (1993) has derived the solution for model selection in the SEMs for Bayesian approach. Scheines et al., (1999) and Lee et al., (2000) have elaborated the comprehensive literature on Bayesian SEM method. The Bayesian approach model selection procedure has been applied through BIC and DIC. It seems very simple to selecting the model and concludes on the objective of interest. It is becoming widely accepted tools to solve real life problem. Raftery (1994) has applied the Bayesian inference in investigated incorporation model. Our aim is to apply the Bayesian approach into the simple SEM. Since, maximum likelihood is useful in case of large sample data. The Bayesian inference with posterior mean becomes inevitable in case of small sample data. It is general tendency (due to various difficulties) to collect small size data in real research problem, however, so it seems very essential area to explore. Marika et al., (2010) have applied the SEM to find out the association between the network of variables and BMI Northern Finland Birth Cohort study. Kaori et al., (2010) have explored the relation between environmental, social, and psychological influences by SEM in Japan. Indira and Ramendra (2009) have conducted the study to assess the attitude of mothers towards their gender preference for children in Hoogly district of West Bengal. It has been found that the desire for next child should be son is significantly higher in comparison to daughter. Malahi et al., (1998) have reported a higher preference for sons in urban Himachal Pradesh mothers. Pandey et al., (2002) have studied that the boy child’s are likely to be taken care more in companion to girl in West Bengal. Here, the levels of education, family status and family behaviors have been observed for next child sex preference in Manipur women. The sex preference has also been carried out in husband. In the earlier studies, sex preference has been clearly observed for the son. The effect of higher education seems as contributor of equal sex preferences for next child in the study population. 9. CONCLUSIONS This work is contributed to deal the non-response problem in fertility data on sex preference among Manipuri mothers. The desire number of daughter and son has been carried out with the couples. Particularly, SEM has been applied through Bayesian approach to obtain the robust conclusion. The latent variables to the response of interest has been measured by SEM. The methods presented here can be extended to different covariates related to desire sex of next child by the couples. The foremost advantage of the proposed method is the application of prior information to obtain the posterior inference of SEM. However, the frequency approach can be useful as an alternative. The DIC value has been considered as an effective model in the study. The best fitted model has been calculated based on minimum
A Bayesian Approach in Structural Equation Modeling for Human Fertility in Manipur
253
DIC value. The presence of higher education has been found equal sex preference among Manipuri mothers. It can be confirmed that more extensive applied research is required to deal with the effect of other latent variables on response of interest through SEM. REFERENCES [1] Ansari A., and Jedidi K., (2000), Bayesian Factor Analysis for Multilevel Binary Observations, Psychometrika, 64: 475–496. [2] Bardhan, (1992), “Little Girls and Death in India”, Economic and Political Weekly, 1448–50. [3] Bentler P. M., and Wu E. J. C., (2002), EQS6 for Windows User Guide, Enciuo, CA.: Multivariate Software, Inc. [4] Das Gupta M., (1987), “Selective Discrimination Against Female Children in Rural Punjab, India”, Population and Development Review, 13: 77–10. [5] David et al., (2005), Have Applied the Bayesian Approach in SEM to Compare the Performance with Frequency Approach Using a Democratization and Industrialization Data. [6] David B. Dunson, Jesus Palomo, and Ken Bollen, (2005), Bayesian Structural Equation Modeling, Technical Report #2005-5. [7] Jöreskog K. G., and Yang F., (1996), Nonlinear Structural Equation Models: The Kenny Judd Model with Interaction Effects. [8] Heinz Herzig, Ulla Sovio, Amanda J. Bennett, Leena Peltoneny, Mark I. McCarthy, Paul Elliott, Bianca De Stavola, and Marjo-Riitta Ja¨ Rvelin, (2010), Life-Course Analysis of a Fat Mass and Obesity-Associated (FTO) Gene Variant and Body Mass Index in the Northern Finland Birth Cohort 1966 Using Structural Equation Modeling, American Journal of Epidemiology, 172(6). [9] Sik-Yum Lee, (2007), Structural Equation Modeling, A Bayesian Approach, John Wiley & Sons Ltd., [10] Indira Dey (Pal), and Ramendra Narayan Chaudhuri, (2009), Gender Preference and Its Implications on Reproductive Behavior of Mothers in a Rural Area of West Bengal, Indian J. Community Med., 34(1): 65–67, (January). [11] Jöreskog K. G., and Sörbom D., (1996),, LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language, Scientific Software International. [12] G. A. Marcoulides, and R. E. Schumacker (Eds), Advanced Structural Equation Modeling Techniques (pp. 57–88). Hillsdale, NJ: LEA. [13] Marika Kaakinen, Esa La¨a¨ra¨, Anneli Pouta, Anna-Liisa Hartikainen, Jaana Laitinen, Tuija H. Tammelin, Karl. [14] Kaori Ishii, Ai Shibata, and Koichiro Oka, (2010), Environmental, Psychological, and Social Influences on Physical Activity Among Japanese Adults: Structural Equation Modeling Analysis, International Journal of Behavioral Nutrition and Physical Activity, 7: 61. [15] Lee S.-Y., and Shi J.-Q., (2000), Bayesian Analysis of Structural Equation Model with Fixed Covariates, Structural Equation Modeling: A Multidisciplinary Journal, 7: 411–430.
254
Dilip C. Nath, Atanu Bhattacharjee & H. Brojeshwor Singh
[16] Malahi P., and Raina G., (1999), Preferences for the Gender of Children and Its Implications for Reproductive Behaviour in Urban Himachal Pradesh, J. Fam Welfare, 45: 23–30. [17] Nair P. M., (1996), “Imbalance of Sex Ratio of Children in India”, Demography India, 25: 177–87. [18] Raftery A. E., (1994), Bayesian Model Selection in Social Research, Working Paper No. 94-12, Center for Studies in Demography and Ecology , Univ of Washington. [19] Pandey A., Sengupta P. G., Mondal S. K., Gupta D. N., Manna B., Ghosh S., Sur D., and Bhattacharya S. K., (2002), Gender Differences in Healthcare-Seeking During Common Illnesses in a Rural Community of West Bengal, India, J. Health Popul Nutr., 20(4): 306–11, (December). [20] Christian P. Robert, and George Casella, Springer (1999), Monte Carlo Statistical Methods, Spinger text. [21] Scheines R., Hoijtink H., and Boomsma A., (1999), Bayesian Estimation and Testing of Structural Equation Models, Psychometrika, 64: 37–52. [22] Spiegelhalter D. J., Thomas A., Best N., and Gilks W., (2003), WinBUGS, Version 1.4 User Manual, MRC Biostatistics Unit. URL: http://weinberger.mrc-bsu.cam.ac.uk/bugs/ Welcome.html. [23] Raftery A. E., (1993), Bayesian Model Selection in Structural Equation Models, In Testing Structural Equation Models (Edited by K. A. Bollen and J. S. Long). [24] Arminger G., and Muth¶en B. O., (1998), A Bayesian Approach to Nonlinear Latent Variable Models Using the Gibbs Sampler and the Metropolis-Hastings Algorithm, Psychometrika, 63: 271–300. [25] Lee S.-Y., (1992), Bayesian Analysis of Stochastic Constraints in Structural Equation Models, British Journal of Mathematica and Statistical Psychology, 75: 115–125. [26] Lee S.-Y., and Song X.-Y., (2003), Bayesian Model Selection for Mixtures of Structural Equation Models with an Unknown Number of Components, British Journal of Mathematica and Statistical Psychology, 56: 145–165. Athematical and Statistical Psychology, 45: 93–107. Dilip C. Nath, Atanu Bhattacharjee and H. Brojeshwor Singh Department of Statistics, Gauhati University, Guwahati-781 014, India. E-mails:
[email protected], 2
[email protected] , 3
[email protected]