A latent variable model for mixed continuous and ordinal ... - Sankhya

1 downloads 0 Views 200KB Size Report
Outcomes related to mixed correlated ordinal and continuous data with ... given the categorical variables (for a mixed Poisson and continuous responses.
Sankhy¯ a : The Indian Journal of Statistics 2010, Volume 72-B, Part 1, pp. 38-57 c 2010, Indian Statistical Institute

A Latent Variable Model for Mixed Continuous and Ordinal Responses with Nonignorable Missing Responses: Assessing the Local Influence via Covariance Structure E. Bahrami Samani Shahid Beheshti University, Tehran, Iran

M. Ganjali Shahid Beheshti University, Tehran, Iran

S. Eftekhari Shahid Beheshti University, Tehran, Iran

Abstract A joint model for mixed ordinal and continuous responses, with missing values in both variables, and their missing mechanisms is proposed. Full likelihood-based approach that yields maximum likelihood estimates of the model parameters is used. For data with missing responses in both variables some modified Pearson residuals are also proposed. A common way to investigte if perturbations of model components influence key results of the analysis is to compare the results derived from the original and perturbed models using an influence graph. For this, influence of a small perturbation of elements of the covariance structure of the model on likelihood displacement is also studied. The model is illustrated using data from a foreign language achievement study. AMS (2000) subject classification. Primary 62P99. Keywords and phrases. Latent variables, missing responses, ordinal and continuous responses, sensitivity analysis, likelihood displacement

1

Introduction

Outcomes related to mixed correlated ordinal and continuous data with possibility of missing values, are pervasive and research on analyzing them needs to be promoted. Schafer (1997), for example, describes data from Raymond and Roberts (1983). These data were collected in order to measure the Foreign Language Attitude Scale (FLAS), an instrument to predict

A latent variable model for ordinal responses

39

success in the study of foreign languages. A number of variables were measured for a sample of students enrolled in foreign language courses at the Pennsylvania State University in the early 1980s. These variables include the student’s score on FLAS (continuous), and the student’s final grade in the foreign language course (GRD) (ordinal). In this data, GRD and FLAS are the correlated responses and the effect of explanatory variables on GRD and FLAS should be investigated simultaneously. Consequently, we need to consider a method in which these variables can be modelled jointly. For joint modelling of responses, one method is to use the general location model of Olkin and Tate (1961), where the joint distribution of the continuous and categorical variables is decomposed into a marginal multinomial distribution for the categorical variables and a conditional multivariate normal distribution for the continuous variables, given the categorical variables (for a mixed Poisson and continuous responses where Olkin and Tate’s method is used see Yang et al. (2007) and for joint modelling of mixed outcomes using latent variables see McCulloch (2007). A second method for joint modelling is to decompose the joint distribution as a multivariate marginal distribution for the continuous responses and a conditional distribution for categorical variables given the continuous variables. Cox and Wermuth (1992) empirically examined the choice between these two methods. The third method uses simultaneous modelling of categorical and continuous variables taking into account the association between the responses by the correlation between errors in the model for responses. For more details of this approach see, for example, Heckman (1978) in which a general model for simultaneously analyzing two mixed correlated responses is introduced and Catalano and Ryan (1992) who extended and used the model for a cluster of discrete and continuous outcomes. Rubin (1976) and Little and Rubin (2002) made important distinctions between the various types of missing mechanism. They define the missing mechanism as missing completely at random (MCAR) if missingness is dependent neither on the observed responses nor on the missing responses, and missing at random (MAR) if, given the observed responses, it is not dependent on the missing responses. Missingness is defined as non-random if it depends on the unobserved responses. From a likelihood point of view MCAR and MAR are ignorable but not missing at random (NMAR) is nonignorable. For mixed data with missing outcomes, Little and Schuchter (1987) and Fitzmaurice and Laird (1997) used the general location model of Olkin and

40

E. Bahrami Samani, M. Ganjali and S. Eftekhari

Tate (1961) with the assumption of missingness at random (MAR) to justify ignoring the missing data mechanism. This means that they used all available responses, without a model for missing mechanism, to obtain parameter estimates using the EM (Expectation Maximization) algorithm. A model for mixed continuous and discrete binary responses with possibility of missing responses is introduced by Ganjali (2003). Ganjali and Shafie (2007) present a transition model for an ordered cluster of mixed continuous and discrete binary responses with non-monotone missingness. In this paper a general latent variable model for simultaneously handling response and non-response in mixed ordinal and continuous data with potentially nonrandom missing values in both types of responses is presented. With this model, the dependence between responses can be taken into account by the correlation between errors in the models for continuous and ordinal responses and it is shown how we can incorporate a model for a complicated pattern of missing responses. So, by this general model, not only we are able to joint model ordinal response (as an extention for binary response) and continuous response (considering missing mechanisms for both responses) but also the effect of both correlated responses on both correlated missing mechanisms can be investigated. For goodness of fit, it is common practice to find abnormal values of response in univariate regression models and there are many approaches for doing this in statistical literature (Draper and Smith, 1986, ch. 2). However, for mixed data, especially with missing values, to detect abnormal observation, correlation between responses are not taken into account or typically some approaches, e.g. residuals, ignoring the missing mechanisms (see for example, Hirano et al., 2001) are used. In this paper we defined some modified Pearson residuals for mixed ordinal and continuous responses to find abnormal values of responses in the existence of missing values. We will also propose influence graphs which are helpful to assess robustness of the model parameters to assumptions made on the missing mechanisms. In the next Section, the model, the likelihood and the modified residuals are presented and then in the subsequent Section, the proposed methodology will be applied on the data from the study on foreign language acquisition. As a sensitivity analysis for these data the influence of a small perturbation of the covariance structure of the model on likelihood displacement will also be investigated. Finally, concluding remarks are given.

A latent variable model for ordinal responses 2

41

Model for Mixed Continuous and Ordinal Responses

2.1. Model and likelihood. Suppose two responses, Yi , as an ordinal response with c levels and Zi as a continuous response are measured on the ith individual. Variables Yi and Zi are correlated and must be modelled ∗ denote the underlying latent variable simultaneously. Let Yi∗ , RY∗ i and RZ i of the ordinal response, the non-response mechanism for the ordinal variable and non-response mechanism for the continuous variable, respectively, and the ordinal variable of the ith individual with c levels is defined as  Yi∗ < η1 ,  1 j+1 ηj ≤ Yi∗ < ηj+1 , j = 1, ..., c − 2 Yi =  c Yi∗ ≥ ηc−1 , where η1 , ..., ηc−1 are the cut-point parameters with η1 < η2 < ... < ηc−1 and for model fitting, we define ξ1 = η1 , ξ2 = η2 − η1 ,..., ξc−1 = ηc−1 − ηc−2 where ξj ≥ 0 for j = 2, ..., c − 1. The indicator variable for responding to Y is defined as  1 RY∗ i > 0 RYi = 0 otherwise, and the indicator variable for responding to Z is defined as  ∗ >0 1 RZ i RZ i = 0 otherwise. The joint model takes the form:

Y ∗ = β1′ X1 + ε1 Z = RY∗ ∗ RZ

= =

β2′ X2 + σε2 α1′ X3 + ε3 α2′ X4 + ε4

(2.1) (2.2) (2.3) (2.4)

where Xj for j = 1, 2, 3, 4 are vectors of explanatory variables. It is assumed that E(εl ) = 0 for l = 1, 2, 3, 4. The covariance matrix of the vector of errors (ε1 , ε2 , ε3 , ε4 )′ is   1 ρ12 ρ13 ρ14  ρ12 1 ρ23 ρ24   Σ=  ρ13 ρ23 1 ρ34  ρ14 ρ24 ρ34 1

The vector of parameters β1 , β2 , α1 and α2 , the cut-point parameters η1 , ..., ηc−1 or equivalently ξ1 , ..., ξc−1 and the correlation parameters ρjj ′ for j < j ′ ,

42

E. Bahrami Samani, M. Ganjali and S. Eftekhari

j = 1, 2, 3 and j ′ = 2, 3, 4 should be estimated. The vector β2 includes an intercept parameter but the vector β1 , due to having cut-point parameters in the model, does not include any intercept. In this model any multivariate distribution can be assumed for the errors in the model. Here a multivariate normal distribution is assumed. The indicator of response of Y (RY ) in this model is correlated with both responses, the indicator of response of Z (RZ ) also is correlated with both responses and the continuous response Z is correlated with the ordinal response Y . If, one of the correlation parameters ρjj ′ for j = 1, 2 and j ′ = 3, 4 is found to be significant, then we have a non random missing process. On the other hand, if ρjj ′ for j = 1, 2 and j ′ = 3, 4 are found to be 0s, the missing data is completely at random and can be ignored. There is also a correlation between the two non- responses mechanisms which if is positive indicates that high rate of non-response of one variable is associated with high rate of non-responses for the other variable. If, one of the correlation parameters ρjj ′ for j = 1, 2 and j ′ = 3, 4 is found to be significant, then we may have a NMAR mechanism (we have NMAR mechanism for sure if the condition given in (2), below, does not satisfy). On the other hand, if ρjj ′ for j = 1, 2 and j ′ = 3, 4 are found to be 0s, the missing data is MCAR and can be ignored. For finding the condition for MAR let W = (Z, Y ) = (Wobs , Wmis ), ∗ , W ∗ ) and R∗ = (R∗ , R∗ ) where W ∗ is the vector W ∗ = (Z, Y ∗ ) = (Wobs z mis Y obs ∗ of latend variables related to the observed part of W = (Z, Y ), Wmis is the vector of latend variables related to the missing part of W = (Z, Y ). According to our joint model, the vector of responses along with the missing ∗ , W ∗ , R∗ ) has a multivariate normal distribuindicators (W ∗ , R∗ ) = (Wobs mis tion with the following covariance structure,   Σo,o Σo,m Σo,R∗ Σ =  Σm,o Σm,m Σm,R∗  ΣR∗ ,o ΣR∗ ,m ΣR∗ ,R∗

where

∗ ,W∗ ) Σo,o = cov(Wobs obs ∗ ,W∗ ) Σo,m = cov(Wobs mis ∗ , R∗ ) Σo,R∗ = cov(Wobs ΣR∗ ,R∗ = cov(R∗ , R∗ ).

The joint density function can also be partitioned as ∗ ∗ ∗ f (W ∗ , R∗ ) = f (Wmis , R∗ |Wobs )f (Wobs ),

A latent variable model for ordinal responses

43

∗ , R∗ |W ∗ ) and f (W ∗ ) have, respectively, a conditional and where f (Wmis obs obs a marginal normal distribution. According to the missing mechanism definitions, to have a MAR mechanism the covariance matrix of the above mentioned conditional normal distribution, ∗ , R∗ |W ∗ ) Σm,R∗ |o = cov(Wmis obs

=



Σm,m Σm,R∗ ΣR∗ ,m ΣR∗ ,R∗

=



Σm,m − Σ′m,o Σ−1 Σm,R∗ − Σ′m,o Σ−1 o,o Σm,o o,o ΣR∗ ,o ′ −1 ′ ∗ ∗ ∗ ΣR ,m − ΣR∗ ,o Σo,o Σm,o ΣR ,R − ΣR∗ ,o Σ−1 o,o ΣR∗ ,o





Σm,o ΣR∗ ,o

′

Σ−1 o,o

Σm,o ΣR∗ ,o





should satisfy the following constraint, Σm,R∗ − Σ′m,o Σ−1 o,o ΣR∗ ,o = 0.

(2.5)

For our application (see section of application), we have missing values only ∗ = Z and W ∗ for our ordinal variable and we may have Wobs mis = Y . For ∗ missing mechanism we only need to define R = Ry∗ , as we do not have any missing value for our continuous response, and Σm,m = 1, Σm,R∗ = ρ23 , ΣR∗ ,R∗ = 1 Σo,o = σ 2 , Σm,o = σρ12 , ΣR∗ ,o = σρ13 so that the above constraint will be reduced to, Σm,R∗ − Σ′m,o Σ−1 o,o ΣR∗ ,o = ρ23 − ρ12 ρ13 = 0. The Likelihood for this model, which has been given in appendix A, shows the simplification obtained by using the assumption of normality for errors of the model in the system of equations (1). 2.2. Residuals. The missing values of responses create problems for the usual residual diagnostics, (see Hirano et al., 1998). The Pearson residuals for continuous response, assuming MCAR, can take the form rz =

z − E(Z|x2 ) . (V ar(Z|x2 ))1/2

(2.6)

These unconditional residuals will be misleading if missingness is MAR or NMAR as in these cases the values of the continuous variable come from

44

E. Bahrami Samani, M. Ganjali and S. Eftekhari

a conditional or truncated conditional distribution. We can examine the residuals of the responses conditional on being observed. Let start with using theoretical form, involving E(Z|RZ = 1), E(Y |RY = 1), V ar(Z|RZ = 1) and V ar(Y |RY = 1) rather than their predicted values to define residuals. The Pearson residuals for continuous response can take the form rz =

z − E(Z|RZ = 1) , (V ar(Z|RZ = 1))1/2

(2.7)

where E(Z|RZ = 1) = β2′ X2 + σρ24 M, V ar(Z|Rz = 1) = σ 2 [(1 − ρ224 ) + ρ224 (1 − M α2′ X4 − M 2 )] where M =

φ(−α′2 X4 ) Φ(α′2 X4 )

and that for ordinal response can take the form ry =

y − E(Y |RY = 1) , [V ar(Y |RY = 1)]1/2

(2.8)

where, for example, when c = 3 we have E(Y |RY = 1) = 3 +

M12 + M11 − (M2 + M1 ) Φ(α1′ X3 )

15(M12 + M11 − (M2 + M1 )) Φ(α1′ X3 ) M12 + M11 − (M2 + M1 ) 2 2M12 − M2 −[ ] − Φ(α1′ X3 ) Φ(α1′ X3 ) where Φ(·), φ(·) and Φ12 (·, ·; ·) are, respectively, the cumulative univariate, density function and bivariate normal distributions and M1 = Φ(η1 −β1′ X1 ), M2 = Φ(η2 −β1′ X1 ), M11 = Φ12 (η1 −β1′ X1 , −α1′ X3 ; ρ13 ) and M12 = Φ12 (η2 − β1′ X1 , −α1′ X3 ; ρ13 ). The estimated Pearson residuals (ˆ rz , rˆy ) can be found by using the maximum likelihood estimates of the parameters, obtained by system (1), in (2.7) and (2.8). As residuals in (2.7) and (2.8) are defined by conditioning on observing the responses, they differ from those of Ten Have et al. (1998), who found the expectation and variance of responses, and consequently the residuals, unconditional on the fact that responses should be observed. Then, to calculate the residuals they assumed no link between response and non response mechanisms. This gives biased estimates of the means and variances of responses if missingness is not at random. However, the residuals in equations (2.7) and (2.8) do not take into account the correlation between responses. Some modified Pearson residuals, defined in appendix B, are conditioned on the indicator of responses, and can take into account the correlation between responses. V ar(Y |RY = 1) =

A latent variable model for ordinal responses 3

45

Applicaton

3.1. Data. In this section, we present an application of the proposed methodology to a foreign language achievement study (Raymond and Roberts, 1983, and Schafer, 1997 (chapter 9)). The main purpose of the foreign language achievement study was to investigate the psychometric properties of the Foreign Language Attitude (FLAS), an instrument developed for predicting proficiency in a variety of foreign language learning settings. The data we used in our analysis are based on N = 236 students enrolled in foreign language courses at Pennsylvania State University in the 1980s with complete records on the following variables: (1) Gender (0=male and 1=female); (2) Age, age group (0= less than 20 years and 1=larger than or equal to 20 years); (3) PRI, number of prior foreign language courses (1=none, 2=at least 1 course); (4) LAN, the foreign language studied by the student (1=French, 2=Spanish, 3=Other); (5) CGPA, current college grade point average; (6) FLAS, a continuous variable representing the student’s score on the Foreign Language Attitude Scale; and incomplete records for (7) GRD, an ordinal variable with four levels representing the student’s final letter grade in the foreign language course (1 if D, 2 if C, 3 if B and 4 if A). The vector of explanatory variables is X=(Gender, Age, PRI, LAN1 , LAN2 , CGPA) where LAN1 and LAN2 are dummy variables for languages French and Spanish, respectively. The two correlated responses are FLAS and GRD. The mean of FLAS is 82.406 and its standard deviation is 13.655. A Kolmogorov Smrinov test of normality assumption for FLAS is not rejected (P-value=0.776). A frequency table for student grade shows that 14.8% percent of GRD values are missing. The highest level of GRD is for level A ( 50.4%) and the lowest is for level D (3.6% ), the other two levels contributions are (22.4%) for level B and (8.8%) for level C. We also consider the interaction effect between Gender and LAN, due to its significant effect in pervious works of Schafer (1997) and De Leon and Carrire (2007), who assume an ignorable missing mechanism, for these data.

3.2. Models for BHPS data. For comparative purposes, three models are considered. The first model (model I) uses only complete cases and does

46

E. Bahrami Samani, M. Ganjali and S. Eftekhari

not consider the correlation between two responses. This model is F LAS = β0 + β1 Gender + β2 Age + β3 P RI + β4 LAN1 + β5 LAN2 CGP A +β6 CGP A + β7 Gender × LAN1 + β8 Gender × LAN2 + σε1 (3.1) GRD



= γ1 Gender + γ2 Age + γ3 P RI + γ4 LAN1 + γ5 LAN2 +γ6 CGP A + γ7 Gender × LAN1 + γ8 Gender × LAN2 + ε2 (3.2)

where there is no correlation between ε1 and ε2 . The second model (model II) uses model I and takes into account the correlation between two responses (ρ12 ). Model III considers the full model, model with missing mechanism, with the following form

F LAS = β0 + β1 Gender + β2 Age + β3 P RI + β4 LAN1 + β5 LAN2 +β6 CGP A + β7 Gender × LAN1 + β8 Gender × LAN2 + σε1 (3.3) GRD



= γ1 Gender + γ2 Age + γ3 P RI + γ4 LAN1 + γ5 LAN2 +γ6 CGP A + γ7 Gender × LAN1 + γ8 Gender × LAN2 + ε2 (3.4)

∗ RGRD

= α0 + α1 Gender + α2 Age + α3 P RI + α4 LAN1 + α5 LAN2 + α6 CGP A + α7 Gender × LAN1 + α8 Gender × LAN2 + ε3 (3.5)

The correlation structure of the errors in this model is   1 ρ12 ρ13 Σ =  ρ12 1 ρ23  . ρ13 ρ23 1

4

Sensitivity Analysis for Data

Sensitivity analysis is the study of how model output varies with changes in model inputs. An assessment of the influence of minor perturbations of

A latent variable model for ordinal responses

47

the model is important (vide, Cook, 1986). Likelihood displacement and global influence will be used for measuring the influence of a perturbation of the covariance structure of the model in system (1), on deviation from assumption of MAR. Generally, one introduce perturbations into the model through the q × 1 vector ω which is restricted to some open subset Ω of Rq . Let L(θ|ω) denote the log-likelihood corresponding to the perturbed model for a given ω in Ω. For a given set of observed data, where θ is a p × 1 vector of unknown parameters, we assume that there is an ω0 in Ω such that L(θ) = L(θ|ω0 ) for all θ. Finally, Let θˆ and θˆω denote the maximum likelihood estimators under L(θ) and L(θ|ω). To assess the influence of varying ω throughout Ω, we consider the likelihood displacement defined as: ˆ − L(θˆω )]. LD(ω) = 2[L(θ) Let see how we can use this approach for our purposes. In our application the continuous variable is always observed, and so condition for MAR of ordinal response would be f (y ∗ , Ry∗ |z) = f (y ∗ |z)f (Ry∗ |z). As (y ∗ , z, Ry∗ ) is assumed to have a multivariate normal distribution with ′ mean µ = β1′ X1 , β2′ X2 , α1′ X3 and covariance matrix  1 σρ12 ρ13 Σ =  σρ12 σ 2 σρ23  ρ13 σρ23 1 

for MAR we should have   1 ρ13 Σy∗ ,Ry∗ |z = − ρ13 1

σρ12 σρ23



(σ 2 )

−1

σρ12 σρ23

′

=0

which gives the following condition for MAR: ρ13 − ρ12 ρ23 = 0 The perturbation from MAR to NMAR is what we like to consider for our ordinal response. We can use likelihood displacement for the effect of

48

E. Bahrami Samani, M. Ganjali and S. Eftekhari

perturbation from MAR to NMAR. Let h = ρ13 − ρ12 ρ23 . Here, ω = h, ω0 = 0 and q = 1. Denote the log-likelihood function by L(θ |h ) =

n X

Li (θ |h )

i=1

where Li (θ |h ) is the contribution of the ith individual to the log-likelihood and θ is the parameter vector. Here, L(θ |h = 0 ) is the log-likelihood function which corresponds to a MAR mechanism. Suppose h can be perturbed around 0. Let θˆ be MLE for θ obtained by maximizing L(θ) = L(θ |h = 0 ) and let θˆh denote the MLE for θ under L(θ |h ). Now one compare θˆh and θˆ as local influence. Strongly different estimates show that the estimation procedure is highly sensitive to such modification. We can quantify the differences using Cook’s likelihood displacement defined as ˆ − L(θˆh )] LD(h) = 2[L(θ) LD(h) will be large if L(θ |h ) is strongly curved at θˆ which means that θ is estimated with high precision, and small otherwise. A graph of LD(h) versus h provides a global sensitivity analysis. 4.1. Results . Results of using three models are given in Table I and II. Model (I) shows significant effect of LAN2 (Spanish) and LAN2 × Gender on FLAS and significant effect of Gender, Age, PRI, LAN2 , CGPA and LAN1 × Gender on GRD. The FLAS score of Spanish language learners is less than that of learners of the other languages. The FLAS score of the female Spanish language learners is more than that of males or females of other languages or males of Spanish language learners. The odds of obtaining grade A for males is more than female and the odds of obtaining grade A decreases as age increases. The model also shows that the experience of prior language course increases the odds of obtaining grade A. Model (II) shows a positive correlation between FLAS and GRD (ˆ ρ12 =0.277, P-value=0.000). This gives the conclusion that high values of FLAS are associated with high values of latent variable of GRD∗ (high level for GRD). The results for this model is close to those of model (I). In Table II, three different forms of model (III) is considered. The first form, assumes a missing completely at random mechanism for GRD ∗ which means no correlation between error terms of FLAS, RGRD and GRD∗ (MCAR, ρ13 = ρ23 = 0). In this case, model (III) gives the same results as model (I).

A latent variable model for ordinal responses

49

Table 1: Results of using models I (model using complete cases and not considering the correlation between two responses) and model II (model using complete cases and considering the correlation between two responses) for foreign language achievement study (Parameters significant at %5 level are highlighted in bold, for Gender the baseline is male and for Age the baseline is less than 20) Model I Model II Parameter Est. S.E. Est. S.E. β0 (Constant) 81.625 6.260 81.970 7.235 β1 (Gender ) ∗ 4.475 2.433 4.293 2.574 β2 (Age) -1.748 1.746 -1.547 1.766 β3 (PRI) ∗∗ -1.066 1.953 -1.247 2.128 β4 (LAN1 ) -0.421 2.971 -1.192 2.925 β5 (LAN2 ) -7.288 2.769 -7.946 2.795 β6 (CGP A) 0.281 1.809 0.186 2.047 β7 (LAN1 × Gender) 2.106 4.345 1.582 4.340 β8 (LAN2 × Gender) 10.625 4.015 10.647 4.045 σ 12.92 0.683 11.909 0.593 ρ12 0.277 0.081 γ1 (Gender) 0.882 0.316 0.886 0.320 γ2 (Age) 0.541 0.189 0.583 0.190 γ3 (PRI) 0.484 0.203 0.490 0.205 γ4 (LAN1 ) 0.108 0.291 0.088 0.288 γ5 (LAN2 ) -0.668 0.276 -0.671 0.280 γ6 (CGPA) 1.355 0.211 1.354 0.210 γ7 (LAN1 ×Gender) -1.421 0.466 -1.377 0.469 γ8 (LAN2 × Gender) -0.447 0.440 -0.484 0.445 ξ1 2.875 0.705 2.896 0.708 ξ2 0.885 0.177 0.885 0.177 ξ3 1.127 0.137 1.121 0.136 −log-likelihood 1103.981 947.765

50

E. Bahrami Samani, M. Ganjali and S. Eftekhari

The second form of model (III) considered the missing mechanism for GRD as missing at random (MAR, h = 0). The correlation parameter ρ23 is ∗ significant and that shows a positive correlation between FLAS and RGRD (ˆ ρ23 =0.196, P-value=0.039). In the third form of the model there is no constrain for the missing mechanism of GRD (NMAR, h 6= 0). We use the likelihood ratio to test for NMAR. By these results we can conclude that the two responses are correlated and also the missing indicator for GRD is related to both responses. This leads to have a not at random missing mechanism (Deviance= 10.086 with 1 d.f., P-value=0.002). The mean of latent variable for grade is less for Spanish language learners than that for other language learners which gives an expectation of low value for GRD (low levels of GRD). On the other hand, the mean of GRD∗ in increasing with high values for CGPA, which leads to high level for GRD. GRD∗ is also less for French female learners than other learners. Some covariates are also effective in responding to GRD. For example, male are more likely to respond. French people are more likely to respond and people with higher values of CGPA are more likely to respond. To search for Sensitivity analysis we find likelihood displacement for different values of h. LD(h) [computed from equation (8)] for different values of h in the interval (−1, 1) is plotted in figure 1. This curve shows a high curvature around h = 0, so that the differing values of h affects the model results, hence final results of the model is highly sensitive to the elements of covariance structure. Using correlated residuals, introduced in equation (3.1), for model (III) with NMAR mechanism, we find five outliers with residuals larger than 3. Refitting the model without those observations the same results are obtained and the plot in figure 1 is also give the same conclusion that data are missing not at random. 5

Discussion

In this paper a latent variable model was presented for simultaneously modelling ordinal and continuous correlated responses. The model was formulated in such a way that one can handle the possibility of missing responses. It can also be used for more than two variables with missing responses. Some modified residual were also presented to detect outliers. We assumed a bivariate normal distribution for errors in the model. However, any other bivariate distribution such as t or logistic may also be considered. In our application we only had missing values for ordinal response and it makes our sensitivity analysis to be simpler. If both variables have missing

A latent variable model for ordinal responses

51

Table 2: Results using three models (model with assumption of missing completely at random, MCAR, model with missing at random, MAR, assumption and model with not imposing any assumption on the missing mechanisms, NMAR) for foreign language achievement study (parameters significant at 5% level are highlighted in bold) M CAR M AR N M AR Parameter Est. S.E. Est. S.E. Est. S.E. β0 (Constant) 81.625 6.168 81.970 7.235 82.007 6.164 β1 (Gender)∗ 4.475 2.394 4.293 2.574 4.375 2.370 β2 (Age) -1.748 1.715 -1.547 1.766 -1.750 1.707 ∗∗ β3 (P RI) -1.066 1.913 -1.247 2.128 -1.046 1.910 β4 (LAN1 ) -0.420 2.879 -1.192 2.925 -0.321 2.583 β5 (LAN2 ) -7.288 2.712 -7.946 2.795 -7.189 2.650 β6 (CGP A) 0.280 1.794 0.186 2.047 0.290 1.783 β7 (LAN1 × Gender) 2.106 4.257 1.582 4.340 1.806 4.006 β8 (LAN2 × Gender) 10.626 3.936 10.647 4.045 10.826 3.904 σ 12.675 0.583 11.909 0.593 12.098 0.583 γ1 (Gender) 0.886 0.315 0.886 0.320 1.019 0.285 γ2 (Age) 0.576 0.188 0.583 0.190 0.479 0.177 γ3 (P RI) 0.457 0.203 0.490 0.205 0.353 0.190 γ4 (LAN1 ) 0.106 0.277 0.088 0.288 0.090 0.262 γ5 (LAN2 ) -0.652 0.276 -0.671 0.280 -0.598 0.253 γ6 (CGP A) 1.349 0.208 1.354 0.210 1.174 0.194 γ7 (LAN1 × Gender) -1.358 0.458 -1.377 0.469 -1.560 0.429 γ8 (LAN2 × Gender) -0.483 0.439 -0.484 0.445 -0.726 0.407 ξ1 2.889 0.700 2.896 0.708 2.843 0.656 ξ2 0.881 0.176 0.885 0.177 0.829 0.167 ξ3 1.115 0.135 1.121 0.136 1.001 0.122 α0 (Cutpoint) -1.700 0.795 -1.751 0.792 -2.913 0.869 α1 (Gender) -0.580 0.283 -0.587 0.289 -0.561 0.258 α2 (Age) 0.690 0.235 0.702 0.237 0.931 0.239 α3 (P RI) 0.052 0.245 0.060 0.247 0.030 0.235 α4 (LAN1 ) 0.358 0.435 0.345 0.434 0.786 0.373 α5 (LAN2 ) -0.035 0.360 -0.026 0.408 - 0.013 0.152 α6 (CGP A 0.763 0.234 0.778 0.233 1.103 0.256 α7 (LAN1 × Gender) 0.896 0.683 0.890 0.670 0.267 0.649 α8 (LAN2 × Gender) 1.085 0.554 1.066 0.593 0.971 0.454 ρ12 0.293 0.085 0.293 0.085 0.288 0.080 ρ13 - -0.910 0.236 ρ23 - 0.196 0.102 0.196 0.095 −log-likelihood 1182.475 1180.719 1175.676

52

E. Bahrami Samani, M. Ganjali and S. Eftekhari

Figure 1: Likelihood displacement against values of (the value that perturb the model from missing at random to not missing at random).

values, to see the behavior of the likelihood displacement, one has to perturb the model from MAR to NMAR by looking at the nonzero values of the left part of the equation (2) (say matrix H). If likelihood displacement gets so large with nonzero H that means that the results are highly sensitive to the modification and a joint model of missing mechanism and responses is necessary. We think this is a very interesting topic of research for future study. References De Leon, A. R. and Carrire, K. C. (2007). General mixed-data model: Extension of general location and grouped continuous models. Canad. J. Statist., 35, 533-548. Catalano, P. and Ryan, L. M. (1992). Bivariate latent variable models for clustered discrete and continuous outcomes. J. Amer. Statist. Assoc., 50, 1078-1095. Cox, D. R. and Wermuth, N. (1992). Response models for mixed binary and quantitative variables. Biometrika, 79, 441-461. Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical Society, Series B, 48, 133-169. Draper. N. and Smith. H. (1986). Applied Regression Analysis. Wiley, New York, 2nd -edition. Fitzmaurce, G. M. and Laird, N. M. (1997). Regression models for mixed discrete and continuous responses with potentially missing value. Biometrics, 53, 110-122. Ganjali. (2003). A model for mixed continuous and discrete responses with possibility of missing responses. J. Sciences, Islamic Republic of Iran, 14, 53-60. Ganjali. and Shafie, K. (2007). A transition model for an ordered cluster of mixed continuous and discrete responses with non- monotone missingness. J. Applied Statistical Sciences, 15, 331-343.

A latent variable model for ordinal responses

53

Heckman, J. J. (1978). Dummy endogenous variable in a Simultaneously equation system. Econometrica, 46, 931-959. Hirano, K. Imbens, G. Ridder, G. and Rubin, D. (2001). Combining Panel data sets with attrition and refreshment samples. Econometrica, 69, 1645-1659. Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publication. London. Little, R. J. and Schluchter M. (1987). Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika, 72, 497-512. Little, R. J. and Rubin, D. (2002). Statistical analysis with missing data. Second edition. New york, Wiley. Olkin, L. and Tate, R. F. (1961). Multivariate correlation models with mixed discrete and continuous variables. Ann. Math. Statist., 32, 448-456. McCulloch, C. M. (2007). Joint modelling of mixed outcome type using latend variables. Statistical Methods in Medical Research, 17, 53-73. Raymond, M. R. and Roberts, D. M. (1983). Development and validation of a foreign language attitude scale. Educational and Psychological Measurement, 43, 12391246. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581-592. Schafer, J. L. (1997). Analysis of incomplete multivariate data, Chapman and Hall. Ten Have, T. R. T., Kunselman, A. R., Pulkstenis, E. P. and Landis, J. R. (1998). Mixed effects logistic regression models for longitudinal binary response data with informative dropout. Biometrics, 54, 367-383. Yang, Y., Kang, J., Mao, K. and Zhang, J. (2007). Regression models for mixed Poisson and continuous longitudinal data. Statistics in Medicine, 26, 3782-3800.

Appendix A: Likelihood For Mixed Ordinal and continuous Data With Missing Responses For individual who observe neither y nor z the likelihood is L = P r(Ry = 0, Rz = 0) = Φ12 (−α1′ X3 , −α2′ X4 ; ρ34 ) where Φ12 (·; ·) is the cumulative bivariate normal distribution with mean   1 ρ34 vector of 0 and Σ34 = . ρ34 1 For individual who observe only y the likelihood is L = P r(Y = y, Ry = 1, Rz = 0) = P r(Y = y, Rz = 0) − P r(Y = y, Ry = 0, Rz = 0) = Pr(ηy−1 ≤ Y ∗ ≤ ηy , Rz∗ < 0) − Pr(ηy−1 ≤ Y ∗ ≤ ηy , Ry∗ < 0, Rz∗ < 0) = Φ12 (ηy − β1′ X1 , −α2′ X4 ; ρ14 ) − Φ12 (ηy−1 − β1′ X1 , −α2′ X4 ; ρ14 ) −Φ123 (ηy − β1′ X1 , −α1′ X3 , −α2′ X4 ; Σ134 ) +Φ123 (ηy−1 − β1′ X1 , −α1′ X3 , −α2′ X4 ; Σ134 )

54

E. Bahrami Samani, M. Ganjali and S. Eftekhari

where Σ134

 1 ρ13 ρ14 =  ρ13 1 ρ34  ρ14 ρ34 1 

and Φ123 (·, ·; ·) is cumulative three-variate normal distribution with mean vector of 0. For individual who observe only z the likelihood is

L = f (z) Pr(Ry = 0 , Rz = 1 |z ) = f (z)[Pr(Ry = 0 |z ) − Pr(Ry = 0 , Rz = 0 |z )] = f (z)[Pr(Ry∗ < 0 |z ) − Pr(Ry∗ < 0 , Rz∗ < 0 |z )] = f (z) [Φ (ψ(X3 , 0, α1 , β)) − Φ12 (ψ(X3 , 0, α1 , β2 ), ψ(X4 , 0, α2 , β; ρ34.2 )] , where ρ

ψ(Xj , a, b, c) = ρij.2 =

a − b′ Xj − σ2j (z − c′ X2 ) q , j = 3, 4, 1 − ρ22j

ρij − ρ2i ρ2j q , i, j = 3, 4, j 6= i. q 2 2 1 − ρ2i 1 − ρ2j

For individual with both z and y observed the likelihood is L = f (z) Pr(Y = y , Ry = 1 , Rz = 1 |z ) = f (z)[Pr(Y = y |z ) − Pr(Y = y , Ry = 0 |z )

− Pr(Y = y , Rz = 0 |z ) + Pr(Y = y , Ry = 0 , Rz = 0 |z )] = f (z)[Pr(ηy−1 ≤ Y ∗ ≤ ηy |z ) − Pr(ηy−1 ≤ Y ∗ ≤ ηy , Ry∗ < 0 |z ) − Pr(ηy−1 ≤ Y ∗ ≤ ηy , Rz∗ < 0 |z ) + Pr(ηy−1 ≤ Y ∗ ≤ ηy , Ry∗ < 0, Rz∗ < 0 |z )] = f (z) [Φ(ψ(X1 , ηy , β1 , β2 )) − Φ(ψ(X1 , ηy−1 , β1 , β2 )) −Φ12 (ψ(X1 , ηy , β1 , β2 ), ψ(X3 , 0, α1 , β); ρ13.2 ) +Φ12 (ψ(X1 , ηy−1 , β1 , β2 ), ψ(X3 , 0, α1 , β); ρ13.2 ) −Φ12 (ψ(X1 , ηy , β1 , β2 ), ψ(X4 , 0, α2 , β2 ); ρ14.2 ) +Φ12 (ψ(X1 , ηy−1 , β1 , β2 ), ψ(X4 , 0, α2 , β2 ); ρ14.2 ) +Φ123 (ψ(X1 , ηy , β1 , β2 ), ψ(X3 , 0, α1 , β2 ), ψ(X4 , 0, α2 , β2 ); Σ134|2 ) +Φ123 (ψ(X1 , ηy−1 , β1 , β2 ), ψ(X3 , 0, α1 , β2 ), ψ(X4 , 0, α2 , β2 ); Σ134|2 )

A latent variable model for ordinal responses

55

where 

1

Σ134|2 =  ρ13.2 ρ14.2

with η0 = −∞, ηc = +∞.

 ρ13.2 ρ14.2 1 ρ34.2  ρ34.2 1

Appendix B: Residual To obtain the correlated bivariate modified residuals when both responses are observed, let 1 2 ˆ− riP = Σ ˆi ) (B.1) ρ (Ki − µ where Ki = (Zi , Yi )′ , ˆ i |Ry = 1, Rz = 1), E(Y ˆ i |Ry = 1, Rz = 1))′ µ ˆi = (E(Z i i i i and ˆρ = Σ



ˆ Vˆ ar(Z|Rz = 1, Ry = 1) Cov(Z, Y |Rz = 1, Ry = 1) ˆ ˆ Cov(Z, Y |Rz = 1, Ry = 1) V ar(Y |Rz = 1, Ry = 1)



,

where Cov(Z, Y |Rz = 1, Ry = 1) = E(ZY |Rz = 1, Ry = 1) − E(Z|Rz = 1, Ry = 1)E(Y |Rz = 1, Ry = 1) = E(ZE(Y |Z, Rz = 1, Ry = 1)) −E(Z|Rz = 1, Ry = 1)E(Y |Rz = 1, Ry = 1) Z = z E(Y |z, Rz = 1, Ry = 1) fZ (z|Rz = 1, Ry = 1) dz

−E(Z|Rz = 1, Ry = 1)E(Y |Rz = 1, Ry = 1) # Z " X z( y P (Y = y|z, Rz = 1, Ry = 1)) fZ (z|Rz = 1, Ry = 1) dz = y



Z

z fZ (z|Rz = 1, Ry = 1) dz

 "X y

#

y P (Y = y|Rz = 1, Ry = 1))

(B.2)

56

E. Bahrami Samani, M. Ganjali and S. Eftekhari

where P (Y = j|z, Rz = 1, Ry = 1) = {[Φ(ψ(X1 , ηj , β1 , β2 )) − Φ(ψ(X1 , ηj−1 , β1 , β2 )] −[Φ12 (ψ(X1 , ηj , β1 , β2 ), ψ(X3 , 0, α1 , β2 ); ρ13.2 ) −Φ12 (ψ(X1 , ηj−1 , β1 , β2 ), ψ(X3 , 0, α1 , β2 ); ρ13.2 ) +[Φ12 (ψ(X1 , ηj , β1 , β2 ), ψ(X4 , 0, α2 , β2 ); ρ14.2 ) −Φ12 (ψ(X1 , ηj−1 , β1 , β2 ), ψ(X4 , 0, α2 , β2 ); ρ14.2 ) −[Φ123 (ψ(X1 , ηj , β1 , β2 ), ψ(X3 , 0, α1 , β2 ), ψ(X4 , 0, α2 , β2 ); Σ134|2 ) −Φ123 (ψ(X1 , ηj−1 , β1 , β2 ), ψ(X3 , 0, α1 , β2 ), ψ(X4 , 0, α2 , β2 ); Σ134|2 )]}/N. P (Y = j|Rz = 1, Ry = 1) = {[Φ(ηj − β1′ X1 ) − Φ(ηj−1 − β1′ X1 )] − [Φ12 (ηj − β1′ X1 , −α1′ X3 ; ρ13 ) −Φ12 (ηj−1 − β1′ X1 , −α2′ X4 ; ρ14 )] +[Φ123 (η1 − β1′ X1 , −α1′ X3 , −α2′ X4 ; Σ134 ) −Φ123 (ηj−1 − β1′ X1 , −α1′ X3 , −α2′ X4 ; Σ134 )]}/N ∗ where N

= 1 − [Φ(ψ(X3 , 0, α1 , β) − Φ(ψ(X4 , 0, α2 , β) −Φ12 (ψ(X3 , 0, α1 , β2 ), (ψ(X4 , 0, α2 , β); ρ34.2 ) +Φ123 ((ψ(X1 , ηj , β1 , β2 ), (ψ(X3 , 0, α1 , β2 ), (ψ(X4 , 0, α2 , β2 ); Σ134|2 )

N



= 1 − [Φ(−α1′ X3 ) + Φ(−α2′ X4 ) − Φ12 (−α1′ X3 , −α2′ X4 ) +Φ123 (η1 − β1′ X1 − α1′ X3 , −α2′ X4 ]

The Pearson residual for ith observation is based on the Pearson goodnessof-fit statistic n X 2 χ2p (Ki , µ ˆi ) χ = i=1

with the following ith component:

ˆ −1 (Ki − µ χ2p (Ki , µ ˆi ) = (Ki − µ ˆi )′ Σ ˆi ). ρi ˆ ρ in (B.1) A Cholesky decomposition is used for finding the square root of Σ and the function integrate in R is used to numerically calculate the integral given in (B.2).

A latent variable model for ordinal responses E. Bahrami Samani Assistant Professor of Statistics Departments of Statistics Shahid Beheshti University GC, Tehran E-mail: ehsan bahrami [email protected]

M. Ganjali Associate Professor of Statistics Departments of Statistics Faculty of Mathematical Sciences Shahid Beheshti University GC, Tehran E-mail: [email protected]

S. Eftekhari Ph.D. Student Departments of Statistics Shahid Beheshti University GC, Tehran E-mail: s [email protected]

Paper received June 2008; revised October 2009.

57