A LATENT TRANSITION MODEL WITH LOGISTIC ... - Springer Link

7 downloads 0 Views 411KB Size Report
E-mail: [email protected]; or to Theodore. Walls ..... likelihood (ML) methods using conventional iterative algorithms such as Newton Raphson and.
PSYCHOMETRIKA—VOL. 72, NO. 3, 413–435 SEPTEMBER 2007 DOI: 10.1007/s11336-005-1382-y

A LATENT TRANSITION MODEL WITH LOGISTIC REGRESSION

HWAN CHUNG MICHIGAN STATE UNIVERSITY

THEODORE A. WALLS UNIVERSITY OF RHODE ISLAND

YOUSUNG PARK KOREA UNIVERSITY Latent transition models increasingly include covariates that predict prevalence of latent classes at a given time or transition rates among classes over time. In many situations, the covariate of interest may be latent. This paper describes an approach for handling both manifest and latent covariates in a latent transition model. A Bayesian approach via Markov chain Monte Carlo (MCMC) is employed in order to achieve more robust estimates. A case example illustrating the model is provided using data on academic beliefs and achievement in a low-income sample of adolescents in the United States. Key words: latent transition, logistic regression, MCMC, academic achievement.

1. Introduction Latent transition (LT) models are designed to estimate the transition rate of individuals among hypothesized latent classes over each successive longitudinal measurement occasion. LT models derive from the family of latent class (LC) models in which manifest items are treated as fallible indicators of unseen states that are subject to measurement error (Goodman, 1974; Clogg & Goodman, 1984). Specifically, LT models are specified to estimate simultaneously the probability that an individual belongs to any of several hypothesized latent classes and the probability of individuals transitioning among latent classes from occasion to occasion. The probability of individuals’ transitions among latent classes is estimated typically by means of a first-order Markov chain. A natural way to extend the LT model is to include manifest covariates designed to explain variation in latent classes and transition rates of individuals among classes over time. These covariates may account for differences in the prevalence of a particular latent class and/or class transition rates over time. Such an approach is desirable when researchers are interested in assessing possible influences on the probability of LC membership and individuals’ transitions among classes. This approach may be extended to the case of unobserved or latent covariates. For example, in many prevention studies, early pubertal timing has been identified as a risk factor for adolescent substance use onset in females (Chung, Park, & Lanza, 2005; Lanza & Collins, This research was partially supported by the National Institute on Drug Abuse Grant 1-R03-DA021639. This research was partially supported by the National Institute on Drug Abuse Grant 1-P50-DA10075, The Methodology Center, The Pennsylvania State University. This research was partially supported by the National Institute of Mental Health funds as part of the Studying Diverse Lives research support program at the Henry A. Murray Research Archive, Institute for Quantitative Science, Harvard University. Requests for reprints should be sent to Hwan Chung, Assistant Professor, Department of Epidemiology, Michigan State University, B 601 West Fee Hall, East Lansing, MI 48824, USA. E-mail: [email protected]; or to Theodore Walls, Assistant Professor, Department of Psychology, University of Rhode Island, 10 Chafee Road, Suite 15W, Kingston, RI 02881, USA. E-mail: [email protected].  c 2007 The Psychometric Society

413

414

PSYCHOMETRIKA

2002). In this case, several items reflecting an adolescent’s perception of her body are important in determining pubertal status at a given age. None of these items are perfect indicators of the covariate pubertal status. Rather, the items must be aggregated in some way to provide an omnibus (or latent) construct of pubertal status. This construct could be crafted in the same way as a typical LC model. Then, the effects of pubertal timing on substance use behavior can be examined by comparing the prevalence of a specific substance use class and the incidence of transitions over time for early and on-time maturing females. Accordingly, the purpose of this paper is to provide a model summary and limited demonstration of the LT model with both observed and latent covariates. The paper pursues two major aims: demonstration and estimation. In terms of demonstration, a worked example with real data is provided. The practical relevance of this model is that in many studies there are many related covariates that could account for important differences in class prevalence or transition rates. Note from the outset, however, the focus here is not at all on substantive findings although we do suggest some speculative interpretations. In terms of estimation, this consideration of the LT model with covariates also covers some estimation topics and possibilities in detail. We present a Bayesian alternative to maximum likelihood (ML) estimation of the LT model and explore problems in drawing Bayesian inferences for the LT model where: (a) latent classes are allowed to vary across time and for each value of the manifest covariate; and (b) the prevalence of the latent classes and their subsequent transitions is allowed to be dependent on the latent covariate. This paper is organized in four parts. First, the logistic LT model with covariates is described. Second, a detailed explanation of estimation strategies with attention to the Bayesian approach and its requisite steps are covered. Third, a synopsis of the achievement data example demonstrates the model implementation and interpretation. Fourth, the paper concludes with general considerations in light of the development of this model. Across the discourse, note that the phrase “latent classes” is used to refer to the LC structure of main interest, that for which transition rates are also estimated. By contrast, the phrase “latent covariate” or “group” is used to refer to the classes for the latent covariate. 2. Model 2.1. A Latent Class Model We begin with the latent class (LC) model for a latent covariate (group). The basic idea of the LC model is that associations among item variables arise from the assumption that the population is composed of different groups. The LC model associates item variables through their group membership based on the assumption of local independence (Lazarsfeld & Henry, 1968). To specify an LC model for a group variable, let W = (W1 , . . . , WK ) be K discrete items measuring the group variable G, where variable Wk takes possible values from 1 to uk . We refer to W as group items. The group variable G represents the group membership ranging from 1 to C. Denote xi = (xi1 , . . . , xip ) to be the p × 1 vector of manifest covariates affecting the prevalence of group membership for the ith individual. Then, the probability of a particular item-response pattern wi = (wi1 , . . . , wiK ) can be written as P [W = wi | xi ] =

C  g=1

γg (xi )

uk K  

I (wik =l) ηkl|g ,

(1)

k=1 l=1

where: • γg (xi ) represents the probability of the ith individual being in group g. • ηkl|g represents the probability of response l to the kth group item, Wk , given a group membership in g. We refer to ηkl|g as the group item-response probability.

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

415

• I (A) is the usual indicator function such that I (A) = 1 if A is satisfied and I (A) = 0 otherwise. In (1) we have assumed local independence, that is, the items W1 , . . . , WK are conditionally independent within each group of g. The probability that the ith individual has group membership g is specified by the logistic link function given as γg (xi ) = P [G = g | xi ] exp{xi β g } = C ,  j =1 exp{xi β j }

(2)

where β g = (β1g , . . . , βpg ) for g = 1, . . . , C − 1 is a p × 1 vector of logistic regression coefficients influencing the log-odds that an individual falls into group g relative to the baseline group C (i.e., β C = 0).

2.2. A Latent Transition Model We use the following notation to estimate the prevalence of latent classes and their transitions over time, conditional on membership in group g. Let Yt = (Y1t , . . . , YMt ) be a vector of M survey items to measure latent class at time t for t = 1, . . . , T , where each variable Ymt can take values from 1 to rm . Correspondingly, let S = (S1 , . . . , ST ) be the LC membership from t = 1 to T , each component taking values from 1 to L. We refer to Ymt and St as the class item and class variable, respectively. Let pt be the number of covariates measured at time t for t = 1, . . . , T and let xi1 = (xi11 , . . . , xip1 1 ) denote a p1 × 1 vector of covariates for individual i that may influence class prevalence at the first time point. Further, let xit = (xi1t , . . . , xipt t ) , measured at time t, denote a pt × 1 vector of covariates that may influence subsequent transitions of classes for t = 2, . . . , T . The probability that the ith subject provides responses yi = (yi1 , . . . , yiT ) conditioned on G = g and (xi1 , . . . , xiT ) would be P [Y1 = yi1 , . . . , YT = yiT | G = g, xi1 , . . . , xiT ] =

L  s1 =1

···

L  sT =1

δs1 |g (xi1 )

T  t=2

τs(t) (xit ) t |st−1 g

rm T  M  

I (y

=j )

imt ρmj t|s , tg

(3)

t=1 m=1 j =1

where: • δs1 |g (xi1 ) represents the probability of the ith individual belonging to class s1 at time 1 given a group membership in g. • τs(t) (xit ) represents the transitional probability of class membership in st at time t given t |st−1 g the previous class membership in st−1 and the group membership in g. • ρmj t|st g represents the probability of response j to the mth item at time t, Ymt , given a class membership in st at time t and a group membership in g. We refer to ρmj t|st g as the class item-response probability. Similar to (1), we assume that Y1t , . . . , YMt are conditionally independent given St for l = 1, . . . , T . The sequence St constitutes a first-order Markov chain for l = 2, . . . , T . The marginal probability of the class membership at the initial time t = 1 would be δs1 |g (xi1 ) = P [S1 = s1 | G = g, xi1 ]   exp xi1 β (1) s1 |g = L   (1)  j =1 exp xi1 β j |g

(4)

416

PSYCHOMETRIKA

for s1 = 1, . . . , L, where β (1) L|g = 0. The transitional probability that the ith individual changes his or her class to St = st from the previous class St−1 = st−1 is, for st = 1, . . . , L and t = 2, . . . , T , τs(t) (xit ) = P [St = st | St−1 = st−1 , G = g, xit ] t |st−1 g   exp xit β (t) st |st−1 g = L   (t)  , j =1 exp xit β j |st−1 g

(5)

where β (t) st |st−1 g = 0 when st = st−1 . Note that class L at t = 1 serves as a baseline in (4), whereas the same class at the previous occasion is the baseline class in (5). Therefore, the coefficient vector β (t) st |st−1 g in (5) can be interpreted as the log-odds of transitioning to class st at time t from the previous class st−1 versus remaining at the same class as the previous st−1 given a group membership in g. 2.3. A Latent Transition Logistic Regression Model Using the specifications from (1) to (5), the contribution of the ith individual to the likelihood function of W and Y1 , . . . , YT is given by P [W = wi , Y1 = yi1 , . . . , YT = yiT | xi , xi1 , . . . , xiT ] =

C 

γg (xi )

g=1

×

L  s1 =1

uk K  

I (wik =l) ηkl|g

k=1 l=1

···

L  sT =1

δs1 |g (xi1 )

T 

τs(t) (xit ) t |st−1 g

t=2

rm T  M  

I (y

=j )

imt ρmj t|s . tg

(6)

t=1 m=1 j =1

Our model for two time periods is depicted in Figure 1. The LT approach has been applied widely. For example, Collins and Wugalter (1992) and Van de Pol and Langeheine (1994) applied LT models without covariates. Van de Pol and Langeheine (1990) incorporated categorical covariates in an LT model using multiple groups. Also, LT regression models formulated by Pfeffermann, Skinner, and Humphreys (1998) and by Vermunt, Langeheine, and B¨ockenholt (1999) took into account continuous or time-varying manifest covariates in the analysis of latent classes and their transitions. They outlined regression models in which manifest covariates may predict individuals’ class membership and their subsequent transitions. Muth´en and Muth´en (2000) applied general growth mixture modeling to examine the latent trajectory as a function of LC membership. In our LT regression model, we

FIGURE 1. A diagram of latent transition regression model.

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

417

extend their regression-based LT models by allowing latent covariates to predict the marginal prevalence of the latent class at the first wave and its subsequent transitions.

3. Bayesian Approaches Based on MCMC 3.1. Motivation In most applications of the LT model, parameters have been estimated by means of maximum likelihood (ML) methods using conventional iterative algorithms such as Newton Raphson and the expectation-maximization (EM) algorithm (Rubin, 1976). When the loglikelihood function is nicely shaped (i.e., not far from quadratic), ML methods have good finite sample properties and provide reliable model selection tools. Upon convergence, standard errors for the standard model parameters are obtained by inverting the Hessian matrix of the loglikelihood function. However, the ML method does not routinely provide standard errors for combinations of parameters. In addition, the normality is often questionable with finite sample size, in which case standard errors may not provide a useful portrayal of uncertainty. For example, in the situation where some ML estimates lie on the boundary, we can fix those parameters in order to obtain standard errors for other parameters. However, the Hessian matrix at the ML solution may be still singular under this restricted model. Given the difficulties associated with inference for LT models, some have adopted a Bayesian approach, simulating random draws of parameters from a posterior distribution using Markov chain Monte Carlo (MCMC) (Robert, 1996; Hoijtink, 1998; Garrett & Zeger, 2000; Garrett, Eaton, & Zeger, 2002; Chung, Flaherty, & Schafer, 2006). The motivation for our approach is that MCMC may produce greater flexibility in model fit assessment and various hypothesis tests without appealing to large-sample approximations (Richardson & Green, 1997; Gelman, Meng, & Stern, 1996). For example, one can easily construct model diagnostics using the posterior predictive check distribution (Rubin & Stern, 1994) and provide standard errors of any desired combination of parameters.

3.2. Basic Strategy For simplicity, we hereafter consider our model in (6) for only two time periods, although the extension to more than two is straightforward. Let η and ρ denote the vectorized item-response probabilities for all η- and ρ-parameters, and let β, β (1) , and β (2) represent the respective vectors for all β-, β (1) -, and β (2) -coefficients defined in logistic functions (2), (4), and (5), respectively. In Bayesian analysis for LT regression models, we are interested in describing the posterior distribution, P [η, ρ, β, β (1) , β (2) | w, y], where w = (w1 , . . . , wn ) and y = (y11 , . . . , yn1 , y12 , . . . , yn2 ) represent the sample, and conditioning on covariates is implicit. Except in trivial examples, this distribution is difficult to portray; however, if the latent membership for each individual was known, the augmented posterior P [η, ρ, β, β (1) , β (2) | w, y, z] would be easy to simulate. Here, the latent z = (z1 , . . . , zn ) where zi is a three-dimensional array indicating   membership to which  the ith individual belongs, so that zi(g,s1 ,s2 ) ∈ {0, 1} and Cg=1 Ls1 =1 Ls2 =1 zi(g,s1 ,s2 ) = 1. That is, if individual i belongs to group g, class s1 at time 1 and s2 at time 2, then zi(g,s1 ,s2 ) equals 1 and 0 otherwise. In the first step of MCMC procedure—the Imputation or I-step—we generate a random draw for each zi(g,s1 ,s2 ) given the observed data w and y and the current parameters. In the second step—the Posterior or P-step—we draw new random values for the parameters from the augmented posterior distribution which regards the latent membership indicator zi(g,s1 ,s2 ) as known. Repeating this two-step procedure creates a sequence of iterates converging to the stationary posterior distribution. This stream of parameter values (after a suitable burn-in period)

418

PSYCHOMETRIKA

is summarized in various ways to produce approximate Bayesian estimates, intervals, tests, etc. Details of this two-step procedure are given in the next subsection. Bayesian methods for LC models have been described by Hoijtink (1998), Garrett and Zeger (2000), Lanza, Collins, Schafer, and Flaherty (2005), and Chung et al. (2006). When covariates are not included in the model, a simple MCMC algorithm may be implemented as an iterative two-step procedure which can be regarded as a form of data augmentation (Tanner & Wong, 1987) or Gibbs sampling (Gelfand & Smith, 1990). When covariates are included in the model, however, because there is no simple conjugate prior family for the coefficients of a multinomial logistic model, we embed steps of a Metropolis algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953) for the β’s into the Gibbs sampler (Robert & Casella, 2004). 3.3. I-Step In the I-step, given current simulated parameter values, we calculate the posterior probabilities of latent membership using θi(g,s1 ,s2 ) = P [G = g, S1 = s1 , S2 = s2 | wi , yi , xi , xi1 , xi2 ] =

P [G = g, S1 = s1 , S2 = s2 , W = wi , Y = yi | xi , xi1 , xi2 ] P [W = wi , Y = yi | xi , xi1 , xi2 ]

(7)

for g = 1, . . . , C and s1 , s2 = 1, . . . , L. Then we draw zi(g,s1 ,s2 ) from Multinomial(1, θi(g,s1 ,s2 ) )   independently for all individuals, and calculate marginal counts zig = Ls1 =1 Ls2 =1 zi(g,s1 ,s2 ) ,   zi(g,s1 ) = Ls2 =1 zi(g,s1 ,s2 ) , and zi(g,s2 ) = Ls1 =1 zi(g,s1 ,s2 ) . Once latent membership has been imputed, the augmented posterior factors into independent likelihood functions for parameters. It is convenient to choose priors that cause all parameter vectors to be a posteriori independent given the observed data w, y, and z. One way to achieve this is to make the priors independent. In situations where the priors are independent, the augmented posterior given latent membership may be expressed as ⎤⎡ ⎤ ⎡ uk rm C  C  K  2  L  M    nkl|g n t|gst ⎦ ⎣P (ρ) ⎦ ηkl|g ρmjmjt|s P [η, ρ, β, β (1) , β (2) | w, y, z] ∝ ⎣P (η) tg g=1 k=1 l=1

⎡ × ⎣P (β)

n  C 

⎤⎡

γg (xi )zig ⎦ ⎣P (β (1) )

i=1 g=1

⎡ × ⎣P (β (2) )

g=1 t=1 st =1 m=1 j =1

n  C  L  L 

n  C  L 



δs1 |g (xi1 )zi(g,s1 )⎦

i=1 g=1 s1 =1



τs(2) (xi2 )zi(g,s1 ,s2 ) ⎦ , 2 |s1 g

(8)

i=1 g=1 s1 =1 s2 =1

  where nkl|g = ni=1 zig I (wik = l) and nmj t|gst = ni=1 zi(g,st ) I (yimt = j ). The choice of priors and the P-step will be discussed below. 3.4. P-Step In the P-step, we draw new random values for all parameters independently from (8). Applying the Jeffreys prior to ηk|g = (ηk1|g , . . . , ηkuk |g ) and ρ mt|st g = (ρm1t|st g , . . . , ρmrm t|st g ), new random values for ηk|g are drawn from the Dirichlet(nk1|g + 1/2, . . . , nkuk |g + 1/2), and ρ mt|st g from the Dirichlet(nm1t|st g + 1/2, . . . , nmrm t|gst + 1/2) independently for k = 1, . . . , K; m = 1, . . . , M; g = 1, . . . , C; st = 1, . . . , L; and t = 1, 2. A draw from Dirichlet distribution can be obtained by normalizing independent gamma variates (Kennedy & Gentle, 1980).

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

419

We impose noninformative uniform priors for β-parameters. Then we apply a Metropolis algorithm in which a candidate for the next parameter set is sampled from a multivariate t distribution, ˜ c ˆ ), β ∼ tν (β, β

(1) β (1) g ∼ tν1 β g , c1 βˆ (1) , g

β (2) s1 g

(2) ∼ tν2 β s1 g , c2 βˆ (2) , s1 g

where c, c1 , and c2 are scalars and degrees of freedom ν, ν1 , and ν2 . The variances βˆ , βˆ (1) , g and βˆ (2) we adopted are the negative inverses of the second derivatives of the logarithm of s1 g ˆ βˆ (1) , and βˆ (2) . Note that this inverse typically exists even if (8) evaluated at the ML estimates β, g s1 g ML estimates for item-response probabilities lie on the boundary of the parameter space, because regression coefficients and item-response probabilities are independent in (8). Using four degrees of freedom (i.e., ν = ν1 = ν2 = 4), we modify the scalar values if the acceptance rate of the simulation is much too high or too low (Gelman, Carlin, Stern, & Rubin, 2004). The candidate (1)c (2)c ˜ c points for regression coefficients β c , β g , and β s1 g are then accepted with probabilities, α(β, β ), (1) (2) (2)c α(β , β (1)c ), and α(β , β ), respectively. These acceptance probabilities are defined as g

g

s1 g

s1 g



 zig ⎞ n  C  c γ (x )|

g i β ˜ β c ) = min ⎝1, ⎠, α(β, γ (x )| ˜ g i β i=1 g=1

  zi(g,s1 )  n  L δs1 |g (xi1 )|β (1)c 

g (1)c α β (1) = min 1, , g , βg δs1 |g (xi1 )|β (1) i=1 s =1 g

1





⎤zi(g,s1 ,s2 ) ⎞ n  L (xi2 )|β τs(2) (2)c 

2 |s1 g s1 g (2)c ⎝ ⎣ ⎦ ⎠. α β (2) s1 g , β s1 g = min 1, (2) τ (x )| s2 |s1 g i2 β (2) i=1 s2 =1 s1 g

Extending this procedure to a data set with missing items in w and y is straightforward. In the I-step, given the current parameter guesses, the posterior probabilities are calculated only from obs to distinguish from the previous θi(g,s1 ,s2 ) given the observed part of w and y. We denote it θi(g,s 1 ,s2 ) obs in (7). The latent membership zi(g,s1 ,s2 ) is drawn from Multinomial(1, θi(g,s ) independently for 1 ,s2 ) all individuals. In the I-step we also generate the missing part of each wi and yi as follows. Suppose the mth class item at time t is missing for individual i who belongs to group g and class st at time t; then yimt is randomly drawn from Multinomial(1, ρ mt|st g ), where ρ mt|st g is a set of samples from the previous iteration. In the P-step, the updated parameters are simulated in the manner described above, treating the imputed values for the missing elements wi and yi as known.

4. A Case Example 4.1. Data Our main focus in this paper is to describe the latent transition model with logistic regression. In order to demonstrate our model, we drew data for a limited case example from the Henry A. Murray Research Archive at the Institute of Quantitative Science, Harvard University, a data

420

PSYCHOMETRIKA

archive for studies on a diversity of topics involving human lives. Specifically, we utilized data from the Prince George’s County Study of Adolescent Development in Multiple Contexts (PGC) (Eccles, Early, Frasier, Belansky, & McCarthy, 1997; Roeser & Eccles, 1998; Wong, Eccles, & Sameroff, 2003). Data were collected from multiple informants, on an economically and ethnically diverse sample of adolescents and their families. 4.2. Participants The sample of 1482 families with adolescent children is unique in that it includes a large proportion (61%) of African-American families and a broad range of socioeconomic status among both African-American and European-American families. The sample is drawn from a county with several different ecological settings including rural, low-income, and high-risk urban neighborhoods. Data collection began in the Fall 1991 as the adolescents entered middle school (beginning of 7th grade). For more information on the study and the context in which the data were collected, see Roeser and Eccles (1998) and Wong et al. (2003), as well as the various web pages covering the study at the Institute for Quantitative Science. 4.3. Model Specification Our overall goal for our data example was to demonstrate the utility of the latent transition model with logistic regression in a research domain in which the need to incorporate latent covariates in longitudinal analysis arises frequently. We adopted an interest in whether students fell into identifiable latent classes based on the combination of their actual performance and academic beliefs, and whether class membership was stable or varying over the time spent in middle school. Hence, the main dependent construct of interest, academic adjustment, was operationalized as a class variable. This construct was based on two class items from school recorded grades in math (MathGrd) and a combination of verbal classes (literature or grammar or both, depending on student class enrollment) (VerbGrd) as well as on responses to three self-administered, student-reported interval scale items reflecting the student’s feelings about the importance of school: Schooling is not so important for kids like me (Impor), I learn more useful things from friends and relatives than I learn in school (NoLearn), and I have so much to do at home that I don’t have enough time to do homework (NoTime). Student self-reported data was available on academic adjustment variables at wave 1 (7th grade; Fall, 1991) and wave 3 (end of 8th grade). We did not consider wave 2, as not all variables of interest were collected at that time. All items were rescaled as binary indicators: 1 = (fail/D/C), 2 = (B/A) for MathGrd and VerbGrd, and 1 = (strongly disagree/disagree/neither agree nor disagree), 2 = (agree/strongly agree) for Impor, NoLearn, and NoTime. A latent covariate, self-concept of academic ability (SCAA), was developed from two items in an interval subscale: How good are you in math? (Math) and How good are you in other school subjects? (OtherSub). Responses fall into ordered categories ranging from “not at all good” to “very good.” These items were rescaled to be binary values of 1 = (below median) and 2 = (above median) using a median split. The same scoring procedure was employed to create binary data indicating two groups for SCAA: students who believe they are good in school subjects (positive belief), and students who believe they are not (negative belief). We were further interested in determining the influence of selected manifest covariates on SCAA, academic adjustment, and students’ class transitions in academic adjustment over time. Age (Age, adolescent actual age), gender (Sex, 0 = male, 1 = female), and racial ethnic status (Race, 0 = European-American or other, 1 = African-American) were used as manifest covariates. The overall model is shown in Figure 5, which corresponds exactly in form to the conceptual model discussed earlier, shown in Figure 1.

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

421

After removing a small number of individuals with missing covariates, there were n = 1460 students studied in the analysis. Attrition in the original study was 29%, hence, study variables were missing at wave 3 for this percentage, although information on covariates was available (Roeser & Eccles, 1998). The overall percentage of missing cells for this analysis was 12%, due to the attrition in study variables at wave 3. Our algorithm allows missing values to occur among the items, provided that they are missing at random (Rubin, 1976). Students in the study were predominantly age 12 (69%) and 13 (25%), although a handful of subjects were 11 (2.5%) or 14 (2.1%) at wave 1. The racial ethnic composition was 38% European-American or other race, and 63% African-American, and the gender was 51% male and 49% female. Previous reports utilizing these data are extensive, however we found that work by Roeser and Eccles (1998) was useful in considering our findings. In this work, note that the academic self-concept scale was highly reliable as reported using Cronbach’s alpha (α = .78 at wave 1 and α = .82 at wave 3). A selective consideration of the results indicates that SCAA declines over time across the sample, that academic grades were higher for girls, and that European-Americans received higher grades than African-Americans. Note that Roeser and Eccles operationally define the term academic adjustment as a global construct including several behavioral and psychological determinants. This differs from our above operationalization of the term, which is based on fewer and less-studied survey items that we selected from the data archive. Moreover, note that the Roeser and Eccles academic adjustment construct in fact includes SCAA items, which we treat as indicator items in defining latent groups, separately from the items used in defining our academic adjustment latent classes. 4.4. Model Selection This study uses the LT regression model to examine stage-sequential patterns of students’ academic adjustment over the period of middle adolescence. In particular, this study investigates whether hypothesized membership in latent groups based on students’ SCAA leads to differential prevalence of latent class membership of academic adjustment. In addition, whether or not the transition rate of latent classes of academic adjustment, conditioned on SCAA, is: (a) stable across the conjoint latent classes over time; and (b) well explained by observed demographic covariates, is evaluated. The first and most crucial step in LT regression analysis is to choose an appropriate number of classes. As shown in Bandeen-Roche, Miglioretti, Zeger, and Rathouz (1997), the model (6) has an appealing marginalization property: averaging over an arbitrary distribution for the covariates produces a conventional LT model with the same number of classes and identical values for item-response probabilities. Therefore, we do not need to consider covariates when selecting the number of classes. Using Bayesian techniques as suggested by Garrett and Zeger (2000), specification of the estimated probabilities of major response patterns under the three-, four-, and five-class models were compared using a posterior predictive  check distribution (PPCD). This comparison is based on the following process. Let θr = n−1 ni=1 Pr(Yi = yr ) represent the marginal probability of the rth response pattern from the suggested LT model, where yr denotes the rth response pattern of items, and let θrobs be the observed probability of the rth response pattern from the saturated model, which provides perfect fit given the incomplete items. Because of missing observations in the items, we calculated the expected observed response pattern probability θˆrobs by the data augmentation algorithms for incomplete categorical data sets under the saturated multinomial model, which imposes no restrictions on the types of relationships among Y1 , . . . , YT (Schafer, 1997). By comparing the probability θr to an expected observed probability θˆrobs , we can see how well a model reproduces the observed data. To do this, θˆr was  calculated at each of 5000 iterations as given in (6) for r = 1, . . . , M m=1 rm . Then the average of these values and a 95% Bayesian confidence interval was taken (Garrett & Zeger, 2000). Figure 2

422

PSYCHOMETRIKA

θˆrobs

1121211111

2221222221

1111111111

1222212222

2221122222

1121211211

1122211111

2221122212

2222212221

2222122222

2222122212

response pattern

1121211212

1121211222

2222122221

2222222211

2222222221

1122211221

2221222211

2221222212

1122211212

2221222222

2222222212

2222222222

1122211222

θˆrobs

response pattern

θˆrobs

θˆrobs

2221222112

1222222212

1121111112

1121111212

1122222222

2221212212

1121212111

2121222222

2222211222

1222212212

2122222222

response pattern

2221222111

1111111211

1121211221

1122111222

2222212222

1111111222

2222212212

2221122211

1122211211

1122212222

1121111111

2222212211

1222222222

θˆrobs

response pattern

2211122211

2121111111

1121212222

1121112212

1122111211

2121211222

2121211212

2221212211

1221212222

1122222212

2221122111

response pattern

2121212222

1122221222

2122211212

2222122211

1122221211

2222222122

1122212211

2222221211

1122111111

2222211212

2211222222

1111211222

1222211212

θˆrobs

response pattern

FIGURE 2.

Estimated observed (θˆrobs ) and expected prevalence (θˆr , from three-class (◦), four-class (), and five-class (×) model) and their 95% confidence intervals of the 72 most prevalent patterns.

displays the estimated and observed probabilities for the 72 most prevalent patterns with three-, four-, and five-class models. With five items at each wave, there is it total of 210 = 1024 possible response patterns. We should be concerned primarily with the most prevalent patterns in order to capture the most significant features in the distribution of students. Among the 952 patterns not shown here, the probability of the most prevalent pattern was less than 0.32% and most of the observed probabilities were within the 95% confidence intervals of predictive probabilities. For each graph in Figure 2, the observed probability θˆrobs (the solid horizontal line) was centered, and its 95% interval (dashed horizontal line) was scaled to be equal. Each of six plots presents 12 response patterns. A vertical line represents the 95% confidence interval for the estimated probability θˆr . There are three types of vertical lines for each pattern, corresponding to the three-, four-, and five-class models. The response patterns were predicted in order of observed pattern prevalence. For example, the first plot in the first row contains the 12 most prevalent patterns (response pattern 2222222222 was most prevalent (expected observed frequency = 94)

423

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

θˆrobs

1222212222

2221222221

1111111111

2221122222

1121111111

2222212222

2221122212

2222212221

1121211222

2222122212

1121211212

response pattern

1122211111

2222122222

2222122221

1122211221

1122211212

2222222211

2222222221

2221222212

2221222211

2222222212

2221222222

1122211222

2222222222

θˆrobs

response pattern

θˆrobs

2221122111

1122111111

1222211212

1111111211

1121111212

1122211112

1122221211

1222222212

2221212212

1122222222

1222212212

response pattern

1122111222

1111111222

2222212212

1121211221

2122222222

1122211211

2221222111

2221122211

1122212222

1121211111

1121211211

2222212211

1222222222

θˆrobs response pattern

2222211111

2222221222

2121111111

2211122212

2222211212

2122211222

2221222112

1121212222

1122221222

2121211222

2221212211

1122212211

2121222222

2222221211

1221212222

2222211222

2222222122

2211222222

1122222212

1111211222

1121212111

1121111112

2121212222

response pattern

2122211212

θˆrobs

θˆrobs

response pattern

FIGURE 3.

Estimated observed (θˆrobs ) and expected prevalence (θˆr , from the full model (◦), the ρ-restricted model (), and the restricted model (×)) and their 95% confidence intervals of the 72 most prevalent patterns.

and 2222122221 was least prevalent (expected observed frequency = 14.4)), and the second plot in the third row contains the 12 least prevalent patterns (expected observed frequency of 2211122211 was 4.8). Among the 24 most prevalent patterns (see two plots in the first row in Figure 2), seven estimates from the three-class model fell outside the boundaries (over the two plots, seven vertical bars on the “◦” indicated estimates do not overlap with the dotted horizontal lines), whereas only three estimates from four- and five-class models fell outside. However, over the six plots, most patterns were predicted almost equally by the four- and five-class models. Therefore, we selected the four-class structure over the five-class structure under the principle of parsimony. One difficulty when the ρ-parameter depends on both waves and groups is when the definition of academic adjustment becomes different for each combination of different waves and groups. Thus, using the same PPCD procedure for Figure 2, we assessed the three following LT models: the full model where ρ-parameters are allowed to vary across groups and waves, and δ- and

424

PSYCHOMETRIKA

τ -parameters are allowed to vary across groups; the ρ-restricted model where only ρ-parameters are constrained to be equal across groups and waves; and the restricted model where all ρ-, δ-, and τ -parameters are constrained to be equal. Out of the 72 most prevalent patterns, we found that the full model was slightly better than the ρ-constrained model: the intervals for the 65 patterns of θˆr overlapped with the intervals of their observed response probabilities, but 57 intervals did under the ρ-restricted model. The restricted model fitted much worse: only 34 intervals of θˆr covered the intervals of their observed response probabilities. Even more troubling was that, out of the 24 most prevalent patterns, 20 intervals did not cover the intervals of observed probabilities. Using a ρ-restricted model, even if it ignores some of the fine details of the item distributions for groups or waves, we gain much in terms of interpretability and facilitation in comparisons of prevalence and its transition rates across groups. Another reason for preferring the ρ-restricted model is that its ρ-parameters are stable across groups and waves. Comparing ρ-parameters estimated from the ρ-restricted model to those from the full model in Figure 4, the relative stability of the parameter estimates gives us confidence that this constrained model captures the most essential features of the class structure. Figure 4 displays the point and 95% interval estimates for the unconstrained ρ-parameters for all waves and groups superimposed over those for the constrained ρ-parameters. For each class, the constrained estimate (the solid line) and interval endpoints (the dotted lines) appear as horizontal lines, and the dots with vertical lines represent the unconstrained estimates with their intervals. A large majority of the intervals from the constrained model overlap the estimates from the unconstrained model, indicating very little difference from the unconstrained model. Model identification is imperative in estimating the parameters of an LT model. As discussed by McHugh (1956) and Goodman (1974), within the frequentist framework, a model is identifiable if the Hessian matrix is nonsingular. We fixed some parameters in order to make the remaining parameters identifiable: under the equality constraints on the ρ-parameter, 17 transitional probabilities were close to 0 or 1; hence, those values are fixed to 0 or 1 a posteriori and not estimated. These constrained parameters are presented in Table 4. The estimated transitional probabilities in Table 4 will be discussed later. We found that the Hessian matrix in the ML solution was still singular under this model. For investigating identifiability within the Bayesian framework, we used a simple ad hoc test: we generated 10 samples of size 5000 from the model with specified parameter values, applied the MCMC algorithm to each sample, and verified that it supplied estimates close to the original specified parameter values. Comparing estimates from each sample with the original values, the largest absolute differences are (0.03, 0.03, 0.02, 0.03, 0.11) for (η, ρ, γ , δ, τ ), respectively. Finally, Age, Sex, and Race were added as manifest covariates (i.e., x and x1 ) predicting for SCAA groups and initial class membership of the fitted four-class model. For transitions of the classes, we used Age and Race as manifest covariates (i.e., x2 ) (see Figure 5). Sex was not included in x2 due to sparse data. In a typical application of MCMC, the analyst runs the algorithm for a burn-in period to eliminate dependence on the starting values. After the burn-in, averaging the output stream of simulated parameters produces estimates for the posterior means and variance (Tierney, 1994). Various methods for choosing the length of the series and the burnin period have appeared in the literature (Geweke, 1992; Roberts, 1992; Gelman & Rubin, 1992; Best, Cowles, & Vines, 1995). We use time-series plots and autocorrelation functions to visually monitor the behavior of output values from MCMC and to confirm our choice for the length of the burn-in period. After burn-in of 2000, we typically use 5000 cycles of MCMC to estimates posterior means and variances. We also run five independent chains of 5000 √ after burn-in and ˆ ˆ use Gelman’s R (Gelman √ et al., 2004) diagnostic to assess convergence. The R is near 1 for ˆ all estimates—range of R is (0.9995, 1.0032)—showing that approximate convergence was reached. Results of our final model are reported below.

425

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK Class 1 MathGrd

Impor

VerbGrd

NoLearn

NoTime

1.0

0.5

0

Group 1 2 Wave 1

1

2

1

3

2

1

1

2

1

3

2

1

1

2

1

3

2

1

1

2

1

3

2

1

1

2 3

Class 2 MathGrd

Impor

VerbGrd

NoLearn

NoTime

1.0

0.5

0

Group 1 2 Wave 1

1

2

1

3

2

1

1

2

1

3

2

1

1

2

1

3

2

1

1

2

1

3

2

1

1

2 3

Class 3 MathGrd

Impor

VerbGrd

NoLearn

NoTime

1.0

0.5

0

Group 1 2 Wave 1

1

2

1

3

2

1

1

2

1

3

2

1

1

2

1

3

2

1

1

2

1

3

2

1

1

2 3

Class 4 MathGrd

Impor

VerbGrd

NoLearn

NoTime

1.0

0.5

0

Group 1 2 Wave 1

1

2 3

1

2 1

1

2 3

1

2

1

1

2 3

1

2 1

1

2 3

1

2 1

1

2 3

FIGURE 4. Constrained (horizontal lines) and unconstrained (vertical lines) estimates and 95% intervals for class item-response probabilities.

4.5. Parameter Estimates Readers may wish to evaluate applications that include some reviews of the basic latent transition model and introductory literature before proceeding to interpret this extended model; possible resources for this review include Collins and Wugalter (1992), Lanza, Flaherty, and Collins (2003) and Martin, Velicer, and Fava (1996). The hypothesized relations among the items, latent variables, and covariates for our example are depicted in Figure 5. In this figure, an unlabeled arrow from one variable to another, as in A → B, does not indicate a causal relationship, but rather a statistical association, parametrized in terms of the conditional distribution of B given A. Most of the specific estimates are described in the text for easier comparison. Working from left to right in Figure 5, consider estimates for the latent group G for SCAA. Three parameters are of substantive interest here: η-, γ -, and β-parameters. Their 95% confidence intervals are presented in Table 1. First, the η-parameters

426

PSYCHOMETRIKA

FIGURE 5. A diagram of LT regression model with estimates.

reflect the probability that students believe their SCAA is above average for mathematics and for other subjects for each latent group: negative SCAA or positive SCAA. For example, among students in the negative belief group, the probability of responding “good” or “very good” is .233 to Math and .230 to OtherSub, however, by contrast, the values are .840 and .839 for students in the positive belief group. This indicates that the negative group is composed of students who adhere to a set of beliefs suggesting that their academic ability is below average. Second, the γ -estimates for each SCAA group reflect the proportion of the population estimated to reside in each latent group. Also, 50.4% of students are expected to belong in the negative belief group and 49.6% to the positive belief group. Third, point estimates and their 95% intervals for β-coefficients pertaining to each covariate are reported in Table 1. There is a 95% probability that the true value of the coefficients for Sex and Race lies within the interval which includes 0. However, the interval for Age does not cover 0. The exponentiated coefficients may be interpreted as estimated odds ratios. For instance, the estimated odds of belonging to negative class versus positive class are exp(.429) = 1.536 times higher for each unit increase in Age. There is a 95% probability that the true value of odds ratio lies within the range of exp(.209) = 1.232 and exp(.665) = 1.944. Based on the identified SCAA group, we now investigate whether membership in an SCAA group leads to different prevalence of class membership S1 and S2 for academic adjustment. The estimated ρ-parameters are presented in the right side of Figure 5. The values represent the probability of responding at the higher end of the response scale for the set of academic adjustment items (i.e., A/B to MathGrd and VerbGrd and agree/strongly agree to Impor, NoLearn, and NoTime), for a given latent class. Inspection of these values, combined with judgment based on the prevailing theory about academic beliefs, led to the adoption of the class names shown in Figure 5 (i.e., Class 1 = Low/Low, Class 2 = Low/High, Class 3 = High/Low and Class 4 = High/High). In these class names, the first part refers to school grades and the second part, after the slash, refers to the school importance variables. Values close to zero or one indicate good measurement of the latent class. The obtained values suggest that the school grade items

Negative Positive

SCAA group

.233 [.175, .289] .840 [.790, .891]

.230 [.177, .283] .839 [.784, .897]

Item-response prob. (η) Math OtherSub .504 [.449, .564] .496 [.436, .551]

Group prevalence (γ ) .429 [.209, .665] 0

Age

Race .149 [−.168, .456] 0

Regression coefficients (β) Sex −.124 [−.420, .168] 0

TABLE 1. Bayesian estimates of probabilities of responding “good” or “very good,” and their 95% confidence intervals for Self-Concept of Academic Ability (SCAA) (positive belief group is baseline).

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

427

428

PSYCHOMETRIKA

TABLE 2. The probabilities of responding “A/B” to MathGrd and VerbGrd and “agree/strongly agree” to Impor, NoLearn, and NoTime for each latent class of Academic Adjustment.

Class 1 2 3 4

MathGrd .141 [.097, .189] .182 [.144, .221] .733 [.665, .792] .872 [.836, .905]

Item-response probability (ρ) VerbGrd Impor NoLearn .181 [.128, .235] .249 [.201, .297] .806 [.749, .859] .949 [.925, .970]

.427 [.355, .498] .939 [.909, .967] .563 [.486, .636] .993 [.979, 1.00]

.163 [.107, .218] .666 [.619, .713] .187 [.128, .243] .695 [.653, .741]

NoTime .263 [.192, .333] .859 [.818, .899] .405 [.338, .471] .799 [.763, .833]

are relatively good measures given a four-class model, and all five items combined support a meaningful interpretation for each class. Their 95% confidence intervals are given in Table 2. The δ-parameters in Figure 5 show the probability of a student belonging to a class at waves 1 (t = 1) and 3 (t = 2 because wave 2 is omitted), given each SCAA group. Based on their 95% confidence intervals, given in Table 3, marginal prevalences of academic adjustment are different across SCAA groups at both waves. The prevalence of Class 4 (High/High), that is, high grades and high perceived school importance, is larger for the positive academic belief group at both waves (review the “Negative” and “Positive” columns in S1 and S2 , for the High/High row in Figure 5). Other classes, however, are smaller for the positive academic ability group, reflecting a main effect for perceived academic adjustment regardless of age, gender, and race. For both SCAA groups, membership in the “Low” attitude classes (Classes 1 and 3) is increasing over time, and classes of “High” attitudes (Classes 2 and 4) are decreasing. The estimated regression coefficients β (1) given in Table 3 show how age, gender, race, and SCAA groups influence the prevalence of latent classes of academic adjustment at wave 1. Age effects for Classes 1 and 2 in the positive belief group are in the same direction, such that older students are more likely to be in classes involving “Low” grades (Classes 1 and 2) than in Class 4. Female students are less likely to be in Class 1, 2, or 3, than in Class 4 for both SCAA groups. African-American students are more likely to be in Classes 1 or 2 than in Class 4 for both SCAA groups. Interestingly, Age, Sex, and Race do not explain the difference in the prevalence of class membership between Class 3 and Class 4. The effects of these covariates are displayed graphically in Figure 6. For each of the four pairs of graphs in Figure 6, the left panel represents the marginal prevalence of the class for the negative SCAA group. The right panel shows the marginal prevalence of the class for the positive SCAA group. The prevalence in classes including low-grade status (i.e., Classes 1 and 2) increases with age, but prevalence in those including high grades (i.e., Classes 3 and 4) decreases. African-American males are most likely to be in the classes involving low grades, regardless of age and SCAA, and they are least likely to be in the classes involving high grades, except Class 3 for the positive SCAA group, where African-American females are least prevalent. In keeping, European-American females are most likely to be in the high-grade classes (except Class 3 for the positive SCAA group, where European-American males are most prevalent), but they are least likely to be in the classes involving low grades, regardless of age and SCAA group status. The prevalence of Class 4 has the steepest decreasing trend with age for the positive SCAA group, but the decline is similar for all covariates. In the positive SCAA group, Class 4 is more prevalent consistently among females than males. By contrast, Class 1 is more prevalent among males than females in the negative SCAA group. Table 4 gives transitional probabilities of moving from one class at wave 1 to another class at wave 3 conditional on SCAA group membership. These estimates are not reported in Figure 5 for space reasons, but their conceptual position is indicated with an asterisk (τ ∗ ). The diagonal

Positive

Negative

SCAA group

Class 1 2 3 4 1 2 3 4

Class prevalence (δ) Wave 1 Wave 3 .221 [.173, .282] .345 [.281, .414] .438 [.374, .515] .314 [.240, .393] .203 [.151, .258] .223 [.161, .280] .138 [.092, .195] .118 [.072, .163] .046 [.022, .075] .167 [.120, .213] .256 [.205, .304] .136 [.089, .196] .112 [.075, .155] .210 [.147, .272] .586 [.525, .635] .488 [.421, .555]

Regression coefficients (β (1) ) Age Sex .542 [−.126, 1.169] −1.785 [−2.688, −.907] .548 [−.030, 1.210] −1.243 [−2.156, −.335] −.114 [−.835, .561] .414 [−.712, 1.588] 0 0 .945 [.056, 1.868] −2.321 [−4.641, −.952] .706 [.254, 1.162] −1.678 [−2.368, −1.090] .375 [−.368, 1.033] −1.275 [−2.567, −.358] 0 0

TABLE 3. The prevalence of latent classes at waves 1 and 3, the estimated regression coefficients β (1) , and their 95% confidence intervals.

Race 1.321 [.631, 1.989] 1.396 [.589, 2.116] −.087 [−.960, .796] 0 1.305 [.037, 3.211] 1.358 [.766, 2.006] −.760 [−1.607, .090] 0

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

429

430

PSYCHOMETRIKA Class 1 Negative SCAA .90

prob.

.90

Age

.00 11

12

13

prob.

Positive SCAA

Age

.00

14

11

12

13

14

Class 2 Negative SCAA .90

prob.

.90

Age

.00 11

12

13

prob.

Positive SCAA

Age

.00

14

11

12

13

14

Class 3 Negative SCAA .90

prob.

.90

Age

.00 11

12

13

prob.

Positive SCAA

Age

.00

14

11

12

13

14

Class 4 Negative SCAA .90

prob.

.90

Age

.00 11

12

13

prob.

Positive SCAA

Age

.00

14

11

12

13

14

FIGURE 6. Prevalence of latent classes across Age, Sex, and Race at wave 1: European-American male (—–); European-American female (- - -); African-American male (· · ·); and African-American female (-·-·-).

values in Table 4 represent the probability of remaining in the same class as wave 1, at wave 2. The difference in the estimated transitional probabilities between SCAA groups and their corresponding 95% confidence intervals are also reported in Table 4. A difference greater than 0 indicates a higher transitional probability in the positive SCAA group than in the negative SCAA group. Figure 7 shows the probabilities of diagonal values in Table 4 across age and race. Using 95% confidence intervals, no difference between males and females was detected. These plots indicate that the age trends are similar for all groups and classes, but fairly dramatic differences due to ethnicity exist in Class 2 for both SCAA groups, reflecting higher stability (probability of remaining in the previous class) for African-American students in this “higher” school importance and lower academic performance class.

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

431

TABLE 4. Transitional probabilities (τ ) for each group of Self-Concept of Academic Ability and their difference with 95% confidence intervals.

Class at wave 1 1 2 3 4

Class at wave 1 1 2 3 4 Class at wave 1 1 2 3 4

1 .774 [.629, .899] .397 [.265, .521] .000 .000

Negative SCAA Class at wave 3 2 3 .226 [.101, .371] .000 .603 [.479, .735] .000 .000 .791 [.617, .909] .000 .447 [.231, .668]

Positive SCAA Class at wave 3 1 2 3 1.00 .000 .000 .473 [.286, .626] .527 [.374, .714] .000 .000 .000 .597 [.365, .809] .000 .000 .243 [.162, .321] Positive SCAA—Negative SCAA Class at wave 3 1 2 3 .226 [.101, .371] −.226 [−.371, −.101] .000 .075 [−.188, .287] −.075 [−.287, .185] .000 .000 .000 −.193 [−.446, .067] .000 .000 −.205 [−.441, .039]

4 .000 .000 .209 [.091, .383] .553 [.332, .769]

4 .000 .000 .403 [.191, .635] .757 [.679, .838]

4 .000 .000 .193 [−.068, .445] .205 [−.039, .441]

5. Discussion In our analyses using LT analysis, we were able to identify a plausible LC structure for academic adjustment from survey items related to both school grades and attitudes toward school. In addition, we investigated the effect of age, gender, race, and SCAA on the identified latent classes for academic adjustment. Supporting evidence for this latent structure was provided not only by quantitative measures, but by its straightforward substantive interpretability and relative stability over the two-year period. Based on a four-class model, the two latent SCAA groups had differential class prevalence and different patterns of transition in academic adjustment. These prevalences and transition patterns were further explored through the use of logistic regression on manifest covariates. In the prevailing statistical methods for latent variables, such as factor analysis and structural equation modeling, the goal is to examine if observed variables can be explained largely in terms of a small number of latent variables and to relate the latent variables in structural equations. Many research questions, however, require methods that can examine the relationships among individuals reflected in their membership in pragmatically meaningful groups. Latent transition analysis is now becoming more popular among social scientists because the model can identify population subgroups and their transitions over time. LT models examine the structure of individuals’ item responses and form discrete classes with their similar item-response profiles. This categorical approach to the latent variable allows us to make no further assumptions about the nature of the classes (e.g., nominal or ordinal). In other words, individuals classified into a specific class have both dimensional and configurational similarity in the distribution of item-response probabilities. This type of information cannot be attained easily or at all using methods chiefly designed for continuous latent variables.

432

PSYCHOMETRIKA Class 1 1.0

prob.

Negative SCAA 1.0

Age

0.0 11

12

13

prob.

Positive SCAA

Age

0.0

14

11

12

13

14

Class 2 prob.

Negative SCAA

prob.

1.0

Positive SCAA

1.0

Age

0.0 11

12

13

Age

0.0

14

11

12

13

14

Class 3 prob.

Negative SCAA

prob.

1.0

Positive SCAA

1.0

Age

0.0 11

12

13

Age

0.0

14

11

12

13

14

Class 4 prob.

Negative SCAA

prob.

1.0

Positive SCAA

1.0

Age

0.0 11

12

13

Age

0.0

14

11

12

13

14

FIGURE 7. Probabilities of remaining in the previous classes (diagonal τ ’s) across Age and Race: European-American (—–) and African-American (- - -).

We analyzed the PGC data using Bayesian estimation and found that the Bayesian inference by MCMC is an attractive alternative to ML: Bayesian methods open up new possibilities for model checking and fit assessment via the posterior predictive check distribution, and they still provide interval estimates without difficulty when hypothesis tests involving the combinations of parameters are necessary to address specific research questions. However, new challenges, such as subjectivity of the priors or a label switching problem may arise. The priors should be chosen with care. Most of the previous literature on MCMC for LT models emphasizes conjugate priors that are symmetric with respect to the labeling of classes. These tend to simplify computations and may have been thought to avoid subjective biases. With conjugate priors, however, the labeling problem becomes more acute, because class labels may permute during the simulation run. A

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

433

variety of strategies for resolving problems related to label switching are reviewed by Chung, Loken, and Schafer (2004). We used time-series plots to visually inspect if the label switching occurred; but this problem was not present in the current study. However, this problematic aspect of the LT model has yet to be fully explored and should be the focus of future research. Although using a Bayesian approach via MCMC may overcome some of the shortcomings of ML, the ML methods remain efficient ways to analyze LT models in many studies: they have good finite sample properties and provide reliable model selection tools. While not giving up the concept of overall LRT, procedures such as the parametric bootstrap (Collins, Fidler, Wugalter, & Long, 1993; Langeheine, Pannekoek, & Pol, 1996) and the adjusted LRT (Lo, Medell, & Rubin, 2001) have been developed for testing the number of classes. These procedures are easy to access via computer software (e.g., Mplus (Muth´en & Muth´en, 2004) and Latent GOLD (Vermunt & Magidson, 2005)). In addition, ML estimation is able to statistically compare nested models. Although many substantive interpretations could be pursued in light of the model, we mention only a few in this presentation. Note that, as indicated in the Introduction, our exploration was intended to achieve model demonstration goals and to provide practitioners with a well-worked and plausible example. While we offer some speculative interpretations, enough to aid the reader in following the model, several criticisms could be levied from a substantive perspective. For example, educational psychologists may question whether the placement of self-concept as a covariate is proper, perhaps favoring a joint regulation of academic adjustment and self-concept, or a converse ordering. Nonetheless, we do point out a few parallels to research in related areas that we think are worthy of future consideration. Across model estimates, a clear academic advantage in class membership was found for European-Americans, and, in some cases, for females. Conversely, African-American males reflected higher prevalence in groups and classes that reflect lower school importance and academic self-concept. Age accounted for a decrement or increase in class prevalence differentially for positive and negative self-concept of academic ability groups, although we note that the 11and 15-year old groups were very small. Students with important statuses not included in our model, such as being retained in grade or, less likely, advanced in grade, could have been overrepresented in these groups. The presence of several classes in which African-American students’ views are inconsistent with their actual performance is reminiscent of findings in correlational studies suggesting an unrealistic “optimistic bias” in nonmajority US ethnicity middle school students (Lopez, Little, Oettingen, & Baltes, 1998; Walls & Little, 2005). The presence of inconsistent school importance/performance views is in accord with an increasing body of literature on ethnic differences in academic beliefs, finding stark inconsistencies in academic-relevant views and actual performance. At the same time, the higher prevalence of European-Americans in the high academic performance, but low school importance class, for the positive academic selfconcept group, could reflect the presence of a talented adolescent group whose members may hold attitudes that school is boring or not challenging. Hence, membership in any group–class combination that is not high/high is less than optimal in terms of alignment of student beliefs with academic goals in the school context. An advantage of this model is that it enables review of specific differences in outcomes due to the conjoint analysis of covariate groups and latent classes. Correlational and regression-based analyses can only reflect covariate influences in relation to the full distribution of each study variable. By contrast, this model enables an explanation of meaningful group membership and rates. Finally, these findings are consistent with previous reports on these data, such as Roeser and Eccles (1998), but are quite different in the specific type of inferences that they support, with respect to group membership and transition being the primary objective of inference. Some specific limitations to interpretation of our model as deployed, include the limited previous work in scale construction of the school importance variables used in our academic adjustment classes. We also note the possible challenges to interpretation of our classes, given

434

PSYCHOMETRIKA

that the constitution of the two middle classes (i.e., Classes 2 and 3), has not been studied previously. Finally, we note our divergence from the construct operational definitions of the source study and the incorporation of rather ad hoc theoretical positions as needed to deploy a new model demonstration with these data. Researchers working in the educational achievement area are cautioned to interpret these results carefully amidst extant literature that is primarily based on correlational and regression-based analyses. Consistently, we look forward to new opportunities that may arise from future applications based on this demonstration, especially those that may serve to better explicate determinants of academic adjustment in middle school. In summary, this presentation provided thorough coverage of the specification, estimation, and application of the latent transition model with logistic regression. We provided a limited demonstration using real data to help elucidate the model. Our hope is that substantive practitioners will be able to utilize this demonstration in their research and that it may provide some graphical, model development and model interpretation guidance. To this end, we have made the algorithms available in this manuscript and the software written in R is available at the following web address: www.uri.edu/walls/software.htm. In addition, this model may be estimable with a range of other software packages, and, in this circumstance, readers will likely still find our model demonstration and discussion of parameter interpretation useful.

References Bandeen-Roche, K., Miglioretti, D.L., Zeger, S.L., & Rathouz, P.J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92, 1375–1386. Best, N., Cowles, M., & Vines, S. (1995). Coda: Convergence diagnostics and output analysis software for Gibbs sampler output, version 0.3 (Technical Report). MRC Biostatistics Unit. Chung, H., Flaherty, B.P., & Schafer, J.L. (2006). Latent class logistic regression: Application to marijuana use and attitudes among high school seniors. Journal of the Royal Statistical Society, Series A, 169, 723–743. Chung, H., Loken, E., & Schafer, J.L. (2004). Difficulties in drawing inferences with finite-mixture models: A simple example with a simple solution. The American Statistician, 58, 152–158. Chung, H., Park, Y., & Lanza, S.T. (2005). Latent transition analysis with covariates: Pubertal timing and substance use behaviors in adolescent females. Statistics in Medicine, 24, 2895–2910. Clogg, C.C., & Goodman, L.A. (1984). Latent structure analysis of a set of multidimensional contingency tables. Journal of the American Statistical Association, 79, 762–771. Collins, L.M., Fidler, P.L., Wugalter, S.E., & Long, J.L. (1993). Goodness-of-fit testing for latent class models. Multivariate Behavioral Research, 28, 375–389. Collins, L.M., & Wugalter, S.E. (1992). Latent class models for stage-sequential dynamic latent variables. Multivariate Behavioral Research, 27, 131–157. Eccles, J.S., Early, D., Frasier, K., Belansky, E., & McCarthy, K. (1997). The relation of connection, regulation, and support for autonomy to adolescent’s functioning. Journal of Adolescent Research, 12, 263–286. Garrett, E.S., Eaton, W.W., & Zeger, S.L. (2002). Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: A latent class model approach. Statistics in Medicine, 21, 1289–1307. Garrett, E.S., & Zeger, S.L. (2000). Latent class model diagnosis. Biometrics, 56, 1055–1067. Gelfand, A.E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409. Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian data analysis (2nd ed.). London: Chapman & Hall. Gelman, A., Meng, X.L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733–807. Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511. Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. 169–193). Oxford: Oxford University Press. Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231. Hoijtink, H. (1998). Constrained latent class analysis using the Gibbs sampler and posterior predictive p-values: Applications to educational testing. Statistica Sinica, 8, 691–711. Kennedy, W.J., & Gentle, J.E. (1980). Statistical computing. New York: Marcel Dekker. Langeheine, R., Pannekoek, J., & van de Pol, F. (1996). Bootstrapping goodness-of-fit measures in categorical data analysis. Sociological Methods and Research, 24, 492–516.

HWAN CHUNG, THEODORE A. WALLS, AND YOUSUNG PARK

435

Lanza, S.T., & Collins, L.M. (2002). Pubertal timing and the onset of substance use in females during early adolescence. Prevention Science, 3, 69–82. Lanza, S.T., Collins, L.M., Schafer, J.L., & Flaherty, B.P. (2005). Using data augmentation to obtain standard errors and conduct hypothesis tests in latent class and latent transition analysis. Psychological Methods, 10, 84–100. Lanza, S.T., Flaherty, B.P., & Collins, L.M. (2003). Latent class and latent transition analysis. In J.A. Schinka, & W.F. Velicer (Eds.), Handbook of psychology (pp. 663–685). Hoboken, NJ: Wiley. Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. Boston: Houghton Mifflin. Lo, Y., Medell, N.R., & Rubin, D.B. (2001). Testing the number of components in a normal mixture. Biometrika, 88, 767–778. Lopez, D.F., Little, T.D., Oettingen, G., & Baltes, P.B. (1998). Self-regulation and school performance: Is there optimal level of action-control? Journal of Experimental Child Psychology, 70, 54–74. Martin, R.A., Velicer, W.F., & Fava, J.L. (1996). Latent transition analysis to the stages of change for smoking cessation. Addictive Behaviors, 21, 67–80. McHugh, R.B. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21, 331–347. Metropolis, M., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., & Teller, E. (1953). Equations of state calculations by fast computing machine. Journal of Chemical Physics, 21, 1087–1091. Muth´en, B.O., & Muth´en, L.K. (2000). Intergrating person-centered and variable centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882–891. Muth´en, L.K., & Muth´en, B.O. (2004). Mplus user’s guide (3rd ed.). Los Angeles: Muth´en & Muth´en. Pfeffermann, D., Skinner, C., & Humphreys, K. (1998). The estimation of gross flows in the presence of measurement error using auxiliary variables. Journal of the Royal Statistical Society, Series A, 161, 13–32. Richardson, S., & Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society, Series B, 59, 731–792. Robert, C.P. (1996). Mixtures of distributions: Inference and estimation. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 441–464). London: Chapman & Hall. Robert, C.P., & Casella, G. (2004). Monte Carlo statistical methods (2nd ed.). New York: Springer-Verlag. Roberts, G.O. (1992). Convergence diagnostics of the gibbs Sampler. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. 775–782). Oxford: Oxford University Press. Roeser, R.W., & Eccles, J.S. (1998). Adolescent’s perceptions of middle school: Relation to longitudinal changes in academic and psychological adjustment. Journal of Research on Adolescence, 8(1), 123–158. Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581–592. Rubin, D.B., & Stern, H.S. (1994). Testing in latent class models using a posterior predictive check distribution. In A. von Eye, & C.C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 420–438). Thousand Oaks, CA: Sage. Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall. Tanner, W.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–550. Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Annals of Statistics, 22, 1701– 1762. Van de Pol, F., & Langeheine, R. (1990). Mixed Markov latent class models. In C.C. Clogg (Ed.), Sociological methodology 1990 (pp. 213–247). Oxford: Blackwell. Van de Pol, F., & Langeheine, R. (1994). Discrete-time mixed Markov models. In A. Dale, & R.B. Davies (Eds.), Analyzing social and political change: A casebook of methods (pp. 170–197). London: Sage. Vermunt, J.K., Langeheine, R., & B¨ockenholt, U. (1999). Discrete-time discrete-state latent Markov models with timeconstant and time-varying covariates. Journal of Educational and Behavioral Statistics, 24, 179–207. Vermunt, J.K., & Magidson, J. (2005). Latent GOLD 4.0 user’s guide. Belmont, MA: Statistical Innovations. Walls, T.A., & Little, T.D. (2005). Relations among personal agency, motivation, and school adjustment in early adolescence. Journal of Educational Psychology, 97(1), 23–31. Wong, C.A., Eccles, J.S., & Sameroff, A. (2003). The influence of ethnic discrimination and ethnic identification on African American adolescents’ school and socioemotional adjustment. Journal of Personality, 71(6), 1197–1232. Manuscript received 25 AUG 2005 Final version received 24 OCT 2006 Published Online Date: 16 MAY 2007

Suggest Documents