Fuzzy Clusterwise Growth Curve Models via Generalized Estimating ...

1 downloads 0 Views 218KB Size Report
Models via Generalized Estimating. Equations: An Application to the. Antisocial Behavior of Children. Heungsun Hwang and Yoshio Takane. McGill University ...
MULTIVARIATE BEHAVIORAL RESEARCH, 42(2), 233–259 Copyright © 2007, Lawrence Erlbaum Associates, Inc.

Fuzzy Clusterwise Growth Curve Models via Generalized Estimating Equations: An Application to the Antisocial Behavior of Children Heungsun Hwang and Yoshio Takane McGill University, Montreal, QC, Canada

Wayne S. DeSarbo Pennsylvania State University, Pennsylvania

The growth curve model has been a useful tool for the analysis of repeated measures data. However, it is designed for an aggregate-sample analysis based on the assumption that the entire sample of respondents are from a single homogenous population. Thus, this method may not be suitable when heterogeneous subgroups exist in the population with qualitatively distinct patterns of trajectories. In this paper, the growth curve model is generalized to a fuzzy clustering framework, which explicitly accounts for such group-level heterogeneity in trajectories of change over time. Moreover, the proposed method estimates parameters based on generalized estimating equations thereby relaxing the assumption of correct specification of the population covariance structure among repeated responses. The performance of the proposed method in recovering parameters and the number of clusters is investigated based on two Monte Carlo analyses involving synthetic data. In addition, the empirical usefulness of the proposed method is illustrated by an application concerning the antisocial behavior of a sample of children.

The Growth Curve Model (GCM; Grizzle & Allen, 1969; Khatri, 1966; Potthoff & Roy, 1964; Rao, 1965) has been a useful method for the analysis of longitudinal data. The traditional GCM may be viewed as a constrained fixedCorrespondence concerning this article should be addressed to Heungsun Hwang, Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, QC, H3A 1B1, Canada. Email: [email protected]

233

234

HWANG ET AL.

effects multivariate regression model on which prespecified basis functions such as polynomials are imposed (Takane & Hunter, 2001). This method enables one to discern the average trajectory of repeated assessments on a response variable over time. At the same time, it helps to examine the effects of time-invariant explanatory variables on the temporal trajectory. The traditional GCM is different from its random-effects counterpart (e.g., Laird & Ware, 1982; Vonesh & Carter, 1987), which is equivalent to latent growth curve models (Meredith & Tisak, 1990) or a two-level hierarchical linear model where time points are nested within an individual (Bryk & Raudenbush, 1992; Goldstein, 1987). The random-effects GCM permits modeling the average or intraindividual trajectory of repeated measures over time in addition to interindividual variations in the trajectory. On the other hand, the fixed-effects GCM focuses on modeling the average trajectory of change. The main focus of this paper is in the traditional, fixed-effects GCM (hereafter GCM is used to indicate the fixed-effects model). In GCM, the population covariance matrix of repeated measurements is typically assumed unconstrained or unstructured, indicating that all covariances have to be estimated (Rao, 1965; Reinsel & Velu, 1998; von Rosen, 1991). When the number of time points becomes large or the number of repeated assessments is unequal across individuals, the unconstrained covariance matrix is computationally unattractive, resulting in less reliable parameter estimates (Duncan, Duncan, Hops, & Stoolmiller, 1995; Laird & Ware, 1982). Alternatively, the population covariance matrix may be conceived as being constrained in a particular manner (e.g., Khatri, 1988). If the constrained covariance matrix is consistent with the data, more reliable estimates of the covariances may be obtained, reducing the number of the covariances to be estimated. In practice, however, the population covariance structure of repeated measures data is unlikely to be known in advance. Thus, it is difficult to decide which constrained structure is tenable for the data. This often leads to misspecification of the population covariance matrix, resulting in the distortion of statistical inferences (Diggle, Heagerty, Liang, & Zeger, 2002). One effective approach for dealing with this problem is to estimate the parameters of GCM based on Generalized Estimating Equations (GEE; Liang & Zeger, 1986; Zeger & Liang, 1986). GEE is a multivariate extension of the quasi-likelihood method (Wedderburn, 1974; McCullagh, 1983), which offers asymptotically consistent parameter estimates even if the covariance structure of repeated measures data is not correctly specified. As such, the combined approach of GCM and GEE (GCM-GEE) allows the analysis of longitudinal data under specifications of a variety of different population covariance structures (e.g., Cheong, Fotiu, & Raudenbush, 2001; Hwang & Takane, 2005; Raudenbush, 1995). Despite the merit of relaxing distributional assumptions, the GCM-GEE approach estimates parameters by aggregating the data across sample individuals

CLUSTERWISE GROWTH CURVE MODEL

235

under the assumption that all individuals belong to a single, homogenous population. Hence, this method is not ideally suited for investigating heterogeneity involving different trajectories of change over time for different groups of individuals. Such group- or cluster-level heterogeneity has been well studied in a variety of areas. For instance, two distinct trajectories of change in antisocial behavior, labeled ‘life-course persistent’ and ‘adolescent-limited’, have been previously recognized (Moffitt, 1993). Bagozzi (1982) found that consumer belief structures tended to differ across different groups of customers or market segments. Moreover, accounting for cluster-level heterogeneity has been shown to be important in modeling brand choice decisions (Kamakura, Kim & Lee, 1996). Furthermore, in the situations where this heterogeneity is present, aggregate sample analyses that ignore such heterogeneity are likely to yield biased results (cf. DeSarbo & Cron, 1988; Jedidi, Jagpal, & DeSarbo, 1997; Muthén, 1989). In this paper, the GCM-GEE methodology is generalized to account for cluster-level heterogeneity in trajectories inherent to repeated measures data and in their relations to other variables. Specifically, GCM-GEE is combined with fuzzy clustering (Bezdek, 1974a; Bezdek, Coray, Gunderson, & Watson, 1981; Dunn, 1974; Hathaway & Bezdek, 1993; Wedel & Steenkamp, 1989) in a unified framework. Fuzzy clustering is a form of overlapping clustering in which individuals are partially assigned to more than one cluster. There are a number of reasons for combining GCM-GEE with fuzzy clustering. First, GCM-GEE is a limited-information method that concerns only the mean and covariance structure of repeated assessments. As such, it is infeasible to apply a full information or likelihood-based method such as finite mixture linear models (e.g., DeSarbo & Cron, 1988; DeSarbo, Wedel, Vriens, & Ramaswamy, 1992) to GCM-GEE. On the other hand, the fuzzy clustering algorithm is well suited for GCM-GEE because it does not rely on the likelihood. Moreover, the fuzzy clustering approach offers advantages over traditional non-overlapping clustering methods such as k-means. For example, the partial classification of fuzzy clustering appears more attractive than the hard classification of nonoverlapping clustering methods because it is often difficult to identify clear boundaries between clusters (Wedel & Kamakura, 1998). Also, the fuzzy clustering algorithm is computationally more efficient because less dramatic changes tend to occur in estimating cluster membership values (McBratney & Moore, 1985). Furthermore, the partial memberships for any given set of individuals derived from fuzzy clustering allow additional insights into the phenomenon under study by having the ability to identify fractional membership in the derived clusters which nonoverlapping clustering methods cannot address (Everitt, Landau, & Leese, 2001). The proposed fuzzy clusterwise method is comparable to finite mixture latent growth curve models or Latent Growth Mixture Models (LGMM) which aim to identify clusters of relatively homogeneous individuals; and the parameters

236

HWANG ET AL.

of the latent growth curve model are estimated within each of those clusters (Bauer & Curran, 2003; Li, Duncan, & Duncan, 2001; Muthén & Shedden, 1999; Nagin, 1999). Under the assumption of multivariate normality of observed variables conditional upon cluster membership, LGMM seeks to maximize a likelihood function typically by using the Expectation-Maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977). Under the additional assumption of correct model specification, LGMM provides asymptotically efficient parameter estimates. Moreover, a number of information criteria such as AIC and BIC can be calculated in LGMM, which may help determine the number of clusters. However, the latent growth mixture modeling approach has limitations. For instance, it requires the correct specification of the covariance structure of repeated measures for valid estimation. As stated earlier, in most social science applications, it is difficult to determine the correct form of the population covariance structure in advance. Moreover, finite mixture approaches usually require very large samples for parameter estimation. It is known that at least a sample size of M.M C 1/=2 in each cluster is required for obtaining a positive definite covariance matrix within each cluster, where M is the number of variables (Wedel & Kamakura, 1998). Furthermore, the estimation procedures of finite mixture approaches typically suffer from slow convergence and are computationally intensive (DeSarbo, Grewal, & Hwang, 2006). Note, the multiple optima feature of the likelihood function in LGMM makes it very important to reestimate the model many times using different starting values each time (e.g., 50–100 sets of starting values for complex models) in order to ensure that the global solution is obtained (Hipp & Bauer, 2006). In practice, this procedure may be an additional source of intensive computation in LGMM. Lastly, the standard likelihood ratio statistic for nested model comparisons cannot be used in finite mixture models because it does not follow an asymptotic chi-square distribution due to the violation of the regularity properties of the likelihood function (cf. McLachlan & Peel, 2000). Although information criteria have been frequently used for model selection, they are also based on the same regularity properties as the standard likelihood ratio statistic (Wedel & Kamakura, 1998). On the other hand, the proposed method does not require the correct form of the population covariance matrix due to its adoption of GEE for parameter estimation. As will be seen in a later section regarding Monte Carlo analyses with synthetic data, the proposed method seems to perform well in small samples; it is less afflicted by the problem of slow convergence; and the computational burden of the proposed method is relatively minor compared to the finite mixture approach. This paper is organized as follows. The next two sections briefly review the GCM-GEE approach and fuzzy clustering to facilitate the derivation of the proposed method. Monte Carlo simulation studies are carried out to evaluate

CLUSTERWISE GROWTH CURVE MODEL

237

the performance of the proposed method and an empirical application to antisocial behavior in children illustrates the usefulness of the proposed method. The final section is devoted to discussing several additional aspects of the proposed method as well as directions for future research.

THE GROWTH CURVE MODEL VIA GENERALIZED ESTIMATING EQUATIONS Let yi denote a T by 1 vector of repeated measurements of individual i (i D 1; : : : ; N ) on a single response variable across T time points. Let xi denote a P by 1 vector of time-invariant explanatory variables for individual i . The traditional GCM (Potthoff & Roy, 1964) is given by: yi D ABxi C ei ;

(1)

where A is a T by D.D • C 1/ matrix of known basis functions such as polynomial functions of order •, which represents a temporal pattern of change in the response variable (see Biesanz, Deeb-Sossa, Papadakis, Bollen, & Curran (2004) for diverse examples of basis functions), B is a D by P matrix of unknown coefficients, and ei is a T by 1 vector of errors for individual i . It is assumed that ei  N.0; † i /. In Equation (1), the population covariance matrix † i is unconstrained, so that it contains T .T 1/=2 distinct covariances and T variances among repeated assessments. As stated earlier, the unconstrained covariance matrix is often computationally less attractive than constrained ones. GEE can be utilized to estimate model parameters of Equation (1) under a variety of constrained covariance structures. When GEE is applied to GCM, the covariance matrix of yi is replaced by a “working” covariance matrix. Let Vi denote the “working” covariance matrix for yi , defined by: Vi D ’Ri .™/;

(2)

where ’ is a scale parameter, and Ri .™/ is a working correlation matrix for yi . It is assumed that Ri .™/ is a function of a vector of ™, which is the same across all individuals. The Ri .™/ can take various forms, depending on what kind of covariance structures are construed in Equation (1). The covariance structures commonly available in GEE are: independence, exchangeable, auto-regressive, and unconstrained. The independence covariance structure represents no correlations among repeated responses. The exchangeable (or compound symmetry) covariance structure indicates that all covariances between repeated measurements are

238

HWANG ET AL.

identical, i.e., corr.yi t ; yi t 0 / D ™ for all t ¤ t 0 (t D 1; : : : ; T ). If time intervals are unequal (i.e., arbitrary number of observation times) across individuals or observations are not balanced across time points, the exchangeable covariance structure may be appropriate (Duncan et al., 1995; Li, Maddalozzo, Harmer, & Duncan, 1998). The auto-regressive covariance structure regards a correlation as a function of the time between two repeated measurements. The first order autoregressive (AR-1) structure (Jöreskog, 1970; Guttman, 1954) is typically adopted for time-series data (Duncan et al., 1995; Liang & Zeger, 1986), 0 which is equivalent to corr.yi t ; yi t 0 / D ™jt t j for all t ¤ t 0 . The unconstrained structure leads to T .T 1/=2 distinct correlations among repeated assessments. The unconstrained covariance structure appears suitable when the number of repeated assessments is small and is equal across individuals (Duncan et al., 1995; Ghisletta & Spini, 2004; Liang & Zeger, 1986). In GEE, the “working” correlation matrix need not be correctly specified in order to obtain asymptotically consistent parameter estimates because it relies only on correct specification of the marginal expectation or mean model, treating covariances as nuisance parameters. If the working correlation matrix is correctly specified, however, the resultant parameter estimates are efficient. In addition, GEE provides asymptotically consistent covariance estimates of the parameter estimates even if the covariance matrix is mis-specified.

FUZZY CLUSTERING Traditional clustering methods such as k-means are based on the assumption of hard classification. That is, individuals are to belong to only one of mutually exclusive clusters with rigidly defined boundaries. In contrast, fuzzy clustering is a set of classification algorithms that use the fuzzy set theory (Zadeh, 1965) which permits individuals to belong totally, partially, or not at all to a cluster due to the vagueness of cluster boundaries. More specifically, two assumptions underlie fuzzy clustering: (1) an individual can be assigned to more than one cluster where her degree of membership in a cluster lies between 0 and 1, and (2) the sum of the memberships of an individual across all clusters must be equal to one (Wedel & Kamakura, 1998). The gradual degree of membership indicates the strength of the association between an individual and a cluster. Let uci denote a membership value for individual i in the c-th cluster (c D 1; : : : ; C ), which satisfies the two assumptions, i.e., 0  uci  1 and PC cD1 uci D 1. Let m denote the prescribed fuzzy weight scalar, often called the ‘fuzzifier’ (Bezdek, 1974a) which influences the degree of fuzziness of the solution. The main purpose of fuzzy clustering algorithms is to estimate the fuzzy membership of individuals in a prescribed number of clusters. The algo-

CLUSTERWISE GROWTH CURVE MODEL

239

rithms typically arrive at partitioning individuals by minimizing the following criterion: ¥D

C X N X

um ci SS.yi

Bc xi /;

(3)

cD1 i D1

with respect to Bc and uci , where SS.Z/ indicates the Sum of Squares of Z. When xi D 1 in Equation (3), this is equivalent to solving fuzzy c-means algorithm (Bezdek, 1974a; Dunn, 1974). On the other hand, when xi includes other predictor variables, Equation (3) becomes the criterion for fuzzy c-regression or fuzzy clusterwise regression (Hathaway & Bezdek, 1993; Wedel & Steenkamp, 1989). It is easy to see that when C D 1, Equation (3) is equivalent to the least squares criterion for ordinary regression analysis because uci D 1 for all individuals. In fuzzy clustering, the value of the fuzzy weight m is typically decided a priori. In principle, any value of m can be chosen from the range of 1 < m < 1. However, when m ! 1, all memberships converge to 0 or 1 (hard classification). When m ! 1, all memberships approach 1=C (Arabie, Carroll, DeSarbo, & Wind, 1981). Although there have been some heuristic procedures to determine the value of m (e.g., McBratney & Moore, 1985; Okeke & Karnieli, 2006; Wedel & Steenkamp, 1989), there does not seem to exist any proven formal way of deciding m. In practice, m D 2 is the most popular choice in fuzzy clustering (Bezdek, 1981; Gordon, 1999; Hruschka, 1986; Wedel & Steenkamp, 1991). In fuzzy clustering, a number of so called cluster validity measures (Bezdek, 1981; Roubens, 1982) are used to decide how many clusters are inherent to the data or equivalently to determine C . Roubens (1982) recommended the Fuzziness Performance Index (FPI) and the Normalized Classification Entropy (NCE) as the most useful cluster validity measures for fuzzy clustering. The FPI and NCE are given by: FPI D 1

.C  PC

1/=.C

1/;

where PC is the Partition Coefficient (Bezdek, 1974b), defined as PC D PC 2 cD1 uci , and

1 N

PN

i D1

NCE D PE= log C;

P where PE is the Partition Entropy (Bezdek , 1974b), defined as PE D N1 N i D1 PC cD1 uci log uci . These measures indicate the separation status of clusters, i.e., how well the derived clusters are separated from each other. The smaller the values of FPI and NCE are, the more distinctly separated the clusters are from each other. Thus, the appropriate number of clusters should result in smaller values of FPI and NCE. Note that NCE and FPI can be used when C > 1.

240

HWANG ET AL.

Moreover, the changes in the values of Equation (3) or other related measures (e.g., the average adjusted R2 across clusters) may be graphically examined against the different number of clusters in order to determine the optimum number of clusters (Wedel & Steenkamp, 1989). The number of clusters may be chosen as the ‘elbow’ point in the trajectory of the value of Equation (3) over clusters, beyond which no substantial changes in the value occur. Besides these heuristics, in practice, nonstatistical criteria for evaluating the usefulness and relevance of clusters (e.g., cluster size, interpretability, etc.) should also be considered for deciding the value of C (Arabie & Hubert, 1994; Wedel & Kamakura, 1998).

FUZZY CLUSTERWISE GROWTH CURVE MODELS VIA GENERALIZED ESTIMATING EQUATIONS In spite of its data analytic flexibility, GCM-GEE methodology is designed for an aggregate sample analysis of repeated measures data. As such, the GCM-GEE approach is not appropriate for testing hypotheses about developmental trajectories for qualitatively distinct (unknown) population subgroups with respect to such models. In this section, thus, an extension of the GCM-GEE methodology is proposed that enables investigators to test these hypotheses regarding cluster-level heterogeneity for repeated measures data. Specifically, the method proposed herein aims to identify fuzzy clusters of individuals that are relatively homogeneous and, simultaneously, obtain the parameters of GCM-GEE within each of the clusters. This problem of combining GCM-GEE and fuzzy clustering in a single framework is equivalent to minimizing the following criterion: ®D

C X N X

um ci SS.yi

ABc xi /;

(4)

cD1 i D1

with respect to Bc , Vci , and uci . The proposed method subsumes the GCM-GEE method as a special case where there is only one cluster (C D 1). We shall call this proposed methodology Fuzzy Clusterwise GCM-GEE (FCGG) hereafter. To minimize Equation (4) for parameter estimation, two main optimization steps are repeated until convergence: In the first step, the GCM parameters for each cluster (Bc and Vci ) are estimated given the fuzzy membership parameter (uci ); and, in the second step, the fuzzy membership parameter is estimated for the fixed GCM parameters. The detailed description of the optimization procedures is provided in the Appendix. The proposed algorithm monotonically decreases the value of Equation (4), which in turn, is also bounded from below. The algorithm is therefore convergent.

CLUSTERWISE GROWTH CURVE MODEL

241

However, it does not guarantee that the convergence point is always the global minimum. To safeguard against local minima, two alternative procedures may be operationalized: First, the fuzzy c-means algorithm can be applied to the data and the resultant memberships are used as rational initial starts for uci . Secondly, the estimation procedures can be repeated with alternative random initial starts for uci , the criterion values after convergence are compared, and the solution associated with the smallest criterion value is selected as the final one. In FCGG, the overall fit of a hypothesized model can be measured by:

FIT D 1

C X N X

um ci SS.yi

ABc xi /

cD1 i D1

C X N X

:

(5)

um ci SS.yi /

cD1 i D1

This index indicates how much variance of the response variable is accounted for by the hypothesized (multiple-cluster) model. The values of FIT range from 0 to 1. The larger this value, the more variance is explained. FIT is a function of Equation (4) that summarizes the discrepancies between the model and the data. When C D 1 and T D 1, this index is equivalent to R2 . Prior to clusterwise or disaggregate analyses, FCGG requires the specification of its model components such as the overall temporal pattern of repeated assessments (A) and the structure among population covariances over clusters (Vci ). In the situation where no substantive theory or no previous experience is available for the model specification, the functional form of A may be decided by applying GCM-GEE (or equivalently FCGG with C D 1) to the aggregate sample as A is assumed to be identical across clusters in FCGG. Similarly, the aggregate-level structure of population covariances is first selected by GCMGEE, and in turn it may be used for subsequent clusterwise or disaggregate analyses under the assumption that the covariance structure is invariant over clusters (i.e., Vci D Vi ). This is a common practice in LGMM, where one imposes equality constraints on variance components over clusters (Bauer & Curran, 2003). Although this invariance assumption appears somewhat restrictive, it is acceptable in FCGG as the consistency of parameter estimates is not afflicted by a potential misspecification of Vci . To select the final forms of A and Vi , the GCM-GEE method fits to the aggregate sample a broad range of growth curve models with different temporal patterns of change as well as different covariance structures. Because of the absence of the actual likelihood, however, it is infeasible to use likelihood-based information criteria such as AIC and BIC for model selection in GCM-GEE. Alternatively, Pan (2001)’s QIC (the Quasi-likelihood under the Independence model Criterion) may be used, which is a modification to AIC in GEE. In QIC,

242

HWANG ET AL.

the likelihood function in AIC is replaced by the (log) quasi-likelihood function obtained under R.™/ D I and the penalty term is adjusted. The QIC is defined as: QIC D

2Q.B/ C 2trace.€

1

‰/;

(6)

where Q.B/ is the value of the quasi-likelihood under the independence assumption computed by the GEE estimator of B based on any working correlations, and € and ‰ are the matrices of the naive and robust covariance estimates of B, respectively (refer to the Appendix for the computation of these matrices). The second term in Equation (6) reflects the degree of the differences between the naive and robust covariance estimates of B, which indicates how much the working covariance matrix is consistent with the true covariance matrix (Zeger & Liang, 1986). For GCM-GEE, specifically, the quasi-likelihood under the independence assumption is given by: N X

.yi

ABxi /0 .yi

ABxi /=2

(7)

i D1

(see McCullagh & Nelder, 1989, p. 326). Like the AIC model selection heuristic, a model that minimizes QIC is regarded as the most appropriate one among fitted models. Note that QIC is used to decide on the temporal pattern of change and the covariance structure of repeated measures when C D 1. To decide the number of clusters inherent to the data, like other fuzzy clustering methods, FCGG also utilizes cluster validity measures such as FPI and NCE. Moreover, the (inverted) elbow point in the trend of the FIT values over clusters may be examined as FIT is proportional to the value of Equation (4) and is easier to interpret than Equation (4). However, no formal investigation on the performance of FIT has been implemented. (This will be tested with synthetic data in the next section.) FCGG adopts m D 2 as the default value of the fuzzy weight, which is the most popular choice for fuzzy clustering methods.

MONTE CARLO SIMULATIONS Two Monte Carlo simulation studies were conducted to investigate the performance of the proposed method. The first study focuses on the recovery of clusterwise coefficient parameters and cluster memberships in small samples. The second study evaluates the usefulness of the cluster validity measures (FPI, NCE, and FIT) for the proposed method. For both studies, T D 4, P D 3, C D 2, and m D 2 were selected. Moreover, Bc and † ci were chosen as follows:

CLUSTERWISE GROWTH CURVE MODEL



† 1i

 3 :7 :3 B1 D ; 1 :4 :2 2 3 3 6:6 3 7 7; D6 4:4 :6 3 5 :2 :4 :6 3

B2 D

† 2i

 2

1 :5

:6 :2

2 6:3 2 6 D4 :2 :3 :1 :2

 :8 ; :4 3

2 :3 2

7 7: 5

243

(8)

As seen above, the structure of †ci corresponds with an AR-1 process. Furthermore, A is specified as a matrix of orthogonal polynomials of order 1 across all clusters which represents a linear change of repeated assessments over time: 0  1 1 1 1 AD 3 . 1 1 3 In the first simulation study, sample sizes varied as follows: N D 20, 50, and 100. For each sample size, xci was drawn from the uniform distribution once and was considered fixed. The first element of xci was always one, representing an intercept term in GCM-GEE. Then, yci was drawn from a multivariate normal distribution N.ABc xci ; †ci /. The size of each cluster was equal to N=2. Five hundred Monte Carlo samples were generated for each sample size. A MATLAB code was written to implement the estimation procedures of the proposed method.1 Given xci and A, each Monte Carlo sample was analyzed under the specifications of four different covariance structures (Independence, Exchangeable, AR-1, and Unconstrained). The proposed method was found to converge in all cases with the rational starts for the membership parameters. Table 1 provides the mean parameter estimates and the absolute values of relative biases (100  [estimate parameter]/parameter) obtained under different sample sizes and covariance structures. Table 1 also provides the mean congruence coefficient (Tucker, 1951) between parameters and their estimates as a global measure of parameter recovery. The congruence coefficient is calculated as follows: Let ™ and ¡ denote the vectors of the parameters and the estimates obtained from a single Monte Carlo sample, p p respectively. Then, the congruence coefficient is (™0 ¡/=. ™0 ™ ¡0 ¡/. The congruence coefficient indicates the degree of similarity between parameters and their estimates. Furthermore, Table 1 presents the average rate of misclassification of individuals under different sample sizes and covariance structures. In order to compute the misclassification rate per sample, the estimated fuzzy membership values of individuals were converted into binary membership values, that is, 1 was assigned to the cluster associated with the highest fuzzy membership value and 0 to the other clusters. The parameter estimates obtained under the four different covariance structures were akin to each other within the same sample size, thus yielding quite 1 The

MATLAB code for FCGG may be obtained by contacting the first author.

244 TABLE 1 Mean Parameter Estimates and Absolute Values of Relative Biases Obtained from the Simulation Study N D 20 Indep PAR

EST

3.0 .7 .3 1.0 .4 .2 1.0 .6 .8 .5 .2 .4 CC MR

2.96 .70 .36 .93 .47 .21 .98 .64 .88 .50 .12 .38

Exch

Bias

EST

1.3 .0 20.0 7.0 17.5 5.0 2.0 16.7 10.0 .0 40.0 5.0 .82 10.36

2.96 .77 .28 .93 .47 .25 .99 .60 .84 .53 .08 .38

N D 50 AR-1

Bias

EST

1.3 10.0 6.7 7.0 17.5 25.0 1.0 .0 5.0 6.0 60.0 5.0 .83 10.61

2.98 .76 .29 .95 .46 .20 .96 .78 .79 .53 .14 .34

Uncon

Bias

EST

0.7 8.6 3.3 5.0 15.0 .0 4.0 30.0 1.3 6.0 30.0 15.0 .83 10.88

2.96 .75 .32 .90 .47 .21 1.01 .72 .79 .53 .04 .35

CC D Average congruence coefficient. MR D Average misclassification rate (%).

Indep

Bias

EST

1.3 7.1 6.7 10.0 17.5 5.0 1.0 20.0 1.3 6.0 80.0 12.5 .81 13.45

3.07 .58 .31 1.02 .36 .18 1.11 .52 .73 .38 .31 .44

Bias

2.3 17.1 3.3 2.0 10.0 10.0 11.0 13.3 8.75 24.0 55.0 10.0 .91 4.84

Exch EST

Bias

3.07 2.3 .68 2.9 .22 26.7 1.03 3.0 .36 10.0 .19 5.0 1.14 14.0 .49 18.3 .71 11.3 .38 24.0 .31 55.0 .44 10.0 .91 4.74

N D 100 AR-1

EST

Bias

3.06 2.0 .70 .0 .18 40.0 1.02 2.0 .38 5.0 .16 20.0 1.13 13.0 .50 16.7 .73 8.8 .37 26.0 .32 60.0 .43 7.5 .91 4.67

Uncon EST

Bias

3.05 1.7 .64 8.6 .29 3.3 1.02 2.0 .37 7.5 .16 20.0 1.13 13.0 .49 18.3 .75 6.3 .36 28.0 .33 65.0 .45 12.5 .91 4.75

Indep EST

Bias

3.03 1.0 .69 1.4 .28 6.7 1.01 1.0 .39 2.5 .18 10.0 1.02 2.0 .61 1.7 .78 2.5 .47 6.0 .20 .0 .40 .0 .95 2.17

Exch EST

Bias

3.01 0.3 .70 .0 .31 3.3 1.01 1.0 .38 5.0 .19 5.0 1.03 3.0 .60 .0 .78 2.5 .45 10.0 .21 5.0 .41 2.5 .95 2.20

AR-1 EST

Bias

3.03 1.0 .67 4.3 .27 10.0 1.01 1.0 .40 .0 .20 .0 1.00 .0 .62 3.3 .80 .0 .47 6.0 .20 .0 .41 2.5 .95 2.22

Uncon EST

Bias

3.04 4.0 .69 1.4 .27 10.0 1.00 .0 .39 2.5 .19 5.0 1.04 4.0 .61 1.7 .76 5.0 .44 12.0 .21 5.0 .43 7.5 .95 2.31

CLUSTERWISE GROWTH CURVE MODEL

245

similar congruence coefficients (Table 1). This suggests that the misspecification of the population covariance structure did not seem to influence substantially the recovery of parameters in the proposed method. When N D 50, under all different covariance structures, the mean congruence coefficient became larger than .90, which is a conventional rule of thumb criterion as an acceptable degree of congruence (Mulaik, 1972). Moreover, the average rate of misclassification was lower than 5% under all covariance structures. However, the estimates of many parameters involved rather large relative biases. On the other hand, when N D 100, the mean congruence coefficient was around .95 and the estimates of the parameters had small relative biases under every covariance structure. Moreover, the average rate of misclassification was about 2% under all covariance strictures. Thus, the proposed method seems to recover parameters very well with sample sizes of 100. The second simulation study investigated how well the three cluster validity measures identified the true number of clusters in the data. For this study, one hundred Monte Carlo samples were generated from the same normal distributions as in the first study with N D 100. The proposed method was applied to analyze the samples under varying numbers of clusters from 1 to 10. The mean values of the three measures across the different number of clusters are exhibited in Figure 1. It is apparent that the trend of the mean FIT values had

FIGURE 1 The mean values of FIT, FPI, and NCE for different multiple-cluster growth curve models for synthetic data.

246

HWANG ET AL.

an (inverted) elbow point at C D 2 and the mean values of both FPI and NCE were minimized at C D 2. Thus, the three model selection heuristics seem to perform well for revealing the number of the clusters in these synthetic data.

EMPIRICAL APPLICATION: ANTISOCIAL BEHAVIOR DATA The Data The present data are part of the National Longitudinal Survey of Youth (NLSY) reported in Curran (1998). In NLSY, a large sample of children and their mothers were administered a set of assessment instruments every other year starting from 1986 to 1992. From the original NLSY sample, Curran (1998) selected 221 pairs of children and mothers based on three selection criteria. First, children must have aged between 6 and 8 years at the first time point of assessment. Second, they had to complete interviews at all four time points. Finally, only one biological child was considered from each mother. The average child’s age was 6.9 years (SD D .62) and the average mother’s age was 25.5 years (SD D 1.87) at the first time point. In these data, the antisocial behavior of children was repeatedly measured over the equally spaced four time points. Antisocial behavior of children was measured as a sum of mother’s responses to 6 items from the Behavior Problems Index antisocial behavior subtest, and could range in value from 0–12. Descriptive statistics of the antisocial behavior variable measured at the four time occasions is provided in Table 2. Besides the repeated assessments of antisocial behavior, there were two time-invariant variables such as gender and cognitive stimulation for children at home. They were measured once at the initial time point. A measure of cognitive stimulation was obtained as a sum of mother’s responses to 14 items in the cognitive stimulation subscale of the Home

TABLE 2 Descriptive Statistics of the Repeated Measures of Antisocial Behavior at Four Time Points

Time Time Time Time

1 2 3 4

Mean

SD

Min

Max

1.49 1.84 1.88 2.07

1.54 1.79 1.80 2.09

0 0 0 0

7 9 10 9

Skewness (SE) 1.26 1.02 1.11 .98

(.16) (.16) (.16) (.16)

Kurtosis (SE) 1.66 .65 1.45 .40

(.33) (.33) (.33) (.33)

CLUSTERWISE GROWTH CURVE MODEL

247

Observation for Measurement of the Environment-Short Form (HOME-SF). The scores of this measure could range from 0–14. The mean score of cognitive stimulation was 9.10 (SD D 2.46). Gender was dummy coded (female D 0 and male D 1). About 48% of children were female. The same MATLAB code used in the simulation studies was used for the analyses of these data.

Aggregate, Single-Cluster Analyses At first, aggregate sample analyses in which no cluster-level heterogeneity was taken into account (i.e., C D 1) were conducted for identifying an average intra-individual change in antisocial behavior over time and for specifying the covariance structure of the repeated assessments of antisocial behavior. Moreover, the aggregate-level effects of time-invariant explanatory variables on the average temporal change were investigated. For the aggregate analyses, antisocial behavior was used as a response variable while gender and cognitive stimulation were employed as time-invariant explanatory variables. As shown in Table 2, the mean levels of antisocial behavior were found to increase monotonically over the four time points. This suggested that there existed a linear trend of change in antisocial behavior over the four assessments. Moreover, in these data, the level of antisocial behavior was only measured over four time points. This relatively small number of measurements seems insufficient to reveal a complex non-linear temporal trend in trajectories of the response variable (MacCallum, Kim, Malarkey, & Kiecolt-Glaser, 1997). Furthermore, previous studies with the same data supported such a linear trend of change (Curran & Bollen, 1999; Hwang & Takane, 2004). Accordingly, a linear-trend GCM was initially assumed for the data. Additional growth curve models with two different temporal patterns of change such as a quadratic trend and stability over time (i.e., no time-dependent change) in antisocial behavior were also considered for the purpose of model comparison. (The exact forms of A for the three models are described below.) Moreover, four different types of covariance structures were specified for each of the aggregate growth curve models: Independence, Exchangeable, AR-1, and Unconstrained. The GCM-GEE methodology was applied to fit the aggregate-level models. Table 3 provides the QIC values of the fitted models. In Table 3, model 1 posited that the level of antisocial behavior varied in a quadratic fashion over the four assessments. In model 1, A was prespecified as a matrix of polynomials of order 2 as follows: 2 3 1 0 0 61 2 4 7 7 AD6 (9) 41 4 165 1 6 36

248

HWANG ET AL.

TABLE 3 Summary of Fit for Various Aggregate Growth Curve Models with Four Different Covariance Structures for the Antisocial Behavior Data

Temporal Pattern Model 1 (Quadratic)

Model 2 (Linear)

Model 3 (Stable)

Covariance Structure

QIC

Independence Exchangeable AR-1 Unconstrained Independence Exchangeable AR-1 Unconstrained Independence Exchangeable AR-1 Unconstrained

2,706.1 2,704.0 2,701.9 2,704.9 2,707.1 2,702.5 2,701.2 2,702.7 2,752.3 2,744.6 2,746.2 2,748.5

To preserve years at the unit of time, the elements of the second column (i.e., the linear components) of A increased by 2, placing the origin of time at the first time point. Model 2 represented the linear-trend GCM, which was initially assumed for the data. This model specified A as a matrix of consisting of the first two columns in Equation (9) (i.e., the first-order polynomials). Model 3 assumed no change or stability in the level of antisocial behavior over time, in which A was specified as a matrix of polynomials of zero order. In addition, the four types of covariance structures of the repeated assessments of antisocial behavior were specified for each of the three growth curve models. The linear-trend GCM with the AR-1 covariance structure was chosen as the final aggregate model because it had the smallest QIC value. This final model accounted for about 54% of the variance of the response variable (FIT D .541). It posited that the level of antisocial behavior increased in a linear manner over the four assessments. As stated above, this linear pattern of change in antisocial behavior was reported in other aggregate analyses of the same data (Curran & Bollen, 1999; Hwang & Takane, 2004). However, the present GCM-GEE analysis additionally provided that the covariances among the four repeated measurements on antisocial behavior were likely to be structured in a first order autoregressive or a simplex pattern. This directly implies that the level of antisocial behavior at time point t is affected by that at time point t 1. The GEE estimates of the coefficients obtained from the best fitting aggregate model, along with their robust standard errors in the parentheses, are shown in Table 4. The first column of the table under the label of Initial exhibits the effects

CLUSTERWISE GROWTH CURVE MODEL

249

TABLE 4 The GEE Coefficient Estimates and Their Robust Standard Errors in the Parenthesis of the Final Aggregate Growth Curve Model for the Antisocial Behavior Data Initial Intercept Gender Cognitive stimulation

1.68 (.35) .56 (.10) .05 (.19)

Linear .30 (.04) .05 (.04) .02 (.01)

of the time-invariant explanatory variables on antisocial behavior at the initial status, and the second column under the label of Linear displays the effects of the explanatory variables on the linear growth rate of antisocial behavior over time. The estimated mean intercept was equal to 1.68 (SE D .35) and the mean growth rate was equal to .30 (SE D .04). This indicates that there existed a substantial level of antisocial behavior at the initial assessment and there was a significant growth of antisocial behavior over the four time points. The effect of gender on the initial status of antisocial behavior was .56 (SE D .10). It suggests that boys seemed to show a higher level of antisocial behavior than girls at the initial status. The effect of gender on the growth rate was .05 (SE. D .04), indicating that boys tended slightly to increase antisocial behavior at a higher rate compared to girls over the four time points. However, this effect is not significant. The effects of cognitive stimulation on the initial status and the growth rate of antisocial behavior were equal to .05 (SE D .19) and .02 (SE D .01), respectively. It identifies lower initial levels of antisocial behavior in addition to a slower growth rate of antisocial behavior by children receiving higher levels of cognitive stimulation at home compared to those receiving lower levels of cognitive stimulation. However, the effect of cognitive stimulation on initial status appears insignificant. Multiple-Cluster Analyses To take into account cluster-level heterogeneity, the proposed clusterwise method was applied to fit the same linear-trend GCM with the AR-1 covariance structure to the data with varying numbers of clusters. In these multiple-cluster analyses, FCGG was found to reach a convergence point relatively quickly under all different numbers of clusters, for instance, it converged in 10 iterations when C D 2 and in 210 iterations when C D 10 using the rational starts for the membership parameters. Figure 2 provides the values of the three cluster validity measures for different multiple-cluster linear-trend GCM models with the AR-1 covariance structure. As shown in the figure, the values of FIT increased gradually beyond

250

HWANG ET AL.

FIGURE 2 The values of FIT, FPI, and NCE for different multiple-cluster growth curve models for the antisocial behavior data.

C D 2, suggesting that no substantial improvements in FIT were obtained by having more than two clusters. Moreover, the minimum values of FPI and NCE were obtained at C D 2. Thus, C D 2 was adopted for further analyses. This two-cluster model accounted for about 71% of the variance of the response variable (FIT D .707). Table 5 provides the GEE coefficient estimates obtained from the two-cluster analysis along with their robust standard errors in the parentheses. In order to highlight distinct characteristics of the obtained solutions across the two clusters, statistically significant estimates are emphasized for TABLE 5 The GEE Coefficient Estimates and Their Robust Standard Errors in the Parenthesis of the Two-Cluster Growth Curve Model for the Antisocial Behavior Data Initial Cluster 1 (n1 D 139) Cluster 2 (n2 D 82)

Intercept Gender Cognitive stimulation Intercept Gender Cognitive stimulation

1.34 .28 .06 1.80 .66 .04

(.24) (.06) (.13) (.55) (.15) (.29)

Linear .05 .01 .00 .58 .04 .04

(.03) (.02) (.01) (.07) (.06) (.02)

CLUSTERWISE GROWTH CURVE MODEL

251

interpretational purposes here. Model fitting procedures of the proposed method showed that the estimated intercept and growth rate were 1.34 (SE D .24) and .05 (SE D .03), respectively, in cluster 1. This indicates that there was a significant level of antisocial behavior at the initial status while there existed no significant increase in antisocial behavior over the four assessments. On the other hand, the intercept and growth rate estimates in cluster 2 were 1.80 (SE D .55) and .58 (SE D .07), respectively. This suggests a substantial level of antisocial behavior at the initial status and a significant linear growth over the four time points. Thus, the two clusters involved substantively different temporal patterns of change in antisocial behavior: The individuals in cluster 1 were likely to show relatively a low level of antisocial behavior at the initial assessment and to show little change in the level of antisocial behavior over the four time points. On the other hand, those in cluster 2 involved a relatively high level of antisocial behavior at the first assessment and a significant rate of increase in antisocial behavior across the four time points. It was also shown that in cluster 1, gender appeared to have a significant and positive effect on the initial level of antisocial behavior (.28, SE D .06), indicating that boys displayed a higher level of antisocial behavior than girls at the initial status. No variables showed significant impacts on the growth rate of antisocial behavior. On the other hand, in cluster 2, gender had a significant and positive effect on the initial level of antisocial behavior (.66, SE D .15), again suggesting a higher level of antisocial behavior by boys at the initial status compared to girls. In this cluster, cognitive stimulation exhibited a significant and negative impact on the growth rate of antisocial behavior ( .04, SE D .02). It suggests that children cognitively more stimulated at home increased antisocial behavior at a lower rate over the four assessments. Thus, in the first cluster, only gender had a significant impact on the initial level of antisocial behavior. On the other hand, in the second cluster, gender had a significant effect on the initial level of antisocial behavior while cognitive stimulation displayed a significant influence on the increase of antisocial behavior over time. When classifying respondents based on a membership cut-off point of .5, the sizes of clusters 1 and 2 arrived at 139 (63%) and 82 (37%), respectively. To summarize, FCGG was used to fit multiple-cluster linear trend growth curve models with the AR-1 covariance structure to antisocial behavior data. This method determined that C D 2 represented the number of clusters based on cluster validity heuristics. The two-cluster linear-trend model estimated by the proposed method seemed to reveal substantively distinct developmental trajectories in antisocial behavior within the two groups of children. Moreover, the influences of the time-invariant explanatory variables on the developmental pathways were shown to be different across the two clusters. Importantly, these cluster specific insights were missed completely by the aggregate GCMGEE approach. Therefore, the proposed method appears to be helpful in studying

252

HWANG ET AL.

qualitatively different longitudinal processes on antisocial behavior and different consequences of its antecedents across two relatively heterogeneous subgroups of children. Finally, it is worthwhile to mention that the results of the above analyses should be interpreted with caution as the normality assumption underlying the proposed method may not be valid for the antisocial behavior data as alluded in Table 2, although the same assumption has also been made for the data in the previous studies (e.g., Curran & Bollen, 1999; Hwang & Takane, 2004).

CONCLUDING REMARKS An extension of the traditional aggregate-level growth curve model was proposed that simultaneously classifies individuals into heterogeneous groups and estimates parameters based on generalized estimating equations for each of the derived groups. The proposed method permits investigators to examine intercluster differences in developmental trajectories of time-dependent measures and, at the same time, their relations to time-invariant variables. In addition, it allows the specification of an increased variety of population covariance structures for repeated measures data. The performance of the proposed method was evaluated with synthetic data with different sample sizes. The analyses of synthetic data proved the effectiveness of the proposed method in recovering parameters within clusters and determining the number of clusters especially in small samples. In addition, the usefulness of the proposed method was empirically demonstrated by the analysis of longitudinal data on antisocial behavior of children. The proposed method was able to identify two clusters which involved qualitatively different longitudinal processes on antisocial behavior of a sample of children and different consequences of its antecedents. In spite of its clear technical and empirical implications, the proposed method has limitations as well. One limitation comes from the fact that the proposed method is based on the traditional fixed-effects growth curve model which focuses on modeling the average or intra-individual change of trajectories over time within heterogeneous subgroups of the population. Technically, the fixed-effects GCM is beneficial when combined with GEE because it is quite compatible with the marginal modeling approach of GEE. However, the proposed method is not suitable for illuminating inter-individual variations in trajectories of change, which may be of substantive importance in some longitudinal studies (Raudenbush, 2001). If its data requirements such as large sample and correct model specification are met, LGMM may be an alternative for testing both intraindividual change over time and interindividual differences in the temporal change of repeated measures data within different subgroups of the population. On the other hand, the proposed method may be a sensible choice when modeling the

CLUSTERWISE GROWTH CURVE MODEL

253

average temporal change in repeated measures data and their relations to predictors across different subgroups of the population is the primary interest of study (see Diggle et al. (2002) and Ghisletta & Spini (2004) for such situations where inferences about the average change of trajectories are the main objective of longitudinal studies). Another potential disadvantage of the proposed method is that its parameter estimates are not efficient unless the correct specification of the population covariance matrix is ensured. This lack of efficiency, because of a reliance on GEE methodology, may be regarded as a trade-off with respect to the flexibility of the method in terms of model specification and computation. Again, when the correct specification of the population covariance matrix is guaranteed, it may often be more beneficial to apply LGMM because of its effectiveness for testing interindividual differences as well as intra-individual change of longitudinal data. A number of future research topics may be further considered to extend the capabilities of the proposed method. For example, although it is not a problem unique to this method, more formal rules are needed for determining the fuzzy weight. Similarly, a fruitful research area would be to develop a more confirmatory way of selecting the number of clusters in fuzzy clustering, as some ad hoc decisions still need to be made with current cluster validity measures. For instance, the graphical judgment on the (inverted) elbow point of FIT may often be subjective. Nonparametric procedures such as bootstrap scree tests (Hong, Mitchell, & Harshman, 2006) may be adopted for addressing this issue. Moreover, the proposed method is currently geared for the analysis of continuous longitudinal variables. It may be interesting to apply a similar approach to discrete longitudinal variables. Furthermore, it may be of interest to examine hypotheses concerning model parameters within/across clusters. For instance, one may want to investigate whether certain coefficients in Bc are identical to each other across clusters. Such hypotheses may be tested in the proposed method by imposing linear constraints (e.g., equality or zero constraints) on parameters (e.g., Böckenholt & Takane, 1994; Takane, Yanai, & Mayekawa, 1991). This would be an important extension to incorporate linear constraints into the proposed method, thus allowing the evaluation of a variety of hypotheses on parameters within/across clusters.

ACKNOWLEDGMENTS The work reported in this paper was supported by Grant 290439 and Grant A6394 from the Natural Sciences and Engineering Research Council of Canada to the first and second authors, respectively. The authors wish to thank the Editor and two anonymous reviewers for their constructive comments which helped improve the overall quality and readability of this manuscript.

254

HWANG ET AL.

REFERENCES Arabie, P., Carroll, J. D., DeSarbo, W. S., & Wind, J. (1981). Overlapping clustering: A new method for product positioning. Journal of Marketing Research, 18, 310–317. Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Advanced methods of marketing research (pp. 160–189). Oxford: Blackwell. Bagozzi, R. P. (1982). A field investigation of causal relations among cognition, affect, intensions, and behavior. Journal of Marketing Research, 19, 562–584. Bauer, D. J., & Curran, P. J. (2003). Distributional assumptions of growth mixture models: Implications for over-extraction of latent trajectory classes. Psychological Methods, 8, 338–363. Bezdek, J. C. (1974a). Numerical taxonomy with fuzzy sets. Journal of Mathematical Biology, 1, 57–71. Bezdek, J. C. (1974b). Cluster validity with fuzzy set. Journal of Cybernetics, 3, 58–72. Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press. Bezdek, J. C., Coray, C., Gunderson, R., & Watson, J. (1981). Detection and characteristics of cluster substructure. II. Fuzzy c-varieties and convex combinations thereof. SIAM Journal on Applied Mathematics, 40, 358–372. Biesanz, J. C., Deeb-Sossa, N., Papadakis, A. A., Bollen, K. A., & Curran, P. J. (2004). The role of coding time in estimating and interpreting growth curve models. Psychological Methods, 9, 30–52. Böckenholt, U., & Takane, Y. (1994). Linear constraints in correspondence analysis. In M. J. Greenacre & J. Blasius (Eds.), Correspondence analysis in social sciences (pp. 112–127). London: Academic Press. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park: Sage Publications. Cheong, Y. F., Fotiu, R. P., & Raudenbush, S. W. (2001). Efficiency and robustness of alternative estimators for two- and three-level models: The case of NAEP. Journal of Educational and Behavioral Statistics, 26, 411–429. Curran, P. J. (1998). Introduction to hierarchical linear models of individual growth: An applied example using the SAS data system. Paper presented at the first international institute on developmental science, University of North Carolina, Chapel Hill. Curran, P. J., & Bollen, K. A. (1999). A hybrid latent trajectory model of stability and change: Applications in developmental psychology. Paper presented at the biennial meeting of the society for research on child development, Albuquerque, New Mexico. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM-algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. DeSarbo, W. S., & Cron, W. L. (1988). A conditional mixture maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5, 249–289. DeSarbo, W. S., Grewal, R., & Hwang, H. (2006). A clusterwise bilinear multidimensional scaling methodology for marketing research: An application to the estimation of strategic groups. Unpublished manuscript. DeSarbo, W. S., Wedel, M., Vriens, M., & Ramaswamy, V. (1992). Latent class metric conjoint analysis. Marketing Letters, 3, 273–288. Diggle, P. J., Heagerty, P., Liang, K.-Y., & Zeger, S. L. (2002). Analysis of longitudinal data. Oxford: Oxford University Press. Duncan, T. E., Duncan, S. C., Hops, H., & Stoolmiller, M. (1995). An analysis of the relationship between parent and adolescent marijuana use via generalized estimating equation methodology. Multivariate Behavioral Research, 30, 317–339.

CLUSTERWISE GROWTH CURVE MODEL

255

Dunn, J. C. (1974). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57. Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster analysis, 4th ed. London: Arnold Press. Ghisletta, P., & Spini, D. (2004). An introduction to generalized estimating equations and an application to assess selectivity effects in a longitudinal study on very old individuals. Journal of Educational and Behavioral Statistics, 29, 412–437. Goldstein, H. I. (1987). Multilevel models in educational and social research. London: Oxford University. Gordon, A. D. (1999). Classification. London: Chapman & Hall/CRC. Grizzle, J. E., & Allen, D. M. (1969). Analysis of growth and dose response curves. Biometrics, 25, 357–381. Guttman, L. A. (1954). A new approach to factor analysis: The radix. In P. F. Lazarsfeld (Ed.), Mathematical thinking in the social sciences (pp. 3–22). New York: Columbia University Press. Hathaway, R. J., & Bezdek, J. C. (1993). Switching regression models and fuzzy clutering. IEEE Transactions on Fuzzy Systems, 1, 195–204. Hipp J. P., & Bauer, D. J. (2006). Local solutions in the estimation of growth mixture models. Psychological Methods, 11, 36–53. Hong, S., Mitchell, S., & Harshman, R. A. (2006). Bootstrap scree tests: A Monte Carlo simulation and applications to published data. British Journal of Mathematical and Statistical Psychology, 59, 35–57. Hruschka, H. (1986). Market definition and segmentation using fuzzy clustering methods. International Journal of Research in Marketing, 3, 117–134. Hwang, H., & Takane, Y. (2004). A multivariate reduced-rank growth curve model with unbalanced data. Psychometrika, 69, 65–79. Hwang, H., & Takane, Y. (2005). Estimation of growth curve models with structured error covariances by generalized estimating equations. Behaviormetrika, 32, 141–153. Jedidi, K., Jagpal, H. S., & DeSarbo, W. S. (1997). Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science, 16, 39–59. Jöreskog, K. G. (1970). Estimation and testing of simplex models. British Journal of Mathematical and Statistical Psychology, 23, 121–145. Kamakura, W. A., Kim, B., & Lee, J. (1996). Modeling preference and structural heterogeneity in consumer choice. Marketing Science, 15, 152–172. Khatri, C. G. (1988). Robustness study for a linear growth model. Journal of Multivariate Analysis, 24, 66–87. Khatri, C. G. (1966). A note on a MANOVA model applied to problems in growth curves. Annals of the Institute of Statistical Mathematics, 18, 75–86. Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974. Li, F., Duncan, T. E., & Duncan, S. C. (2001). Latent growth modeling of longitudinal data: A finite growth mixture modeling approach. Structural Equation Modeling, 8, 493–530. Li, F., Maddalozzo, G. F., Harmer, P., & Duncan, T. E. (1998). Analysis of longitudinal data of repeated observations using generalized estimating equations methodology. Measurement in Physical Education and Exercise Science, 2, 93–113. Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. MacCallum, R. C., Kim, C., Malarkey, W. B., & Kiecolt-Glaser, J. K. (1997). Studying multivariate change using multilevel models and latent curve models. Multivariate Behavioral Research, 32, 215–253. McBratney, A. B., & Moore, A. W. (1985). Application of fuzzy sets to climatic classification. Agricultural and Forest Meteorology, 35, 165–185.

256

HWANG ET AL.

McCullagh, P. (1983). Quasi-likelihood functions. The Annals of Statistics, 11, 59–67. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. London: Chapman & Hall. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: John Wiley & Sons. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122. Moffitt, T. E. (1993). Adolescent-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review, 100, 674–701. Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraw-Hill. Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557–585. Muthén, B. O., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55, 463–469. Nagin, D. (1999). Analyzing developmental trajectories: A semi-parametric, group-based approach. Psychological Methods, 4, 139–157. Okeke, F., & Karnieli, A. (2006). Linear mixture model approach for selecting fuzzy exponent value in fuzzy c-means algorithm. Ecological Informatics, 1, 117–124. Pan, W. (2001). Akaike’s information criterion in generalized estimating equations. Biometrics, 57, 120–125. Potthoff, R. F., & Roy, S. N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51, 313–326. Raudenbush, S. W. (1995). Reexaming, reaffirming, and improving application of hierarchical models. Journal of Educational and Behavioral Statistics, 20, 210– 220. Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501–525. Rao, C. R. (1965). The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika, 52, 447–458. Reinsel, G. C., & Velu, R. P. (1998). Multivariate reduced-rank regression. Theory and application. New York: Springer. Roubens, M. (1982). Fuzzy clustering algorithms and their cluster validity. European Journal of Operational Research, 10, 294–301. Takane, Y., & Hunter, M. A. (2001). Constrained principal component analysis: A comprehensive theory. Applicable Algebra in Engineering, Communication, and Computing, 12, 391–419. Takane, Y., Yanai, H., & Mayekawa, S. (1991). Relationships among several methods of linearly constrained correspondence analysis. Psychometrika, 56, 667–684. Tucker, L. R. (1951). A method for synthesis of factor analysis studies (Personnel Research Section Report No. 984). Washington, D.C.: U.S. Department of the Army. von Rosen, D. (1991). The growth curve model: A review. Communications in Statistics: Theory and Methods, 9, 2791–2822. Vonesh, E. F., & Carter, R. L. (1987). Efficient inference for random-coefficient growth curve models with unbalanced data. Biometrics, 43, 617–628. Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the GaussNewton method. Biometrika, 61, 439–447. Wedel, M., & Kamakura, W. A. (1998). Market segmentation: Conceptual and methodological foundations. Boston: Kluwer Academic Publishers. Wedel, M., & Steenkamp, J.-B. E. M. (1989). Fuzzy clusterwise regression approach to benefit segmentation. International Journal of Research in Marketing, 6, 241–258. Wedel, M., & Steenkamp, J.-B. E. M. (1991). A clusterwise regression method for simultaneous fuzzy market structuring and benefit segmentation. Journal of Marketing Research, 28, 385–396. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353. Zeger, S. L., & Liang, K. Y. (1986). The analysis of discrete and continuous longitudinal data. Biometrics, 42, 121–130.

CLUSTERWISE GROWTH CURVE MODEL

257

APPENDIX: PARAMETER ESTIMATION For the parameter estimation of the proposed method, two main optimization steps are alternated as follows until convergence: Step 1: The GCM parameters are updated for fixed um ci . The first step is equivalent to minimizing: ®D

C X N X

SS.yci

ABc xci /;

(A.1)

cD1 i D1

p m p m uci yi and xci D uci xi . The GEE method is where yci D utilized to update Bc under the specification of a certain covariance structure within each cluster. Specifically, the GEE estimation procedure repeats two sub-steps as follows: In the first substep, the quasi-likelihood estimate of Bc is obtained for fixed Vci . Let bc D vec.Bc /, where vec.Z/ is a vector formed by stacking all columns of Z one below another. Given that yci ABc xci D  vec.yci ABc xci / D yci .x0 ci bc , where ci ˝ A/bc D yci ci D x0 ˝ A, b is the solution of a quasi-score function: c ci N X

0ci Vci1 .yci

ci bc / D 0:

i D1

(A.2)

Thus, solving (A.2) for bc yields: bO c D

(

N X

0ci Vci1 ci

i D1

1

)

(

N X

0ci Vci1 yci

i D1

)

:

(A.3)

The updated Bc is reconstructed from bc . In the next sub-step, ’c and Rci .™/ are estimated by the current Pearson residuals (McCullagh & Nelder, 1989, p. 37), defined as follows: ˜ci D yci

ABc xci :

(A.4)

Then, ’c is updated by: ’O c D

1 N

S

N X i D1

˜0ci ˜ci ;

(A.5)

258

HWANG ET AL.

where S is the number of parameters in Bc . In turn, ˜ci and ’c are used to update Rci .™/. The derivation of Rci .™/ from ˜ci and ’c depends on which covariance structure is specified for repeated assessments (refer to Liang & Zeger, 1986). These two sub-steps are alternated until no substantial changes in Bc occur within each cluster. Step 2: The fuzzy membership parameter uci is updated for fixed Bc and Vci . Let dci D SS.yi ABc xi /. Then, uci is updated by: uO ci D

C  X k

1 1=.m

dci dki

1/

:

(A.6)

Formula (A.6) can be derived as follows. Minimizing Equation (4) under the membership constraint is equivalent to minimizing: ! C X N C X X m LD uci 1 ; (A.7) uci dci œ cD1 i D1

cD1

@L m 1 where œ is a Lagrangian multiplier. Solving @u D muci dci œ D ci 0 for uci yields:  1=.m 1/ œ uO ci D : (A.8) mdci

Using

@L @œ

D

PC

1 D 0 and Equation (A.8) leads to:

cD1 uci

0

œO D @

C X

cD1

!1=.m

1=.mdci /

1/

11 A

m

:

(A.9)

Then, Equation (A.6) is obtained by inserting Equation (A.9) in (A.8). Once Bc is estimated, its asymptotic covariance estimates can also be obtained. If the covariance structure is correctly specified, i.e., Vci D † ci , the consistent estimates of the asymptotic covariances of Bc is given by: €c D

(

N X i D1

0ci Vci1 ci

)

1

:

(A.10)

CLUSTERWISE GROWTH CURVE MODEL

259

The elements in (A.10) are called the naive covariance estimates. If the covariance structure is mis-specified, i.e., Vci ¤ †ci , the asymptotic covariance estimates of Bc is given by: ( N ) X 0 1     0 1 ‰c D € c ci Vci .yci ABc xci /.yci ABc xci / Vci ci € c : i D1

(A.11) The elements in Equation (A.11) are called the robust covariance estimates because they are consistent even if the structure of † ci is mis-specified.

Suggest Documents