Application of Bayesian Inference using Gibbs Sampling to Item ...

30 downloads 0 Views 481KB Size Report
Lindon Eaves,1,4 Alaattin Erkanli,2,3 Judy Silberg,1 Adrian Angold,3 Hermine H. Maes,1 ...... Eaves, L. J., Martin, N. G., Heath, A. C., and Kendler, K. S.. (1987).
Behavior Genetics, Vol. 35, No. 6, November 2005 (Ó 2005) DOI: 10.1007/s10519-005-7284-z

Application of Bayesian Inference using Gibbs Sampling to Item-Response Theory Modeling of Multi-Symptom Genetic Data Lindon Eaves,1,4 Alaattin Erkanli,2,3 Judy Silberg,1 Adrian Angold,3 Hermine H. Maes,1 and Debra Foley1 Received 25 Mar. 2003—Final 23 May 2005

Several ‘‘genetic’’ item-response theory (IRT) models are fitted to the responses of 1086 adolescent female twins to the 33 multi-category item Mood and Feeling Questionnaire relating to depressive symptomatology in adolescence. A Markov-chain Monte Carlo (MCMC) algorithm is used within a Bayesian framework for inference using Gibbs sampling, implemented in the program WinBUGS 1.4. The final model incorporated separate genetic and non-shared environmental traits (‘‘A and E’’) and item-specific genetic effects. Simpler models gave markedly poorer fit to the observations judged by the deviance information criterion (DIC). The common genetic factor showed major loadings on melancholic items, while the environmental factor loaded most highly on items relating to self-deprecation. The MCMC approach provides a convenient and flexible alternative to Maximum Likelihood for estimating the parameters of IRT models for relatively large numbers of items in a genetic context. Additional benefits of the IRT approach are discussed including the estimation of latent trait scores, including genetic factor scores, and their sampling errors. KEY WORDS: Bayesian inference; depression; genetic; Gibbs sampling; item-response theory; Markov Chain Monte Carlo; multivariate; twins.

structure underlying large numbers of categorical items using item-response theory (IRT). We illustrate the approach by fitting a conventional IRT model to responses of female adolescent MZ and DZ twins to 33 three-category items from the Mood and Feelings Questionnaire (MFQ, Angold 1995). The approach allows us simultaneously to estimate parameters of the IRT model and the genetic model for twin resemblance, and to estimate scores of individual subjects. The approach yields samples from the posterior distributions given the data of the model parameters, including the subjects latent trait scores, from which is it possible to obtain point estimates of the parameters, such as the estimated mean, mode or median of these distributions, as well to infer information on the credibility of the estimates represented by estimates of the standard deviation/variance of the

INTRODUCTION This paper illustrates the application of Markov Chain Monte Carlo (MCMC) methods to a problem in genetic analysis that has, so far, proved tedious to solve by conventional maximum likelihood approaches namely, the synthesis of genetic modeling, especially in twin data, with modeling of the latent 1

2

3

4

Virginia Institute for Psychiatric and Behavioral Genetics, Department of Human Genetics, Virginia Commonwealth University, Richmond, Virginia, USA. Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, USA. Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, North Carolina, USA. To whom correspondence should be addressed at Virginia Institute for Psychiatric and Behavioral Genetics, PO Box 980003, Virginia Commonwealth University, Richmond, VA 23298-0003, U.S.A. e-mail: [email protected].

765 0001-8244/05/1100-0765/0 Ó2005 Springer Science+Business Media, Inc.

766 posterior distributions, or by credible intervals for the parameters. Item-response theory has a long history (see e.g., chapters by Birnbaum in Lord and Novick, 1968) and provides a theoretically appealing framework for conceiving the relationship between categorical outcomes, such as responses to items on a psychological test, and latent traits. From one perspective, IRT is thus a generalization of the traditional factor model to incorporate categorical outcomes. The IRT model postulates that the probability that an individual exceeds a given response threshold, defined by the transition from one response category to the next, is a monotonic function (typically logistic or probit) of the subject’s level on one or more continuous latent traits. ‘‘Item’’ parameters correspond to the category thresholds (‘‘item difficulty parameters’’ or ‘‘item extremities’’) and loadings of the items on the latent trait or traits (‘‘item discriminating powers’’). The basic IRT framework also provides a fertile general conceptual structure for dealing with many clinically and genetically important questions as etiological and phenotypic heterogeneity in complex behavioral disorders indexed by multiple symptoms. Several programs are available for fitting a variety of IRT models to data on unrelated subjects (e.g., Fraser, 1988; Muthe´n and Muthe´n, 2001; Thissen, 1995). However, as far as we can tell, none of these has been used yet to apply the IRT model to kinship data, such as data on MZ and DZ twins and test hypotheses about the underlying genetic and environmental influences on trait variation and covariation between relatives. Two problems typically arise in trying to fit IRT models to twin data. First, the twins are correlated and behavior-geneticists typically want to test models that imply a variety of constraints on the covariance structure between relatives, such as those implied by the familiar ‘‘ACE’’ model for twin resemblance (e.g., Neale and Cardon, 1992). Secondly, item parameters need to be constrained across latent traits and groups in the case of twins, to allow for the fact that the same factor structure applies to all subjects. Part of the problem of applying IRT to twin data stems from the fact that although it is relatively easy to write the likelihood for a ‘‘twin IRT’’ model (see e.g., Eaves et al., 1987), it is far harder to maximize the likelihood with respect to the parameters of a genetic IRT model because of the numerical problem of integrating the likelihood of individual pairs over the infinity of possible latent trait values in multiple dimensions. Although this is not impossible, it has

Eaves, Erkanli, Silberg, Angold, Maes, and Foley proved very tedious and, to our knowledge there are few, if any, large scale applications of IRT to behavior-genetic data. The recent advance in computer power has unleashed a variety of computer-intensive approaches to statistical analysis that are only just beginning to have an impact on the way behavior-geneticists and twin researchers think about their data. One approach that is starting to receive some attention is the application of Bayesian inference, implemented through Markov Chain Monte Carlo algorithms. Gilks et al. (1996) provide a good general overview of the concepts and the freely available program WinBUGS 1.4 (Spiegelhalter et al., 2003) provides a relatively userfriendly platform for its implementation. Recent papers provide illustrations of the approach to survival analysis (Do et al., 2000), non-linear developmental change and genotypeenvironment interaction (Eaves and Erkanli, 2003; Eaves et al., 2003). In this paper, we focus on an application to IRT modeling and provide WinBUGS code that illustrates the relative transparency and flexibility of the approach. These three publications give more extended accounts of the conceptual and practical background to the application of MCMC methods to several different types of problem in twin data analysis. Briefly, the MCMC approach constructs a Markov Chain on the (parameter) space of unknown quantities such that, starting with a series of trial values (e.g., means, regressions, genetic variances, genetic and environmental effects etc.), after an initial series of iterations (the ‘‘burn-in’’) successive iterations represent samples from the unknown joint distribution. This is the so-called stationary distribution of the Markov Chain, and in the Bayesian context, it is the joint posterior distribution of all the parameters. We note that in this context, the ‘‘parameters’’ do not just include the usual parameters of the structural model (means, genetic and environmental variances etc.) but also the latent genetic and environmental deviations of the individual twins and any missing observed values. These MCMC iterations are furnished by simulating values for the unknown parameters, conditional upon the given data, using specially catered transition probability kernels (also called proposal distributions) that are not only easy to simulate from, but also guarantee the convergence (in distribution) of the simulated Markov Chain to the true joint posterior distribution. After a burn-in period the probability distribution, and its moments like the expected value of any function, of the unknown quantity is obtained to any desired degree of precision

Application of Bayesian Inference Using Gibbs Sampling by taking an (ergodic) average of the successive values over a sufficiently large number of iterations. The Gibbs sampler (Creutz, 1979; Gelfand and Smith, 1990; Geman and Geman, 1984; Ripley, 1979) is perhaps the most popular MCMC approach to construct a Markov chain with the desired properties. In the Gibbs sampling approach, the Markov kernels consist of the conditional distributions of each variable of interest given all the other variables. As an example consider random variables X and Y having an unknown joint distribution [X, Y ], and assume further that each of the conditional distributions [X | Y ] and [Y | X ] are available in analytically closed form. Here the Markov kernels are the conditionals [X | Y ] and [Y | X ]. So, if X0 and Y0 are initial values, then the Markov Chain is constructed on the XY space by simulating successively a sequence of {Xr, Y r } from the known conditionals [Y | X r-1] and [X | Y r-1] for r=1,2,...,R. Under mild regularity conditions, iIt can be shown that the joint distribution of the sequence {X r , Y r } converges to the joint distribution [X,Y ] as R tends toward infinity, as long as these conditionals are bona-fide probability distributions. Thus, for a sufficiently large R, the {X r , Y r } will resemble draws from the true joint distribution [X,Y ]. Within the Bayesian context, X and Y are usually unknown parameters (e.g., the mean and variance), and [X | Y ] and [Y | X ] (suppressing the conditioning on the data) are the conditional posterior distributions, and [X,Y ] is the joint posterior distribution of X and Y, respectively. For example, the marginal posterior distribution [X ] of X can be approximated by the Monte Carlo integration, for each X=x, X ½x ¼ 1=R ½xjY r ; r where the summation is over r = 1,2,...,R. Alternatively, a kernel density approximation, or a histogram, can be used using the sequence {Xr}. Similarly, the expectation of a function g (X) is approximated by the ergodic Monte Carlo average X EgðXÞ ¼ 1=R gðX r Þ: r Note that a desired byproduct of MCMC approach is that not only the expectations, but the entire posterior distribution of g(X) is approximated by using the sequence {g(Xr)}. There are also several other ways, such as general Metropolis–Hasting algorithms, to construct an MCMC sampler; in fact the Gibbs sampling is a special case of these general algorithms. A more thorough account of the approach, which is

767 beyond the scope of this paper, may be found in Tierney (1994), Gilks et al., (1996), and Brooks, (1998). A recent paper by Besag (2000) reviews several MCMC approaches and provides a comprehensive list of references. DATA The data chosen to illustrate the method comprise responses of 373 MZ and 170 DZ adolescent female twin pairs aged 8–16 to 33 items of MFQ (Angold, 1995) designed to assess symptoms of depression. The twins completed the MFQ as part of a much more extensive psychiatric assessment in the home during the first wave of the Virginia Twin Study of Adolescent Behavioral Development (VTSABD, Eaves et al., 1997; Hewitt et al., 1997; Simonoff et al., 1997). The items and the raw endorsement frequencies for the three response categories are given for the whole sample of 1086 individual twins in Table I. GENETIC IRT MODEL The model combines two main features. The first is the ‘‘psychometric model’’ or ‘‘measurement model that relates the probability of item endorsement to the latent trait(s) hypothesized in the second ‘‘individual differences model’’ or ‘‘structural model’’. In the case of twin data, the model for individual differences incorporates hypotheses about the number of latent traits and the contributions of genetic and environmental factors to variation in latent trait values. An additional layer may reflect hypotheses about the distribution of the latent traits. In this illustrative example, we start by assuming that a single dimension of liability to depression underlies responses to all 33 items and that all genetic and environmental effects operate through a single ‘‘common pathway’’ (see e.g., Neale and Cardon, 1992). That is, we hypothesize that the same pattern of item loadings applies to both latent genetic and environmental effects. Later, we will relax this assumption and consider the possibility that genetic and environmental effects operate through independent pathways and that there might be item-specific genetic effects. Thus, if Yijk is the response of the jth twin of the ith pair to the kth (binary) item, the model predicts that the link function relating the probability that the twin will endorse the kth item to the latent trait, X, is PðYijk Þ ¼ ð1 þ ez Þ1 where

768

Eaves, Erkanli, Silberg, Angold, Maes, and Foley Table I. MFQ Item Response Frequencies (N=1086) Response

Item Miserable or unhappy Did not enjoy anything Less hungry than usual Ate more than usual Tired/just sat around Slower movement Very restless Felt no good Blamed myself Hard to decide Cranky with parents Talked less Talked more slowly Cried a lot Future no good Life not worth living Thoughts of death

Response

0

1

2

Item

0

1

2

42.09 83.90 60.50 57.51 63.50 79.75 65.15 83.48 61.32 40.36 54.58 57.15 85.29 74.17 86.22 86.61 72.49

51.89 14.11 26.78 27.06 27.33 15.23 26.51 12.80 26.36 42.35 37.02 31.68 11.07 18.32 10.64 10.00 19.11

6.01 1.90 12.73 15.43 9.17 5.02 8.34 3.72 12.31 17.29 8.41 11.17 3.64 7.51 3.14 3.39 8.40

Family better off w/o me Thought of killing myself Did not want to see friends Hard to concentrate Bad things might happen Hated myself Felt I was a bad person Thought I was ugly Worried about aches and pains Felt lonely Thought nobody loved me Did not have fun at school Not as good as other kids Did everything wrong Did not sleep as well Slept more than usual

89.04 93.56 87.18 65.12 73.49 84.01 86.87 63.97 72.55 64.80 84.78 71.26 75.02 85.60 70.17 65.48

7.91 4.95 11.33 29.01 19.82 12.45 11.48 27.60 20.61 28.36 11.99 22.54 19.35 12.67 21.71 24.59

3.05 1.49 1.49 5.87 6.69 3.54 1.65 8.43 6.84 6.84 3.23 6.19 5.62 1.74 8.12 9.93

z ¼ bk ðXij  ak Þ: Xij is the score of the twin on the latent trait, ak is the ‘‘difficulty’’ of the kth item (i.e., the trait value for which the endorsement probability is 50%, see e.g., Lord and Novick, 1968), and bk is the discriminating power of the item (i.e., the regression of endorsement probability on latent trait when Xij=ak). The bk correspond conceptually to factor loadings in the conventional factor model. Lord and Novick (op. cit.) provide an extensive graphical and mathematical treatment of several IRT models and further details of how to interpret the model parameters. The extension to multi-category items is fairly simple (see e.g., Nelder and McCullagh, 1989). Under the (logistic) IRT model, we assume that the discriminating powers are constant over the range of the latent trait, but add further difficulty parameters to reflect the difficulty levels (‘‘response thresholds’’) associated with the additional categories. The simple model for individual differences assumes that there is a single latent trait with identical loadings in MZ and DZ twins and in first and second twins of each pair. The distribution of the latent trait in the twins is assumed to be bivariate normal, with covariances determined by the contribution of genetic and environmental effects to the resemblance of MZ and DZ twins (see e.g., Neale and Cardon, 1992). Given the consensus that shared environmental

effects do not contribute much to liability to depression, we have further simplified our illustrative model to assume that only additive genetic effects (A) contribute to twin similarity and the only effects of environment are those specific to individuals (E). In order to fix the scale for the item parameters, we have to fix the total variance in the latent trait. We chose ultimately to scale the item parameters so that the variance in the latent trait is unity. ALTERNATIVE MODELS The initial model outlined above assumes that the probability of item endorsement is a logistic function of a single heritable latent trait. This is equivalent to what has become known as the ‘‘common pathways’’ model in conventional structural modeling of twin data because the scaling of loadings of responses on the latent genetic and environmental factors implies that genetic and environmental effects operate through a single underlying common pathway (e.g., physiological or neurochemical system, c.f. Martin and Eaves, 1977). This need not be the case. A more general class of models allows for several latent factors, genetic or environmental, with different patterns of loadings (‘‘discriminating powers’’) on different sources of variation. In addition, the above models assume that all item-specific effects are stochastic and uncorrelated across members of twin pairs. If there are specific

Application of Bayesian Inference Using Gibbs Sampling genetic effects on individual items, residual effects on liability are correlated across twins within pairs for individual items. A more general form of the above model is a generalization of the genetic factor model to the IRT case. We write PðYijk Þ ¼ ð1 þ ez Þ1 where z ¼ Rbkm Xijm  akl ; l ¼ 1 . . . n  1; where n is the number of response categories and summation is over m=1...p latent traits. The columns of the matrix of loadings B are scaled initially so that each of the latent traits is N[0,1]. Specific factors are defined by having zero loadings on all but one item. If there is one latent genetic factor, one latent environmental factor, and t items, the number of latent traits is t+2. Excluding structural zeros, there are 3t loadings and t(n)1) thresholds. The latent traits are expected to correlate across twin pairs. The correlation will depend on how the various genetic and environmental differences contribute to variation in the latent traits. We consider the model for genetic and environmental contributions to variance components between and within MZ and DZ twin pairs below (pp. 11ff.)

COMPARING MODELS In the conventional likelihood based approaches, (nested) models are compared by computing the likelihood ratio chi-square difference between the more general and more restricted models. A number of further criteria, such as the Akaike Information Criterion, AIC, (Akaike, 1974) have been developed in the attempt to optimize the trade-off between goodness of fit and parsimony. Within the Bayesian framework, Spiegelhalter et al. (2001) have proposed using the ‘‘deviance information criterion’’, DIC, as a generalization of the AIC that assesses the ability of a model to predict a test data set. The approach uses two statistics available from iterations of the stable Markov Chain of the MCMC algorithm. Where l is the likelihood evaluated at a given set of parameter  is the average value values, the average deviance, D, ^ over many iterations of C)2ln(l) for a given model. D, is C)2ln(l) at the average parameter values of the same model over a large number of successive iterations. Spiegelhalter et al. show the effective number of parameters, pD, is approximately pD=D  D^ and the

769  DIC is D+pD. Thus, as with AIC, the DIC penalizes improvement in fit for loss of parsimony. IMPLEMENTATION IN WINBUGS There is no single method to implement the genetic IRT model in MCMC. In the current beta version of WinBUGS 1.4 (Spiegelhalter et al., 2003) we parameterized the model for twin resemblance as a variance components model (c.f. Jinks and Fulker, 1970). Apart from imparting a natural ‘‘flow’’ to the computations, this formulation is, strictly, more appropriate when the assignment to first and second twins is random rather than fixed. The WinBUGS code for the model for binary items is given in Appendix 1. In describing the model, we follow the notation in the appendix. The simplest ‘‘genetic’’ latent trait model assumes what has been called the ‘‘common pathways’’ model in which the loadings of the items on the latent genetic trait are a constant multiple of the corresponding environmental loadings. (see e.g., Martin and Eaves, 1977; Neale and Cardon, 1992). In this case, there is a single latent trait, X, (ignoring specific effects). The contributions of genes and environment may then be specified to the within- and between-pair variance components for MZ and DZ twins. Given the within-family environmental variance (E) is scaled to unity, we allow the total (additive) genetic variance (r2Æ g) to be free. Following Jinks and Fulker (1970) we then write the components of variance between MZ and DZ pairs as r2bMZ ¼ r2  g; and r2bDZ ¼ r2  g=2: Similarly, the components of variance within pairs are r2wMZ ¼ 1; and r2wDZ ¼ 1 þ r2  g=2: It is a minor alteration to include the effects of the between families environment (C) to the between-pair variance components if needed. The proportion of variance in the latent trait attributable to additive genetic factors is thus h2 ¼ r2  g=ð1 þ r2 2  gÞ: We chose the variance components formulation of the quantitative genetic model rather than the

770 variance–covariance formulation (now) more familiar to twin researchers. The variance components formulation of the nested analysis of variance has a more transparent relationship with the form of hierachial mixed models that permit a wide range of extensions of the twin design to other applications. It also reflects more faithfully the assumption implicit in most genetic models for twin resemblance that likesex twins are unordered with a pair. Furthermore, the independence of between and within-pair effects in the anova model simplifies the coding of the MCMC application. The MCMC algorithm generates successive samples from the full conditional distribution of subject and item parameters employing a sequence of iterations which, ultimately, are (non-independent) samples from the posterior distribution of the model parameters, including item parameters, individual subject parameters, and parameters of the genetic model. Variance components are assumed to be sampled from a gamma distribution. Other parameters are assumed to be normal. The pair means and within-pair deviations from the pair means are assumed to be normal. Means of MZ pairs are N[0, r2bMZ]. Means of DZ pairs are N[0, r2bDZ]. The corresponding within pair deviations are assumed to be N[0, 1] for MZ twins and N[0, r2wDZ] for DZ twins. WinBUGS employs the amount of information (‘‘precision’’, s) to parameterize the variability of samples. Thus, the ‘‘precision’’ of MZ pair means is sÆg=1/r2Æ g and the precision of DZ pair means is 2(sÆgl) . In our case, we have no prior knowledge of the parameters of the distributions of the model parameters so we assume very broad (‘‘uninformative’’) prior distributions for the items parameters and variance components. In the WinBUGS code (Appendices 1 and 2) the uninformative priors are represented by assigning very small values to the precision of the prior gamma and normal distributions (see also examples in Spiegelhalter et al., 2003). We compared the multi-category ‘‘common pathway’’, C(AE), model with a number of alternatives. The more complex models were: (1) The common pathway model plus genetic specifics on the individual items, C(AE)+S; 2) The separate pathways model with no specifics, AE; 3) The separate pathways model with item-specific genetic effects, AE+S. Appendix 2 gives the modified WinBUGS code for the multi-category case with independent pathways for genetic and environmental effects and item specific genetic effects. The principal alteration reflects the fact that dichotomous items represent

Eaves, Erkanli, Silberg, Angold, Maes, and Foley Bernoulli trials with probability of ‘‘success’’ determined by the item parameters and the latent trait. Multi-category responses are sampled from the multinomial distribution. RESULTS Figures 1 and 2 illustrate the performance of the MCMC algorithm for some of the parameters of the multi-category IRT model for 1000 cycles after a 1000 iteration burn in. The item difficulty parameters, a[15,1] and a[15,2], for the lower and upper thresholds, respectively are shown and the sensitivity parameter b[15] for item 15 (‘‘The future is no good’’) and the proportion of variance explained by additive genetic factors, h2. The absence of any apparent long-term cycling of the parameter values across the time series suggests that the MCMC algorithm is yielding reliable estimates of the parameter values (c.f. Gilks et al., 1996). Figure 2 shows the sequence of iterations for the latent trait scores of a pair of MZ twins. Based on the 1000 iterations presented, the 95% confidence intervals of the subjects’ scores are 0.150

Suggest Documents