Structured Antedependence Models for Functional ...

1 downloads 0 Views 277KB Size Report
processes to model the mean vector and by structured antedependence (SAD) models to approxi- mate time-dependent covariance matrices for longitudinal ...
Statistical Applications in Genetics and Molecular Biology Volume 4, Issue 1

2005

Article 33

Structured Antedependence Models for Functional Mapping of Multiple Longitudinal Traits Wei Zhao∗

Wei Hou†

Ramon C. Littell‡

Rongling Wu∗∗



University of Florida, [email protected] University of Florida, [email protected] ‡ University of Florida, [email protected] ∗∗ University of Florida, [email protected]

c Copyright 2005 by the authors. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the author. Statistical Applications in Genetics and Molecular Biology is produced by The Berkeley Electronic Press (bepress). http://www.bepress.com/sagmb

Structured Antedependence Models for Functional Mapping of Multiple Longitudinal Traits∗ Wei Zhao, Wei Hou, Ramon C. Littell, and Rongling Wu

Abstract In this article, we present a statistical model for mapping quantitative trait loci (QTL) that determine growth trajectories of two correlated traits during ontogenetic development. This model is derived within the maximum likelihood context, incorporated by mathematical aspects of growth processes to model the mean vector and by structured antedependence (SAD) models to approximate time-dependent covariance matrices for longitudinal traits. It provides a quantitative framework for testing the relative importance of two mechanisms, pleiotropy and linkage, in contributing to genetic correlations during ontogeny. This model has been employed to map QTL affecting stem height and diameter growth trajectories in an interspecific hybrid progeny of Populus, leading to the successful discovery of three pleiotropic QTL on different linkage groups. The implications of this model for genetic mapping within a broader context are discussed. KEYWORDS: Structured antedependence model, EM algorithm, Growth curve, Multivariate analysis, Quantitative trait loci



We thank two anonymous referees for their constructive comments on this manuscript. This work is partially supported by grants from the National Science Foundation of China to R. W. (09 95671). The publication of this manuscript is approved as journal series No. R-10586 by the Florida Agricultural Experimental Station.

Zhao et al.: Multivariate SAD Models for Functional Mapping

1

1

Introduction

Many economically or biomedically important traits, such as plant size, tumor size or daily milk yield, are expressed continuously throughout life or for a period of life. These traits, called longitudinal traits, are traditionally analyzed in terms of a set of heritabilities at each age and correlations between different ages, but with no consideration of the time dependent continuity that must exist between successive ages. Kirkpatrick and colleagues introduced the idea of genetic covariance functions to study growth data (Kirkpatrick and Heckman, 1989; Kirkpatrick et al., 1990; Kirkpatrick et al., 1994). Schaeffer and his group proposed an alternative approach, random regression, to model individual curves (Jamrozic and Schaeffer, 1997; Jamrozic et al., 1997). More extensive parametric modelling for the covariance structure has been performed by Pletcher and Geyer (1999) and Jaffr´ezic and Pletcher (2000). In real life, it is common for the change of one trait to cause a correlated change in the other due to some shared physiological mechanisms. For example, the height growth of the main stem in forest trees is always correlated with its radial growth (Wu and Stettler, 1996). This issue has intrigued foresters and tree physiologists for many decades (Causton, 1985), because knowledge about the height-diameter correlation helps to select superior genotypes with optimal architecture and high-yielding stem wood production. Trees seem to have developmental mechanisms that relate their height growth with diameter growth in a coordinated manner. Such coordinated mechanisms have been thought to be under strong genetic control. Jaffr´ezic et al. (2003, 2004) have recently extended parametric modelling to analyze multiple genetically correlated traits, but their approaches based on pure phenotypic data cannot identify specific quantitative trait loci (QTL) that determine genetic correlations between traits. With the advent of molecular marker technologies, we are in an excellent position to map and characterize individual QTL involved in quantitative variation. It is now essential to develop a novel statistical model for mapping QTL that contribute to the time-dependent correlation between different traits. More recently, a statistical method implemented with growth model theories has been proposed for QTL mapping (Ma et al., 2002; Wu et al., 2004a, 2004b). The basic principle of this method, called functional mapping, is to express the genotypic means of a QTL at different time points in terms of a continuous growth function with respect to time t. Under this principle, the parameters describing the shape of growth curves, rather than the genotypic means as expected in traditional mapping strategies, are estimated

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

2

Vol. 4 [2005], No. 1, Article 33

within a maximum likelihood framework. Unlike traditional mapping strategies, functional mapping estimates the parameters that model the structure of the covariance matrix among multiple different time points and, therefore, largely reduces the number of parameters being estimated for variances and covariances especially when the number of time points is large. The objective of this study is to extend the newly developed growthbased statistical method to map QTL affecting growth trajectories of two correlated traits. There has been much discussion about functional mapping to model the time-specific residual covariance matrix based on the first-order autoregressive (AR(1)) model (Wu et al., 2004b). The AR(1) model assumes the stationarity of residual variance and covariance across different ages. To ensure the residual errors homoscedastic and normal, a transformation approach of the effect phenotypes has been used. With the transformation at both sides (TBS) (Carroll and Ruppert, 1984), Wu et al. (2004b) were able to maintain the functional relationship between phenotypic change and age within their QTL mapping context. Although the TBS-based model has potential to relax the assumption of variance stationarity, however, the covariance stationarity issue remains unsolved. N´ un ˜ez-Ant´on and Zimmerman (2000) proposed a so-called structured antedependence (SAD) model to model the age-specific change of correlation in the bivariate analysis of longitudinal traits. The SAD model has been employed in several studies and displays many favorable properties (Zimmerman and N´ un ˜ez-Ant´on, 2001). In this article, we incorporate the SAD model into the QTL mapping framework for bivariate longitudinal traits by adequately modelling the covariance structure. We will use an example for stem height and diameter growth in Populus to demonstrate the utilization of our model. It can be anticipated that the model proposed in this article will have great implications for the design of an efficient early selection program and for asking and addressing biological questions at the interface of genetics, development and evolution.

2 2.1

The Method The SAD model

The antedependence model was originally proposed by Gabriel (1962). It states that an observation at a particular time t depends on the previous

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

3

ones, with the degree of dependence decaying with time lag. If an observation at time t is independent of all observations before t − r, this antedependent model is thought to be of r order. The antedependent model is extended to fit the structure of time-dependent variance and correlation, leading to the structured antedependent (SAD) model (N´ un ˜ez-Ant´on, 1997; N´ un ˜ez-Ant´on and Zimmerman, 2000). We will incorporated the SAD model to the QTL mapping framework for bivariate longitudinal traits. Let us consider a simple backcross design of n progeny derived from two contrasting homozygous inbred lines and a genetic linkage map constructed for the backcross. For each backcross progeny i, we measure two different longitudinal traits, xi and yi , at a finite set of times, 1, · · · , T . Assume that all progeny are measured at the same set of time points with an even interval. The genetic map will be used to map possible pleiotropic QTL, i.e., those that jointly affect the two longitudinal traits. Let j = 1, 2 denote two possible QTL genotypes, QQ and Qq, respectively, in the backcross. Assuming the first-order structured antedependence (SAD(1)) model for residual errors, the relationship between two traits x and y at time t for individual i which is determined by a putative QTL can be modelled by xi (t) =

2 

ξij g1j (t) + e1i (t)

j=1

=

2 

ξij g1j (t) + φ1 e1i (t − 1) + ψ1 e2i (t − 1) + 1i (t),

(1)

j=1

yi (t) =

2 

ξij g2j (t) + e2i (t)

j=1

=

2 

ξij g2j (t) + φ2 e2i (t − 1) + ψ2 e1i (t − 1) + 2i (t),

(2)

j=1

where ξij is the indicator variable denoted as 1 if a QTL genotype j is considered for individual i and 0 otherwise, gkj (t) is the genotypic value of QTL genotype j for trait k (k = 1 for traits x and 2 for trait y) at time t, eki (t) is the residual of individual i for trait k including the polygenic and random error effects, φk and ψk are the antedependence parameters induced by trait k itself or by the other trait, respectively, and 1i (t) and 2i (t) are the error terms assumed to be bivariate normally distributed with mean zero and variance

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

4

matrix,

 Σ (t) =

γ12 (t)

γ1 (t)γ2 (t)ρ(t)

γ1 (t)γ2 (t)ρ(t)

γ22 (t)

Vol. 4 [2005], No. 1, Article 33

 ,

where γ12 (t) and γ22 (t) are termed time-dependent “innovation variances” that can be described by a parametric function like as a polynomial of time (Pourahmadi, 1999) log γ12 (t) = u1 + v1 t + w1 t2 log γ22 (t) = u2 + v2 t + w2 t2 ,

(3) (4)

and ρ(t) is the correlation between the error terms of the two traits, which is assumed to be a function of time t, expressed as (Pourahmadi, 1999; Jaffr´ezic et al., 2003) ρ(t) = Corr(1 (t), 2 (t)) = e−λ1 t − e−λ2 t .

(5)

There is no correlation between the error terms of two traits at different time points, i.e., Corr(1 (t1 ), 2 (t2 )) = 0 (t1 = t2 ). The matrix representation of a bivariate model (Eqs. 1 and 2) for specifying the time-dependent residuals based on the SAD(1) model can be written as ⎛ ⎞ ⎞ ⎛ e1 (1) 1 (1) ⎞ ⎛ ⎜ e2 (1) ⎟ ⎟ I 0 ... 0 0 ⎜ ⎜ ⎟ ⎜ 2 (1) ⎟ ⎟ ⎜ e1 (2) ⎟ ⎜ V ⎜ I . . . 0 0 ⎟ ⎜ 1 (2) ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ e2 (2) ⎟ ⎜ V2 ⎟ ⎜ V ... 0 0 ⎟ (6) ⎟ ⎜ 2 (2) ⎟ , ⎜ ⎟=⎜ ⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ . .. . . . . . . . ⎜ . ⎟ ⎝ . . . . ⎠⎜ . ⎟ . ⎜ ⎟ ⎟ ⎜ T −1 T −2 ⎝ e1 (T ) ⎠ ⎝ 1 (T ) ⎠ V ... V I V e2 (T ) 2 (T )

where V=



and I=

φ1 ψ1 ψ2 φ2 1 0 0 1

,

.

Denote the left hand side of Eq. 6 by e whose covariance matrix is expressed as Var(e) = Σ. Pourahmadi (1999) and N´ un ˜ez-Ant´on and Woodworth (1994) provided a procedure for calculating the this covariance matrix

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

5

and its inverse, Σ−1 , based on the Cholesky decomposition of the matrix. To specify the variance matrix, Σ, based on the bivariate SAD(1) model, we need to define a vector of parameters arrayed in Ωv = (uk , vk , wk , λk , φk , ψk ). In the Appendix, we provide the closed forms for the determinant and inverse of the matrix Σ which are embedded within the standard EM algorithm to provide efficient estimates of Ωv (see below).

2.2

The growth law

The time-dependent genotypic values of different QTL genotypes for the two traits described in Eqs. 1 and 2 can be modelled by a particular mathematical function with biological meanings. For growth, the sigmoidal (or logistic) function is thought to capture age-specific change in growth in living systems (West et al., 2001). The logistic growth curve as a biological law can be mathematically described by a (7) g(t) = 1 + be−rt where A is the asymptotic or limiting value of g when t → ∞, a/(1 + b) is the initial value of g when t = 0 and r is the relative rate of growth (von Bertalanffy, 1957). If different genotypes at a putative QTL have different combinations of these parameters, this implies that this QTL plays a role in governing the difference of growth trajectories. Assuming that a putative QTL affects growth curves, the genotypic values of QTL genotype j for two traits x and y can be modelled by Eq. 7, with parameters arrayed in Ωg = (Ωgkj ) = (akj , bkj , rkj ).

2.3

The likelihood function

The statistical foundation of QTL detection and mapping is the mixture model. In this mixture model, each observation zi = (xi , yi ) is assumed to have arisen from one of J (J possibly unknown but finite) components, each component being modelled by a density from the parametric family f : zi ∼ p(zi |i , ϕ, η) = 1|i f (zi ; ϕ1 , η1 ) + · · · + J|i f (zi ; ϕJ , ηJ ),

(8)

where i = (1|i , · · · , J|i ) are the mixture proportions which are constrained to be non-negative and sum to unity; ϕ = (ϕ1 , · · · , ϕJ ) are the component specific parameters, with ϕl being specific to component l; and η is a parameter which is common to all components.

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

6

Vol. 4 [2005], No. 1, Article 33

In the backcross population, j|i , which is progeny-specific, presents the conditional probability of QTL genotype j for progeny i given its marker genotype at two flanking markers Mη and Mη+1 . The forms of these conditional probabilities depend on the recombination fraction between the QTL and the markers (and therefore the QTL position described by θ). The parametric family f is presented by a multivariate normal distribution with the mean vector and covariance matrix fit by a set of parameters, Ω = (Ωg , Ωv ), that specify the growth curves and SAD model, respectively. The likelihood of bivariate longitudinal observation, z, in the backcross population, affected by a QTL, is formulated by n

 1|i f1 (zi ) + 2|i f2 (zi ) , L(Ω, θ|z, M) =

(9)

i=1

where fj (z) =

1 (2π)T /2 |Σ|1/2

  1 −1 T exp − (z − gj )Σ (z − gj ) , 2

with the mean vector gj = (g1j (1), g2j (1) · · · g1j (T ), g2j (T ))

(10)

that can be substituted by Eq. 7, and the covariance matrix that is modelled by Ωv . In practice, it is needed to make a few simplifications about the modelling of the structure of Σ. First, we assume that the innovative variances, γ12 and γ22 are constant over all the T time points. With this assumption, the residual variance for trait k at time t is given by Var(ek (t)) =

1 − φ2t k 2 γ . 1 − φ2k k

For t2 ≥ t1 , the correlation function of the SAD(1) model is written as  1 1 − φ2t k , t2 > t1 . Corr(ek (t1 ), ek (t2 )) = φkt2 −t1 2 1 − φ2t k

(11)

(12)

It can be seen from Eq. 11 that even for constant innovation variances, the residual variance can change with time (Jaffr´ezic et al., 2003). Also, according to Eq. 12, the correlation function is non-stationary for the SAD model because the correlation does not depend only on the time interval t2 − t1 .

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

7

Second, we assume that the innovation correlation exists only between the two traits at the same time point but not between different time points. Also, such an innovation is constant over different ages, i.e., Corr(1 (t), 2 (t)) = ρ. Third, the antedependence parameters caused by correlated traits are equal to zero, i.e., ψ1 = ψ2 = 0 (see also Jaffr´ezic et al., 2003). Thus, the residual correlation between the two traits at different times, t1 for trait x and t2 for trait y, is derived as ⎛ ⎞−1 2t2 1 φ2t2 −t1 − φt11 φt22 ⎝ 1 − φ2t 1 − φ 1 2 ⎠ ρ, t2 > t1 (13) Corr(e1 (t1 ), e2 (t2 )) = 1 − φ1 φ2 1 − φ21 1 − φ22 It is seen from Eq. 13 that the across-correlation function are not symmetrical for the SAD model, i.e., the correlation between trait x at time t1 and trait y at time t2 is not equal to the correlation between trait x at time t2 and trait y at time t1 .

2.4

Computational algorithm

Two different methods are used to estimate the QTL position and the parameters modelling the mean vector and covariance matrix. In practical computations, the QTL position parameter (θ) can be viewed as a fixed parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The log-likelihood ratio test statistic for a QTL at a particular map position is displayed graphically to generate a likelihood map or profile. The genomic position that corresponds to a peak of the profile is the maximum likelihood estimate (MLE) of the QTL location. The parameters that model the mean-covariance structure can be estimated with the EM algorithm (Dempster et al., 1977; Lander and Botstein, 1989; Ma et al., 2002; Wu et al., 2004a, 2004b). The Nelder-Mead simplex algorithm, originally proposed by Nelder and Mead (1965), provides a more efficient way to estimate these parameters (Zhao et al., 2004a). This algorithm is a direct search method for nonlinear unconstrained optimization. It attempts to minimize a scalar-valued nonlinear function using only function values, without any derivative information (explicit or implicit). The algorithm uses linear adjustment of the parameters until some convergence criterion is met. The term “simplex” arises because the feasible solutions for the parameters may be represented by a polytope figure called a “simplex”. The simplex

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

8

Vol. 4 [2005], No. 1, Article 33

is a line in one dimension, triangle in two dimensions and tetrahedron in three dimensions, respectively. As shown in Ma et al. (2002), a standard EM algorithm can be implemented to estimate the unknown parameters by differentiating the loglikelihood function of Equation 9 with respect to each unknown, setting the derivatives equal to zero and solving the resultant log-likelihood equations. Each log-likelihood equation is described in terms of the posterior probabilities for each individual to carry a particular QTL genotype. Thus, a loop of iterations are defined by the E step in which the posterior probability is calculated and the M step in which each unknown is solved using the log-likelihood equation. For this particular functional mapping here, it is very difficult to derive the log-likelihood equation for the curve parameters (Ωgkj ) because they exist in non-linear expressions. As demonstrated by Zhao et al. (2004a), the simplex algorithm can be used as an alternative approach to estimate the curve parameters. One of the favorable properties of the SAD(1) model is the existence of the closed forms of the residual covariance matrix and its determinant and inverse (see the Appendix). These closed forms make it possible to derive the closed forms of the log-likelihood equations for the matrix-structuring parameters contained in Ωv = (uk , vk , wk , λk , φk , ψk ). In this study, we combine the strengths of the simple algorithm for the estimation of Ωgkj and the EM algorithm for the estimation of Ωv . After the point estimates of parameters are obtained by the simplex algorithm, we derive the approximate variance-covariance matrix and evalΩ g ,Ω  v ). The techniques for uate the sampling errors of the estimates (θ, kj so doing involve calculation of the incomplete-data information matrix which is the negative second-order derivative of the incomplete-data log-likelihood. The incomplete-data information can be calculated by extracting the information for the missing data from the information for the complete data (Louis, 1982). A different so-called supplemented EM algorithm or SEM algorithm was proposed by Meng and Rubin’s (1991) to estimate the approximate variancecovariance matrices, which can also be used for the calculations of the sampling errors for the MLEs of the parameters (θ, Ωmkj , Ωv ).

2.5

Model Selection

Jaffr´ezic et al. (2003) proposed an ad hoc approach for model selection. Their strategy is to increase the antedependence order until the additional antede-

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

9

pendence coefficient is close to zero. N´ un ˜ez-Ant´on and Zimmerman (2000) proposed to use AIC information criteria to select the best model. Hurvich and Tsai (1989) showed that AIC can drastically underestimate the expected Kullback-Leibler information when only few repeated measurements are available. Instead, they derived a corrected AIC, expressed, for trait k, as

T +r 2 AICC = T log γˆk + (14) T − (r + 2) where γˆk2 is the white noise (innovation) variance, T is the number of repeated measurements and r is the order of the model. The number of parameters is heavily penalized that models selected by AICC are typically much more parsimonious than those selected by AIC (Hurvich and Tsai, 1989). The model corresponding to a minimum AICC is chosen.

2.6

Hypothesis tests

With longitudinal observations for both traits, the hypotheses tested in this article are H0 : Ωgkj ≡ Ωgk , k = 1, 2 H1 : at least one of the equalities above does not hold,

(15)

where Ωgk is the set of parameters for the mean curve. The log-likelihood values L0 and L1 under the H0 and H1 are calculated. The test is performed with a log likelihood ratio statistic

L0 . (16) LR = −2 ln L1 An empirical approach for determining the critical threshold is based on permutation tests, as advocated by Churchill and Doerge (1994). By repeatedly shuffling the relationships between marker genotypes and phenotypes, a series of the maximum log-likelihood ratios are calculated, from the distribution of which the critical threshold is determined. The LR statistic is plotted against test locations and a high LR corresponds to the position of QTL. The pleiotropic effect of this QTL on both traits x and y can be tested by formulating the null hypotheses, separately, H01 : Ωg1j ≡ Ωg1 ,

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

10

Vol. 4 [2005], No. 1, Article 33

and H02 : Ωg2j ≡ Ωg2 . Only these two null hypotheses are rejected can the QTL be considered to pleiotropically affect the two traits. Another mechanism for genetic correlation is the linkage of different QTL on a nearby genomic location. In other words, if one QTL only affects trait x and a second QTL only affects trait y but they are located together on a genomic region, traits x and y will be genetically correlated. Our model provides a way to test the relative importance of pleiotropy and linkage on genetic correlation. Assuming two QTL, one for trait x and the other for trait y, the likelihood function of bivariate longitudinal traits can now be written as L(Ω, θ|z, M) =

n

 11|i f11 (zi ) + 12|i f12 (zi ) + 21|i f21 (zi ) + 22|i f22 (zi ) , i=1

(17) where j1 j2 |i is the conditional probability of joint genotypes j1 j2 (j1 , j2 = 1, 2) from QTL 1 and 2, conditional upon a marker genotype for individual i. The locations of the two putative QTL are described by θ1 and θ2 . The vectors of the curve parameters, Ωgj1 j2 , are specified separately for different joint QTL genotypes. To determine whether there is a single QTL for traits x and y, we formulate the following hypothesis H0 : θ1 = θ2 H1 : θ1 = θ2 . The rejection of the null hypothesis implies that traits x and y are genetically correlated because of two different but linked QTL.

3

Results

The plant materials used was derived from the triple hybridization of Populus. A P. deltoides clone (designated I-69) was used as a female parent to mate with an interspecific P. deltoides × P. nigra clone (designated I-45) as a male parent (Wu et al., 1992). Both P. deltoides I-69 and P. euramericana (P. deltoides × P. nigra) I-45 were selected at the Research Institute for Poplars

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

11

in Italy in the 1950s and were introduced to China in 1972. In the spring of 1988, a total of 450 one-year-old rooted three-way hybrid seedlings were planted at a spacing of 4 × 5 m at a forest farm near Xuzhou City, Jiangsu Province, China. The total stem heights and diameters measured at the end of each of 11 growing seasons are used in this example. A subset (90) of genotypes randomly selected from the 450 hybrids were used to construct parent-dependent genetic linkage maps with random amplified polymorphic DNAs, amplified fraction length polymorphisms and inter-simple sequence repeats based on a two-way pseudo-test backcross design (Yin et al., 2002). These maps provide a basic framework for mapping growth-specific QTL with our bivariate model. As an example, our QTL analysis will be based on the genetic map constructed from heterozygous markers segregating in the parent P. deltoides. The logistic curve described by Eq. 7 was used to fit the growth trajectories of stem height and diameter for each hybrid tree using non-linear least squares approaches. Statistical tests indicate a good fitness at the significance level P < 0.001. In general, stem height is strongly correlated with stem diameter, with the greater correlations at younger than older ages (Fig. 1). It is thus possible that specific QTL are involved in determining the genetic correlation between stem height and diameter. Given the dynamic nature of the correlation, the genetic effects of these QTL should also be age-dependent. To use the SAD model for mapping QTL that are responsible for both height and diameter growth trajectories in this Populus dataset, we need to determine the best order of it. A more precise approach for so doing is to estimate the residual variances under the full model of hypothesis (15) and further calculate the AICC values based on different orders using Eq. 14. However, this would be computationally expensive because at each antedependence order we need to estimate numerous parameters. Here, we instead based our order determination on the reduced model of hypothesis (15) in which there is only one mean curve that explains the growth data. The AICC values were found to increase with order from models SAD(1) to SAD(6) for both stem height and diameter (Table 1). Also, for the SAD model of higher orders, the first-order antedependence coefficient is markedly higher than the higher-order coefficients. Thus, we think that the SAD(1) model should be reasonable to fit the pseudo-test backcross data used in this example. Along with the constant innovation variance, this model was incorporated to approximate the structure of the covariance matrix for growth trajectories in this interspecific hybrid progeny of Populus.

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

Vol. 4 [2005], No. 1, Article 33

Year 1 (0.89)

Year 2 (0.85)

Year 3 (0.89)

Year 4 (0.85)

Year 5 (0.82)

Year 6 (0.79)

Year 7 (0.66)

Year 8 (0.67)

Year 9 (0.66)

Year 10 (0.68)

Year 11 (0.64)

Height (m)

12

Diameter (cm) Figure 1: Scatter plots of stem height (measured in meters) against diameter (measured in centimeters) at different ages during ontogeny in the pseudotest backcross progeny derived from an interspecific cross in Populus. Sample correlations between the two growth traits at different ages are given. Note that these scatter plots are displayed with different scales, aimed to make a clear comparison across ages.

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

13

Table 1: AICC information criteria for stem height and diameter growth from an interspecific poplar hybrid progeny. The best model is the one with the minimum corrected AIC value. Structure

AICC

γˆ 2

SAD(1) SAD(2) SAD(3) SAD(4) SAD(5) SAD(6)

8.26 11.68 16.58 23.78 34.77 53.09

SAD(1) SAD(2) SAD(3) SAD(4) SAD(5) SAD(6)

17.81 21.10 25.94 33.26 44.25 62.58

φ1

φ3

φ4

φ5

φ6

0.47 0.45 0.44 0.43 0.43 0.43

φ2 Height 1.03 1.25 -0.28 1.29 -0.47 1.29 -0.43 1.29 -0.42 1.29 -0.42

0.21 0.07 0.06 0.06

0.15 0.19 0.17

-0.05 -0.009

-0.04

1.13 1.06 1.02 1.02 1.02 1.02

Diameter 1.07 1.30 -0.29 1.35 -0.52 1.35 -0.51 1.35 -0.50 1.35 -0.50

0.23 0.19 0.17 0.18

0.04 0.08 0.06

-0.04 -0.008

-0.04

γˆ 2 is the estimated white noise variance and φ’s are the coefficients of the SAD models. By scanning all the linkage groups, two QTL on groups 9 and 10 have been detected in terms of the genome-wide critical threshold which each affect both height and diameter growth trajectories. We also detected a chromosomewide QTL on linkage group 15. Figure 2 illustrates a plot of the likelihood ratios (LRs) between the full (there is a QTL) and reduced model (their is no QTL) across all the linkage groups. For a comparison, the LR profiles for a single trait analysis are also plotted. The three detected pleiotropic QTL are located at 114 cM on linkage group D9, 16 cM on linkage group D10 and 64 cM on linkage group D15 because the LR peak at these positions far exceed the genome- or chromosome-wide critical thresholds. The permutation test was performed to determined the empirical threshold for declaring the existence of QTL throughout all the linkage groups. The detection of the QTL on linkage group D15 that cannot be detected from separate mapping of height or diameter suggests that the bivariate mapping model has greater power than traditional single trait mapping models. However, this is not always true; in some cases, e.g., linkage groups 9 and 10, these two approaches gave similar

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

14

Vol. 4 [2005], No. 1, Article 33

results possibly because the two traits display an approximately identical QTLphenotype correlation or such a correlation for one of the traits is close to zero (Jiang and Zeng 1995). Table 2 tabulates the MLEs of the curve parameters and parameters that model the structure of the variance matrix, as well as the asymptotic standard errors of these estimates estimated from the Fisher’s information matrix. All the parameters can be estimated with reasonable precision. The MLEs of the curve parameters in Table 2 were used to draw the growth curves of the two genotypes at each QTL for both height and diameter (Fig. 3). It appears that the genetic effects of the three detected QTL on growth trajectories display some similarities and discrepancies in dynamic pattern between height and diameter. All of these QTL affect the timing of the inflection point, i.e., the age at which growth rate reaches a maximal value (West et al., 2001), for both height and diameter (Fig. 3). Also, the timing of the inflection point for the same QTL genotype occurs approximately at the same age for height and diameter. Table 2: The MLEs of the model parameters and their asymptotic standard errors in the parentheses under the SAD(1) model for stem height and diameter growth from an interspecific poplar hybrid progeny Chr. Posino. tion Trait 9 114 Height

QQ

ˆb1 a ˆ1 rˆ1 18.13 5.40 0.48 (0.50) (0.23) (0.012) Diameter 25.28 9.93 0.65 (0.88) (0.52) (0.017) 17.17 5.30 0.49 (0.51) (0.23) (0.013) Diameter 22.74 9.68 0.70 (0.85) (0.57) (0.020)

Qq ˆb2 φˆk a ˆ2 rˆ2 ρˆ γˆk2 15.72 5.05 0.53 0.48 1.02 (0.44) (0.21) (0.014) 0.65 (0.024) (0.016) 19.96 9.58 0.78 (0.020) 1.05 1.07 (0.72) (0.57) (0.023) (0.053) (0.014)

10

16 Height

16.31 5.15 0.52 0.48 1.03 (0.45) (0.21) (0.013) 0.66 (0.025) (0.016) 21.77 9.76 0.73 (0.020) 1.11 1.07 (0.80) (0.56) (0.022) (0.057) (0.014)

15

64 Height

16.42 5.27 0.51 0.49 1.02 (0.47) (0.23) (0.013) 0.66 (0.025) (0.016) 21.50 9.87 0.73 (0.020) 1.10 1.08 (0.79) (0.58) (0.023) (0.055) (0.013)

17.12 5.30 0.50 (0.47) (0.22) (0.013) Diameter 22.60 9.78 0.71 (0.81) (0.55) (0.022)

We draw the age-dependent curve of genetic effect for each QTL on

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

D1

D2

15

D3−1

D3−2

D4

D4−1

D4−2

D4−3

40 30 20 10 0 D5

D6

D7

D8

D9

D10

D11

40 30

Likelihood Ratio (LR)

20 10 0 D13

D14

D15

D16

D17

D18

40 30 20 10 0 D19

D12

40 30

20 cM

20

Diameter Height Combined

10 0

Test Position Figure 2: The profile of log-likelihood ratio (LR) between the full and reduced model estimated from the bivariate SAD(1) model for analyzing stem height and diameter growth trajectories in an interspecific poplar hybrid progeny across all the linkage groups. The LRs of single trait analyses for height and diameter are obtained using Zhao et al.’s (2005) approach and plotted in broken or dot curves. The LR profiles for bivariate and single traits are expressed in red and blue, respectively. The horizontal lines indicate the critical thresholds for the declaring the genome- (solid) or chromosome-wide (dot) existence of QTL for the bivariate analysis determined on the basis of 100 permutation tests. The ticks on the x-axis indicate the positions of molecular markers. The linkage map used is one constructed with heterozygous markers from the P. deltoides parent (Yin et al., 2002).

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

16

Vol. 4 [2005], No. 1, Article 33

height and diameter using the additive genetic effect function ak (t) = gk1 (t) − gk2 (t) (Fig. 4). The QTL on linkage groups D10 and D15 tend to trigger their effects on growth earlier for height than diameter. Although the genetic effect increases with age for both height and diameter, the slope of the genetic effect is much steeper on diameter than height growth.

4

Discussion

Understanding the genetic architecture of different biological traits and their co-regulation mechanisms during ontogeny is fundamental to quantitative developmental genetics (Cheverud et al., 1983, 1996; Atchley, 1984; Atchley and Zhu, 1997; Vaughn et al., 1999; Mackay, 2001). Traditionally, this knowledge has been thought to be crucial for the study of evolutionary biology and design of plant and animal breeding (Lynch and Walsh, 1998; Schlichting and Pigliucci, 1998). With a continuously increasing demand in these fields, it is emerging as a conceptual framework for shedding light on the genetic basis for disease pathogenesis in biomedical research. In this article, we have developed a new statistical model for functional mapping of QTL that affect growth curves of multiple different traits. Functional mapping incorporating the longitudinal nature of complex traits provides a novel avenue for identifying and mapping QTL for biological processes (Ma et al., 2002; Wu et al. 2004a, 2004b). Extensive attempts have been made to incorporate the principle of functional mapping into the molecular dissection of ubiquitous but difficult biological problems related to QTL-QTL interactions (i.e., epistasis) (Wu et al., 2004a), QTL-sex interactions (Zhao et al., 2004b) and environment-dependent expression of QTL effects (Zhao et al., 2004c). The model presented in this article concerns the genetic mapping of multiple biologically related traits and can be regarded as an important supplement to the current models available. As compared to the published models for QTL mapping of single time points (Jiang and Zeng, 1995; Eaves et al., 1996; Knott and Haley, 2000; Korol et al., 2001), the model reported here has two advantages. The first advantage is in its biological relevance. It provides a quantitative testable framework for studying the phenotypic integration at the interplay between gene action and development at the molecular level. This model integrates information from ontogeny and timing of development to model the dynamic changes of genetic effects over a time course in development. There have been a wealth of evidence for the time-dependent alterations of genetic expression

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

17

20

20

Height (m)

25

15

15

10

10

5

5

0

2

4

6

8

Diameter (cm)

D9 25

10

Age (year)

20

20

Height (m)

25

15

15

10

10

5

5

0

2

4

6

8

Diameter (cm)

D10 25

10

Age (year)

20

20

Height (m)

25

15

15

10

10

5

5

0

2

4

6

8

Diameter (cm)

D15 25

10

Age (year)

Figure 3: Two growth curves each presenting a group of genotype, QQ (solid curves) and Qq (broken curves), for stem height (blue) and diameter (red) at the QTL, detected by our SAD-based model, on linkage group D9 (A), D10 (B) and D15 (C). The growth and ages at the inflection points are indicated on the growth curves, which are connected to the axes by the vertical and horizontal lines. At the age of the inflection point, estimated by ln b/r, growth rate reaches the maximal value (West et al. 2001).

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

Vol. 4 [2005], No. 1, Article 33

6

6 5

5

4

4

3

3

2

2

1

1

0

0

−1

2

4

6

8

10

Diameter (cm)

Height (m)

D9

−1

Age (year) 6

6 5

5

4

4

3

3

2

2

1

1

0

0

−1

2

4

6

8

10

Diameter (cm)

Height (m)

D10

−1

Age (year) 6

6 5

5

4

4

3

3

2

2

1

1

0

0

−1

2

4

6

8

10

Diameter (cm)

D15

Height (m)

18

−1

Age (year)

Figure 4: Dynamic changes of the additive genetic effect on stem height (blue) and diameter growth trajectories (red) due to the QTL detected on linkage groups D9 (A), D10 (B) and D15 (C)

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

19

(Cheverud et al., 1983, 1996; Atchley, 1984; Atchley and Zhu, 1997). For example, there may be completely different genes acting at different stages of development. This models shows tremendous power to detect these stagespecific genes. The second advantage of this model is that it capitalizes on the structured antedependence (SAD) model that has proven to be powerful for studying time series data in statistical problems (N´ un ˜ez-Ant´on & Zimmerman, 2000; Pourahmadi, 1999). The implementation of nonstationary variances and correlations through the SAD model overcomes the limitations due to the stationary assumptions used in the autoregressive (AR) model. Our bivariate model has been employed to detect pleiotropic QTL that govern both stem height and diameter growth trajectories during ontogeny in an interspecific poplar hybrid progeny. This model is powerful in that it can detect a significant QTL that cannot be detected when a single trait mapping approach is used (see also Jiang and Zeng, 1995; Eaves et al., 1996; Knott and Haley, 2000; Korol et al., 2001). The detection of QTL on linkage groups D9 and D10 is concordant to our previous findings by single trait mapping implemented with the AR model (Wu et al., 2003). Different effects of these detected QTL on stem height and radial growth have an ecological basis. In general, the QTL initiate their impacts earlier on height than diameter, but trigger greater effects on diameter than height during development. Plants tend to invest more resources to height than radial growth in earlier developmental stages in order to obtain adequate space in a competitive community, but tend to invest more resources to radial than height growth in order to maintain their competitive advantages. Plants regulate these competitiontypical processes via the activation of particular QTL that are associated with development. Longitudinal data analysis is one of the most pressing challenges encountered in applied statistics (Davidian and Giltinan, 1995; Diggle et al., 2002). More time points tend to provide better estimates of the parameters. However, according to our experience, when sample size is adequately large, for example 300 – 400, 4 – 6 time points can also provide reasonable estimates. In practice, a typical setting involves multidimensional measurements taken on subjects at irregular intervals with high levels of missingness for a variety of reasons, both informative and noninformative. Our model can be generalized to a dynamic mixture model to handle longitudinal data measured at uneven time intervals and with different measurement patterns among different individuals. By developing nonparametric approaches (Green and Silverman, 1994), our model can analyze longitudinal traits that do not obey a growth curve. It can be anticipated that the model proposed in this article and its

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

20

Vol. 4 [2005], No. 1, Article 33

extensions will handle various complexities of longitudinal data. It will have great implications for the design of an efficient early selection program in plant and animal breeding, for the identification of genes that control human disease progression, and for asking and addressing biological questions at the interface of genetics, development and evolution. References Atchley, W. R. (1984). Ontogeny, timing of development, and genetic variancecovariance structure. American Naturalist 123, 519-540. Atchley, W. R. and Zhu, J.(1997). Developmental quantitative genetics, conditional epigenetic variability and growth in mice. Genetics 147, 765-776. von Bertalanffy, L. (1957). Quantitative laws in metabolism and growth. Quarterly Review of Biology 32, 217-31. Carroll, R. J. and Ruppert, D. (1984). Power-transformations when fitting theoretical models to data. Journal of the American Statistical Association 79, 321-8. Causton, D. R. (1985). Biometrical, structural and physiological relationships among tree parts. In: Attributes of Trees As Crop Plants, Edited by M. G. R. Cannell and J. E. Jackson. Titus Wilson & Sons, Cumbria, UK. Churchill, G. A. and Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics 138, 963-71. Cheverud, J. M., Rutledge J. J., and Atchley, W. R. (1983). Quantitative genetics of development – Genetic correlations among age-specific trait values and the evolution of ontogeny. Evolution 37, 895-905. Cheverud, J. M., Routman,E. J., Duarte, F. A. M., van Swinderen, B., Cothran, K. and Perel, C. (1996). Quantitative trait loci for murine growth. Genetics 142, 1305-1319. Davidian, M. and Giltinan, D. M. (1995). Nonlinear Models for Repeated Measurement Data. Chapman and Hall, London. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1-38. Diggle, P. J., Heagerty, P., Liang, K. Y. and Zeger, K. Y. (2002). Analysis of Longitudinal Data. Oxford University Press, Oxford, UK. Eaves, L. J., Neale, M. C. and Maes, H. (1996). Multivariate multipoint

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

21

linkage analysis of quantitative trait loci. Behavior Genetics 26, 519-525. Gabriel, K. R. (1962). Ante-dependence analysis of an ordered set of variables. The Annals of Mathematical Statistics 33, 201-12. Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London, UK. Hurvich, C. M. and Tsai C. L. (1989). Regression and time series model selection in small samples. Biometrika 76, 297-307. Jaffr´ezic, F. and Pletcher, S. D. (2000). Statistical models for estimating the genetic basis of repeated measurements and other function-valued traits. Genetics 156, 913-922. Jaffr´ezic, Thompson, F., R. and Hill, W. G. (2003). Structured antedependence models for genetic analysis of repeated measures on multiple quantitative traits. Genetical Research 82, 55-65. Jaffr´ezic, F., Thompson, R. and Pletcher, S. D. (2004). Multivariate character process models for the analysis of two or more correlated function valued traits. Genetics 168, 477-487. Jamrozik, J. and Schaeffer, L. R. (1997). Estimates of genetic parameters for a test day model with random regressions for yield traits of first lactation Holsteins. Journal of Dairy Science 80, 762-770. Jamrozik, J., Kistemaker, G. J., Dekkers, J. C. M. and Schaeffer, L. R. (1997). Comparison of possible covariates for use in a random regression model for analysis of test day yields. Journal of Dairy Science 80, 2550-2556. Jiang, C. j. and Zeng Z. B. (1995). Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140,1111-1127. Kirkpatrick, M. and Heckman, N. (1989). A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. Journal of Mathematical Biology 27, 429-450. Kirkpatrick, M., Hill W. G. and Thompson R. (1994). Estimating the covariance structure of traits during growth and aging, illustrated with lactation in dairy cattle. Genetical Research 64, 57-69. Kirkpatrick, M., Lofsvold, D. and Bulmer, M. (1990). Analysis of the inheritance, selection and evolution of growth trajectories. Genetics 124, 979-993. Knott, S. A. and Haley, C. S. (2000). Multitrait least squares for quantitative trait loci detection. Genetics 156, 899-911.

Produced by The Berkeley Electronic Press, 2006

22

Statistical Applications in Genetics and Molecular Biology

Vol. 4 [2005], No. 1, Article 33

Korol, A. B., Ronin, Y. I., Itskovich, A. M., Peng, J. and Nevo, E. (2001). Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics 157, 1789-1803. Lander, E. S. and Botstein, D. (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185-199. Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B 44, 226-233. Lynch, M. and Walsh, B. (1998) Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Ma, C. X., Casella G. and Wu, R. L. (2002). Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics 161, 1751-1762. Mackay, T. F. C. (2001). Quantitative trait loci in Drosophila. Nature Reviews Genetics 2: 11-20. Meng, X. L., and Rubin, D. B. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86, 899-909. Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. Computer Journal 7, 308-313. N´ un ˜ez-Ant´on, V. (1997). Longitudinal data analysis: Non-stationary error structures and antedependent models. Applied Stochastic Models and Data Analysis 13, 279-287. N´ un ˜ez-Ant´on, V. and Zimmerman, D. L. (2000). Modeling nonstationary longitudinal data. Biometrics 56, 699-705. N´ un ˜ez-Ant´on, V. and Woodworth, G. G. (1994). Analysis of longitudinal data with unequally spaced observations and time-dependent correlated errors. Biometrics 50, 445-456. Pletcher, S. D., and Geyer, C. J. (1999). The genetic analysis of age-dependent traits: Modeling the character process. Genetics 153, 825-835. Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86, 677-90. Schlichting, C. D. and Pigliucci, M. (1998). Phenotypic Evolution: A Reaction Norm Perspective. Sinauer Associates, Sunderland, MA.

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

23

Vaughn, T. T., Pletscher,L. S., Peripato, A., King-Ellison, K., Adams, E., Erikson, C. and Cheverud, J. M. (1999). Mapping quantitative trait loci for murine growth: a closer look at genetic architecture. Genet. Res. 74, 313-322. West, G. B., Brown, J. H. and Enquist, B. J. (2001). A general model for ontogenetic growth. Nature 413, 628-631. Wu, R. L., Ma, C. X., Yang, M. C. K., Chang, M., Santra, U., Wu, S. S., Huang, M., Wang, M. and Casella, G. (2003). Quantitative trait loci for growth in Populus. Genetical Research 81, 51-64. Wu, R. L., Ma, C. X., Lin, M. and Casella, G. (2004a). A general framework for analyzing the genetic architecture of developmental characteristics. Genetics 166, 1541-1551. Wu, R. L., Ma, C. X., Lin, M., Wang, Z. H. and Casella, G. (2004b). Functional mapping of growth QTL using a transform-both-sides logistic model. Biometrics 60, 729-738. Wu, R. and Stettler, R. F. (1996). The genetic dissection of juvenile canopy structure 5 and function in a three-generation pedigree of Populus. TreesStructure and Function 11, 99-108. Wu, R. L., Wang, M. X. and Huang, M. R. (1992). Quantitative genetics of yield breeding for Populus short rotation culture. I. Dynamics of genetic control and selection models of yield traits. Can. J. Forest Res. 22, 175-182. Yin, T. M., Zhang, X. Y., Huang, M. R., Wang, M. X., Zhuge, Q., Tu, S. M., Zhu, L. H. and Wu, R. L. (2002). The molecular linkage maps of the Populus genome. Genome 45, 541-555. Zhao, W., Wu, R. L., Ma, C.X. and Casella, G. (2004a). A fast algorithm for functional mapping of complex traits. Genetics 167, 2133-2137. Zhao, W.,Ma, C. X., Cheverud, J. M. and Wu, R. L. (2004b). A unifying statistical model for QTL mapping of genotype-sex interaction for developmental trajectories. Physiological Genomics 19, 218-227. Zhao, W., Zhu, J., Gallo-Meagher, M. and Wu, R. L. (2004c). A unified statistical model for functional mapping of environment-dependent genetic expression and genotype × environment interactions for ontogenetic development. Genetics 168, 1751-1762. Zhao, W., Chen, Y. Q., Casella, G., Cheverud, J. M., and Wu, R. L. (2005). A non-stationary model for functional mapping of longitudinal quantitative

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

24

Vol. 4 [2005], No. 1, Article 33

traits. Bioinformatics, 2469-2476. Zimmerman, D. L. and N´ un ˜ez-Ant´on, V. (2001). Parametric modeling of growth curve data: An overview (with discussion). Test 10, 1-73. Appendix: The closed forms for the determinant and inverse of Σ ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ e=⎜ ⎜ ⎜ ⎜ ⎝

As seen from above, recall ⎞ e1 (1) ⎛ I 0 e2 (1) ⎟ ⎟ ⎜ ⎟ I e1 (2) ⎟ ⎜ V2 ⎜ V V e2 (2) ⎟ ⎟, Λ = ⎜ ⎜ .. .. .. ⎟ ⎝ . . . ⎟ ⎟ e1 (T ) ⎠ VT −1 VT −2 e2 (T )





1 (1) 2 (1) 1 (2) 2 (2) .. .

⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , and s = ⎜ ⎟ ⎜ ⎜ ⎠ ⎜ ⎝ 1 (T ) ... V I 2 (T )

... ... ... .. .

0 0 0 .. .

0 0 0 .. .

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎠

The covariance matrix Σ of e can be written as Σ = ΛΣ Λ . The determinant of Σ−1 and the quadratic form z Σ−1 z can be derived by considering the explicit forms of Λ−1 and Σ−1  . Letting

⎛ ⎜ ⎜ ⎜ L=⎜ ⎜ ⎝

I 0 ... 0 −V I . . . 0 0 −V . . . 0 .. .. .. .. . . . . 0 0 . . . −V

we have LΛ = I



0 0 0 .. .

⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠

I

Λ−1 = L.

Assuming zero correlation between innovation errors (both within and between traits) at different time points, the covariance of the innovation errors has a block diagonal form T  Σ (t), Σ = t=1

where the covariance matrix Σ (t) for innovation errors 1 (t) and 2 (t) is   γ1 (t)γ2 (t)ρ(t) γ12 (t) Σ (t) = , γ1 (t)γ2 (t)ρ(t) γ22 (t)

http://www.bepress.com/sagmb/vol4/iss1/art33

Zhao et al.: Multivariate SAD Models for Functional Mapping

25

Then |Σ|−1 = |Σ |−1 T

−1 (1 − ρ2 (t))σ12 (t)σ22 (t) = , t=1

and −1

Σ

T  = L( Σ−1  (t)L 

⎛ ⎜ ⎜ ⎜ = ⎜ ⎜ ⎝

t=1  −1 Σ−1  (1) + V Σ (2)V  −1 −Σ−1  (2) + V Σ (3)V

0 .. . 0

−V Σ−1 0 0  (2) . . . ... 0 0 0 0 −Σ−1  (3)V . . . .. .. .. ... . . . −1 (T − 1)V Σ 0 . . . −Σ−1   (T )

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

= A1 + A2 + A3 + A3 where A1 =

A2

A3

T 

Σ−1  (t)

t=1 ⎛ 0 ... 0 V Σ−1  (2)V  −1 ⎜ Σ (3)V . . . 0 0 V  ⎜ ⎜ .. .. .. .. = ⎜ . . . . ⎜ ⎝ 0 0 . . . V Σ−1  (T )V 0 0 ... 0 ⎛ 0 0 ... 0 0 ⎜ −Σ−1 (2)V 0 . . . 0 0  ⎜ −1 ⎜ (3)V . . . 0 0 0 −Σ = ⎜  ⎜ .. .. .. .. ... ⎝ . . . . (T )V 0 0 0 . . . −Σ−1 

0 0 .. .



⎟ ⎟ ⎟ ⎟ ⎟ 0 ⎠ 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

Observations x = [x(1), . . . , x(T )] and y = [y(1), . . . , y(T )] for each trait can be combined to z, where z = [x(1), y(1), x(2), y(2), . . . , x(T ), y(T )]          z (1)

z (2)

z (T )

Produced by The Berkeley Electronic Press, 2006

Statistical Applications in Genetics and Molecular Biology

26

The quadratic form z Σ−1 z can then be divided into three parts z Σ−1 z = z A1 z + z A2 z + 2z A3 z where z  A1 z =

T 

z (t)Dt−1 z(t)

t=1

z  A2 z =

T −1 

z (t)V D−1 (t + 1)Vz(t)

t=1 T −1 

z  A3 z = −

−1 z (t + 1)Dt+1 Vz(t)

t=1

http://www.bepress.com/sagmb/vol4/iss1/art33

Vol. 4 [2005], No. 1, Article 33