A Statistical Model for Testing the Pleiotropic ... - Semantic Scholar

2 downloads 0 Views 419KB Size Report
quantitative trait loci (QTL) that control the phenotypic plasticity of a complex trait through differentiated expressions of pleiotropic QTL in different environments.
Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.107.081794

A Statistical Model for Testing the Pleiotropic Control of Phenotypic Plasticity for a Count Trait Chang-Xing Ma,*,1 Qibin Yu,†,1 Arthur Berg,‡ Derek Drost,† Evandro Novaes,† Guifang Fu,§ John Stephen Yap,‡ Aixin Tan,‡ Matias Kirst,† Yuehua Cui** and Rongling Wu‡,2 *Department of Biostatistics, University at Buffalo, SUNY, Buffalo, New York 14214, †School of Forest Resources and Conservation, ‡Department of Statistics, §Department of Mathematics, University of Florida, Gainesville, Florida 32611 and **Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824 Manuscript received September 11, 2007 Accepted for publication February 14, 2008 ABSTRACT The differences of a phenotypic trait produced by a genotype in response to changes in the environment are referred to as phenotypic plasticity. Despite its importance in the maintenance of genetic diversity via genotype-by-environment interactions, little is known about the detailed genetic architecture of this phenomenon, thus limiting our ability to predict the pattern and process of microevolutionary responses to changing environments. In this article, we develop a statistical model for mapping quantitative trait loci (QTL) that control the phenotypic plasticity of a complex trait through differentiated expressions of pleiotropic QTL in different environments. In particular, our model focuses on count traits that represent an important aspect of biological systems, controlled by a network of multiple genes and environmental factors. The model was derived within a multivariate mixture model framework in which QTL genotype-specific mixture components are modeled by a multivariate Poisson distribution for a count trait expressed in multiple clonal replicates. A two-stage hierarchic EM algorithm is implemented to obtain the maximum-likelihood estimates of the Poisson parameters that specify environment-specific genetic effects of a QTL and residual errors. By approximating the number of sylleptic branches on the main stems of poplar hybrids by a Poisson distribution, the new model was applied to map QTL that contribute to the phenotypic plasticity of a count trait. The statistical behavior of the model and its utilization were investigated through simulation studies that mimic the poplar example used. This model will provide insights into how genomes and environments interact to determine the phenotypes of complex count traits.

O

NE of the most important challenges facing modern biology is to understand the genetic mechanisms underlying the adaptation of biological traits to environmental factors and use this knowledge to predict the response of biological structure, organization, and function to changing environments (Franks et al. 2007; Huntley 2007). When grown in different environments, an organism may show a range of phenotypes. Such a capacity of the organism to alter its phenotypes in response to changing environment, defined as phenotypic plasticity, has been recognized by many early biologists (Waddington 1942; Schmalhausen 1949). Some earlier theoretical models, including homeostasis, were useful for explaining why some individuals are insensitive to changes in environment (Waddington 1942). With a rapid and dramatic change in global climate stimulated by human activities (Grether 2005), the study of phenotypic plasticity has received a resurgence of interest and a renaissance in its funda1

These authors contributed equally to this work. Corresponding author: Department of Statistics, University of Florida, 409 McCarty Hall C, Gainesville, FL 32611. E-mail: [email protected] 2

Genetics 179: 627–636 (May 2008)

mental role in shaping evolutionary adaptation and consequences (Scheiner 1993; Via et al. 1995; Schlichting and Smith 2002; West-Eberhard 2003, 2005; Wu et al. 2004; De Jong 2005). Because different genotypes display a wide range of variation in the level of their plasticity response (Ungerer et al. 2003), there must be a genetic basis for phenotypic plasticity to environmental change. For this reason, the identification of specific genes that contribute to plastic responses has now become an important research area for understanding the genetic and developmental machineries of organismic adaptation and evolution (Gibert et al. 2007). Genetic mapping of complex traits with molecular markers has been proven powerful for the genomewide characterization of quantitative trait loci (QTL) that regulate phenotypic plasticity (Wu 1998; Leips and Mackay 2000; Kliebenstein et al. 2002; Ungerer et al. 2003; Gutteling et al. 2007). The statistical test that measures differences in genetic effects of a QTL across different environments provides meaningful procedures for investigating the genetic mechanisms hypothesized to explain the genetic basis of phenotypic plasticity. The overdominance hypothesis proposes that

628

C.-X. Ma et al.

a heterozygote at relevant genes shows higher stability, or lower plasticity, than a homozygote and the degree of stability is proportional to heterozygosity for these loci (homeostasis; Gillespie and Turelli 1989). The pleiotropic hypothesis states that differential expression of the same loci across environments causes phenotypic plasticity (allelic sensitivity; Via and Lande 1985). The epistatic hypothesis suggests that specific plasticity genes exist that interact epistatically with the loci for the mean value of the trait to regulate environmental sensitivity (gene regulation; Scheiner and Lyman 1989). These hypotheses have been tested with results from QTL mapping in different species from Arabidopsis (Kliebenstein et al. 2002; Ungerer et al. 2003) to Populus (Wu 1998; Rae et al. 2008), Drosophila (Leips and Mackay 2000; Geiger-Thornsberry and Mackay 2002; Anholt and Mackay 2004), and Caenorhabditis (Gutteling et al. 2007). All these studies were based on the phenotypic plasticity of continuously varying traits. The genetic control of phenotypic plasticity for count traits—another group of important traits to agriculture, biology, and biomedicine—is still poorly understood. Statistical models for genetic mapping of continuous traits that are normally distributed have been well developed in the past two decades (Lander and Botstein 1989; Jansen and Stam 1994; Zeng 1994; Kao et al. 1999; Wu et al. 2007; Rae et al. 2008). The idea of mapping continuous traits has been extended to map binary or ordinal traits that vary in a discontinuous manner on the basis of a threshold model by assuming a continuously distributed liability underlying these traits (Visscher et al. 1996; Xu and Atchley 1996; Yi and Xu 1999; Li et al. 2006). Different from continuous traits (taking an infinite number of trait values) and ordinal traits (taking a fixed number of discrete trait values), there is also a group of traits measured in counts that are discrete yet may take an infinite number of values. Count data, such as cell numbers, branch numbers, or bristle numbers, play a unique role in determining the phenotypic plasticity of a biological system (Norga et al. 2003). More recently, Cui et al. (2006) generalized a parametric mapping strategy, as proposed by Rebaı¨ (1997), Shepel et al. (1998), and Sen and Churchill (2001), to map and test QTL controlling count traits. Different from traditional treatments based on normality or threshold assumptions, Cui et al. (2006) and others (Rebaı¨ 1997; Shepel et al. 1998; Sen and Churchill 2001) modeled count data by incorporating an intrinsic Poisson distribution, thus leading to more reasonable biological interpretations about the genetic effects of a ‘‘count’’ QTL. Cui et al.’s model can be used as a standard procedure for mapping QTL contributing to the genetic control of complex count traits. In this article, we integrate the Poisson distribution into a general mapping framework for the identification of QTL that control the phenotypic plasticity of a count trait through their environment-dependent expression.

A particular study of phenotypic plasticity should make use of the advantages of a randomized complete block design with two or more different treatment levels and multiple replicates per each treatment level. Traditional interval mapping generally takes the means of a phenotypic trait over different replicates under each treatment level and then compares the difference in the genetic effect of each detected QTL across different treatment levels (Hayes et al. 1993). However, by taking averages over replicates, the averaged count traits may no longer be integer valued. Additionally, the covariance structure among the replicates cannot be utilized when averages are taken. To overcome these drawbacks, count observations in individual replicates are incorporated into the mapping model by invoking a multivariate Poisson distribution with dimension equal to the number of replicates. The estimation of the parameters that describe the multivariate Poisson distribution is obtained by the EM algorithm. The implementation of this algorithm into a mixture-based mapping model generates a two-stage hierarchical EM algorithm in which the Poisson parameters that define the environment-dependent genetic effects of a QTL can be estimated. The new model was used to map the QTL that affect sylleptic branch counts and their plasticity across two different fertilization regimes in a Populus hybrid population. Computer simulations are further used to study the performance of the method for mapping environment-sensitive QTL for complex count traits. MODEL

Model structure and estimation: Consider a backcross population with n progeny in which there are two different genotypes at each locus. We assume that a genetic linkage map is constructed with polymorphic markers for this backcross. In many plants, such as poplar trees, clonal propagation is possible, thus allowing the same progeny to be genotypically replicated. Suppose the backcross considered is planted in a randomized complete block design with two treatment levels (e.g., low and high fertilization) and R clonal replicates within each treatment level. Each of the plants studied is measured for a count trait of interest, e.g., the number of branches on a main stem. This experimental design allows the characterization of genetic factors that control the response of each backcross progeny to different treatment levels. Suppose there is a putative QTL segregating with two different genotypes Qq (coded by 1) and qq (coded by 2) in the backcross that affects the phenotypic plasticity of the trait across two treatment levels. This QTL is located somewhere in the genome, which can be detected by the linkage map. Assume the QTL resides between a pair of flanking markers M1 and M2 each with two genotypes coded by 1 and 0. For each backcross progeny, it may carry one (and only one) QTL genotype, 1 or 2. The

A Model for Testing Allelic Sensitivity

probability with which a particular progeny (i) carries QTL genotype 1 or 2 depends on the marker genotypes of this progeny at the two flanking markers (M1 and M2) that bracket the QTL. Under the assumption of independent crossovers, the probability of a QTL genotype given a marker genotype can be derived in terms of the recombination fractions between M1 and QTL, between QTL and M2, and between the two markers. Given that each progeny has a known marker genotype, 11, 10, 01, or 00, the conditional probability of QTL genotype Qq for a given progeny i given its marker genotype is denoted as v1ji and the conditional probability of QTL genotype qq is v2ji ¼ 1  v1ji (Wu et al. 2007). The trait values of backcross progeny i in R different replicates under treatment level k (k ¼ 1, 2), arrayed in Xik ¼ (Xik1, . . . , XikR), are distributed as a mixture function with two different groups of QTL genotypes; i.e., Xik  P ðX ik j Qk Þ ¼ v1ji P1 ðX ik j Q1jk Þ 1 v2ji P2 ðX ik j Q2 jk Þ; ð1Þ where Qk ¼ (Q1jk, Q2jk) contains parameters specific to QTL genotype j for treatment k, and Pj(Xik j Qjjk) is a probability density function for the count trait, which can be described by a multivariate Poisson distribution, expressed as !  sik Y R R uXikr X R  Y X Xikl jjkr Pj ðX ik j Qjjk Þ ¼ exp  ujjkr X ! r r ¼1 r ¼1 ikr r ¼0 l¼1 !r ujjk0 3 r ! QR r ¼1 ujjkr sik ¼ minðXik1 ; . . . ; XikR Þ;

ð2Þ

where each Xikr follows a Poisson distribution with mean parameter uj jkr that is the genotypic mean of the count trait for QTL genotype j in replicate r at treatment level k and covariance parameter uj jk0 that is the QTL genotype- and treatment level-specific covariance of the count trait between all the pairs of replicates. If uj jk0 ¼ 0, then the variables are independent and the multivariate Poisson distribution reduces to the product of independent Poisson distributions. Assuming that the trait values from different levels of treatment are independent, we construct a likelihood of the unknown parameters given the trait values and marker information (M) in terms of the mixture model (1) by combining the two treatment levels; that is Lðv1ji ; v2ji ; Qk j Xik ; MÞ n Y ½v1ji P1 ðXi1 j Q1j1 Þ 1 v2ji P2 ðX i1 j Q2j1 Þ ¼

629

The maximum-likelihood estimates (MLEs) of the parameters can be obtained by maximizing the likelihood (3). In this clonal design, it is reasonable to assume that genotypic means of the count trait are equal among different replicates at each treatment level; that is, ujjk1 ¼ . . . ¼ ujjkR [ ujjk :

Equation 4 suggests that, unlike the general multivariate Poisson model in which different variables are assumed to have different means, our QTL mapping model assumes the same mean for all the variables. Thus, the set of parameters being estimated in the likelihood (3) is Qjjk ¼ (ujjk0, ujjk) ( j ¼ 1, 2; k ¼ 1, 2). The EM algorithm can be implemented to estimate the maximum-likelihood estimates (MLEs) of these parameters as follows. E step: Given the data and the current values of the ðtÞ ðtÞ estimates after the tth iteration Q1jk and Q2jk , we calculate the conditional expectations of the complete data, which include the pseudovalues of progeny i within treatment level k, ðtÞ

s1jik

ðtÞ

s2jik

  ðtÞ P1 Xik  1 j Q1jk   ; ¼ ðtÞ P1 X ik j Q1jk   ðtÞ P2 Xik  1 j Q2jk   ; ¼ ðtÞ P2 X ik j Q2jk

and the posterior probabilities, with which progeny i at treatment level k carries QTL genotype j, ðtÞ

V1jik

ðtÞ

V2jik

  ðtÞ v1ji P1 X ik j Q1jk    ; ¼ ðtÞ ðtÞ v1ji P1 Xik j Q1jk 1 v2ji P2 Xik j Q2jk   ðtÞ v2ji P2 X ik j Q2jk    : ¼ ðtÞ ðtÞ v1ji P1 Xik j Q1jk 1 v2ji P2 Xik j Q2jk

M step: The estimates of parameters are updated by using ðt11Þ

ðtÞ

3

i¼1

½v1ji P1 ðXi2 j Q1j2 Þ 1 v2ji P2 ðX i2 j Q2j2 Þ:

ðt11Þ u2jk0

ðt11Þ u1jk

ð3Þ

Pn

ðtÞ

i¼1

u1jk0 ¼ u1jk0 Pn ¼

ðt11Þ u2jk

ðtÞ u2jk0

Pn ¼

¼

Pn

ðtÞ

V1jik s1jik

i¼1

ðtÞ

V1jik

;

ðtÞ ðtÞ i¼1 V2jik s2jik ; Pn ðtÞ i¼1 V2jik

ðtÞ  i¼1 V1jik Xik Pn ðtÞ i¼1 V1jik

 u1jk0 ;

ðtÞ  i¼1 V2jik Xik Pn ðtÞ i¼1 V2jik

 u2jk0 ;

Pn

i¼1 n Y

ð4Þ

ðt11Þ

ðt11Þ

630

C.-X. Ma et al.

PR

where Xik ¼ ð1=RÞ r ¼1 Xikr . Both the E and the M steps are iterated until the estimates converge to stable values according to some prespecified convergence criterion. As usual, the QTL position is estimated via a fixed approach with which the existence of a QTL is scanned at every 2 cM within a marker interval. Significant peaks of the log-likelihood ratio value profile across the genome are thought to be the estimated location of a QTL (see below). Hypothesis testing: After the genetic parameters are obtained, we need to test whether there is a QTL that affects the count trait at the two levels of treatments. The existence of a QTL can be tested by formulating the hypotheses H0: u1j1 ¼ u2j1 ¼ u1 and u1j2 ¼ u2j2 ¼ u2 vs: H1: Not H0 ; ð5Þ where the null hypothesis H0 states that the data can be fit with only one mean for each treatment level, whereas in the alternative hypothesis H1 there are two distinct means showing that there is a segregating QTL for the trait. The test statistic is the log-likelihood ratio (LR) of the full (H1) over reduced model (H0), expressed as " # ˜ 1; Q ˜ 2Þ LðQ ; LR ¼ 2 log ˆ 1j1 ; Q ˆ 0j1 ; Q ˆ 1j2 ; Q ˆ 2j2 Þ Lðv ˆ 1ji ; v ˆ 2ji ; Q where the tildes and the circumflexes denote the MLEs of the unknown parameters under H0 and H1, ˆ 1j1 ; respectively. Note that the estimation of ðv ˆ 1ji ; v ˆ 2ji ; Q ˆ ˆ ˆ Q2j1 ; Q1j2 ; Q2j2 Þ depends on both phenotypic values and ˜ 1; Q ˜ 2 Þ demarker data, whereas the estimation of ðQ pends only on phenotypic values. The critical threshold for the declaration of a QTL can be determined from permutation tests (Churchill and Doerge 1994). If a significant QTL is found, then we can test whether this QTL has a different effect on trait values at two different treatment levels. This test is formulated as H0: u1j1 ¼ u2j1 ¼ u1

vs: H1: Not H0 ;

ð6Þ

H0: u1j2 ¼ u2j2 ¼ u2

vs: H1: Not H0 :

ð7Þ

and

If the null hypotheses above are both rejected, this implies that the significant QTL detected affects pleiotropically trait phenotypes at two different treatment levels. Otherwise, this QTL is operational only at one treatment level, which causes genotype-by-environment interactions for the trait. In practice, although the QTL is pleiotropic, its effect on the trait may depend on the level of treatment. This means that the same QTL has different effects in different environments. Differential expression of a QTL across two different environments can be tested using

H0: u1j1  u2j1 ¼ u1j2  u2j2 H1: u1j1  u2j1 6¼ u1j2  u2j2 :

ð8Þ

The rejection of the above null hypothesis means that a significant genotype-by-environment interaction exists due to allelic sensitivity to a varying environment. Significant genotype-by-environment interactions are one of the genetic causes of phenotypic plasticity. The log-likelihood-ratio test statistics for hypotheses (6)–(8) can be determined from simulation studies.

RESULTS

A worked example: An interspecific hybrid family (designated as 52-124) was generated by crossing Populus deltoides (genotype ILL101) to P. trichocarpa (genotype Gf93-968). One F1 female clone (52-225) was then crossed with a different P. deltoides individual (D124), which resulted in a pseudobackcross progeny of 397 individuals. Family 52-124 was genotyped with 171 microsatellite (SSR) markers. The linkage map was constructed using MapMaker version 3.0 (Lander et al. 1987) following the two-way pseudotestcross mapping strategy (Grattapaglia and Sederoff 1994), generating a separate map for each of the parents. Markers were assigned to linkage groups at a minimum threshold of LOD 3 and a recombination fraction of 0.35. The resulting linkage was mapped onto 19 linkage groups, equivalent to the Populus chromosome number, with all markers displaying internally consistent linkage patterns. Greenwood cuttings (clonal replicates) from each individual of family 52-124 were produced. All plants were initially grown for 3 weeks under 5 mm of nitrogen (N) fertilizer, in a modified Hockings solution (Cooke et al. 2003). Next, plants were subjected to two different N treatment levels (0 and 25 mm) in a randomized complete block design, with three replicates per level. The number of sylleptic branches derived from the main stem (i.e., those derived from lateral buds without an intervening stage of rest; Wu and Hinckley 2001) were counted for each tree 30 days later. We use the pseudotest backcross mapping strategy proposed by Grattapaglia and Sederoff (1994) to map branch numbers. This strategy constructs two different sets of linkage maps with molecular markers that are segregating in one parent but not in the other. Although it does not take into account all types of markers segregating in the family, the pseudotest backcross strategy has been useful for mapping QTL controlling quantitative traits in forest trees (Grattapaglia et al. 1996). The number of branches on the main stems was observed to approximately follow a Poisson distribution under each of the fertilization levels (Figure 1). For a possible use of traditional normality-based approaches, the data were subjected to log transformation, but the transformed data still tended to obey a Poisson

A Model for Testing Allelic Sensitivity

631

Figure 1.—Histograms of the number of sylleptic branches on the main stem in interspecific poplar hybrids grown under low- and high-fertilization levels.

distribution rather than a normal distribution. Thus, existing QTL mapping models based on normality assumption may not work in this particular case. There are many more branches when trees are planted under a high than a low fertilization level, but the degree of increase for branch number with higher fertilization level seems to vary among different progeny (Figure 2), suggesting that genotype-by-environment interactions are important for branch number. The new method was used to scan for the existence and distribution of QTL that control branch numbers throughout father- and mother-based linkage maps using a joint likelihood (3). A significant QTL was detected between markers G2431 and P2855 on the mother-based linkage map since the peak of the log-likelihood ratio profile (1493.5) is beyond the genomewide critical threshold (1391.1) determined from 1000 permutation tests (Figure 3). This significant QTL detected was further tested for its effects on branch number separately under each fertilization level according to hypotheses (6) and (7). It

Figure 2.—Changes in the number of sylleptic branches on the main stem in interspecific poplar hybrids across different fertilization levels (low and high). Each thin line in the background denotes averaged branch numbers of a hybrid among three clonal replicates at low and high fertilizations, whereas the thick line in the foreground is the average of all hybrids at the two fertilization levels.

displays a significant effect under high fertilization (P ¼ 0.02), but it is marginally significant under low fertilization (P ¼ 0.05). The additive genetic effect of this QTL estimated from the pseudotest backcross is 5.968 under high fertilization, compared with 2.777 under a low fertilization level (Table 1). A hypothesis test with Equation 8 suggests that these two additive effects are significantly different (P ¼ 0.03), suggesting the existence of a QTL-by-environment interaction (Figure 4). This QTL explains 49.5 and 70.2% of the total phenotypic variance for branch number under low and high fertilization levels, respectively. Monte Carlo simulation: To examine the statistical properties of the new model, we performed simulation studies under different scenarios. Consider a backcross for which a linkage group of length 200 cM was simulated with 11 equally spaced markers. Two different sample sizes (n ¼ 100 and 300) were assumed to simulate the backcross. The backcross progeny was planted in a complete randomized block design with two different treatment levels and three clonal replicates per treatment level. A QTL that determines a count trait was hypothesized at 30 cM from the first marker of the linkage group. The phenotypic values of each backcross at each treatment level were simulated by assuming a trivariate Poisson distribution with a mean depending on the genotype of the assumed QTL. The genetic effect of a QTL is expressed as the difference between two QTL genotypes, which is described by u1jk  u2jk at treatment level k. Three different simulation scenarios were considered: (1) The genetic effect of a QTL is large and there are covariances among the three replicates, (2) the genetic effect of a QTL is large and there is no covariance among the three replicates, and (3) the genetic effect of a QTL is small and there are covariances among the three replicates. Simulation under each scenario was repeated 1000 times to estimate the means and mean square errors of the parameter estimates. Table 2 tabulates the results of QTL position and effect estimation by the new model. The model can provide reasonable estimates of all the parameters with

632

C.-X. Ma et al.

Figure 3.—Profile of the log-likelihood ratios (LR) across the length of linkage group 10 from the mother-based linkage map that test the existence of a QTL responsible for the number of sylleptic branches on the main stem in interspecific poplar hybrids grown under low- and high-fertilization levels. Solid blue lines are associated with a joint model that combines low- and high-fertilization levels, whereas dashed and dotted lines are associated with high- and low-fertilization levels, respectively. Horizontal lines indicate the genomewide thresholds over a total of 19 linkage groups determined from 100 permutation tests. The vertical line with arrow indicates the location of a significant QTL detected. Names of the markers and their map distances (in centimorgans) are given at the bottom.

a modest sample size (100) under the three simulation scenarios considered. The estimation accuracy and precision can increase dramatically when the sample size increases to 300. The Poisson parameters can be better estimated when there is no covariance among replicates (scenario 2) than when such a covariance exists (scenario 1), but the reverse is true for the estimation of the QTL position. As expected, when two QTL genotypes diverge more largely, the QTL position can be better estimated (scenario 1 vs. 3). The power to detect significant QTL-by-environment interactions under different simulation scenarios was also calculated. Such power is the largest under scenario 1, followed by scenarios 3 and 2. In all the cases, increasing sample sizes can improve the power for the detection of QTL-by-environment interactions. We conducted an additional simulation study in which multiple QTL are involved in the genetic control of a count trait. We simulated a genome composed of four different chromosomes (A–D), each 200 cM long,

covered by 11 equally spaced markers. Three QTL are located at 45, 95, and 155 cM from the first marker on the first three chromosomes, respectively. The phenotypic data were simulated as trivariate Poisson random variables based on different QTL genotypes and each given treatment level. Due to the assumption in Equation 4, the simulation for a given treatment level k requires only two parameters, the covariance uj1 j2 j3 jk0 and mean parameter uj1 j2 j3 jk, for a given three-QTL genotype j1j2j3 (j1 ¼ 1, 2 for the first QTL on chromosome A, j2 ¼ 1, 2 for the second QTL on chromosome B, and j3 ¼ 1, 2 for the third QTL on chromosome C). Assuming no epistasis, any three-QTL genotypic mean

TABLE 1 The estimates of the genomic position and genetic effects for the QTL detected on the mother-based linkage map for the number of sylleptic branches in a pseudotest backcross strategy of interspecific hybrids in poplars grown under low- and high-fertilization levels Low Marker interval Genotypic value Qq (u1jk) qq (u2jk) Additive effect (u1jk  u2jk)

High

G2431–P2855 3.434 0.657 2.777

7.628 1.660 5.968

Figure 4.—Plots of two different genotypes at the QTL detected on linkage group 10 from the mother-based linkage maps for the number of sylleptic branches on the main stem in interspecific poplar hybrids grown under low- and high-fertilization levels.

A Model for Testing Allelic Sensitivity

633

TABLE 2 The MLEs of the genomic position and effect for a hypothesized QTL (described as u1jk  u2jk) simulated under three different scenarios to control a count trait in different environments in a pseudotest backcross strategy Environment 1

True value n ¼ 100 n ¼ 300

True value n ¼ 100 n ¼ 300

True value n ¼ 100 n ¼ 300

Environment 2 ujj20

u1j2  u2j2

Scenario 1 1 0.9981 (0.0231) 1.0009 (0.0076)

3 3.0818 (0.1347) 3.0391 (0.0530)

2 1.9495 (0.0716) 2.0021 (0.0299)

0 0.0280 (0.0030) 0.0252 (0.0017)

Scenario 2 1 0.9860 (0.0096) 0.9741 (0.0042)

0 0.0960 (0.0238) 3.0608 (0.0091)

2 1.8767 (0.0469) 1.9391 (0.0139)

1 0.9962 (0.0377) 0.9912 (0.0157)

Scenario 3 1 0.9860 (0.0220) 1.0218 (0.0073)

3 3.0122 (0.0864) 2.9426 (0.0377)

1.2 1.2080 (0.0335) 1.2026 (0.0102)

Position

ujj10

29 29.06 (4.52) 28.78 (1.72)

1 1.0260 (0.0587) 0.9985 (0.0154)

29 29.06 (9.72) 29.02 (2.04) 29 28.74 (15.00) 28.92 (1.84)

u1j1  u2j1

Power

100 100

93 97

97 100

Numbers in parentheses are the square roots of mean square errors of the parameters obtained from 1000 simulation replicates. The power to detect significant QTL-by-environment interactions under different simulation scenarios is given.

can be expressed as the sum of genotypic means at individual QTL; that is, u111jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk ; u112jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a3 ; u121jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a2 ; u122jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a2 1 a3 ; u211jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a1 ; u212jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a1 1 a3 ; u221jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a1 1 a2 ; u222jk ¼ uj1 ¼1jk 1 uj2 ¼1jk 1 uj3 ¼1jk 1 a1 1 a2 1 a3 ; where a1, a2, and a3 are the additive genetic effects of the simulated QTL, respectively. Three QTL were assumed to display different effects at two different levels of treatment. Figure 5 illustrates the profile of the LR values calculated from an arbitrary simulation run to test the distribution of QTL across all four simulated chromosomes. As shown, all three QTL can be detected from our model. Table 3 lists the MLEs of the positions and genetic effects of different QTL and the square roots of mean square errors of the MLEs when different sample sizes are assumed. Our model can estimate the positions of three different QTL and their genetic effects with reasonable accuracy and precision. As expected, QTL of larger effects can be better estimated than those of smaller effects. Also, the estimation precision can be increased with increased sample sizes.

DISCUSSION

Genetic mapping of QTL has been instrumental for the characterization of major loci that are responsible for a variety of quantitative traits (Mackay 2001; Anholt and Mackay 2004; Li et al. 2006; Paterson 2006). More recently, QTL mapping techniques have been considerably extended to approach many biologically meaningful issues, among which genetic mapping of the phenotypic plasticity of a count trait has not received adequate attention as compared to its fundamental importance in statistics and biology. While most statistical models for QTL mapping are based on the normality assumption for continuous traits (Lander and Botstein 1989; Zeng 1994; Jiang and Zeng 1995; Lynch and Walsh 1998; Wu et al. 2007) or the log-normality assumption for binary or ordinal traits (Li et al. 2006), there is also a group of traits in nature and of importance to agriculture and biomedicine that are expressed in counts, such as the numbers of leaves, branches, and seeds produced by a plant or the number of cancer cells. These count traits are typically better described by a Poisson distribution than by a normal distribution, and thus many existing models may not be appropriate for mapping this type of trait. To fully consider the statistical nature of count traits, some authors attempted to incorporate the Poisson distribution into a standard mapping model (Rebaı¨ 1997; Shepel et al. 1998; Sen and Churchill 2001; Cui et al. 2006),

634

C.-X. Ma et al.

Figure 5.—Profile of the loglikelihood ratios (LR), calculated from an arbitrary simulation run, across the genome composed of four chromosomes (A, B, C, and D). The vertical dashed lines are the locations of three hypothesized QTL. The horizontal lines are the genomewide critical value for declaring the existence of any QTL determined from permutation tests.

aimed to map ‘‘count’’ QTL that determine a unique aspect of biological systems. There is no difficulty in using current-interval mapping, composite-interval mapping, or multiple-interval mapping approaches to map the phenotypic plasticity of a biological trait, defined as the phenotypic difference of the same genotype across a range of environments (Wu 1998; Leips and Mackay 2000; Kliebenstein et al. 2002; Ungerer et al. 2003), although a mechanistic

TABLE 3 The MLEs of the genomic positions and effects for three hypothesized QTL distributed over the genome that control a count trait in different environments in a pseudotest backcross strategy

QTL 1 n ¼ 100 n ¼ 300 QTL 2 n ¼ 100 n ¼ 300 QTL 3 n ¼ 100 n ¼ 300

Position

u1j1  u2j1

u1j2  u2j2

45 45.07 (0.36) 45.03 (0.36)

0 0.0108 (0.3490) 0.0175 (0.3490)

1 1.0320 (0.5344) 1.059 (0.5344)

95 94.79 (0.74) 94.80 (0.40)

1 0.9304 (0.2933) 0.9819 (0.1651)

2 2.0807 (0.4579) 2.0010 (0.2523)

155 154.85 (0.50) 154.85 (0.28)

0 0.0134 (0.3493) 0.0017 (0.1766)

1.8 1.7965 (0.5051) 1.8017 (0.2594)

Numbers in parentheses are the square roots of mean square errors of the parameters obtained from 200 simulation replicates.

basis of the difference of QTL expression between different environments is not considered. However, by extending these mapping approaches into a multivariate case in which phenotypic vales of a trait in different environments can be regarded as different ‘‘traits’’ ( Jansen et al. 1995; Jiang and Zeng 1995), environmentdependent genetic effects of a QTL can be estimated and tested. The idea of multitrait QTL mapping was used in this study to map a QTL that triggers a pleiotropic effect on a count trait expressed in different environments. If such a pleiotropic effect is different between environments, genotype-by-environment interactions at this QTL result (Ungerer et al. 2003). Our model capitalizes on the merit of a randomized complete block design, characteristic of the study of phenotypic plasticity, in which the same genotypes are grown in different environments with multiple replicates per environment used to minimize microenvironmental noises. By incorporating a multivariate Poisson distribution, our model allows the modeling of count data in individual replicates, thus avoiding losing the integer-valued structure of the data when means over replicate traits are taken. The estimation of the multivariate Poisson distribution is a difficult issue, but a powerful EM algorithm has been implemented to estimate the parameters (Karlis 2003; Tsiamyrtzis and Karlis 2005). We integrate this multivariate Poisson-based model into a standard mixture model for QTL mapping, leading to a two-stage hierarchical EM algorithm that can estimate and test the genetic effects of a QTL expressed in different environments. Through computer simulation, we investigated the statistical behavior of the new model in terms of its estimation accuracy and precision and the power to detect environment-dependent QTL for a count trait. Our simulation results suggest that a modest number of mapping

A Model for Testing Allelic Sensitivity

progeny (100) can offer reasonable parameter estimation and power that is improved with increasing genotypic differences. The application of our model to a real example for Populus genetics leads to the detection of a significant QTL for the number of sylleptic branches that increases significantly with better fertilization. By the continuous development of an axillary meristem into a branch without an intervening stage of rest, syllepsis has been thought to be an important determinant of wood production (Wu and Hinckley 2001). The identification of an environmentsensitive QTL for the number of sylleptic branches is consistent with previous findings that this type of branch is under strong genetic control and highly plastic to changes in environment (Wu and Stettler 1998). Our model can be used to test one of three existing hypotheses proposed to explain phenotypic plasticity—allelic sensitivity; i.e., different expression of a QTL in different environments causes the phenotypic plasticity of a trait (Via and Lande 1985). Although the pseudotest backcross design used, in which only two, rather than all three possible, genotypes exist, does not allow for the test of the overdominance hypothesis (Gillespie and Turelli 1989), this hypothesis can be tested by including F2-type codominant markers and QTL, i.e., those loci that are segregating for both parents and thus form two homozygotes and one heterozygote, in the mapping population derived from outcrossing trees (Lin et al. 2003). It is also possible that our model can be modified to model the gene regulation hypothesis of phenotypic plasticity by assuming different but epistatically interacting QTL for plastic responses and means of a trait across different environments (Scheiner and Lyman 1989; Weber and Scheiner 1992). In addition, there are some other issues that deserve further investigation. In genetics, it is practically useful to implement composite-interval mapping (Zeng 1994; Jansen and Stam 1994) and multiple-interval mapping (Kao et al. 1999) into our Poisson-based mapping model because these techniques have a capacity to remove the background noise of QTL effects and/or map a network of interacting QTL at the same time. In the current statistical model, we used a single Poisson parameter (u0) to model the covariance structure among different replicates, but the power of our model can increase if more sophisticated modeling is used for the covariance structure (Karlis 2003). Also, Cui et al. (2006) recently generalized the Poisson-based mapping model to take into account the dispersion of count data. By estimating and testing a dispersion parameter in our model, we are able to estimate the genetic effects of a QTL on the direction and degree of the dispersion of a count trait. Characterizing how the genetic architecture of a complex trait differs across environments is an important first step toward elucidating the mechanistic basis of genotype-by-environment interactions and ultimately

635

predicting final phenotypes in a given environment. To achieve this goal, it is necessary to identify genes underlying quantitative variation in plastic response of various complex traits (including count traits) to changes in environments and determine how these genes act singly or interact epistatically to determine different phenotypes across ecologically relevant environments. The model proposed in this article and its extensions as discussed above will provide a powerful tool to gain insights into how genomes and environments interact to determine phenotypes and further alter the evolutionary process of adaption. We thank the graduate students in STA 6178 class of Spring 2007 at the University of Florida for their contributions to this work. The preparation of this manuscript was supported by National Science Foundation grant 0540745 to R.W.

LITERATURE CITED Anholt, R. R., and T. F. C. Mackay, 2004 Quantitative genetic analyses of complex behaviours in Drosophila. Nat. Rev. Genet. 5: 838– 849. Churchill, G. A., and R. W. Doerge, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971. Cooke, J. E. K., K. A. Brown, R. L. Wu and J. M. Davis, 2003 Gene expression associated with N-induced shifts in resource allocation in poplar. Plant Cell Env. 26: 757–770. Cui, Y. H., D.-Y. Kim and J. Zhu, 2006 On the generalized Poisson regression mixture model for mapping quantitative trait loci with count data. Genetics 174: 2159–2172. de Jong, G., 2005 Evolution of phenotypic plasticity: patterns of plasticity and the emergence of ecotypes. New Phytol. 166: 101–117. Franks, S. J., S. Sim and A. E. Weis, 2007 Rapid evolution of flowering time by an annual plant in response to a climatic fluctuation. Proc. Natl. Acad. Sci. USA 104: 1278–1282. Geiger-Thornsberry, G. L., and T. F. C. Mackay, 2002 Association of single-nucleotide polymorphisms at the Delta locus with genotype by environment interaction for sensory bristle number in Drosophila melanogaster. Genet. Res. 79: 211–218. Gibert, J. M., F. Peronnet and C. Schlo¨tterer, 2007 Phenotypic plasticity in Drosophila pigmentation caused by temperature sensitivity of a chromatin regulator network. PLoS Genet. 3: e30. Gillespie, J. H., and M. Turelli, 1989 Genotype-environment interactions and the maintenance of polygenic variation. Genetics 121: 129–138. Grattapaglia, D., and R. R. Sederoff, 1994 Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudotestcross: mapping strategy and RAPD markers. Genetics 137: 1121–1137. Grattapaglia, D., F. L. G. Bertolucci, R. Penchel and R. R. Sederoff, 1996 Genetic mapping of quantitative trait loci controlling growth and wood quality traits in Eucalyptus grandis using a maternal halfsib family and RAPD markers. Genetics 144: 1205–1214. Grether, G. F., 2005 Environmental change, phenotypic plasticity, and genetic compensation. Am. Nat. 166: E115–E123. Gutteling, E. W., J. A. G. Riksen, J. Bakker and J. E. Kammenga, 2007 Mapping phenotypic plasticity and genotype environment interactions affecting life-history traits in Caenorhabditis elegans. Heredity 98: 28–37. Hayes, P. M., B. H. Liu, S. J. Knapp, F. Chen, B. Jones et al., 1993 Quantitative trait locus effects and environmental interaction in a sample of North-American barley germ plasm. Theor. Appl. Genet. 87: 392–401. Huntley, B., 2007 Limitations on adaptation: Evolutionary response to climatic change? Heredity 98: 247–248. Jansen, R. C., and P. Stam, 1994 High resolution mapping of quantitative traits into multiple loci via interval mapping. Genetics 136: 1447–1455.

636

C.-X. Ma et al.

Jansen, R. C., J. W. Van Ooijen, P. Stam, C. Lister and C. Dean, 1995 Genotype-by-environment interaction in genetic mapping of multiple quantitative trait loci. Theor. Appl. Genet. 91: 33–37. Jiang, C., and Z.-B. Zeng, 1995 Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111–1127. Kao, C. H., Z.-B. Zeng and R. D. Teasdale, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152: 1203–1216. Karlis, D., 2003 An EM algorithm for multivariate Poisson distribution and related models. J. Appl. Stat. 30: 63–77. Kliebenstein, D. J., A. Figuth and T. Mitchell-Olds, 2002 Genetic architecture of plastic methyl jasmonate responses in Arabidopsis thaliana. Genetics 161: 1685–1696. Lander, E. S., and D. Botstein, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. Lander, E. S., P. Green, J. Abrahamson, A. Barlow, M. Daley et al., 1987 MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174–181. Leips, J., and T. F. C. Mackay, 2000 Quantitative trait loci for lifespan in Drosophila melanogaster: interactions with genetic background and larval density. Genetics 155: 1773–1788. Li, J., S. C. Wang and Z.-B. Zeng, 2006 Multiple interval mapping for ordinal traits. Genetics 173: 1649–1663. Lin, M., X.-Y. Lou, M. Chang and R. L. Wu, 2003 A general statistical framework for mapping quantitative trait loci in nonmodel systems: issue for characterizing linkage phases. Genetics 165: 901–913. Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Mackay, T. F. C., 2001 Quantitative trait loci in Drosophila. Nat. Rev. Genet. 2: 11–20. Norga, K. K., M. C. Gurganus, C. L. Dilda, A. Yamamoto, R. F. Lyman et al., 2003 Quantitative analysis of bristle number in Drosophila mutants identifies genes involved in neural development. Curr. Biol. 13: 1388–1397. Paterson, A. H., 2006 Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat. Rev. Genet. 7: 174–184. Rae, A. M., M. P. C. Pinel, C. Bastien, M. Sabatti, N. R. Street et al., 2008 QTL for yield in bioenergy Populus: identifying G 3 E interactions from growth at three contrasting sites. Tree Genet. Genomics 4: 97–112. Rebaı¨, A., 1997 Comparison of methods for regression interval mapping in QTL analysis with non-normal traits. Genetics 69: 69–74. Scheiner, S. M., 1993 Genetics and evolution of phenotypic plasticity. Annu. Rev. Ecol. Syst. 24: 35–68. Scheiner, S. M., and R. F. Lyman, 1989 The genetics of phenotypic plasticity. 1. Heritability. J. Evol. Biol. 2: 95–107. Schlichting, C. D., and H. Smith, 2002 Phenotypic plasticity: linking molecular mechanisms with evolutionary outcomes. Evol. Ecol. 16: 189–201. Schmalhausen, I. I., 1949 Factors of Evolution. Blakiston, Philadelphia.

Sen, S., and G. A. Churchill, 2001 A statistical framework for quantitative trait mapping. Genetics 159: 371–387. Shepel, L. A., H. Lan, J. D. Haag, G. M. Brasic, M. E. Gheen et al., 1998 Genetic identification of multiple loci that control breast cancer susceptibility in the rat. Genetics 149: 289–299. Tsiamyrtzis, P., and D. Karlis, 2005 Strategies for efficient computation of multivariate Poisson probabilities. Commun. Stat. Simul. Comput. 33: 271–292. Ungerer, M. C., S. S. Halldorsdottir, M. D. Purugganan and T. F. Mackay, 2003 Genotype–environment interactions at quantitative trait loci affecting inflorescence development in Arabidopsis thaliana. Genetics 165: 353–365. Via, S., and R. Lande, 1985 Genotype-environment interactions and the evolution of phenotypic plasticity. Evolution 39: 505–522. Via, S., R. Gomulkievicz, G. de Jong, S. M. Scheiner and C. D. Schlichting, 1995 Adaptive phenotypic plasticity: consensus and controversy. Trends Ecol. Evol. 10: 212–217. Visscher, P. M., C. S. Haley and S. A. Knott, 1996 Mapping QTLs for binary traits in backcross and F2 populations. Genet. Res. 68: 55–63. Waddington, C. H., 1942 Canalization of development and the inheritance of acquired characters. Nature 150: 563–565. Weber, S. L., and S. M. Scheiner, 1992 The genetics of phenotypic plasticity. 4. Chromosomal localization. J. Evol. Biol. 5: 109–120. West-Eberhard, M. J., 2003 Developmental Plasticity: An Evolution. Oxford University Press, New York. West-Eberhard, M. J., 2005 Developmental plasticity and the origin of species differences. Proc. Natl. Acad. Sci. USA 102: 6543–6549. Wu, R. L., 1998 The detection of plasticity genes in heterogeneous environments. Evolution 52: 967–977. Wu, R. L., and T. M. Hinckley, 2001 Phenotypic plasticity of sylleptic branching: genetic design of tree architecture. Crit. Rev. Plant Sci. 20: 467–485. Wu, R., and R. F. Stettler, 1998 Quantitative genetics of growth and development in Populus. III. The phenotypic plasticity of crown structure and function. Heredity 81: 299–310. Wu, R. L., J. E. Grissom, S. E McKeand and D. M. O’Malley, 2004 Phenotypic plasticity of fine root growth increases plant productivity in pine seedlings. BMC Ecol. 4: 14. Wu, R. L., C.-X. Ma and G. Casella, 2007 Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. Springer-Verlag, New York. Xu, S., and W. R. Atchley, 1996 Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics 143: 1417– 1424. Yi, N., and S. Xu, 1999 A random model approach to mapping quantitative trait loci for complex binary traits in outbred populations. Genetics 153: 1029–1040. Zeng, Z.-B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457–1468.

Communicating editor: G. Gibson

Suggest Documents