Quantifying individual variation in behaviour: mixedâeffect modelling ...

Journal of Animal Ecology 2013, 82, 39–54

doi: 10.1111/1365-2656.12013

‘HOW TO…’ PAPER

Quantifying individual variation in behaviour: mixedeffect modelling approaches Niels J. Dingemanse1,2* and Ned A. Dochtermann3 1

Evolutionary Ecology of Variation Group, Max Planck Institute for Ornithology, Eberhard-Gwinner-Straße, 82319 Seewiesen (Starnberg), Germany; 2Department Biology II, Behavioural Ecology, Ludwig-Maximilians-University of Munich, Großhadener Strasse 2, 82152 Planegg-Martinsried, Germany; and 3Department of Biological Sciences, North Dakota State University, 1340 Bolley Drive, Fargo, ND, 58102, USA

Summary 1. Growing interest in proximate and ultimate causes and consequences of between- and within-individual variation in labile components of the phenotype – such as behaviour or physiology – characterizes current research in evolutionary ecology. 2. The study of individual variation requires tools for quantification and decomposition of phenotypic variation into between- and within-individual components. This is essential as variance components differ in their ecological and evolutionary implications. 3. We provide an overview of how mixed-effect models can be used to partition variation in, and correlations among, phenotypic attributes into between- and within-individual variance components. 4. Optimal sampling schemes to accurately estimate (with sufficient power) a wide range of repeatabilities and key (co)variance components, such as between- and within-individual correlations, are detailed. 5. Mixed-effect models enable the usage of unambiguous terminology for patterns of biological variation that currently lack a formal statistical definition (e.g. ‘animal personality’ or ‘behavioural syndromes’), and facilitate cross-fertilisation between disciplines such as behavioural ecology, ecological physiology and quantitative genetics. Key-words: accuracy, metabolism, mixed-effect model, multi-response model, personality, physiology, plasticity, random regression, reaction norm, repeatability, statistical power

Introduction The proximate causes and ultimate consequences of between-individual variation have long intrigued biologists. For example, a standardized measure of individual variation, individual repeatability (Glossary), has often been quantified because repeatable variation approximates the raw material for selection to act upon (Endler 1986). While biologists have long studied the evolutionary repercussions of heritable between-individual variation (e.g. Lynch & Walsh 1998), more recently, behavioural ecologists have started to study from an adaptive viewpoint (i) why individuals are repeatable vs. flexible, (ii) which conditions favour between-individual vs. within-individual variance (Glossary) or between-individual vs. withinindividual correlations (Glossary), and (iii) how evolution*Correspondence author. E-mail: [email protected]

ary and ecological processes are affected by individuality (Bolnick et al. 2011; Fogarty, Cote & Sih 2011; Benton 2012; Sih et al. 2012; Wolf & Weissing 2012). Many of these questions about between- and within-individual differences differ from evolutionary biologists’ focus (e.g. Walsh & Blows 2009) on how genetic (co)variance affects evolution. Recent adaptive modelling has, for example, given rise to a suite of hypotheses about the ecological conditions favouring repeatable – but not necessarily heritable (Dingemanse & Wolf 2010) – variation in behavioural traits, and between-individual – but not necessarily genetic (Dingemanse & Wolf 2010) – correlations between behaviours (Wolf, van Doorn & Weissing 2008; Houston 2010; Frankenhuis & Panchanathan 2011; Mathot et al. 2012). Questions about within- vs. between-individual variation require that phenotypic (co)variation is partitioned into variance components (Glossary) (Lynch & Walsh

© 2012 The Authors. Journal of Animal Ecology © 2012 British Ecological Society

40 N. J. Dingemanse & N. A. Dochtermann Table 1. Key (co)variance components at the between-individual (ind) and within-individual level (e) Variance componenta

Statistical description

Mixed-effect model required

1. Vind0y

Individual variance in phenotype for attribute y/Individual variation in reaction norm intercept for attribute y Within-individual (error) variance for attribute y Individual variance in phenotypic plasticity (reaction norm slope) for attribute y Covariance between the intercept and slope of individual reaction norms for attribute y Individual covariance between (reaction norm intercepts of) attributes y and z Covariance between changes in the expression of attributes y and z within individuals Covariance between individual mean (reaction norm intercept) of one attribute (y) and individual plasticity (reaction norm slope) of another (z) Individual covariance between level of plasticity in attribute y and level of plasticity in attribute z/Covariance between reaction norm slopes of two attributes (y, z)

URIM (eqn 1b)

2. Ve0y 3. Vind1y 4. COVind0y ;ind1y 5. COVind0y ;ind0z 6. COVe0y ;e0z 7. COVind0y ;ind1z 8. COVind1y ;ind1z

URIM (eqn 1b) URSM (eqn 5b) URSM (eqn 5b) MRIM (eqn 7b) MRIM (eqn 7b) MRSM (eqn S6, Supporting information) MRSM (eqn S6, Supporting information)

URIM, Univariate Random Intercept Model; URSM, Univariate Random Slopes Model; MRIM, Multivariate Random Intercepts Model; MRSM, Multivariate Random Slopes Model. a Variance components 7 and 8 are not detailed in the main text. For a discussion and MM-implementation of these components, see Text S10 (Supporting information).

1998) (Table 1). Tools for doing so are available in the statistical (Henderson 1982; Goldstein 1995; Snijders & Bosker 1999; McCulloch & Searle 2000; Pinheiro & Bates 2000; Gelman & Hill 2007; Zuur et al. 2009; Hadfield 2010; Hox 2010) and quantitative genetics literature (Lynch & Walsh 1998; Schaeffer 2004; Wilson et al. 2010). Unfortunately, the rationale for their use is – in our opinion – often presented in a manner that is inaccessible to ecologists in part because of the statistical jargon and focus on the estimation of genetic parameters. Considerable progress has nevertheless been made over the last few years due to attempts by ecologists to bridge this gap (e.g. Bolker et al. 2009; van de Pol & Wright 2009; Nakagawa & Schielzeth 2010). Lack of familiarity with variance partitioning nevertheless remains a problem, particularly in the field of behavioural ecology where current research critically hinges upon variance partitioning, and inappropriate conclusions might be drawn without explicit knowledge about variance components. We give three illustrative examples from the current behavioural ecology literature. First, behavioural consistency, for example in avian provisioning rates, has been proposed as a quality indicator trait under sexual selection (Schuett, Tregenza & Dall 2010). The hypothesis predicts that males show lower within-individual variance compared with females. Published empirical tests of the proposed hypothesis typically quantify whether males have lower repeatability compared with females, which constitutes an inappropriate test because the hypothesis concerns within-individual variance not necessarily repeatability. Second, social interactions have been proposed to induce selection for different behavioural types, for example a mix of individuals that are relatively bold vs. rela-

tively shy (Bergmu¨ller & Taborsky 2010). Published empirical tests of the proposed hypothesis instead sometimes quantify the raw phenotypic variance as a function of sociality, whereas the hypothesis is solely concerned with the amount between-individual variation. Third, behavioural ecologists have proposed that animals differ in suites of correlated behaviours (Reále et al. 2007). The hypothesis implies the presence of nonzero between-individual correlations across behavioural attributes, yet empirical research often reports phenotypic correlations based on single measurements of each behaviour (Dingemanse, Dochtermann & Nakagawa 2012). These examples illustrate the inherent difficulty in translating ecological hypotheses that concern variation at different levels to empirical tests. We appreciate that unfamiliarity with terms such as within- and between-individual correlations contributes to confusion about appropriate study designs and statistical tests of hypotheses. In this paper, we therefore aim to provide an overview of key between- and within-individual (co)variance components to assist ecologists in testing ecological hypotheses about variation. Using ecological jargon and examples, we detail how mixed-effect models (MMs) can be used to address questions about within- and between-individual variance components. We thereby hope to facilitate (i) usage of unambiguous statistical terminology for distinct patterns of biological variation that have typically been defined verbally (e.g. ‘animal personality’ or ‘behavioural syndromes’ [Glossary]), (ii) usage of appropriate statistical paradigms for their estimation and (iii) design of studies aimed at estimating individual (co)variance components with sufficient power. We focus throughout on between- and within-individual variance components, which we do intentionally

© 2012 The Authors. Journal of Animal Ecology © 2012 British Ecological Society, Journal of Animal Ecology, 82, 39–54

Quantifying individual variance components 41 because many ecological questions do not require further partitioning (see above). Wilson et al. (2010) outline approaches for further partitioning into genetic (vs. nongenetic) components for pedigreed data sets. While we focus on behavioural traits as examples, our recommendations apply generally to labile phenotypic traits.

Overview Rather than providing a small number of tutorial examples, as carried out in other ‘How to’ papers, we focus on demonstrative questions and associated MMs, which can be flexibly applied to specific interests of individual readers. Our paper therefore consists of four main parts. The first, entitled Mixed-effect models, introduces MMs and details the framework which we subsequently use to describe why MMs are an important alternative to conventional approaches. The second, entitled Univariate MMs details how MMs can be applied to (i) estimate between- and within-individual variability and thus repeatability (Glossary), (ii) avoid pseudo-repeatability estimates (Glossary) and (iii) estimate individual differences in phenotypic plasticity. The third, entitled Multivariate MMs, briefly introduces how MMs can be applied to decompose correlations between phenotypic attributes. Each part has distinct sections – detailing models aimed at addressing specific questions – which are constructed such that they can be consulted independently; parts two and three conclude with discussions of optimal sampling designs and sample sizes. The Univariate MMs section details simple models followed by progressively more complex variations. This sequencing was not intended to encourage working towards the most significant description of the data set, because step-wise approaches are statistically problematic (e.g. Whittingham et al. 2006; Forstmeier & Schielzeth 2011; Simmons, Nelson & Simonsohn 2011), and inhibit general inferences (Dochtermann & Jenkins 2011). Instead, we recommend fitting models to test a priori hypotheses. The fourth main part is an extensive Supporting information section that includes additional text, tables and illustrations along with programming code and sample data for three commonly used software packages (ASReml, R and SAS). We start with a listing of the sorts of questions that can be addressed with MMs and variance decomposition to avoid misunderstanding (e.g. Bennington & Thayne 1994; Engqvist 2005) of what information these methods can provide (Table 1 details where these questions are addressed): 1 Do individuals differ in average phenotypic response? For example, over multiple reproductive events do some snakes on average produce larger clutch sizes than other conspecifics? [Variance component 1 (Vind0y ) in Table 1]; 2 How much variation in phenotypic response is there within individuals? For example, to what extent do individual butterflies vary their metabolism between days? [Variance component 2 (Ve0y ) in Table 1];

3 Do individuals differ in responsiveness (plasticity) to environmental variation? For example, do some individuals of a plant species alter seed set more dramatically in response to variation in precipitation than other individuals? [Variance component 3 (Vind1y ) Table 1] 4 Are responses of individuals correlated across phenotypic attributes? For example, do birds that have high average levels of reactive oxygen metabolites across multiple sampling events also show higher average levels of antioxidant capacity? [Variance component 5 (COVind0y ;ind0z ) in Table 1]; 5 Within the same individual, do changes in expression for one phenotypic attribute go together with changes in another? For example, are within-individual changes in testosterone correlated with within-individual changes in bill colouration? [Variance component 6 (COVe0y ;e0z ) in Table 1]. In the Text S2 (Supporting information), we provide further examples of questions that deal with more complex patterns of individual variation (variance components 4, 7 and 8 in Table 1).

Mixed-effects models Mixed-effect models incorporate two types of parameters: fixed and random ones, and hence consist of two key parts (Eisenhart 1947; Bennington & Thayne 1994; Pinheiro & Bates 2000; Bolker et al. 2009): 1 The effects that predictor variables – which can be continuous (covariates) or categorical (factors) – have on the mean of response variables. Such effects are called ‘fixed’ whenever specific effects are estimated at their observed levels (e.g. differences in means between four specific years of study). 2 Effects on response variables generated by variation within and among levels of a predictor variable (factor). Effects of such predictor variables are called ‘random’ whenever variance is estimated among observed levels sampled from a population of levels (e.g. the variance among years in general inferred from a sample of 50 years).

utility of mms vs. alternatives Alternative statistical approaches to MMs are available; we compare their utility here. With regard to univariate analyses, repeatability (Glossary) is often estimated using analyses of variance (e.g. Lessells & Boag 1987). MMs are preferable because, unlike classical approaches, they allow direct estimation of between- and within-individual variances (Nakagawa & Schielzeth 2010), providing insight into whether differences in repeatability between groups are attributable to differences specifically in betweenindividual variances, within-individual variances or both (Jenkins 2011). Analysis of variance approaches also assume balanced/complete sampling, a difficult condition to meet for many ecological studies and a condition not


42 N. J. Dingemanse & N. A. Dochtermann required for MM use. Further, only MMs allow for calculation of repeatability of traits with non-Gaussian error distributions (Nakagawa & Schielzeth 2010). With regard to multivariate analyses, alternative approaches have often been applied, specifically to estimate the repeatable component of phenotypic correlations (i.e. between-individual correlations; Glossary). Betweenindividual correlations are typically based on the correlation between an individual’s average value of y ( yj ) and z ( zj ) or between estimates (i.e. best linear unbiased predictors or BLUPs) derived from univariate MMs. Unfortunately, such correlations between mean values provide estimates of between-individual correlations that are biased due to within-individual variation (Snijders & Bosker 1999; Dingemanse, Dochtermann & Nakagawa 2012) and do not appropriately acknowledge uncertainty around the estimates (which also applies to BLUPs; Hadfield et al. 2010). Multivariate MMs avoid these problems and can also prevent drawing additional improper inferences. Specifically, ecologists often assume that correlations based on single measures per attribute per individual represent between-individual correlations (Dingemanse, Dochtermann & Nakagawa 2012). However, raw phenotypic correlations generally poorly predict between-individual correlations, particularly when between- and within-individual correlations are very different (Fig. 1a; Text S11, Supporting information). MM-based estimates match true values much more closely (Fig. 1). Despite major advantages, MMs are complex tools and therefore easily misspecified or interpreted inappropriately (Bennington & Thayne 1994; Bolker et al. 2009; van de Pol & Wright 2009; Schielzeth & Forstmeier 2009; Zuur et al. 2009; Hadfield et al. 2010). We therefore provide considerable detail regarding their specification. However, as this study highlights the ecological questions that can be addressed by decomposing (co)variances, we do not discuss details like diagnostics tools, the method by which statistical models are computationally fit (e.g. maximumlikelihood vs. Bayesian methods), how inferences are drawn (e.g. P-values vs. information criterion), or specifics of dealing with non-normal error distributions (Bolker et al. 2009; Zuur et al. 2009; Nakagawa & Schielzeth 2010; see also Text S1, Supporting information); those issues have been extensively detailed in the statistical books and journal papers cited above or elsewhere in the animal ecology literature (e.g. Garamszegi et al. 2009). We also strongly recommend that readers properly familiarize themselves with basic model assumptions prior to applying these tools themselves (cf. Pinheiro & Bates 2000; Bolker et al. 2009; Zuur et al. 2009).

Univariate MMs introduction We introduce here the notation for the simplest univariate MM, where a constant (b0) and the differences between

individuals are modelled by including what are known as random intercepts to decomposes phenotypic variance (VPy ) into between- and within-individual components (eqn 1a): yij ¼ ðb0 þ ind0j Þ þ e0ij ;

eqn 1a

Here, a single phenotypic element (yij), such as a life history decision (lay date) by individual j exhibited at instance i, is the sum of b0 (the grand mean value of average individual responses) and each individual’s unique average response. This individual contribution is estimated as the difference from the population mean by including random intercepts to model differences in mean response between individuals (ind0j). This random intercept is assumed to be normally distributed (N) with a mean of zero and a variance (Ωind) termed the betweenindividual variance (estimated as Vind0 : the variance across random intercepts of individuals; eqn 1b) and can in principle be fit whenever there are individuals with multiple measurements (see section Sampling designs and sample size requirements for more requirements). A residual error (e0ij) is also assumed to be normally distributed, with zero mean and a variance (Ωe) representing the within-individual variance (Ve0 ; eqn 1b): ½ind0j Nð0; Xind Þ : Xind ¼ ½Vind0 : ½e0ij Nð0; Xe Þ : Xe ¼ ½Ve0

eqn 1b

We note that while Ve0 is the ‘residual error’ representing measurement error and general environmental variance, it has biological relevance as it includes average within-individual plasticity towards any stimulus that is statistically unaccounted (Westneat et al. 2011). This notation, used throughout, differs from the typical statistical notation but is one that we find both unambiguous and intuitive: variances are abbreviated as V, covariances as Cov and random effects for individuals as ‘ind’. Equation 1a can be expanded to include additional fixed effects (b terms), like environmental covariates, and the impact of doing so on variance components is a focus of various later sections. Notably, the inclusion of fixed effects requires considerable thought (e.g. whether withinsubject centring [Glossary] and transformations should be applied), as detailed in Text S3 (Supporting information).

simple repeatability analysis Univariate MMs (eqn 1) can be used to decompose the ‘raw’ phenotypic variance in a single response variable (y) into between- and within-individual variances (Fig. 2a). Those components are informative in their own right (Jenkins 2011) – indicating the degree to which the expression of a trait differs between individuals vs. the degree to which a single observation differs from an individual’s mean – and are also used to calculate repeatability (Falconer & Mackay 1996).


Quantifying individual variance components 43 (a)

(b)

Fig. 1. Effects of sampling design on accuracy of estimates of between-individual (rind) and within-individual (re) correlations, and on the power to ‘significantly’ identify a between-individual correlation. (a) Accuracy, estimated as the root mean square error (RMSE) of between- (first row of panels) and within-individual (second row of panels) correlation estimates for varying numbers of individuals and samples per individual. RMSE was calculated based on an MM estimate of the correlation vs. the known correlation used in the generation of simulated data. In addition, we also include RMSE based on a correlation based on a single measure per individual (closed circles). Panels along rows correspond to different combinations of between- and within-individual effects. All estimates are based on repeatabilities of 05 for both traits. (b) Accumulation of power (1 b) relative to different numbers of individuals and number of samples per individual for the ability to detect a between-individual correlation for two traits with repeatabilities of 05 and equal residual variances. Simulation methods are detailed in Texts S11–S12 (Supporting information).

Repeatability is of key importance because it provides a standardized estimate of individuality that can be compared across studies and is part of quantitative genetics theory by setting an upper limit to heritability (but see, e.g. Dohm 2002 for important caveats). Repeatability represents the phenotypic variation (VP) attributable to differences between individuals (eqn 2): Vind0 repeatability ¼ ; Vind0 þ Ve0

eqn 2

where VP ¼ Vind0 þ Ve0 . Confidence intervals for repeatabilities can be calculated following Nakagawa & Schielzeth (2010; see Text S17, Supporting information for worked examples). Equation 2 assumes a Gaussian error distribution and that repeated measures were taken under the same conditions (Lynch & Walsh 1998). When this first assumption is not met, alternative estimators of repeatability are available (detailed by Nakagawa & Schielzeth 2010), though researchers should also consider whether additional fixed effects – for example ‘sex’ when modelling variation in morphology for a sexually dimor-

phic species – can account for non-normality (additional fixed effects do change the interpretation of repeatability, detailed below). When the second assumption is not met, repeatability will be misestimated.

avoiding ‘pseudo-repeatability’ The study of individual variation is generally a messy undertaking: we often try to measure phenotypic attributes of a set of individuals under identical conditions but fail to do so in practice – the norm in field studies. When individuals differ among each other in the conditions under which they were assayed repeatability estimates can become inflated (Austin & Shaffer 1992; Catry et al. 1999) which leads to ‘pseudo-repeatability’ (Glossary). This inflation occurs when predictor variables (i.e. fixed effects) that influence the phenotype within individuals vary between individuals because of a biased sampling scheme. Imagine, for example, that one is interested in the repeatability of parental provisioning rate, and therefore aims to monitor each of n nests for four consecutive days using video recordings. If some cameras


44 N. J. Dingemanse & N. A. Dochtermann failed after having recorded data for only 2 days, while others worked fine for the whole period pseudo-repeatability would occur unless dealt with statistically: The former nests would be monitored only when the nestlings were relatively young (e.g. 8 and 9 days post-hatching), whereas all other nests were monitored over a longer period (e.g. 8–11 days post-hatching). Because provisioning rates typically increase with nestling age within nests, values of Vind0 (hence repeatability) would be overestimated if the between-nest variation in nestling age during sampling was ignored (Westneat et al. 2011): Vind0 is conflated with differences among individuals due to nestling age effects. Pseudo-repeatability can be avoided by including a between-individual fixed covariate capturing variation due to biased sampling, xj (eqn 3): yij ¼ ðb0 þ ind0j Þ þ b1B xj þ e0ij ;

explaining these variance components. For example, experiences during development might have long-lasting effects on an individual’s later phenotype. One might therefore ask what proportion of between-individual variation (Vind0 ) in, for example, aggressiveness was due to early-life between-individual differences, such as variation in maternal hormones across eggs. Similarly, what proportion of within-individual variation (Ve0 ) is attributable to environmental factors? For such a question one could assay the phenotype of all individuals under the same range of environmental conditions (e.g. record aggressiveness over a gradient of conspecific density) and ask what portion of Ve0 was due to average within-individual plasticity. To address both of these possibilities, both between-individual (x1) and within-individual (x2) fixed effects can be included (eqn 4):

eqn 3 yij ¼ ðb0 þ ind0j Þ þ b1B x1j þ b2W x2ij þ e0ij ;

where xj is calculated by averaging the covariate (xij) (i.e. nestling age for parent j during a focal observation period (i) over all observations of the same parent j (i.e. xj ¼ 85 for nests sampled only when the nestlings were young, and xj ¼ 95 for nests sampled over all 4 days), and b1B represents the regression coefficient of the dependence of y on x at the between-individual level (‘B’ for between). Pseudo-repeatability can sometimes also be avoided by including additional random effects, for example territory identity to avoid overestimation of between-individual variance (Browne et al. 2007; van de Crommenacker et al. 2011; Text S4, Supporting information). Repeatability estimated from models controlling for between-individual fixed effects (e.g. b1B xj ; eqn 3) represents the proportion of ‘phenotypic variance not accounted for by fixed effects’ (VP VFIXED) explained by differences between individuals. This conditional repeatability (Glossary) will often be the biologically relevant parameter; for example, in our provisioning example, the raw repeatability was inflated due to failure to observe all nests for the same period of time and the conditional estimate represented the biological repeatability. Mixed-effect models with between-individual fixed effects ( xj or xj) can also be used when issues related to biased sampling schemes do not apply. Inclusion of between-individual fixed effects (like ‘sex’) enables calculation of average within-class repeatability. However, such exercises assume that Vind0 and Ve0 do not vary between the classes. In the section Comparing variance components across data sets (Text S7, Supporting information), we discuss how multivariate MMs can be used to test for violations of this assumption (see Dingemanse et al. 2012 for an univariate approach for dealing with the same issue).

explaining variance components Simple models like eqn 1 provide information about phenotypic variance exhibited between and within individuals. As a next step, researchers are often interested in factors

eqn 4

where yij represents the level of aggressiveness of individual j at instance i. x1 could represent maternal hormone levels in the egg from which individual j was born and x2 the conspecific density experienced by individual j at instance i. As x1 is a covariate that differs between but not within individuals, it represents a between-individual effect (hence B). In contrast x2 is a covariate that varies within an individual and thus represents a withinindividual effect (hence W). b1B and b2W are the coefficients relating, respectively, x1 and x2 to yij. Comparison of the values of Vind0 and Ve0 between a model where these between- and within-individual fixed effects were included (eqn 4) vs. excluded (e.g. eqn 1a) provides quantitative information on variance in each variance component explained by these fixed effects (for guidelines, see Snijders & Bosker 1999). Equation 4 assumes that all individuals experienced the same set of conditions for any within-individual fixed effect (x2 in eqn 4), such that the average value of such fixed effects would not vary among individuals. If this condition is not satisfied, the within-individual fixed effect would be conflated with between-individual variation, and within-subject centring [Glossary] methods may be needed to distinguish within- from between-individual effects (Snijders & Bosker 1999; van de Pol & Wright 2009) as detailed in Text S9 (Supporting information). Furthermore, effects of predictor variables with considerable measurement error, for example environmental states like predator density, will be estimated with bias and therefore require specific modelling approaches (e.g. Schafer 1987; Bartlett, De Stavola & Frost 2009). Care is also needed in avoiding spurious results due to failure to fit nonlinear effects of covariates, and the exact choice of the covariate (e.g. population vs. local density) generating specific results. When one might wish to estimate how much betweenor within-individual variation in one phenotypic attribute (e.g. a behavioural response) remains after controlling for


Quantifying individual variance components 45 variation in another (e.g. circulating hormone levels), multivariate MMs (introduced below), where both phenotypic attributes are treated as response variables (y and z), may instead be applied (see eqns S2a–d of the Text S8 (Supporting information) for how such conditional estimates are calculated). Multivariate models are particularly appropriate when it is not obvious which phenotypic attribute should be considered predictor vs. response.

estimating individual variation in plasticity Phenotypes have thus far been considered a function of between- and within-individual fixed effects, implying that the range of phenotypes expressed by a single individual can be characterized by a regression line with the same slope for all individuals (eqns 1, 3 and 4; Fig. 2a). Individuals differed in intercept of this reaction norm by including a random intercept for each individual (ind0j) (eqn 1b), but all individuals shared the same reaction norm slope (e.g. b2W in eqn 4). In other words, individuals could vary in their average phenotype but not in their phenotypic plasticity (Fig. 2a). Here, we extend MMs to include individual variation in reaction norm slopes (Fig. 2b). Consider our previous example wherein aggressiveness of individuals was a function of maternal hormone levels (b1Bx1j; eqn 4) and density (b2Wx2ij; eqn 4). In doing so, we assumed that all individuals responded in the same manner to density. However, this assumption may often not hold because plasticity varies among individuals (reviewed by Nussey, Wilson & Brommer 2007; Dingemanse et al. 2010; Mathot et al. 2012). In our previous example, this would be characterized by individuals increasing (or decreasing) aggressive behaviour to a greater degree than others for the same change in conspecific density (Fig. 2b). We can statistically model this relationship by including a withinindividual fixed effect covariate (xij; i.e. density) into our basic model (eqn 1), while also fitting random slopes (ind1j) around the population-average slope b1 of the dependence of yij on xij, which is called ‘random regression’ (Henderson 1982; Meyer 1998; Schaeffer 2004) (eqn 5a): yij ¼ ðb0 þ ind0j Þ þ ðb1 þ ind1j Þxij þ e0ij ;

slope (Vind1 ), and the covariance between intercepts and slopes (Covind0 ;ind1 ; eqn 5b), where the error variance (e0ij) is modelled as normally distributed with a mean of zero and an estimated within-individual variance (Ve0 ; eqn 5b): "

ind0j ind1j

#

MVNð0;Xind Þ : Xind ¼

Vind0 Covind0 ;ind1 Covind0 ;ind1 Vind1 ;

½e0ij Nð0;Xe Þ : Xe ¼ ½Ve0 eqn 5b Ωind is, notably, a symmetrical matrix: the elements below the diagonal are mirrored above the diagonal. Note further that the intercept-slope covariance can be expressed as a correlation (rind0 ;ind1 ), where pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rind0 ;ind1 ¼ Covind0 ;ind1 = Vind0 Vind1 , and that Vind1 , and the sign of rind0 ;ind1 , are specific to the measurement and scaling of the covariate (Schaeffer 2004); those parameters therefore cannot typically be compared across studies. Furthering our hypothetical measures of aggression, a negative intercept-slope correlation, as depicted in Fig. 2b, would suggest that individuals that have low average aggression scores compared with others also increase their aggression at a greater than average rate in response to increases in conspecific density. Interceptslope correlations often differ from zero, see for example Mathot et al. (2012). Importantly, the application of random regression implies that the between-individual variance is no longer stable over the (density) gradient, and Vind0 now uniquely represents the between-individual variance at a specific section of environment (i.e. where all covariates have the value zero). Similarly, if one would calculate repeatability using eqn 2, the estimated value would be solely applicable to that specific section of the data. Specifically, repeatability can vary dramatically over the gradient when rind0 ;ind1 is tight (Text S5, Supporting information), requiring the evaluation of important assumptions (Text S6, Supporting information). For example, failure to acknowledge the presence of nonlinear effects of reaction norm slopes would automatically lead to inappropriate conclusions about the presence of individual variation in plasticity.

eqn 5a

where, as above, yij would represent the level of aggression displayed by individual j at instance i. Here, xij is the density experienced by individual j at instance i. b1 corresponds to b2W in eqn 4 and represents the average withinindividual response to changes in density (i.e. the population-mean slope). A random intercept is fitted for each level of individual identity (+ind0j) as before. What is new here is that the individual’s response to density can deviate from the population-mean slope (+ind1j), modelled as being drawn from a bivariate normal distribution (MVN), with a mean of zero. The variances and covariances for this distribution are defined by the variance in intercepts among individuals (Vind0 ), between-individual variance in

sampling designs and sample sizes requirements What type of sampling designs and sample sizes are needed to estimate specific between- and within-individual (co)variance components? Minimum design requirements for the analysis of single phenotypic attributes are provide in scenarios 3 and 4 of Table 2. Sample sizes needed for the accurate estimation for these variance components are, in contrast, less obvious. Recommendations about optimal sample sizes vary substantially, depending on (i) what one aims to optimize (accuracy or power), (ii) the variance component of interest, and (iii) constraints imposed by the study system (Snijders & Bosker 1999; Maas & Hox 2004;


46 N. J. Dingemanse & N. A. Dochtermann Table 2. Four distinct sampling schemes (scenarios) and their estimable (co)variance components (defined in Table 1). We print ‘Data’ for points in time (1–4) where phenotypic data of the phenotypic attribute y and/or z have been collected for the same individual (1–3), and ‘–’ when no data has been collected Scenario 1

Scenario 2

Scenario 3

Scenario 4

Individual

Time

y

z

y

z

y

z

y

z

1 1 1 1 2 2 2 2 3 3 3 3 …

1 2 3 4 1 2 3 4 1 2 3 4 …

Data – – – Data – – – Data – – – …

Data – – – Data – – – Data – – – …

Data – – – Data – – – Data – – – …

– Data – – – Data – – – Data – – …

Data Data – – Data Data – – Data Data – – …



– – Data Data – – Data Data – – Data Data …

Scenario 1

Scenario 2

Scenario 3

Scenario 4

Component(s)a

Estimable?

Estimable?

Estimable?

Estimable?

Phenotypic variances of y and z (VPy , VPz ) Phenotypic covariance between y and z (CovPy ;Pz ) Between-individual variances of y and z (Vind0y , Vind0z ) Between-individual covariance between y and z (Covind0y ;ind0z ) Within-individual variances (Ve0y , Ve0z ) Within-individual covariance between y and z (Cove0y ;e0z )

Yes Yes No No No No

Yes Yes No No No No

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes No

a Between-individual variances in level of plasticity of y and z (Vind1y ; Vind1z ) can in principle be estimated in scenarios 3 and 4, provided that repeated measures data were taken in different contexts.

Scherbaum & Ferreter 2009; Hox 2010; Martin et al. 2011; van de Pol 2012). Unfortunately many published recommendations are drawn from situations relevant to the social sciences and have limited relevance for ecologists as they typically focus on cases where a few subjects (for example schools) are sampled with a great number of repeats (for example students). In contrast ecologists often work with larger numbers of subjects (individuals) but face constraints in the number of repeated samples that could possibly be collected. Martin et al. (2011) and van de Pol (2012) recently discussed sample size requirements for the estimation of intercept-slope correlations (rind0 ;ind1 ) and provide software for sampling optimization for several variance components. Thus, we focus here on simulations (detailed in Texts S13–S14, Supporting information) asking how sample size affects accuracy of and power to detect repeatabilities of different magnitudes. Our simulations imply that when repeatability exceeds 05 it can be demonstrated with acceptable statistical power (~ 08) with few (e.g. 25) individuals sampled only twice (Fig. 3b). For lower repeatability values 4 samples per individual are typically required whenever the total number of individuals is low ( 100; Fig. 3b). Moreover, it will generally not be possible to detect repeatabilities of 01 with only two repeats per individual (Fig. 3b)

as noted previously (Martin et al. 2011). Furthermore, accuracy of the estimated value of repeatability increases with true repeatability (Fig. 3a), suggesting that sampling considerations are particularly relevant for traits with low values of repeatability. Finally, optimal sample sizes greatly depend on the level of inaccuracy deemed acceptable. When the total sample size (number of individuals 9 number of repeats) is a constraining factor, inaccuracy can be decreased by increasing the number of samples per individual at the cost of the number of sampled individuals (Fig. 3b), though only when repeatability is 03.

summary Univariate MMs can address a wide range of questions regarding variation – and its sources – both between and within individuals (detailed in Table S1, Supporting information). Specifically, univariate MMs facilitate: 1 The estimation of between- and within-individual variation and repeatability; 2 The estimation of individual variation in plasticity; 3 Assignment of variation to fixed effects, and separation of within- and between-individual fixed effects; 4 Statistical control for nonrandom distributions of individuals over environments.


Quantifying individual variance components 47 (b)

(a)

Fig. 2. We illustrate here how between- and within-individual variance components are separated by plotting seven measurements of aggressiveness (y-axis) for five individuals (numbered) whose behaviour was assayed over a range of densities (x-axis). (a) Grey lines represent the average phenotypic value of each individual; the variance among lines represents the between-individual variance (Vind0 ). The variance in within-individual deviations from individual means represents the within-individual variance (Ve0 ). All lines are parallel, and individuals therefore do not vary in behavioural plasticity. This is not the case in panel (b) where individuals do differ in plasticity (Vind1 > 0) and where behavioural reaction norm intercepts and slopes are negatively correlated (rind0 ;ind1 < 0).

Multivariate MMs introduction Here, we discuss how multivariate MMs can be used to decompose phenotypic correlations. We detail first how between- and within-individual effects contribute to raw phenotypic correlations, as well as the biological underpinnings of correlations at each level. We then, for simplicity, focus on bivariate MMs, which may be applied whenever repeated measures for individuals are available for two phenotypic attributes (Table 2), for example a behavioural y and physiological z response. In Text S7 (Supporting information), we further detail how multivariate MMs may be used to ask whether specific variance components, or repeatabilities, of single phenotypic attributes differ between data sets (e.g. sexes, treatments, populations). In Text S16 (Supporting information), we also discuss how more complicated multivariate relationships can be evaluated.

As discussed earlier, rPy ;Pz does not, on its own, tell us much about the nature of the association between y and z because it is shaped by correlations at two distinct levels: between and within individuals. A betweenindividual correlation is present when individual mean values of y ( yj ) correlate with individual mean values of z ( zj ) (as in Fig. 4a, panel 3). A within-individual correlation exists when an individual’s change in y between time period t and t + 1 is correlated with its change in z over the same period (as in Fig. 4b, panel 4). Statistically, within-individual covariances are generated by covariances between deviations from individual mean values for y (i.e. yij yj ) and z (i.e. zij zj ) (Fig. 4b, panel 4). For two labile and repeatable phenotypic attributes (y and z), the between- and within-individual correlations jointly contribute to the phenotypic correlation as (eqn 6b; Dingemanse, Dochtermann & Nakagawa 2012):

rPy ;Pz

correlations between labile phenotypic attributes The association between two phenotypic characteristics – for example maximal metabolic rate (y) and basal metabolic rate (z) – is typically estimated by calculating a phenotypic correlation (rPy ;Pz ) (eqn 6a): rPy ;Pz

COVPy ;Pz ; ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VPy VPz

eqn 6a

where COVPy ;Pz is the covariance between maximal (y) and basal metabolic rates (z), and VPy and VPz are the corresponding phenotypic variances.

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Vind0y Vind0z ¼ rind0y ;ind0z Vind0y þ Ve0y Vind0z þ Ve0z sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Ve0y Ve0z þ re0y ;e0z ; Vind0y þ Ve0y Vind0z þ Ve0z eqn 6b

where the geometric mean repeatability – the square-root of the product of the repeatabilities of the two phenotypic attributes sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi! Vind0y Vind0z ; Vind0y þ Ve0y Vind0z þ Ve0z


48 N. J. Dingemanse & N. A. Dochtermann (a)

(b)

Fig. 3. Effects of sampling design on the accuracy of estimates of repeatability (for definition, see eqn 2), and power to ‘significantly’ identify nonzero values of repeatability. (a) Accuracy, estimated as the root mean square error (RMSE) of repeatabilities for varying numbers of individuals and samples per individual. RMSE was calculated based on an MM estimate of repeatability vs. the known repeatability (ranging from 01 to 09) used in the generation of simulated data. (b) Accumulation of power (1 b) relative to different numbers of individuals and number of samples per individual for the ability to detect repeatability. Simulation methods are detailed in Texts S13–S14 (Supporting information).

determines the contribution of the between-individual correlation (rind0y ;ind0z ) to the overall phenotypic correlation. Practically, rPy ;Pz will approximate the between-individual correlation (rind0y ;ind0z ) most closely for cases where y and z are both highly repeatable (as in Fig. 4a) compared with cases where they are not (as in Fig. 4b). Similarly, eqn 6b simplifies to rPy ;Pz ¼ re0y ;e0z for the unlikely scenario where y and z both completely lack betweenindividual variance (i.e. Vind0y ¼ 0 and Vind0z ¼ 0). Equation 6b can also simplify to rPy ;Pz ¼ rind0y ;ind0z when both phenotypic attributes completely lack within-individual variability. The latter condition could apply to suites of phenotypic attributes that become fixed in adulthood (e.g. skull dimensions, arm or leg lengths). For nonlabile traits, like these rind0y ;ind0z can be estimated from a single measurement per attribute per individual (assuming zero measurement error). Otherwise, one cannot appropriately infer rind0y ;ind0z nor re0y ;e0z without statistically decomposing rPy ;Pz (as discussed in the section Utility of MMs vs. alternatives). Hence, when research questions explicitly ask for the estimation of rind0y ;ind0z (e.g. behavioural syndrome research; Dingemanse, Dochtermann & Nakagawa 2012), or of both rind0y ;ind0z and re0y ;e0z (e.g. life history research; Reznick, Nunney & Tessier 2000), specific sampling designs (where each individual is assayed more than once) and special decomposition tools (multivariate MMs) will be necessary research requirements (detailed below). For labile phenotypic attributes, like behavioural, physiological or life history traits, that typically exhibit intermediate repeatabilities, rPy ;Pz will equal neither re0y ;e0z nor rind0y ;ind0z . For example, repeatabilities of behavioural responses average around 037 (Bell, Hankison & Laskowski

2009), implying an average geometric mean repeatability below 037. Consequently, within-individual correlations would influence phenotypic correlations at least (1–037)/ 037 = 170 times more strongly than would betweenindividual correlations. Phenotypic correlations for such labile phenotypic attributes therefore largely reflect within-individual correlations, and significant phenotypic correlations should not blindly be taken as evidence for between-individual correlations. Our reading of the current literature leads us to the conclusion that this concern is insufficiently appreciated. In fact, we found few ecological examples outside of the quantitative genetics literature where raw phenotypic correlations were statistically decomposed into within- vs. between-individual components (Browne et al. 2007; van de Crommenacker et al. 2011; Mutzel et al. 2011; Wilson et al. 2011; Adriaenssens & Johnsson 2012; Dochtermann et al. 2012).

within- and between-individual correlations: how they are caused and why they often differ What are the biological underpinnings of between-individual and within-individual correlations? Proximally, we can distinguish three main contributors to phenotypic correlations: genetic mechanisms, environmental mechanisms and methodological artefacts (Fig. 5), although other contributors also exist (e.g. gene-environment interactions; Sgro & Hoffmann 2004). In the Text S15 (Supporting information), we provide a suite of examples of ecological questions that can be addressed by partitioning correlations in between- and within-individual effects. Here,


Quantifying individual variance components 49 (a)

(b)

Fig. 4. Illustrations of situations where a positive phenotypic correlation (rPy ;Pz ) originates primarily from a positive (a) between-individual correlation (rind0y ;ind0z ) vs. (b) within-individual correlation (re0y ;e0z ), where each of nine individuals (numbers) is assayed once for each of two phenotypic attributes (y and z) within each of two time periods (scenario 3 in Table 2). For both situations we plot (from left to right): (1) y vs. z at t=1, i.e. y1j vs. z1j; (2) y vs. z at t = 2, i.e. y2j vs. z2j; (3) the average value of y vs. the average value of z, i.e. yj vs. zj ; (4) the deviation of each observation from the individual’s mean of y vs. z, i.e. ðyij yj Þ vs. ðzij zj Þ. Attributes y and z are tightly correlated at each point in time, either (a) due to a tight between-individual correlation because both traits have high repeatabilities or (b) due to a tight within-individual correlation because both traits have low repeatabilities.

we provide a brief summary before detailing the simplest implementations of MMs that may be used for such purposes. Genetic variation underpins phenotypic correlations whenever phenotypic attributes are linked through genetic correlations, for example, the same genes affect the expression of multiple phenotypic attributes (pleiotropy) or genes affecting the expression of one phenotypic attribute are correlated with the genes affecting the expression of another (linkage disequilibrium) (Lynch & Walsh 1998). Genes are attributes of individuals, and genetic variation thus contributes to variation at the between-individual level. Hence, genes that affect multiple aspects of the phenotype cause both between-individual (i.e. repeatable) variation in y and z (Vind0 ) and between-individual correlations (Fig. 5). Environmental variation can underpin correlations through a variety of mechanisms, both between and within individuals (Fig. 5). At the between-individual level environmentally-induced correlations are called permanent environment correlations, at the within-individual level simply environmental correlations (Fig. 5). Importantly, ‘permanent’ refers here to environmental variation causing between-individual differences over the time span within which the repeated measures were taken (Wilson et al.

2010); it does not imply that such environmental factors have effects that are permanent. Finally, within-individual correlations in particular can also be underpinned by correlated measurement errors (Fig. 5). This could be due to differences in accuracies or precision of equipment or result from other methodological practices (e.g. due to effects of order in which phenotypic assays are conducted; Dochtermann 2010). Fortunately, such biases can be quantified and statistically controlled either with appropriate sampling designs (Dochtermann 2010) or the inclusion of additional random effects (Text S4, Supporting information).

decomposing phenotypic covariances using mms Just as univariate MMs were used to decompose phenotypic variances, multivariate MMs can be used to decompose phenotypic covariances into between- and within-individual covariance components whenever repeated measures of two (or more) phenotypic attributes are available for a set of individuals (Table 2). Multivariate MMs share similar characteristics as discussed for univariate MMs except that the covariances between the response variables are explicitly considered. A bivariate equivalent


50 N. J. Dingemanse & N. A. Dochtermann of eqn 1 – where no fixed effects are included in the linear equation except for the constant (b0), and where the phenotypic (co)variance is decomposed between vs. within individuals – is (eqn 7a): yij ¼ ðb0y þ ind0yj Þ þ e0yij zij ¼ ðb0z þ ind0zj Þ þ e0zij

;

eqn 7a

where y and z represent two phenotypic attributes. As in the first examples for using univariate MMs, instance i for individual j is modelled here by fitting a random intercept for each level of individual (ind0j). Typically, b0y and b0z are modelled as being distinct (e.g. Matsuyama & Ohashi 1997). At first glance eqn 7a appears to simply be two univariate MMs. Importantly, this is not the case because of how the between- and within-individual effects are estimated with multivariate MMs. As was the case with univariate MMs, the random intercepts (ind0j) and the within-individual contributions (e0j) to y and z are modelled as having means of zero. However, in this bivariate case, neither the random intercepts nor the residual errors are independent. Instead, the random intercepts are distributed assuming a multivariate normal distribution with a variance-covariance structure (Ωind) specifying the between-individual variances (Vind0y and Vind0z ) and the between-individual covariance between the two attributes (COVind0y ;ind0z ; eqn 7b). The residual errors (e0ij) are likewise assumed to be drawn from a multivariate normal distribution, with means of zero, within-individual variances (Ve0y and Ve0z ), and within-individual covariances (COVe0y ;e0z ; eqn 7b): " # Covind0y ;ind0z Vind0y ind0yj MVNð0;Xind Þ: Xind ¼ Covind0y ;ind0z Vind0z ind0zj " # ; Ve0y Cove0y ;e0z e0yj MVNð0;Xe Þ: Xe ¼ Cove0y ;e0z Ve0z e0zj

eqn7b where the between- (rind0y ;ind0z ) and within-individual (re0y ;e0z ) correlations can be calculated from the betweenand within-individual variances and covariances as (eqn 7c,d): COVind0y ;ind0z rind0y ;ind0z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; Vind0y Vind0z

eqn 7c

COVe0y ;e0z ; re0y ;e0z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Ve0y Ve0z

eqn 7d

testing the influence of fixed effects on correlations The questions asked about between- and within-individual variances can also be asked about between- and within-

individual correlations. In the univariate MMs section, we detailed how the inclusion of fixed effects may help explain variance between and/or within individuals (eqns 3, 4, 5). One can apply the same approaches when asking questions about sources of covariance by fitting fixed effects into the bivariate MMs described in eqn 7. For example, inclusion of genotypic information as a fixed (or random) effect would reveal the extent to which Covind0y ;ind0z was determined by a particular gene with pleiotropic effects, whereas inclusion of the time of day a measurement was taken could reveal whether Cove0y ;e0z is attributable to diurnal variation.

sampling designs and sample sizes requirements For the analyses of correlations between phenotypic attributes, repeated observations for two (or more) phenotypic attributes are needed to estimate between- and within-individual correlations (Table 2). Beyond these general requirements, some specific sampling designs are as follows: 1 Two phenotypic attributes, y and z, are both assayed at the same (two or more) points in time (scenario 3 in Table 2; Fig. 4). For example, lay date (y) and clutch size (z) are both measured at the onset of each breeding season and repeated measures collected for all individuals that survive across seasons. Such a design allows for the estimation of between-individual and within-individual correlations (Table 2). 2 Two phenotypic attributes are both assayed repeatedly but never at the same time (scenario 4 in Table 2). For example, both foraging behaviour (y) and nestling provisioning effort (z) are assayed more than once for the same set of individuals, but the former is assayed once a month in winter, whereas the latter is assayed once per month in summer. Such a design also allows for the estimation of between-individual correlations but withinindividual correlations are nonestimable (Table 2). Other scenarios where each phenotypic attribute is assayed only once, either at the same point in time or at different points in time do not allow the decomposition of phenotypic (co)variances (scenarios 1 and 2; Table 2). Unfortunately, beyond the above mentioned general structural requirements there is little guidance regarding optimal sample sizes, an issue we addressed below using simulation studies. For different combinations of rind0y ;ind0z and re0y ;e0z , we asked whether the accuracy with which these parameters are estimated is a function of the number of individuals and number of samples per individual, and which of these two aspects is most important (simulation details are discussed in Texts S11–S12, Supporting information). We also discuss how statistical power necessary to detect significant between-individual correlations is a function of sample size for a range of between-individual correlations (where repeatability = 05 and re0y ;e0z ¼ 0).


Quantifying individual variance components 51

Fig. 5. Hierarchical diagram illustrating how ‘raw’ phenotypic correlations (rPy ;Pz ) are underpinned by the joined influences of between(rind0y ;ind0z ) and within-individual correlations (re0y ;e0z ), which are in turn shaped by genetic, permanent environment, environment and error correlations, which are themselves due to genetic and environmental variation and measurement error. A range of biological examples is given in Text S15 (Supporting information).

When traits have a repeatability of 05 and betweenindividual correlations are |05|, most sample sizes provide acceptable power ( 08) (Fig. 1b). When rind0y ;ind0z is |05| and 25–50 individuals are sampled, power to detect a between-individual correlation is initially lower than 08 but rapidly increases with sample size (Fig. 1b). In contrast, sample sizes greatly affect power to detect betweenindividual correlations below |05|: A large number of individuals ( 125) should be sampled more than twice to detect values of rind0y ;ind0z of 03 with acceptable power. When rind0y ;ind0z is 03 and individuals are sampled only twice, sufficient power is only reached with a total of 200 individuals. Power to detect between-individual correlations of |01| is always extremely low (Fig. 1b) and would require total sample sizes far larger than those considered in our simulations (2000). When the total sample size is limiting our simulations suggest that power is somewhat increased by favouring more individuals rather than more samples per individual. For example, a power of 08 to statistically detect values of rind0y ;ind0z of 05 may be achieved with either 75 individuals sampled for each attribute twice (total sample size = 150) or with 50 individuals sampled for each attribute four times (total sample size = 200) (Fig. 1b). In addition to considering power, the accuracy of estimates of between- and within-individual correlations might also be of concern because estimated values are less accurate for between- compared with within-individual correlations (Fig. 1a). Within-individual correlations are also estimated with greater accuracy than between-

individual correlations across all total sample sizes (Fig. 1a). When both correlations are of interest, sample sizes should thus be optimized with respect to betweenindividual correlations. Even with large sample sizes multivariate MMs should be applied with caution (see references cited above) as they bear a larger number of assumptions compared with univariate MMs, such as an assumption of multivariate normality (e.g. Snijders & Bosker 1999). Moreover, covariance estimates derived from multivariate MMs assume that the phenotypic attributes being examined are associated in a linear manner; transformation can sometimes alleviate violations of this assumption. Nonetheless decomposition of correlations into between- and withinindividual components necessitates these approaches (Dingemanse, Dochtermann & Nakagawa 2012).

summary Multivariate MMs can address a wide range of questions regarding covariation – and its sources – both between and within individuals. Specifically, multivariate MMs facilitate: 1 The decomposition of phenotypic correlations into between- and within-individual correlations (Figs 4 and 5); 2 Assignment of covariances to fixed effects; 3 Comparison of variance components, and repeatabilities, among data sets (Text S7, Supporting information).


52 N. J. Dingemanse & N. A. Dochtermann

Discussion We have detailed in this paper how mixed-effect models can be applied to estimate a suite of between- and withinindividual variance components (Table 1) of key importance to ecologists. Our study focused primarily on key questions of sampling designs (Table 2), sample sizes (Figs 1 and 3) and models to estimate specific variance components, while detailing how bias such as pseudorepeatability (Glossary) can be investigated and controlled for statistically, or how one would model how variation in ecological variables (e.g. conspecific density) affects the magnitude of variance components. We hope that our discussion of between- and withinindividual variance components (Table 1) will help facilitate the introduction of statistical definitions of biological patterns of ecological relevance. For example, the animal personality literature is full of anthropomorphic and misleading verbal terminology that hampers progress. The statistical framework reviewed here provides a means to define distinct terms statistically. For example, defining personality vs. plasticity as behavioural reaction norm intercepts and slopes respectively enables these distinct patterns of variation to be studied within a single framework (Dingemanse et al. 2010; Westneat et al. 2011) and compared across studies (Mathot et al. 2012), while statistically defining behavioural syndromes as nonzero between-individual correlations clarifies the types of study designs necessary for their study (Dingemanse, Dochtermann & Nakagawa 2012). We further hope that the application of these statistical models will enable researchers to develop biological hypotheses not previously considered in their study organisms, for example for why phenotypic correlations might vary within vs. between their individual subjects. Finally, above all we hope that this paper helps researchers to construct appropriate statistical models to test ecological hypotheses about how variance components are shaped by natural and sexual selection.

Acknowledgements We thank Dan Nussey for stimulating us to write this article, Jon Brommer, Wolfgang Forstmeier, Denis Reále, Dave Westneat and Jon Wright, for inspiring discussions on mixed-effect modelling, Tim Coulson, Jarrod Hadfield, Martijn van de Pol, Dave Westneat, and two anonymous reviewers for constructive editorial/reviewer comments, Yimen Araya-Ajoy, Cynthia Downs, and Shinichi Nakagawa for commenting on the manuscript. Dave Westneat kindly provided the SAS-code for the section ‘Do it yourself’ (Text S17, Supporting information). N.J.D. was supported by the Max Planck Society (MPG).

References Adriaenssens, B. & Johnsson, J.I. (2012) Natural selection, plasticity and the emergence of a behavioural syndrome in the wild. Ecology Letters, in press, doi: 10.1111/ele.12011. Austin, C.C. & Shaffer, H.B. (1992) Short-term, medium-term, and longterm repeatability of locomotor performance in the tiger salamander Ambystoma californiense. Functional Ecology, 6, 145–153.

Bartlett, J.W., De Stavola, B.L. & Frost, C. (2009) Linear mixed models for replication data to efficiently allow for covariate measurement error. Statistics in Medicine, 28, 3158–3178. Bell, A.M., Hankison, S.J. & Laskowski, K.L. (2009) The repeatability of behaviour: a meta-analysis. Animal Behaviour, 77, 771–783. Bennington, C.C. & Thayne, W.V. (1994) Use and misuse of mixed-model analysis of variance in ecological studies. Ecology, 75, 717–722. Benton, T.G. (2012) Individual variation and population dynamics: lessons from a simple system. Proceedings of the Royal Society of London Series B, 367, 200–210. Bergmu¨ller, R. & Taborsky, M. (2010) Animal personality and social niche specialisation. Trends in Ecology and Evolution, 25, 504–511. Bolker, B.M., Brooks, M.E., Clark, C.J., Geange, S.W., Poulsen, J.R., Stevens, M.H.H. & White, J.S.S. (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology and Evolution, 24, 127–135. Bolnick, D.I., Amarasekare, P., Araujo, M.S., Burger, R., Levine, J.M., Novak, M., Rudolf, V.H.W., Schreiber, S.J., Urban, M.C. & Vasseur, D.A. (2011) Why intraspecific trait variation matters in community ecology. Trends in Ecology & Evolution, 26, 183–192. Browne, W.J., McCleery, R.H., Sheldon, B.C. & Pettifor, R.A. (2007) Using cross-classified multivariate mixed response models with application to life history traits in great tits (Parus major). Statistical Modelling, 7, 217–238. Catry, P., Ruxton, G.D., Ratcliffe, N., Hamer, K.C. & Furness, R.W. (1999) Short-lived repeatabilities in long-lived great skuas: implications for the study of individual quality. Oikos, 84, 473–479. van de Crommenacker, J., Komdeur, J., Burke, T. & Richardson, D.S. (2011) Spatio-temporal variation in territory quality and oxidative status: a natural experiment in the Seychelles warbler (Acrocephalus sechellensis). Journal of Animal Ecology, 80, 668–680. Dingemanse, N.J., Dochtermann, N.A. & Nakagawa, S. (2012) Defining behavioural syndromes and the role of “syndrome deviation” to study its evolution. Behavioral Ecology and Sociobiology, 66, 1543–1548. Dingemanse, N.J. & Wolf, M. (2010) A review of recent models for adaptive personality differences. Philosophical Transactions of the Royal Society of London Series B, 365, 3947–3958. Dingemanse, N.J., Kazem, A.J.N., Reále, D. & Wright, J. (2010) Behavioural reaction norms: where animal personality meets individual plasticity. Trends in Ecology and Evolution, 25, 81–89. Dingemanse, N.J., Bouwman, K.M., van de Pol, M., van Overveld, T., Patrick, S.C., Matthysen, E. & Quinn, J.L. (2012) Variation in personality and behavioural plasticity across four populations of the great tit Parus major. Journal of Animal Ecology, 81, 116–126. Dochtermann, N.A. (2010) Behavioral syndromes: carry-over effects, false discovery rates and a priori hypotheses. Behavioral Ecology, 21, 437–439. Dochtermann, N.A. & Jenkins, S.H. (2011) Developing and evaluating multiple hypotheses in behavioral ecology. Behavioural Ecology and Sociobiology, 65, 37–45. Dochtermann, N.A., Jenkins, S.H., Swartz, M.J. & Hargett, A.C. (2012) The roles of competition and environmental heterogeneity in the mainteance of behavioral variation and covariation. Ecology, 93, 1330–1339. Dohm, M.R. (2002) Repeatability estimates do not always set an upper limit to heritability. Functional Ecology, 16, 273–280. Eisenhart, C. (1947) The assumptions underlying the analysis of variance. Biometrics, 3, 1–21. Endler, J.A. (1986) Natural Selection in the Wild. Princeton University Press, Princeton, NJ. Engqvist, L. (2005) The mistreatment of covariate interaction terms in linear model analyses of behavioural and evolutionary ecology studies. Animal Behaviour, 70, 967–971. Falconer, D.S. & Mackay, T.F.C. (1996) Introduction to Quantitative Genetics. Longman, New York. Fogarty, S., Cote, J. & Sih, A. (2011) Social personality polymorphism and the spread of invasive species: a model. American Naturalist, 177, 273–287. Forstmeier, W. & Schielzeth, H. (2011) Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse. Behavioral Ecology and Sociobiology, 65, 47–55. Frankenhuis, W.E. & Panchanathan, K. (2011) Balancing sampling and specialization: an adaptationist model of incremental development. Proceedings of the Royal Society of London Series B, 278, 3558–3565. Garamszegi, L.Z., Calhim, S., Dochtermann, N., Hegyi, G., Hurd, P.L., Jorgensen, C., Kutsukake, N., Lajeunesse, M.J., Pollard, K.A.,


Quantifying individual variance components 53 Schielzeth, H., Symonds, M.R.E. & Nakagawa, S. (2009) Changing philosophies and tools for statistical inferences in behavioral ecology. Behavioral Ecology, 20, 1363–1375. Gelman, A. & Hill, J. (2007) Data analysis Using Regression and MultiVariate/Hierarchical Models. Cambridge University Press, New York. Goldstein, H. (1995) Multilevel Statistical Models. Arnold, London. Hadfield, J.D. (2010) MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of Statistical Software, 33, 1–22. Hadfield, J.D., Wilson, A.J., Garant, D., Sheldon, B.C. & Kruuk, L.E.B. (2010) The misuse of BLUP in ecology and evolution. American Naturalist, 175, 116–125. Henderson, C.R. (1982) Analysis of covariance in the mixed model – higher-level, non-homogeneous, and random regressions. Biometrics, 38, 623–640. Houston, A.I. (2010) Models of metabolism and personality. Philosophical Transactions of the Royal Society of London Series B, 365, 3969– 3975. Hox, J.J. (2010) Multilevel Analyses. Techniques and applications, Routledge, New York. Jenkins, S.H. (2011) Sex differences in repeatability of food-hoarding behaviour of kangaroo rats. Animal Behaviour, 81, 1155–1162. Lessells, C.M. & Boag, P.T. (1987) Unrepeatable repeatabilities: a common mistake. Auk, 104, 116–121. Lynch, M. & Walsh, B. (1998) Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Maas, C.J.M. & Hox, J.J. (2004) The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis, 46, 427–440. Martin, J.G.A. & Reále, D. (2008) Temperament, risk assessment and habituation to novelty in eastern chipmunks, Tamias striatus. Animal Behaviour, 75, 309–318. Martin, J.G.A., Nussey, D., Wilson, A. & Reále, D. (2011) Measuring individual differences in reaction norms in field and experimental studies: a power analysis of random regression models. Methods in Ecology and Evolution, 2, 362–374. Mathot, K.J., Wright, J., Kempenaers, B. & Dingemanse, N.J. (2012) Adaptive strategies for managing uncertainty may explain personalityrelated differences in behavioural plasticity. Oikos, 121, 1009–1020. Matsuyama, Y. & Ohashi, Y. (1997) Mixed models for bivariate response repeated measures data using Gibbs sampling. Statistics in Medicine, 16, 1587–1601. McCulloch, C.E. & Searle, S.R. (2000) General, Linear and Mixed Models. Willey, New York. Meyer, K. (1998) Estimating covariance functions for longitudinal data using a random regression model. Genetics Selection Evolution, 30, 221– 240. Mutzel, A., Kempenaers, B., Laucht, S., Dingemanse, N.J. & Dale, J. (2011) Circulating testosterone levels do not affect exploration in house sparrows: observational and experimental tests. Animal Behaviour, 81, 731–739. Nakagawa, S. & Schielzeth, H. (2010) Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews, 85, 935–956. Nussey, D.H., Wilson, A.J. & Brommer, J.E. (2007) The evolutionary ecology of individual phenotypic plasticity in wild populations. Journal of Evolutionary Biology, 20, 831–844. Pinheiro, J.C. & Bates, D.M. (2000) Mixed Effect Models in S and SPLUS. Springer, New York. van de Pol, M. (2012) Quantifying individual variation in reaction norms: how study design affects the accuracy, precision and power of random regression models. Methods in Ecology and Evolution, 3, 268–280. van de Pol, M. & Wright, J. (2009) A simple method for distinguishing within- versus between-subject effects using mixed models. Animal Behaviour, 77, 753–758. Reále, D., Reader, S.M., Sol, D., McDougall, P. & Dingemanse, N.J. (2007) Integrating temperament in ecology and evolutionary biology. Biological Reviews, 82, 291–318. Reznick, D., Nunney, L. & Tessier, A. (2000) Big houses, big cars, superfleas and the costs of reproduction. Trends in Ecology & Evolution, 15, 421–425. Schaeffer, L.R. (2004) Application of random regression models in animal breeding. Livestock Production Science, 86, 35–45. Schafer, D.W. (1987) Covariate measurement error in generalized linear models. Biometrika, 74, 385–391.

Scherbaum, C.A. & Ferreter, J.M. (2009) Estimating statistical power and required sample sizes for organisational research using multi-level modeling. Organizational Research Methods, 12, 347–367. Schielzeth, H. & Forstmeier, W. (2009) Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology, 20, 416–420. Schuett, W., Tregenza, T. & Dall, S.R.X. (2010) Sexual selection and animal personality. Biological Reviews, 85, 217–246. Sgro, C.M. & Hoffmann, A.A. (2004) Genetic correlations, tradeoffs and environmental variation. Heredity, 93, 241–248. Sih, A., Cote, J., Evans, M., Fogarty, S. & Pruitt, J. (2012) Ecological implications of behavioural syndromes. Ecology Letters, 15, 278–289. Simmons, J.P., Nelson, L.D. & Simonsohn, U. (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. Snijders, T.A.B. & Bosker, R.J. (1999) Multilevel Analysis – An Introduction to Basic and Advanced Multilevel Modelling. Sage, London. Walsh, B. & Blows, M.W. (2009) Abundant genetic variation plus strong selection = multivariate genetic constraints: a geometric view of adaptation. Annual Review of Ecology Evolution and Systematics, 40, 41–59. Westneat, D.F., Hatch, M.I., Wetzel, D.P. & Ensminger, A.L. (2011) Individual variation in parental care reaction norms: integration of personality and plasticity. American Naturalist, 178, 652–667. Whittingham, M.J., Stephens, P.A., Bradbury, R.B. & Freckleton, R.P. (2006) Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75, 1182–1189. Wilson, A.J., Reále, D., Clements, M.N., Morrissey, M.M., Postma, E., Walling, C.A., Kruuk, L.E.B. & Nussey, D.H. (2010) An ecologist’s guide to the animal model. Journal of Animal Ecology, 79, 13–26. Wilson, A.J., de Boer, M., Arnott, G. & Grimmer, A. (2011) Integrating animal personality research and animal context theory: aggressiveness in the green swordtail Xiphophorus helleri. PLoS ONE, 6, 1–13. Wolf, M., van Doorn, G.S. & Weissing, F.J. (2008) Evolutionary emergence of responsive and unresponsive personalities. Proceedings of the National Academy of Sciences of the United States of America, 105, 15825–15830. Wolf, M. & Weissing, F.J. (2012) Animal personalities: consequences for ecology and evolution. Trends in Ecology and Evolution, 27, 452–461. Zuur, A.F., Ieno, E.N., Walker, N.J., Saveliev, A.A. & Smith, G.M. (2009) Mixed Effect Models and Extentions in Ecology with R. Springer, New York. Received 6 July 2011; accepted 13 September 2012 Handling Editor: Martijn van de Pol

Glossary Between-individual correlation: Phenotypic correlation at the between-individual level, that is the individual average phenotypic responses of two traits are correlated; called a behavioural syndrome in the context of behaviour (Dingemanse, Dochtermann & Nakagawa 2012). [rind0y ;ind0z ; eqn 7c] Behavioural reaction norm: The function describing the relationship between the behavioural phenotype and environmental gradient within the same individual (Martin & Reále 2008). [eqn 5] Between-individual variance: The amount of phenotypic variance attributable to differences between individuals in average phenotype. [Vind0y ; Table 1] Individual repeatability: The proportion of phenotypic variance that is attributable to differences between individuals, where phenotypic variance represents the sum of the between- and within-individual variance (Falconer & Mackay 1996). Personality: Variation among individuals in the intercept of their behavioural reaction norm (Dingemanse et al. 2010).


54 N. J. Dingemanse & N. A. Dochtermann Pseudo-repeatability: Biased (inflated) repeatability estimate because predictor variables that influence the phenotype within individuals vary between individuals because of a biased sampling scheme; called pseudo-personality in the context of behaviour (Westneat et al. 2011). Variance component: A random factor explaining phenotypic variance, such as individual or territory identity. Within-individual correlation: Phenotypic correlation at the within-individual level, that is, two phenotypic attributes show correlated changes within individuals. [re0y ;e0z ; eqn 7d] Within-individual variance: Amount of phenotypic variance attributable to differences in phenotype among observations of the same individual. [Ve0y ; Table 1] Within-subject centring: Expressing the observation of a covariate that varies both within and between individuals as a deviation from its mean value over all observations of the same individual (van de Pol & Wright 2009). [eqn S4]

Text S7. Comparing variance components across datasets. Text S8. Controlling for other labile attributes. Text S9. Fixed effects that vary within and between individuals. Text S10. Estimating covariances between reaction norms. Text S11. Accuracy of correlation estimates. Text S12. Power to detect between-individual correlations. Text S13. Accuracy of repeatability estimates. Text S14. Power to detect repeatability. Text S15. Causes and consequences of within- and betweenindividual correlations. Text S16. Testing hypothesized covariance structures.

Supporting Information

Text S17. Do it yourself.

Additional Supporting Information may be found in the online version of this article.

Data S1. Simulated dataset no. 1 (DataS1.txt).

Text S1. Error term distributions. Text S2. Examples of questions about individual variation. Text S3. How to include fixed effects. Text S4. Including additional random terms. Text S5. The effects of differences in plasticity on understanding repeatability.

Data S2. Simulated dataset no. 2 (DataS2.txt) Data S3. Simulated dataset no. 3 (DataS3.txt) As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Text S6. Important assumptions of random regression models.


Quantifying individual variation in behaviour: mixedâeffect modelling ...

Quantifying individual variation in behaviour: mixedâeffect modelling ...

Suggest Documents

Dispersal range analysis: quantifying individual variation ... - CiteSeerX

Modelling the effect of individual strategic behaviour

Individual variation in fear behaviour - Animal Studies Repository

Modelling life cycle related and individual shape variation in biological ...

AIDS, individual behaviour and the unexplained remaining variation

Individual User Behaviour Modelling for Effective Web ... - QUT ePrints

Individual Differences in Developmental Change: Quantifying ... - MDPI

Quantifying individual differences in dispersal using ...

Quantifying Individual Player Differences

Quantifying Individual Potential Contributions of

Quantifying Individual Potential Contributions of

Individual Investor Behaviour

Quantifying geographic variation in physiological ... - (Jeb) Byers

Quantifying Phenotypic Variation in Isogenic Caenorhabditis elegans

Individual Epigenetic Variation

Quantifying interspecific variation in foraging behavior ...

Quantifying genetic variation in environmental ...

Individual-Based Modelling - Hindawi

Individual-Based Modelling - Hindawi

Modelling individual globular clusters

Individual differences in searching behaviour and ...

Assessment of fidelity in individual level behaviour

Individual behaviour in group formation

Management modelling behaviour:

Quantifying individual variation in behaviour: mixedâeffect modelling ...