doi:10.1111/j.1420-9101.2008.01529.x
REVIEW
Bayesian approaches in evolutionary quantitative genetics R. B. O’HARA,* J. M. CANO, O. OVASKAINEN, C. TEPLITSKY ,à & J. S. ALHO *Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland Department of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland àMuse´um National d’Histoire Naturelle, CRBPO, Paris, France
Keywords:
Abstract
Bayesian analysis; evolution; hierarchical models; quantitative genetics; statistics.
The study of evolutionary quantitative genetics has been advanced by the use of methods developed in animal and plant breeding. These methods have proved to be very useful, but they have some shortcomings when used in the study of wild populations and evolutionary questions. Problems arise from the small size of data sets typical of evolutionary studies, and the additional complexity of the questions asked by evolutionary biologists. Here, we advocate the use of Bayesian methods to overcome these and related problems. Bayesian methods naturally allow errors in parameter estimates to propagate through a model and can also be written as a graphical model, giving them an inherent flexibility. As packages for fitting Bayesian animal models are developed, we expect the application of Bayesian methods to evolutionary quantitative genetics to grow, particularly as genomic information becomes more and more associated with environmental data.
Introduction In recent years, the use of quantitative genetics to tackle real-world evolutionary problems has seen considerable progress in several areas. On the biological side, the use of designed crosses and long-term pedigrees to infer quantitative genetic parameters has become more common (e.g. Merila¨ et al., 2001; Kruuk, 2004), and there have also been advances in the use of genetic markers to study quantitative genetic problems (e.g. Beraldi et al., 2007). Simultaneously, there have been advances in statistical methods, largely driven by advances in computational power, which have facilitated improvements in the extent to which data can be analysed. This has made it possible to fit more complex models, which better reflect biological reality, to the considerable amounts of data that are being generated. These developments have occurred in several fields, and for researchers investigating specific, real-world problems (such as the evolution of a focal population of a species), following these different advances can be difficult. Correspondence: Robert B. O’Hara, Department of Mathematics and Statistics, PO Box 68 (Gustaf Ha¨llstro¨min katu 2b), FI-00014 University of Helsinki, Helsinki, Finland. Tel.: +358 9 191 51479; fax: +358 9 191 51400; e-mail:
[email protected]
Underlying many of the computational advances has been the use of the Bayesian framework, which has become increasingly popular in many areas of the life sciences (e.g. Holder & Lewis, 2003; Beaumont & Rannala, 2004; Spiegelhalter et al., 2004) as an approach for fitting large, complex models to data. These methods give the scientist great flexibility to build models that reflect the data and problems being analysed, but the increased difficulty and their newness creates a barrier to their widespread use. Our purpose was to give an overview of the use of Bayesian methods in evolutionary quantitative genetics, and outline how these methods can help the field advance in near future. Our principal argument is that emerging biological questions and data will be difficult to deal with using classical statistics; so, the flexibility that Bayesian methods possess will lead to an increase in their use in evolutionary biology. Different types of data are being accumulated in evolutionary studies, from laboratory crosses, wild populations and newer data from molecular markers. These data can be used to ask how wild populations evolve, but their analysis needs new integrated approaches, such as those currently emerging in the Bayesian field. An important role is played by software that enables scientists to develop their own models, helping them to fit the models they want
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
949
950
R. B. O’HARA ET AL.
without having to code the complex algorithms themselves.
Statistical challenges in evolutionary quantitative genetics The foundations for quantitative genetics were provided independently by Fisher (1918) and Wright (1921a, b, c) in work which also provided the basis of several statistical ideas. This work provided the basis for animal breeding programmes for almost 80 years (Lush, 1937; Lynch & Walsh, 1998 and references therein). Advances in statistical theory – driven by problems in animal breeding – lead to the development of restricted maximum likelihood (REML) methods (Patterson & Thompson, 1971) and ultimately the animal model (Henderson, 1975). These developments in the analysis of pedigree data have only been transferred from animal breeding to natural populations and evolutionary problems relatively recently (e.g. Knott et al., 1995; Kruuk et al., 2000; Merila¨ et al., 2001). At the same time, Bayesian methods were being developed in animal breeding (e.g. Sorensen & Gianola, 2002; Gianola et al., 2003), but so far have rarely been used for evolutionary problems (for exceptions, see below). At present, maximum-likelihood (ML) and REML (Shaw, 1987) methods are mainly used to estimate quantitative genetic parameters (Blasco, 2001; Kruuk, 2004; Thompson, 2008). This is because these methods can handle large, complex pedigrees with computational efficiency. Typical problems in animal breeding involve the analysis of data from millions of individuals; so, computational efficiency is essential. This contrasts with the situation in evolutionary biology for two fundamental reasons. First, data sets are almost always much smaller, and hence making the fullest possible use of the data is more important than computational speed. Second, the questions to be addressed in evolutionary research are often more complex than in animal breeding: the focus is not on estimating the genetic quality of individuals, but rather on understanding the underlying causes of the observed genetic and phenotypic variability. For example, the aim may be to determine whether temporal variation in genotypic composition of a population is due to selection, or what traits are closely linked to fitness, or why quantitative traits have diverged between populations. The problems that need to be addressed in the statistical analyses are therefore different: they revolve more around accounting for the uncertainty associated with small data sets, accounting for the effects of uncontrolled factors such as viability selection (Hadfield, 2008) and combining the different sources of information in a single inference. This suggests that a trade-off can be made, with flexibility and full accounting for uncertainty being more important than computational efficiency. The small size of data sets in evolutionary biology is an inevitable result of the practical constraints arising from
the way they are collected: either sampling in the field over long periods of time or from making experimental crosses in the laboratory. This can lead to a lack of information, which need not manifest itself simply as a low number of individuals, but also through the number of links in the pedigree structure (see references in Kruuk, 2004). Depending on the structure of the pedigree and the model used, the information available to estimate some parameters can be small even for large pedigrees. For instance, the power to estimate genetic dominance (and several other parameters of the model) depends on the degree of relatedness among individuals in a pedigree, rather than on the overall number of individuals per se. Some breeding designs (e.g. nested fullsib designs, Lynch & Walsh, 1998) deliberately confound dominance with other effects, in which case the total number of individuals is not the limiting factor. Many of the problems associated with small amounts of data spring from difficulties in quantifying the estimation error properly. This can be especially difficult in models with a hierarchical structure (as is commonplace in evolutionary quantitative genetics), in which error can propagate from level to level. For example, a classic procedure in quantitative genetics is to use a twostep approach to estimate the additive genetic variance and the breeding values of the individuals in a pedigree (Lynch & Walsh, 1998; Kruuk, 2004): (1) use a ML ⁄ REML mixed effects approach to estimate additive genetic variance and other variance components, and (2) estimate the best linear unbiased predictions (BLUPs) of the breeding values conditioned on the REML estimates (Sorensen & Gianola, 2002). Because the predicted BLUPs are conditioned on the value of the variance components, the error in the variance is ignored when estimating their standard error. For extensive pedigrees with a very accurately estimated variance matrix this may not be of major concern, but with smaller pedigrees typical of evolutionary biology this may lead to overconfidence in the predicted breeding values. Even worse, selection on a trait can be quantified by regression of fitness against breeding values. But, because the breeding values are predicted with error, this regression will be biased downwards (Postma, 2006), it is simply regression towards the mean. The problems of error propagation become more acute when different kinds of information have to be integrated together. For example, predicting the response to selection in a set of quantitative traits requires the combination of information about genetic covariances and selection gradients, including all of their associated estimation errors (e.g. Lynch & Walsh, 1998, pp 180– 181). For example, Coulson et al. (2003) estimated selection on neonatal traits in red deer using a path analysis and a hierarchical decomposition of selection (van Tienderen, 2000). This allowed them to show how selection on neonatal traits was acting through different components
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Bayes, evolution, quantitative genetics
of fitness in different years (e.g. through offspring or mother survival), as these fitness components were affected differently by density and weather. The complexity of the data meant that the study used 55 different analyses per trait, with the results from one analysis being fed in (as point estimates) into the next. Although this allowed the authors to carry out their analyses using standard statistical packages and methods, they were not able to let the estimation errors propagate through the analysis; so, there may be undue confidence in the final result. More subtly, the estimates at the start of the process are not as precise as they could be, because the data were analysed sequentially, so that the analyses at the start of the process are able to inform those later, but the converse is not true – the extra information appearing later in the analysis cannot be fed back to the start of the sequence. In addition, ignoring estimation errors can lead to biased estimates of other parameters (e.g. error in variables regression); so, a simultaneous analysis can reduce this bias too. Therefore, the two major statistical challenges in evolutionary quantitative genetics are to: (1) account for all the uncertainty in the data, a problem especially when sample sizes are small; and (2) avoid splitting the problem into parts, and hence avoid the need to use different methods for each part and to allow the information to flow efficiently between different parts of the analysis. Here, we discuss how Bayesian inference can provide a more accurate way of estimating uncertainty, and a more flexible framework that allows the simultaneous estimation of all parameters of interest in a single model.
A brief introduction to Bayesian analyses The underlying idea in Bayesian inference is that uncertainty can be represented as a probability distribution, which summarizes what values of the parameters are likely (Gelman et al., 2004). The effect of data is to change this distribution, and formally this is carried out using Bayes’ rule: pðhjyÞ / pðyjhÞpðhÞ;
ð1Þ
where h denotes the parameter(s), and y the data. p(h) is the prior distribution, i.e. the distribution of the parameter before the data are seen, based on our prior belief. p(y|h) is the likelihood of observing the data given the parameter, and p(h|y) is the posterior distribution, i.e. the distribution of the parameter after the information in the data has been included. Because the posterior distribution is a proper probability distribution, it can be manipulated in the same way as any other probability distribution. For example, parameters that are not of interest (e.g. variance components when breeding values are being estimated, or missing data when this is present) can be integrated out. The distributions of the parameters that remain (their marginal distributions) will then
951
include the uncertainty in the removed parameters. Inferences made about these remaining parameters and functions of them will therefore incorporate all of the sources of uncertainty, and not (for example) give undue confidence because some of the nuisance parameters are fixed at a single value. For example, in the regression of fitness against predicted breeding values mentioned above, the Bayesian approach would be to estimate the distribution of the breeding values and integrate the regression over the posterior distributions of the breeding values (modern methods of computational statistics make this much easier in practice than it sounds). Another advance in the use of Bayesian methods comes from the development of graphical models (Fig. 1). The contribution of every variable to the posterior can be written as a product of factors that look like a prior distribution, and that resemble a likelihood (e.g. Lauritzen & Spiegelhalter, 1988). These can then be combined together in chains, with any parameter being in the prior distribution for some parameters, and the likelihood for others. Hence, local models can be built up into a larger structure, and the graphical nature of the model makes it easier to guarantee that the end result is still a consistent probabilistic model. For example, in the animal model, the expected value of a trait (which appears in the likelihood for the data), is assumed to be a sum of the fixed effects and several normally distributed terms, such as the additive, dominance and environmental effects. These, in turn, can be written as functions of other variables, for example the additive variance or environmental covariates such as temperature. Being able to write a model as a graph is of more than just theoretical interest, it also provides a method for building complex models by putting together simple models, and can even provide a visual, intuitive interface for the biologist developing their model (Fig. 1). A major practical problem with statistical inference based on complex models is fitting them to data. This is particularly so with Bayesian models, where large, multidimensional integrals need to be evaluated to compute the posterior distribution. As these evaluations are only possible analytically in special cases, most Bayesian analyses are carried out by simulating the posterior distribution, typically by Markov chain Monte-Carlo (MCMC) methods (e.g. Gilks et al., 1995). The graphical description of the model actually helps with constructing MCMC algorithms as well, as it allows the posterior to be broken down into smaller blocks, which can then be updated sequentially, either individually or in small blocks of parameters (Gilks et al., 1995). The graph (such as shown in Fig. 1) shows what other parameters are needed to update a given block of parameters. This general approach has been implemented in BUGS, a general purpose MCMC package (Thomas et al., 2006), which thus provides a flexible tool for Bayesian analyses. One criticism of Bayesian methods surrounds their use of prior distributions. These are, ultimately, subjective
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
952
R. B. O’HARA ET AL.
Fig. 1 A hypothetical example of a Bayesian hierarchical model represented as a graph. The data consist of mark–recapture data acquired for several generations, and the aim of the modelling exercise is the estimation of the variance components affecting survival. The model consists of three main components, which are a relatedness model, an animal model and a capture–recapture model. The relatedness model is used to infer the pedigree structure from data consisting direct observations (which can be considered as a prior) and genetic marker data. The pedigree feeds into the animal model through coefficients of co-ancestry, which are required for the estimation of additive effects. Finally, the survival rate of an individual depends on the additive and environmental effects. Any part of the model can be modified or extended in isolation. Rectangles represent constants, ellipses random variables and arrows conditional dependencies.
and when there is no strong prior knowledge, it may be difficult to find a suitable prior distribution. Ideally, these priors should have statistical properties that are equivalent to those for a frequentist estimation, e.g. their 95% confidence intervals should contain the true value 95% of the time. Recent investigations, following on from Gelman (2006), suggest that especial care should be taken with choosing an uninformative prior, but that noninformative and weakly informative priors for variance components (such as additive genetic variance) with good frequentist properties are available (O’Hara & Merila¨, 2005, Van Dongen, 2006a).
The uses of Bayesian analyses in evolutionary quantitative genetics Computational methods Although Bayesian methods were introduced into animal breeding in the early 1980s (Gianola & Foulley, 1983; Gianola & Fernando, 1986; Blasco, 2001), computational difficulties hampered their spread in the field. The development of new computational methods (i.e. MCMC) in the late 1980s and the continued increase in computer power meant that Bayesian methods could more readily be applied to quantitative genetics. As in the early 1990s, applications have been developing at a steady pace (Wang et al., 1993, 1994; Sorensen et al., 1994), covering an impressive array of topics and methods of paramount interest for evolutionary biologists. Estimation
techniques and associated software have been developed to fit multivariate normal animal models with an arbitrary number of causal components (Jensen et al., 1994; Van Tassel & Van Vleck, 1996; Ovaskainen et al., 2008), and extended to models with binary traits (Sorensen & Gianola, 2002; Kovacˇ & Groeneveld, 2003; Bennewitz et al., 2007). Recently, approaches using Gaussian Markov Random Fields to fit the multivariate animal model, assuming a variety of statistical distributions, have been developed with evolutionary applications in mind (Steinsland & Jensen, 2005). One problem in expanding the use of quantitative genetic models in evolutionary biology is the lack of suitable and easily usable software with which to conduct the analyses. While software, such as GIBBSF90 (Misztal et al., 2002), MTGSAM (Van Tassel & Van Vleck, 1996) and a Bayesian option in VCE (Kovacˇ & Groeneveld, 2003), are available, these have been developed for animal breeding problems rather than evolutionary analyses. It is therefore not always easy to extend the analyses to some of the inferences required in evolutionary biology. An alternative is to implement these methods in general statistical software packages, or in any programming language. However, doing this requires specific programming expertise; so, an easier solution would be to use general purpose MCMC software such as BUGS (Thomas et al., 2006). Several studies have used BUGS in evolutionary problems (see below), but these have analysed data from fixed crosses, which makes the models easier to set up and run.
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Bayes, evolution, quantitative genetics
Damgaard (2007) has shown how BUGS can be used to fit animal models, but this requires preprocessing of the data to transform it into a format that makes fitting the model in BUGS easier; so, the method is somewhat less attractive for the nonexperts. There is therefore a need for either an easy-to-use general purpose Bayesian animal model software package or a specific implementation of the animal model in BUGS. Although not yet routine, Bayesian methods have been applied to several evolutionary problems. So far, most of the applications have used designed crosses (for an exception, see Ros et al., 2004), for which the genetic models are easier to formulate than the general pedigree model. Although there are relatively few applications (as we now review), the range of problems is broad, and demonstrates some of the flexibility of the Bayesian approach in going beyond standard problems. Analysis of laboratory crosses As examples of the analysis of laboratory crosses, Pakkasmaa et al. (2003) and Merila¨ et al. (2004) analysed data from individuals measured in different environmental conditions, to disentangle the different variance components (genetic and environmental) affecting the traits, whilst separating out the effects of the different rearing environments. One aspect of these studies worth noting is that they did not have to assume that the traits were normally distributed, and included binary traits such as survival. Previous studies have estimated genetic parameters assuming normality for traits that are not normally distributed (such as lifetime reproductive success, e.g. Kruuk et al., 2000), but the effect of this on the inferences is unclear. Developmental instability Bayesian methods have also been used to fit hierarchical models to investigate the quantitative genetics behind developmental instability (DI). Van Dongen (2001) and Waldmann (2004) both developed Bayesian methods to analyse DI, fluctuating asymmetry (FA) and their inheritance. A crucial part of this (as was already recognized, e.g. Merila¨ & Bjo¨rklund, 1995; Whitlock, 1998) is that the trait (FA) needs to be measured several times, so that any variation due to measurement error can be estimated and included in the uncertainty of the estimate of FA. On top of this, because the trait (i.e. FA) is calculated as the variance of two observations, it is intrinsically poorly estimated (Van Dongen, 2006b). Van Dongen (2007) showed that, because of this, large sample sizes (of the order of several thousands of individuals) are needed to obtain reasonable estimates of heritability of FA. Van Dongen & Talloen (2007) looked at FA in several traits, and found that averaging over multiple traits can give a better estimate of FA, although the biological interpretation of this consensus FA may be unclear. Their
953
extension of the model to multiple traits allowed them to estimate the genetic and environmental correlations, although the confidence intervals were extremely wide and suggested that almost any genetic correlation was possible. Again, this implies that more data would be needed. Modelling mean and variance Ros et al. (2004) used Bayesian methods to fit a double hierarchical linear model (Lee & Nelder, 2006) to look at the genetic control of adult weight in the snail Helix aspersa, examining both the value of the trait and the phenotypic plasticity (i.e. the variance in the trait). They did this by modelling the genetic contributions to both the mean and variance of the trait, fitting a model where both the mean and the (log of the) variance in weight were considered as correlated traits. These were found to be strongly correlated, with higher breeding values for the mean weight being associated with more variation in weight. The flexibility of the Bayesian approach made it easier to fit the model to the data, despite the complex structure. Estimation of QST Bayesian methods have also been used to estimate quantitative genetic divergence between populations. From a quantitative genetic perspective, this can be measured with a statistic called QST (e.g. Spitze, 1993): QST ¼ VP =ðVP þ 2VA Þ;
ð2Þ
where VP is the additive genetic variation among populations and VA is the additive genetic variance within populations. This statistic is typically used in studies where molecular markers are used to estimate FST, and breeding designs to estimate different quantitative genetic components to assess variation between populations through QST. Under neutrality, QST should equal FST (e.g. Merila¨ & Crnokrak, 2001). Several studies have used a Bayesian approach to estimate QST (e.g. Palo et al., 2003; Cano et al., 2004; Waldmann et al., 2005; Evanno et al., 2006; Knopp et al., 2007). The estimates of QST have tended to have wide confidence intervals (e.g. Evanno et al., 2006), a reflection of the uncertainty in the data (O’Hara & Merila¨, 2005). Palo et al. (2003) went further than a simple QST–FST comparison, by looking at correlations between pairwise estimates of QST, FST and geographical distance. They concluded that the non-neutral phenotypic variation along a cline was explained by latitude, rather than by genetic distance. The Bayesian approach meant that the full posterior distribution of the correlations could be calculated, taking into account the nonindependence between the estimates. Failing to account for nonindependence would have lead to confidence intervals that were too narrow, and hence overconfidence in the
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
954
R. B. O’HARA ET AL.
estimates. This would have lead to the conclusion that some of the contradictory patterns seen were real and in need of explanation, rather than being the result of statistical noise. QTL analysis The earliest use of Bayesian methods in evolutionary quantitative genetics was in QTL analysis. Hurme et al. (2000) examined data where only maternal and offspring marker information was available. They therefore took advantage of the flexibility of Bayesian method to deal with the missing marker data, by treating the missing data as extra parameters to be estimated, and then integrating the parameters out. Their cross was of trees from southern and northern Finland, which differ in the length of their growth season (this being shorter in northern Finland), and hence in traits such as time of budset and frost hardiness. Hurme et al. were able to show the presence of four QTLs for budset, and seven for frost hardiness, i.e. 11 loci which have a noticeable effect on phenotypes that are subject of selection, and which may therefore have been involved in adaptation to the local environments.
Where the field is going? Evolutionary quantitative genetics is asking more and more complex questions. This entails combining data from several sources. This can be seen, for example, in recent work combining genetic and environmental data to show that environmental quality not only affects selection patterns (e.g. Milner et al., 1999; Coulson et al., 2003), but also levels of heritability in wild populations (Charmantier & Garant, 2005; Wilson et al., 2006). Longterm data provide rich material to link ecological and evolutionary dynamics, and Bayesian techniques are ideal for combining these together. Developments in molecular ecology and genomics are also feeding into evolutionary ecology, and will undoubtedly become increasingly important for evolutionary quantitative genetics in near future. Here, we outline some of the areas of work where Bayesian methods may help to integrate together these different types of data. Pedigrees are needed to estimate quantitative genetic parameters. Collecting them through behavioural observation is time consuming. An alternative approach would be to use markers to reconstruct the pedigree, and feed that into the quantitative genetic estimation. The Bayesian approach would then allow the uncertainties in the pedigree to be integrated into the whole analysis, as well as letting the phenotypic data to inform the pedigree reconstruction. Bayesian methods for pedigree reconstruction have already been developed. Hadfield et al. (2006) were able to combine the genetic information (from microsatellite data) with behavioural (mother dominance status) and spatial (distance between male
and offspring territories) data to estimate parentage in a Seychelles warbler population; so, nongenetic data can also be integrated into the estimation. Gasbarra et al. (2007) have developed a Bayesian method to reconstruct deeper pedigrees from genetic data; so, partially observed pedigrees could be completed, and fed into an animal model, again accounting for the uncertainty in the pedigree estimation (Falconer & MacKay, 1996). Because the space of possible pedigrees is huge, exploring that space can be difficult and computationally costly. A simpler approach is to estimate the relatedness matrix directly from the marker data, without going through the pedigree estimation step, and regressing the phenotypic difference in the trait of interest against pairwise relatedness (e.g. Ritland, 1996). This was recently evaluated by van Kleunen & Ritland (2005) who showed that the method underestimated heritability. This is expected, as the method is based on regressing pairwise differences in trait values against relatedness. The error in the estimates of relatedness is ignored; so, the estimate of the regression slope is biased downwards (e.g. see Gustafson, 2004). The Bayesian approach to the problem would allow the measurement error in the relatedness estimates to be included in the regression. This would give a more accurate estimate of the uncertainty in the estimates, as well as reducing the bias caused by ignoring the errors in estimating the relatedness. In future, the numbers of loci that are used in relatedness estimates will increase (van Kleunen & Ritland used eight allozymes), which will increase the precision of the estimates. However, a large number of loci are probably needed to obtain reasonable confidence intervals (Csille´ry et al., 2006; Oliehoek et al., 2006). In principal, this approach should be less accurate than estimating the pedigree fully, but should be quicker precisely because it misses out the step of estimating the pedigree. How much information is lost is an open question. The rapid developments in genetics and genomics will not just bolster the methods for asking current questions, but will also allow new questions to be asked. Focus is shifting towards asking about the actual genes involved in selection. For example, Slate et al. (2002) used QTL techniques to find for loci affecting birth weight in a natural population of red deer (Cervus elaphus). As genomic data accumulate and statistical methods develop, it will become possible to find the genes affecting fitness. For example, Hanski & Saccheri (2006) found that the enzyme phosphoglucose isomerase (Pgi) has substantial effects on the fitness of individuals of the Glanville fritillary butterfly, affecting both flight metabolic performance and fecundity. Biologically, the challenge will be to find the location of genes and loci affecting traits in wild populations, and determining the exact function of the individual genes and their combinations. Doing this will require the integration of phenotypic, pedigree and marker data. Bayesian methods
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Bayes, evolution, quantitative genetics
for QTL analysis, including the use of pedigree information, are already available (Yi & Xu, 2000), and models that include population structure (Yu et al., 2005) have already been developed and could be easily fitted into a Bayesian framework. Hence, the Bayesian approach has a great potential for integrating new molecular data into current models and analyses of ecological genetics. This will help us in describing the genetic architecture behind traits that are under selection, and identifying the genes whose frequencies are changing in response to selection. A key practical problem is the implementation of these estimation methods. At present, many of the packages that are available for fitting quantitative genetic models have been written for specific applications in animal breeding (see above). Data from natural populations will never be as clean, and problems with data collection and confounding variables will be more severe. In principle, the flexibility of Bayesian approaches means that these problems can be tackled, but implementation can be difficult. For many Bayesian problems, the BUGS package (Thomas et al., 2006) has been very successful in providing a platform for fitting complex hierarchical models, where the difficulties in the estimation are solved by the programme itself, rather than by the analyst. A number of analyses in quantitative and evolutionary genetics have already been carried out in BUGS (e.g. Palo et al., 2003; O’Hara, 2005; Sillanpa¨a¨ & Bhattacharjee, 2006; Damgaard, 2007). However, further development of tools specific to quantitative genetics is required to facilitate the analysis of large pedigrees in BUGS.
Conclusions Bayesian methods have considerable potential to help in the study of natural systems, which are characterized by complex networks of interactions and hence often require complex tools to tease apart the interactions. The role of the Bayesian approach in modelling such systems will be to provide glue that binds together the different sources of data, filling in the cracks that occur due to unavoidable problems with acquiring information, and ultimately allowing us to efficiently extract the information available in the data. This can best be performed by drawing together techniques and methods from different areas of research – animal and plant breeding, human genetics and statistics – and integrating them together to help us learn more about how evolution occurs in the wild.
Acknowledgments We would like to thank Juha Merila¨ for comments on earlier versions of this manuscript, and two referees for their suggestions. Our research has been supported by the Academy of Finland (through grant 205371 to R.B.O’H., 213457 and 211173 to O.O., 1108601 to
955
J.M.C. and 799001 to C.T.), and by the Finnish Ministry of Education to J.S.A.
References Beaumont, M.A. & Rannala, B. 2004. The Bayesian revolution in genetics. Nat. Rev. Genet. 5: 251–261. Bennewitz, J., Morgades, O., Preisinger, R., Thaller, G. & Kalm, E. 2007. Variance component and breeding value estimation for reproductive traits in laying hens using a Bayesian threshold model. Poultry Sci. 86: 823–828. Beraldi, D., McRae, A.F., Gratten, J., Slate, J., Visscher, P.M. & Pemberton, J.M. 2007. Mapping quantitative trait loci underlying fitness-related traits in a free-living sheep population. Evolution 61: 1403–1416. Blasco, A. 2001. The Bayesian controversy in animal breeding. J. Anim. Sci. 79: 2023–2046. Cano, J.M., Laurila, A., Palo, J. & Merila¨, J. 2004. Population differentiation in G matrix structure due to natural selection in Rana temporaria. Evolution 58: 2013–2020. Charmantier, A. & Garant, D. 2005. Environmental quality and evolutionary potential: lessons from wild populations. Proc. R. Soc. Lond. B 272: 1415–1425. Coulson, T., Kruuk, L.E.B., Tavecchia, G., Pemberton, J. & Clutton-Brock, T.H. 2003. Estimating selection on neonatal traits in red deer using elasticity path analysis. Evolution 57: 2879–2892. Csille´ry, K., Johnson, T., Beraldi, D., Clutton-Brock, T.H., Coltman, D., Hansson, B., Spong, G. & Pemberton, J. 2006. Performance of marker-based relatedness estimators in natural populations of outbred vertebrates. Genetics, 173: 2091–2101. Damgaard, L.H. 2007. How to use Winbugs to draw inferences in animal models. J. Anim. Sci. 85: 1363–1368. Evanno, G., Castella, E. & Goudet, J. 2006. Evolutionary aspects of population structure for molecular and quantitative traits in the freshwater snail Radix balthica. J. Evol. Biol. 19: 1071–1082. Falconer, D.S. & MacKay, T.F.C. 1996. Introduction to Quantitative Genetics, 4th edn. Longman, Essex, UK. Fisher, R.A. 1918. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinburgh 52: 399–433. Gasbarra, D., Pirinen, M., Sillanpa¨a¨, M.J., Salmela, E. & Arjas, E. 2007. Estimating genealogies from unlinked marker data: a Bayesian approach. Theor. Popul. Biol. 72: 305–322. Gelman, A.J. 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1: 515–534. Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. 2004. Bayesian Data Analysis, 2nd edn. Chapman & Hall ⁄ CRC, Boca Raton, FL. Gianola, D. & Fernando, R.L. 1986. Bayesian methods in animal breeding theory. J. Anim. Sci. 63: 217–244. Gianola, D. & Foulley, J.L. 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15: 201–223. Gianola, D., Perez-Enciso, M. & Toro, M.A. 2003. On markerassisted prediction of genetic value: beyond the ridge. Genetics 163: 347–365. Gilks, W.R., Richardson, S. & Spiegelhalter, D.J. 1995. Markov Chain Monte Carlo in Practice. Chapman & Hall ⁄ CRC, London, UK.
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
956
R. B. O’HARA ET AL.
Gustafson, P. 2004. Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman & Hall ⁄ CRC, London, UK. Hadfield, J.D. 2008. Estimating evolutionary parameters when viability selection is operating. Proc. R. Soc. Lond. B 275: 723– 734. Hadfield, J.D., Richardson, D.S. & Burke, T. 2006. Towards unbiased parentage assignment: combining genetic, behavioural and spatial data in a Bayesian framework. Mol. Ecol. 15: 3715–3730. Hanski, I. & Saccheri, I. 2006. Molecular-level variation affects population growth in a butterfly metapopulation. PLoS Biol. 4: 719–726. Henderson, C.R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31: 423–447. Holder, M. & Lewis, P.O. 2003. Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet. 43: 275–284. Hurme, P., Sillanpa¨a¨, M.J., Arjas, E., Repo, T. & Savolainen, O. 2000. Genetic basis of climatic adaptation in Scots Pine by Bayesian quantitative trait locus analysis. Genetics 156: 1309– 1322. Jensen, J., Wang, C.S., Sorensen, D.A. & Gianola, D. 1994. Bayesian-inference on variance and covariance components for traits influenced by maternal and direct genetic-effects, using the Gibbs sampler. Acta Agric. Scand. Anim. Sci 44: 193– 201. van Kleunen, M. & Ritland, K. 2005. Estimating heritabilities and genetic correlations with marker-based methods: an experimental test in Mimulus guttatus. J. Hered. 96: 368–375. Knopp, T., Crochet, P.-A., Cano Arias, J.M. & Merila¨, J. 2007. Contrasting levels of variation in neutral and quantitative genetic loci on Island populations of moor frogs (Rana arvalis). Conserv. Genet. 8: 45–56. Knott, S.A., Sibly, R.M., Smith, R.H. & Møller, H. 1995. Maximum likelihood estimation of genetic parameters in life-history studies using the ‘Animal Model’. Funct. Ecol. 9: 122–126. Kovacˇ, M. & Groeneveld, E. 2003. VCE-5 user’s guide and reference manual version 5.1. Department of Animal Sciences, University of Ljubljana, Slovenia. Kruuk, L.E.B. 2004. Estimating genetic parameters in natural populations using the ‘animal model’. Philos. Trans. R. Soc. Lond. B 359: 873–890. Kruuk, L.E.B., Clutton-Brock, T.H., Slate, J., Pemberton, J.H., Brotherstone, S. & Guinness, F.E. 2000. Heritability of fitness in a wild mammal population. Proc. Natl Acad. Sci. 97: 698–703. Lauritzen, S.L. & Spiegelhalter, D.J. 1988. Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. B 50: 157–224. Lee, Y. & Nelder, J.A. 2006. Double hierarchical generalized linear models (with discussion). J. R. Stat. Soc. A 55: 139–185. Lush, J.L. 1937. Animal Breeding Plans. Iowa State University Press, Ames, IA. Lynch, M. & Walsh, B. 1998. Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Merila¨, J. & Bjo¨rklund, M. 1995. Fluctuating asymmetry and measurement error. Syst. Biol. 44: 97–101. Merila¨, J. & Crnokrak, P. 2001. Comparison of marker gene and quantitative genetic differentiation among populations. J. Evol. Biol. 14: 892–903. Merila¨, J., Kruuk, L.E.B. & Sheldon, B. 2001. Cryptic evolution in a wild bird population. Nature 412: 76–79.
Merila¨, J., So¨derman, F., Ra¨sa¨nen, K., Laurila, A. & O’Hara, R. 2004. Local adaptation and genetics of acid-stress tolerance in the moor frog. Conserv. Genet. 5: 513–522. Milner, J.M., Albon, S.D., Illius, A.W., Pemberton, J.M. & Clutton-Brock, T.H. 1999. Repeated selection of morphometric traits in the Soay sheep on St Kilda. J. Anim. Ecol. 68: 472– 488. Misztal, I., Tsuruta, S., Strabel, T., Auvray, B., Druet, T. & Lee, D.H. 2002. BLUPF90 and related programs (BGF90). Proc. 7th World Congr. Genet. Appl. Livest. Prod., Montpellier, France. Communication No. 28-07. O’Hara, R.B. 2005. Comparing the effects of genetic drift and fluctuating selection on genotype frequency changes in the scarlet tiger moth. Proc. R. Soc. Lond. B 272: 211–217. O’Hara, R.B. & Merila¨, J. 2005. Bias and precision in QST estimates: problems and some solutions. Genetics 171: 1331– 1339. Oliehoek, P.A., Windig, J.J., van Arendonk, J.A.M. & Bijma, P. 2006. Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics 173: 483–496. Ovaskainen, O., Cano, J.M. & J. Merila¨, J. 2008. A Bayesian framework for comparative quantitative genetics. Proc. R. Soc. Lond. B 275: 669–678. Pakkasmaa, S., Merila¨, J. & O’Hara, R.B. 2003. Genetic and maternal effect influences on viability of common frog tadpoles under different environmental conditions. Heredity 91: 117–124. Palo, J.U., O’Hara, R.B., Laugen, A.T., Laurila, A., Primmer, C.R. & Merila¨, J. 2003. Latitudinal divergence of common frog (Rana temporaria) life history traits by natural selection: evidence from a comparison of molecular and quantitative genetic data. Mol. Ecol. 12: 1963–1978. Patterson, H.D. & Thompson, R. 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58: 545– 554. Postma, E. 2006. Implications of the difference between true and predicted breeding values for the study of natural selection and micro-evolution. J. Evol. Biol. 19: 309–320. Ritland, K. 1996. Marker-based method for inferences about quantitative inheritance in natural populations. Evolution 50: 1062–1073. Ros, M., Sorensen, D., Waagepetersen, R., Dupont-Nivet, M., SanCristobal, M., Bonnet, J.-C. & Mallard, J. 2004. Evidence for genetic control of adult weight plasticity in the snail Helix aspersa. Genetics 168: 2089–2097. Shaw, R.G. 1987. Maximum-likelihood approaches applied to quantitative genetics of natural-populations. Evolution 41: 812–826. Sillanpa¨a¨, M. & Bhattacharjee, M. 2006. Association mapping of complex trait loci with context-dependent effects and unknown context variable. Genetics 174: 1597–1611. Slate, J., Visscher, P.M., MacGregor, S., Stevens, D., Tate, M.L. & Pembertom, J.M. 2002. A genome scan for quantitative trait loci in a wild population of red deer (Cervus elaphus). Genetics 162: 1863–1873. Sorensen, D. & Gianola, D. 2002. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer-Verlag, New York. Sorensen, D.A., Wang, C.S., Jensen, J. & Gianola, D. 1994. Bayesian-analysis of genetic change due to selection using Gibbs sampling. Genet. Sel. Evol. 26: 333–360.
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY
Bayes, evolution, quantitative genetics
Spiegelhalter, D.J., Abrams, K. & Myles, J.P. 2004. Bayesian Approaches to Clinical Trials and Health Care Evaluation. John Wiley & Sons, Chichester, UK. Spitze, K. 1993. Population-structure in Daphnia obtusa – quantitative genetic and allozymic variation. Genetics 135: 367–374. Steinsland, I. & Jensen, H. 2005. Making Inference from Bayesian Animal Models Utilising Gaussian Markov Random Field Properties. Preprint, Statistics No. 10 ⁄ 2005, Norwegian University of Science and Technology, Trondheim, Norway. Available at: http://www.math.ntnu.no/preprint/statistics/2005/S102005.pdf. Thomas, A., O’Hara, B., Ligges, U. & Sturtz, S. 2006. Making BUGS Open. R News 6: 12–17. Thompson, R. 2008. Estimation of quantitative genetic parameters. Proc. R. Soc. Lond. B 275: 679–686. van Tienderen, P.H. 2000. Elasticities and the link between demographic and evolutionary dynamics. Ecology 81: 666– 679. Van Dongen, S. 2001. Modelling developmental instability in relation to individual fitness: a fully Bayesian latent variable model approach. J. Evol. Biol. 14: 552–563. Van Dongen, S. 2006a. Prior specification in Bayesian statistics: three cautionary tales. J. Theor. Biol. 242: 90–100. Van Dongen, S. 2006b. Fluctuating asymmetry and developmental instability in evolutionary biology: past, present and future. J. Evol. Biol. 19: 552–563. Van Dongen, S. 2007. What do we know about the heritability of developmental instability? Answers from a Bayesian model. Evolution 61: 1033–1042. Van Dongen, S. & Talloen, W. 2007. Phenotypic and genetic variations and correlations in multitrait developmental instability : a multivariate Bayesian model applied to Speckled Wood butterfly (Pararge aegeria) wing measurements. Genet. Res. 89: 155–163. Van Tassel, C.P. & Van Vleck, L.D. 1996. Multiple-trait Gibbs sampler for animal models: flexible programs for Bayesian and
957
likelihood-based (co)variance component inference. J. Anim. Sci. 74: 2586–2597. Waldmann, P. 2004. A quantitative genetic method for estimating developmental instability. Evolution 58: 238–244. Waldmann, P., Garcı´a-Gil, M.R. & Sillanpa¨a¨, M.J. 2005. Comparing Bayesian estimates of genetic differentiation of molecular markers and quantitative traits: an application to Pinus sylvestris. Heredity 94: 623–629. Wang, C.S., Rutledge, J.J. & Gianola, D. 1993. Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25: 41–62. Wang, C.S., Rutledge, J.J. & Gianola, D. 1994. Bayesian-analysis of mixed linear-models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91–115. Whitlock, M. 1998. The repeatability of fluctuating asymmetry: a revision and extension. Proc. R. Soc. Lond. B 265: 1429–1431. Wilson, A.J., Pemberton, J.M., Pilkington, J.G., Coltman, D.W., Mifsud, D.V., Clutton-Brock, T.H. & Kruuk, L.E.B. 2006. Environmental coupling of selection and heritability limits evolution. PLoS Biol. 4: e216. Wright, S. 1921a. Systems of mating. I. The biometric relations between parents and offspring. Genetics 6: 111–123. Wright, S. 1921b. Systems of mating. II. The effects of inbreeding on the genetic composition of a population. Genetics 6: 124–143. Wright, S. 1921c. Systems of mating. III. Assortative mating based on somatic resemblance. Genetics 6: 144–161. Yi, N. & Xu, S. 2000. Bayesian mapping of quantitative trait loci under the identity-by-descent-based variance component model. Genetics 156: 411–422. Yu, J., Pressoir, G., Briggs, W.H., Bi, I.V., Yamasaki, M., Doebley, J.F., McMullen, M.D., Gaut, B.S., Nielsen, D.M., Holland, J.B., Kresovich, S. & Buckler, E.S. 2005. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: 203–208. Received 4 December 2007; revised 26 February 2008; accepted 28 February 2008
ª 2008 THE AUTHORS. J. EVOL. BIOL. 21 (2008) 949–957 JOURNAL COMPILATION ª 2008 EUROPEAN SOCIETY FOR EVOLUTIONARY BIOLOGY