Methods in Ecology and Evolution 2011, 2, 155–162
doi: 10.1111/j.2041-210X.2010.00063.x
A unified approach to model selection using the likelihood ratio test Fraser Lewis1*, Adam Butler2 and Lucy Gilbert3 1
Scottish Agricultural College, King’s Buildings, West Mains Road, Edinburgh EH9 3JG, UK; 2Biomathematics and Statistics Scotland, The King’s Buildings, Edinburgh EH9 3JZ, Scotland, UK; and 3Macaulay Institute, Craigiebuckler, Aberdeen AB15 8QH, UK
Summary 1. Ecological count data typically exhibit complexities such as overdispersion and zero-inflation, and are often weakly associated with a relatively large number of correlated covariates. The use of an appropriate statistical model for inference is therefore essential. A common selection criteria for choosing between nested models is the likelihood ratio test (LRT). Widely used alternatives to the LRT are based on information-theoretic metrics such as the Akaike Information Criterion. 2. It is widely believed that the LRT can only be used to compare the performance of nested models – i.e. in situations where one model is a special case of another. There are many situations in which it is important to compare non-nested models, so, if true, this would be a substantial drawback of using LRTs for model comparison. In reality, however, it is actually possible to use the LRT for comparing both nested and non-nested models. This fact is well-established in the statistical literature, but not widely used in ecological studies. 3. The main obstacle to the use of the LRT with non-nested models has, until relatively recently, been the fact that it is difficult to explicitly write down a formula for the distribution of the LRT statistic under the null hypothesis that one of the models is true. With modern computing power it is possible to overcome this difficulty by using a simulation-based approach. 4. To demonstrate the practical application of the LRT to both nested and non-nested model comparisons, a case study involving data on questing tick (Ixodes ricinus) abundance is presented. These data contain complexities typical in ecological analyses, such as zero-inflation and overdispersion, for which comparison between models of differing structure – e.g. non-nested models – is of particular importance. 5. Choosing between competing statistical models is an essential part of any applied ecological analysis. The LRT is a standard statistical test for comparing nested models. By use of simulation the LRT can also be used in an analogous fashion to compare non-nested models, thereby providing a unified approach for model comparison within the null hypothesis testing paradigm. A simple practical guide is provided in how to apply this approach to the key models required in the analyses of count data. Key-words: information theoretic metrics, likelihood ratio test, model selection, non-nested models Introduction Choosing an appropriate statistical model is complex. It is also crucially important because inferences about the ecological system under study will be based on the statistical model(s) identified as being well supported by the data. Broadly speaking, quantitative approaches to model comparison can be divided into two main paradigms: information-theoretic
*Correspondence author. E-mail:
[email protected] Correspondence site: http://www.respond2articles.com/MEE/
approaches (ITMC) and more traditional approaches based upon null hypothesis testing (NHT). The key difference between the two approaches is that ITMC approaches compare sets of models in a symmetric fashion (in the sense that all models have, a priori, an equivalent status), whereas NHT approaches compare pairs of models and do so in an inherently asymmetric way (they assume that the null model is valid unless the data show that there is significant evidence to reject this in favour of the alternative model). There is ongoing debate within the ecological literature about the relative merits and drawbacks of the ITMC and NHT approaches. In recent years there has been a shift away
2010 The Authors. Methods in Ecology and Evolution 2010 British Ecological Society
156 F. Lewis et al. from the use of NHT approaches within ecological research, and towards the use of ITMC approaches. This paper does not attempt to revisit or review the debate about the relative philosophical merits of the ITMC and NHT approaches, or to establish the precise circumstances in which each approach should be used. Instead, it demonstrates that one of the key perceived practical limitations of the NHT approach [the restriction of the likelihood ratio test (LRT) to comparisons of nested models], can be overcome in a relatively straightforward way. It is contended that this is of substantial practical relevance to ecologists and evolutionary biologists, since it is widely accepted – even amongst proponents of the ITMC approach – that there are at least some circumstances in which the NHT approach provides an appropriate basis for model comparison. Burnham & Anderson (2002) and others (e.g. Posada & Buckley 2004) strongly advocate ITMC as a general framework for model comparison and model selection within observational studies (which are ubiquitous in ecology), but acknowledge that NHT methods can play a legitimate role in the analysis of designed experiments. Stephens et al. (2005) contend that NHT methods also have a valuable role to play in the analysis of observational studies that are concerned primarily with determining the effects of a single variable. Extending NHT to non-nested situations naturally leads, for example, to the consideration of comparisons between models which may differ by only a single parameter but which are structurally much more complex, and therefore give rise an arguably asymmetric comparison – testing for the existence of zero-inflation within the context of ecological count data is one such important practical example. The key criticisms of the NHT approach (by Burnham & Anderson 2002; Stephens et al. 2005, and others) relate to their use as a basis for selecting the ‘best’ model from amongst a candidate set of competing models, especially when this set is large and stepwise selection is required. Likelihood ratio tests provide an established and widely used basis for model selection within the NHT framework (Neyman & Pearson 1928a,b). LRTs are generally used to compare two nested models – i.e. in situations where one of the models is a special case of the other – with the null hypothesis that the data are drawn from the simpler of the two models. It is often assumed that LRTs can only be used to compare nested models. If true, this would be a substantial disadvantage of the LRT approach, since applied research often requires the comparison of models which are non-nested (i.e. neither model is a special case of the other). Neither the negative Binomial (NB) nor the negative Binomial with zero inflation (NBZIF) are, for example, nested within the negative Binomial with hurdle (NBH), so a comparison of the performance of these three models requires the comparison of non-nested models. The LRT can, in fact, be applied to both nested and nonnested pairs of models. Cox (1962) proposed a modification of the LRT that can be applied to non-nested model comparisons. The use of the LRT in this way, however, has to date been extremely rare in practice, at least within the ecological and biological literature, perhaps because the modification required to the usual likelihood ratio is problem-specific and
potentially mathematically challenging. Williams (1970) proposed an alternative approach based on the use of Monte Carlo simulation, which is very straightforward to implement. The simulation-based approach is relatively computationally intensive, but modern computing power means that this should no longer be a substantive limitation. The LRT can therefore be considered as a single unified test for performing both nested and non-nested model comparisons. Tests other than the LRT are also available for comparing non-nested models within an NHT framework (e.g. see Vuong 1989; Shapiro 2009), and one of these, the Vuong Test, is considered later in the analyses. The primary goal of this article is to demonstrate that the use of the LRT to compare both nested and non-nested models is straightforward, and provides a unified approach to model comparison within the paradigm of NHT in applied ecology and biology. In the Materials and methods section, the use of the LRT for comparing nested models is first reviewed, before illustrating how it can be extended, via simulation, to nonnested comparisons. This methodology is then illustrated using simulated count data, and using real count data on tick abundance (including a comparison with ITMC approaches). All relevant computer code is provided as part of the Appendix S1. This is followed by a brief discussion of the practical issues involved in using the LRT to compare non-nested models, and an outline of possible avenues for future work.
Materials and methods MODELLING AND INFERENCE
This paper is concerned with model comparison – with comparing the relative performance of two or more models describing the empirical properties of an observed data set. The word ‘model’ is used to refer to statistical models, such as linear regression models, generalized linear models, and zero-inflated Poisson regression models, which contain unknown parameters whose values can be estimated from data. Purely deterministic models are excluded from this framework. In richly parameterised models, such as those containing random effects, estimating the number of degrees of freedom (or alternatively the number of ‘effective’ parameters) is neither straightforward nor necessarily unambiguous (Hodges & Sargent 2001). Hence, attention here is restricted to models that contain fixed effects only. Two models are said to be ‘nested’ if one of the models constitutes a special case of the other model. A linear regression model which contains ‘temperature’ as a covariate, is, for example, nested within an otherwise identical model that includes both ‘temperature’ and ‘rainfall’ as covariates, because the former model can be obtained by fixing the coefficient associated with ‘rainfall’ in the latter model to be zero. A model that contains ‘temperature’ as a covariate is not, however, nested within a model that just contains ‘rainfall’ as a covariate. Each model is assumed to contain a set of unknown parameters, and the values of these parameters are assumed to have been estimated from an observed data set of size n using maximum likelihood. Let Li represent the likelihood value associated with the maximum likelihood estimate for the parameters of model i and let pi denote the number of unknown parameters within model i. When illustrating ideas the focus will be upon ecological count data, and will consider six possible distributions for the response
2010 The Authors. Methods in Ecology and Evolution 2010 British Ecological Society, Methods in Ecology and Evolution, 2, 155–162
Unified approach to model selection 157 variable: Poisson, Poisson with zero inflation (PZIF), Poisson with hurdle (PH), NB, NBZIF, and NBH. These models are all designed for unrestricted count data: see Martin et al. 2005; and Potts & Elith 2006, for specific details of these models and of their application to ecological data (and the Appendix S1 contains a brief comparison between hurdle and non-hurdle models for Poisson data). Three of the models (NB, NBZIF, and NBH) explicitly allow for overdispersion, whilst four models (PH, NBH, PZIF and NBZIF) explicitly allow for zero-inflation: these are both common features of ecological count data. Comparisons between the PH model and the Poisson and PZIF models are non-nested, as are comparisons between the NBH model and the NB and NBZIF models.
MODEL COMPARISON USING LIKELIHOOD RATIO TESTS
Comparing two nested models Begin by considering the situation in which there are two models, A and B, and model A is nested within B. The performance of the two models can be compared using the likelihood ratio test statistic (LRTS), Q ¼ 2 logðLB =LA Þ ¼ 2ðlogLB logLA Þ; which is equal to twice the log of the ratio of the likelihoods. The value of LB must be larger than or equal to that of LA because model A is a special case of model B. The LRTS must, therefore, be positive, even if the data have actually been drawn from the simpler of the two models (model A). Whether the additional complexity involved in fitting model B rather than model A leads to a greater improvement in performance than would have been obtained by chance alone is assessed by evaluating the distribution of Q under the null hypothesis that model A is the true model (this is called the ‘reference distribution’). It is conventional to reject the null hypothesis (that the simpler model is consistent with the data), and therefore to select model B over model A, if the LRTS Q exceeds the 95% quantile of the reference distribution. If the LRTS lies below this quantile then the null hypothesis is not rejected, and model A is selected in favour of model B. For nested models the reference distribution can be shown, asymptotically (i.e. as the sample size becomes sufficiently large), to be a chisquared distribution with pB - pA d.f., where pA and pB denote the number of parameters in model A and model B, respectively. It is standard to assume that this reference distribution also remains approximately valid for finite samples, so long as the sample size is moderately large.
observed ratio test statistic is, indeed, unaffected by changing from model A as the null model, to using model B as the null – only the sign is changed. The reference distribution that is obtained under the null hypothesis that model A is true will, however, differ from that which is obtained under the null hypothesis that model B is true, and may differ in potentially complicated ways. Williams (1970) presented a strategy for addressing both of these issues. Williams argues that both model A and model B should be considered as possible null models. The observed value of the LRTS, Q, can then fall into one of four categories: 1. An LRT with A as the null model is non-significant, but an LRT with B as the null model is significant: model A is therefore preferred over model B; 2. An LRT with B as the null model is non-significant but an LRT with A as the null model is significant: model B is therefore preferred over model A; 3. Both of the LRTs are significant: neither model is therefore deemed to be appropriate; or 4. Neither of the LRTs is significant: no discrimination between the two models is possible. Assume that model A is the null model, and that it is wished to evaluate the reference distribution of the LRTS under this null model. If it is possible to simulate from model A, then it is possible (Williams 1970) to construct an arbitrarily good approximation to the reference distribution via simulation, without needing to derive the mathematical properties of the distribution. The simulation-based approach is very straightforward, if somewhat computationally intensive, to implement: 1. Generate a large number, S, of simulated data sets from model A (with the parameters of the model taken to be equal to the maximum likelihood estimates obtained from fitting model A to the observed data set); 2. Fit models A and B to each of these simulated data sets, and, in each case, use the fitted models to calculate the value of the LRTS, Q(k) 3. Compare the observed value of the LRTS, Q, against the simulated values, Q(1),…, Q(S). If Q is extreme relative to the simulated values then the null hypothesis (that the data have been drawn from model A) is rejected. The P-value for this LRT will be approximately equal to the proportion of simulated test statistics (Q(1),…, Q(S)) that are larger in magnitude than the observed test statistic (Q). The quality of this approximation will improve as the number of simulations, S, increases. An equivalent approach can be used under the null hypothesis that the data have been generated from model B.
ALTERNATIVE APPROACHES TO MODEL COMPARISON
Comparing two non-nested models Now, consider the situation in which models A and B are non-nested. The LRTS, Q, still quantifies the relative performance of these models, but it is now possible for this value to be either positive or negative (because model A is no longer a special case of model B). There are two complications in extending the LRT to this situation: 1. The asymptotic distribution of the LRTS under the null hypothesis will not, in general, be chi-squared, and can be difficult to evaluate mathematically; and 2. It is not clear whether model A or model B should be treated as the ‘null’ model when performing the hypothesis test. The second of these complications requires some comment, because it is not immediately obvious that the results of the LRT will depend upon which model is used as the null. The magnitude of the
Vuong’s non-nested hypothesis test (Vuong 1989) provides an alternative to the standard LRT, and is based on the Kullback–Leibler Information Criterion. Vuong shows that a scaled version of the LRTS pffiffi Qv ¼ ðQ hÞ=f2 ðVðQÞnÞg converges, asymptotically (as the sample size, n, tends to infinity) to a standard normal distribution, N(0,1). V(Q) denotes the variance of the LRTS, whilst h denotes a bias correction term. The simplest version of the Vuong Test includes no adjustment for bias (h = 0), but an adjusted version in which h = (pB – pA) log n is commonly used. It is assumed that the normal distribution should, therefore, also provide a valid reference distribution for finite but reasonably large
2010 The Authors. Methods in Ecology and Evolution 2010 British Ecological Society, Methods in Ecology and Evolution, 2, 155–162
158 F. Lewis et al.
AICi ¼ 2 logLi þ 2 pi : Models with low AIC values will have a higher level of empirical support than models with high AIC values. The model with the lowest AIC is generally taken to be the best supported model, and models with AIC values that are only slightly larger than this (often models with a difference in AIC values of less than two, although there are certain circumstances in which this rule-of-thumb can be highly inappropriate; Burnham & Anderson 2002) are also taken to have a good level of empirical support. An interesting aside is that the 2p term in AIC is in fact a correction for bias rather than an explicit penalty designed to favour more parsimonious models, arguably contrary to popular belief (see Pan 1999 for details). The Bayesian Information Criterion (BIC) BICi ¼ 2 logLi þ pi logn; utilizes a more conservative penalty than AIC (unless the sample size n is extremely small), and will therefore select simpler models that contain less parameters. Note that there are close connections between AIC, BIC, the LRTS and the Vuong Test. The difference in AIC between two models i and j is equal to Q + 2(pi – pj), where Q is the LRTS for comparing models i and j. The difference in BIC between these models is equal to Q + (pi – pj) log n, which is equal to the (un-normalized) adjusted test statistic for the Vuong Test. Figure 1 illustrates the different values of Q that are required, using AIC, BIC and a standard LRT, in order to prefer the more complicated model over the simpler model in a comparison of two nested models. The LRT, Vuong Test, AIC and BIC are all designed to compare the performance of models that have been fitted to data via maximum likelihood estimation. Bayesian methods offer an alternative framework for statistical modelling and inference, and are rapidly being adopted in ecology (see review by Ellison 2004) and evolutionary biology (e.g. Ronquist & Huelsenbeck 2003 and Drummond & Rambaut 2007; which have very high citation rates). A range of different metrics for model comparison are available within the Bayesian framework. The standard approach for comparing two or more models is based on computing Bayes factors (Lewis & Raftery 1997). An alternative choice, which is extremely popular in ecology, is to use the Deviance Information Criterion (Spiegelhalter et al. 2002): this is relatively straightforward to calculate (unlike Bayes factors), but it is not without controversy (e.g. see the discussion in Spiegelhalter et al. 2002). Many authors have convincingly argued that it is preferable to use measures of model performance as a basis for simultaneously drawing inferences about models and parameter values, rather than as a basis for selecting a single model (or small set of models) that have relatively good performance. Burnham & Anderson (2002) argue that AIC provides a metric for multi-model inference within the context of
40
30
Qcrit
sample sizes, so the practical application of the test involves comparing Qv against the quantiles of a standard normal distribution. If Qv>c then model A is typically preferred over model B (with c chosen to reflect the desired level of significance), if Qv < )c then model B is preferred over model A, and if )c £ Qv £ c the performance of the models cannot be distinguished on the basis of the observed data. A key distinction between the simulation-based non-nested LRT and the Vuong Test is that the latter relies on an assumption of asymptotic normality whereas the former does not; the Vuong Test is therefore likely to be less accurate when the sample size is small. A completely different paradigm for model selection involves calculating the value of the Akaike Information Criterion (AIC) (Akaike 1973) for each model i, where
20
10
0
2
4
6
8
10
12
Number of parameters Fig. 1. The minimum cut-off value Qcrit of the likelihood ratio test (LRT) statistic Q between two nested models that is required in order for the more complicated model to be preferred over the simpler model, based on: a standard likelihood ratio test with P = 0Æ05 (purple), a standard LRT with P = 0Æ01 (blue), model selection using Akaike Information Criterion (AIC) (green), model selection using Bayesian Information Criterion (BIC) when n = 30 (brown), model selection using BIC when n = 100 (red), model selection using BIC when n = 500 (pink) and model selection using BIC when n = 10 000 (orange). E.g. LRT (at P = 0Æ05) is more conservative than AIC up to a difference of seven parameters. maximum likelihood estimation (Burnham & Anderson 2002), but most methodological statistical research in this area is focused upon Bayesian Model Averaging (Hoeting et al. 1999) – recent technical advances (see the review in Sisson 2005) have increased the popularity and accessibility of this relatively new and potentially technically challenging area (e.g. King & Brooks 2008).
CASE STUDY: FACTORS AFFECTING QUESTING IXODES RICINUS ABUNDANCE
A practical example of applying the LRT to non-nested model comparisons is presented using field data from a recent study designed to identify factors correlated to the questing abundance of I. ricinus. These data provide an excellent example of the relevance of nonnested model comparison when dealing with ecological count data and how this can be addressed using the LRT. ITMC approaches, based on AIC and BIC, are also considered throughout the analyses.
Survey design and data collection Questing ticks were surveyed in four broad habitat types: Callunadominated open moorland, grass-dominated open pasture, commercial conifer forest plantations and semi-natural mixed woodland (usually pine-dominated). Abundance surveys were conducted on 66 different patches of habitat within 13 different land holding units (estates and farms) in the Highland and Grampian regions of Scotland. Within most estates all four habitat categories were surveyed, and it was ensured that habitats surveyed within estates were adjacent to each other (to minimize the confounding effects of climate, aspect, altitude etc.) Adjacent habitats were also surveyed on the same day to minimize the confounding effects of daily changes in weather conditions on tick questing behaviour. All surveys were conducted within a 6 week period during July–August. To provide an index of relative tick abundance blanket drags were used to survey questing I. ricinus ticks (Gray & Lohan 1982). Between 10 and 21 blanket drags per habitat patch were conducted. For each
2010 The Authors. Methods in Ecology and Evolution 2010 British Ecological Society, Methods in Ecology and Evolution, 2, 155–162
Unified approach to model selection 159 drag a 1 m square of thick woollen material was used and pulled slowly for 10 m before turning it over and carefully counting and removing all larvae, nymphs and adult ticks. In the current analyses only nymph counts are considered. At the time and location of each drag, the temperature and relative humidity were recorded, in order to take into account conditions likely to affect questing behaviour. At the start, middle and end of each 10 m drag the height ⁄ density index of the ground vegetation was recorded, using a sward stick and the mean values were used to summarize each blanket drag. At the beginning and end of each drag, dung counts over a 1 m diameter area were recorded to estimate relative herbivore (tick host) abundance. Dung counts for species other than deer were extremely sparse and in the subsequent example only deer are considered and these counts are used to indicate presence (absence). Observations from a total of 800 blanket drags were available for analysis.
STATISTICAL ANALYSES
A comprehensive statistical analysis of these data is beyond the scope of this article. Important statistical issues such as the imperfect detectability of the survey method (e.g. blanket drags, see Daniels, Falco, & Fish 2000), and the potential impact of residual spatial dependence induced by the nested structure of the design are therefore ignored, and the focus is instead on two key characteristics of these data: zeroinflation and over-dispersion. A single response variable is considered: the number of nymph ticks per blanket drag. Initially six possible distributions for this variable are considered (Section 2.1), but, because the data showed overwhelming evidence of overdispersion, it was decided to restrict attention solely to the NB, NBZIF and NBH models (see the Appendix S1 for Poisson model comparisons). Likewise, eight potential covariates were initially considered – relative humidity; temperature; vegetation height; vegetation density; habitat type; presence of deer; use of fencing, and Julian day – but on the basis of preliminary analyses (see the Appendix S1 for more details) it was decided to restrict attention to the three of these (deer presence, habitat type and Julian day) that appeared to be most consistently and strongly related to tick abundance. Also, for simplicity, attention is restricted to models that only include main effects (but evidence for a habitat-by-deer interaction is briefly considered, see Section 3.1). These considerations lead to a set of 136 possible models (23 = 8 NB models, 8 · 8 = 64 NBZIF models and 64 NBH models). Note that residual spatial dependence would ideally be dealt with by including ‘habitat patch’ and ‘estate’ as nested random effects, but established procedures for fitting PH, NBH, PZIF and NBZIF models within a mixed modelling framework are not available. Akaike Information Criterion and BIC were used to identify the best supported models from within this candidate set. Stepwise selection using standard (nested) LRTs was also, for comparison, used as a basis for selecting the best supported model within each of the three distributional families (density types). The key part of the analysis, however, involves using the non-nested LRT to compare the three models that appeared to be best supported (according to both AIC and stepwise selection) within each of the three distributional families. The comparisons of the NBH and NBZIF models against the NB constitute, in effect, a test of the existence of zero-inflation relative to the NB model, and in this context it seems natural to regard the NB model as the null model. It seems natural to treat these comparisons in an asymmetric way (using NHT), given the considerable additional structural complexity required by the inclusion of a zero-inflation or hurdle term into the model. Comparisons between the NBH and NBZIF models involve determining the nature of any zero-inflation,
and in this context it seems appropriate to consider both the NBH and NBZIF as possible null models. All model fitting was done within R (R Development Core Team, 2006), in particular using the pscl (Zeileis, Kleiber, & Jackman 2008) and lmtest libraries (Zeileis & Hothorn 2002).
SIMULATION STUDY
In considering the LRT as a basis for non-nested model comparisons, it is of interest to examine how this approach performs on simulated data, when the model and parameters which generated the data are known. The Appendix S1 contains a simulation study comprising of 24 different simulated data sets, based loosely around the tick case study data, where each of these is fitted to twenty different count models comprising Poisson, NB, zero-inflation and hurdle densities. Models with random effects are also briefly considered. All relevant technical details are provided, along with all R code and extensive results comparing the performance of the LRT, AIC, BIC and the Vuong Test on each simulated data set for each model. In brief, over the range of data and models examined, as sample sizes become large (e.g. thousands of observations) there is strong consistency across the different model selection strategies. As sample sizes become smaller (e.g. several hundreds of observations) then the performance of each of the different strategies is highly variable, making it difficult to draw general conclusions as to the relative performance of each approach.
Results Results of the case study analysing tick abundance data are now presented.
MODEL COMPARISON USING INFORMATION CRITERIA
Akaike Information Criterion and BIC values were calculated for each of the 136 models under consideration (full details can be found in the Appendix S1). The model with the lowest AIC value is a NBH distribution with deer presence, habitat type and Julian day as covariates in both the zero and count parts of the model (13 parameters in total). Recall that covariates in the zero part of the model are those correlated with absence (presence) of ticks; covariates in the count part of the model are correlated to the count (left truncated at one) of ticks conditional on the zero part of the model. Five other models also have, according to AIC, a moderate level of empirical support: a model that is similar to the best model but excludes deer presence from the count part of the distribution (DAIC = 2Æ52, 12 parameters), a model that is similar to the best model but excludes Julian day from the zero part of the distribution (DAIC = 2Æ77, 12 parameters), an NBZIF model that excludes Julian day from the zero part of the distribution, (DAIC = 3Æ37, 12 parameters), an NBZIF model that includes all three variables in both the zero and count part of the model (DAIC = 5Æ25, 13 parameters), and an NBH model with Julian day and habitat in the count part and deer and habitat in the zero part (DAIC = 5Æ28, 11 parameters). The results that are obtained using BIC are substantially different from those obtained using AIC. The model with the lowest BIC value is the NB distribution with deer presence, habitat
2010 The Authors. Methods in Ecology and Evolution 2010 British Ecological Society, Methods in Ecology and Evolution, 2, 155–162
160 F. Lewis et al. type and Julian day as covariates (7 parameters). The only other moderately well-supported models are a NBH model with Julian day and habitat in the count part and deer and habitat in the zero part (DBIC = 4Æ55, 11 parameters), and a NB model containing deer and habitat (DBIC = 4Æ62, 6 parameters). The second best supported model in BIC is the sixth best supported model in AIC. The remaining five models with lowest AIC have a low level of support (DBIC values between 6Æ47 and 13Æ89).
Table 1. Application of non-nested LRT to tick case study, P-values for model selection between response densities: NB, NBZIF and NBH (based on 10 000 simulations). NB is always assumed the null hypothesis, in NBZIF vs. NBH each model is considered as the possible null hypothesis Alternative model
Null model
STEPWISE SELECTION
Backward and forward selection algorithms, based on standard likelihood ratio tests for nested models, were also, for comparison, used to select a set of covariates from within each of the three distributional models. Note that there are issues of multiple testing when performing a series of LRTs which can result in inflated false discovery rates, so the results of stepwise selection should be treated with caution (adjustments are available which attempt to address this issue – e.g. Benjamini & Hochberg 1995 – but have not been considered here). The forward and backward algorithms (using a cut-off for statistical significance of 0Æ05) selected identical models, and, in all cases, these models also turned out to be identical to those selected by AIC as having the highest degree of empirical support within each of the three distributional families. Note, however, that the P-values associated with individually dropping Julian day from the zero part (0Æ0291) or dropping deer presence from the count part (0Æ0336) of the best performing model within the NBH family were only marginally significant.
MODEL COMPARISON USING NON-NESTED LIKELIHOOD RATIO TESTS
The non-nested LRT, and Vuong Test, were used to compare the models that were identified (using both AIC and stepwise selection) to be the best fitting models from within each of the three distributional families (NB, NBZIF and NBH). Four comparisons are considered: 1. Negative Binomial as the null model and NBZIF as the alternative model; 2. Negative Binomial as the null model and NBH as the alternative model; 3. Negative Binomial with zero inflation as the null model and NBH as the alternative model; 4. Negative binomial with hurdle as the null model and NBZIF as the alternative model. The first two of these comparisons can be viewed as tests for the existence of zero inflation within over-dispersed count data, since the NBZIF and NBH models both allow the proportion of zeros to be higher than that given by the NB model. It is arguably natural to perform such comparisons within an NHT framework, on the basis that they can be regarded as being asymmetric: there is a standard model (the NB), and the task is to investigate if there are important features of the data that cannot be described using this model. The third and fourth comparisons seek to identify the nature of any zero-inflation
NB NBZIF NBH
NZIF
NBH