Considering Spatial Correlations Between Binary Response Variables

0 downloads 0 Views 140KB Size Report
Considering Spatial Correlations Between Binary Response Variables in. Forestry: An Example Applied to Tree Harvest Modeling. Mathieu Fortin, Simon ...
Considering Spatial Correlations Between Binary Response Variables in Forestry: An Example Applied to Tree Harvest Modeling Mathieu Fortin, Simon Delisle-Boulianne, and David Pothier Abstract: In forestry, many phenomena, such as tree mortality or harvesting, are thought to be spatially correlated. However, the statistical methods that account for spatial correlations with Bernoulli-distributed response variables are not well known. In this study, we implement a new approach recently developed by Bhat and Sener (2009). This approach is based on the Farlie-Gumbel-Morgenstern (FGM) copula family and was tested in a context of tree harvest modeling. Empirical and estimated Spearman’s correlation coefficients (SCC) were compared to assess the goodness of fit of the model. The empirical SCCs showed decreasing correlations as the distance increased between the trees. A copula including a correlation function based on a negative exponential function accounted for this trend. Although the FGM copula is limited to cases in which the dependence is moderate, it worked fairly well in this case study and resulted in a model that had a better fit than a traditional generalized linear mixed model. The comparison between this copula and other families of copula remains to be investigated. FOR. SCI. 59(3):253–260. Keywords: Farlie-Gumbel-Morgenstern copula, Spearman’s correlation coefficient, spatial dependence, correlated binary outcomes, maximum likelihood estimator

I

N FORESTRY, OBSERVATIONAL UNITS are often characterized by autocorrelation. For instance, trees within a sample plot are more likely to have similar growth behavior than trees from different plots. When a continuous response variable, such as diameter growth, is modeled, the occurrence of autocorrelation results in correlated error terms. These temporal or spatial correlations are known to affect statistical inferences, in the sense that confidence intervals no longer achieve their nominal coverage (Gregoire et al. 1995). As a consequence, the explanatory variables cannot be appropriately selected according to their significance levels. Over the last two decades, many articles in forestry literature have addressed this issue of autocorrelation with continuous response variables. The use of random effects and/or correlation structures has been found to be an efficient way to account for autocorrelation in linear and nonlinear modeling (e.g., Gregoire et al. 1995, Hall and Bailey 2001). The emergence of software that can implement the estimators for such complex models explains the popularity of mixed models and correlation structure models. In forestry, many variables of interest follow a Bernoulli distribution. For example, this is the case of binary outcomes such as tree mortality or tree removal by harvesting (e.g., Arii et al. 2008, Fortin et al. 2008a). Currently available software makes it possible to consider spatial or temporal dependence through the specification of random effects, which results in a generalized linear mixed model. If the random effect is specified on the model intercept, the

correlation is assumed to be fairly constant among individuals of the same cluster. If the random effect is specified on a covariate, the correlations may vary, depending on the value of the covariate. Because continuous response variables often exhibit decreasing correlations in time or space (Fortin et al. 2007, Fox et al. 2007a), it can reasonably be assumed that discrete variables do the same. Nevertheless, the methods for modeling these response variables under the assumption of spatially or temporally dependent correlations are still poorly known. The copula approach represents an alternative to random effects and correlation structures in the modeling of statistical dependence. This approach consists of defining a multivariate joint distribution, based on the marginal distributions of the response variables. Copula models have been used in forestry for modeling joint distributions of continuous random variables (e.g., Wang et al. 2008, 2010, Kershaw et al. 2010), but their application in a context of spatial-dependent correlations has not been thoroughly addressed. To our knowledge, Eskelson et al. (2011) present the only example in forestry literature of copula models that account for spatial correlations. Their study relied on a Gaussian copula with marginal response variables that followed beta distributions. In this study, we used a different copula formulation, which was developed by Bhat and Sener (2009), to demonstrate that this approach can be applied to binary variables. More specifically, we used tree harvest modeling data as a

Manuscript received October 18, 2011; accepted April 24, 2012; published online May 24, 2012; http://dx.doi.org/10.5849/forsci.11-129. Mathieu Fortin ([email protected]), AgroParisTech, AgroParisTech⫺Centre, Nancy, rue Girardet, France. Simon Delisle-Boulianne ([email protected]), Universite´ Laval, France. David Pothier ([email protected]), Universite´ Laval, France. Acknowledgments: We thank two anonymous reviewers for their helpful comments on a preliminary version of this article as well as Annabelle Moisan-de-Serres and Jean-Franc¸ois Provencher (Universite´ Laval) for their contribution to field measurements. Special thanks are due to Jean-Franc¸ois Belzile, David Be´langer, and Pascal Gauthier (Coope´rative forestie`re des Hautes-Laurentides) for their support in this project. William Parsons (Universite´ de Sherbrooke) edited the text. Copyright © 2013 by the Society of American Foresters.

Forest Science 59(3) 2013

253

real-world case study for implementing this modified copula. Our objective is not to discuss the harvest model itself, but rather to test whether or not this copula approach can improve the fit of the model and to determine how well spatial correlations are taken into account. In addition, the advantages and disadvantages of this copula approach compared with those of other commonly used statistical methods are discussed.

logit link function is used, the probability mass function of the Bernoulli distribution as shown in 2b is similar to the cumulative density function of the logistic distribution. Given this property, the FGM copula for a bivariate case, i.e., yi ⫽ (yi1, yi2)T, can be expressed as (Trivedi and Zimmer 2005, p. 15) ᐉ(yi 兩Xi , ␤, ␦) ⫽ f(yi1 兩xi1 , ␤) 䡠 f(yi2 兩xi2 , ␤) ⫻[1 ⫹ ␪ 共␦)(1 ⫺ f 共 yi1 兩xi1 , ␤))(1 ⫺ f 共 yi2 兩xi2 , ␤))]

Methods Statistical Developments Let yij be a Bernoulli-distributed random variable with i and j being the cluster and the observation indices, respectively, such that i ⫽ 1, 2,…, n and j ⫽ 1, 2,…, pi. Using a traditional logistic model, the probability of having yij ⫽ 1 is usually modeled through a logit link function as follows exij␤ ⬅ ␲ij Pr(yij ⫽ 1兩xij , ␤) ⫽ 1 ⫹ exij␤

(1)

where xij is a vector of predictors and ␤ is a vector of unknown parameters. Under the assumption of independence, the likelihood (ᐉ) of the response vector for cluster i, i.e., yi, is the product of the marginal probability masses

写 f共 y 兩x , ␤)

where ␪(␦) is a correlation function based on parameters ␦. To account for the correlation among the observations of a particular cluster, Bhat and Sener (2009) proposed a modified FGM copula for the multivariate case, which leads to the following likelihood function

ᐉ(yi 兩Xi , ␤, ␦,Zi ) ⫽



⫻ 1⫹



写 f(y 兩x , ␤) pi

ij

ij

j⫽1

冘 冘 (⫺1)

p i ⫺1



pi

y ij ⫹y ij⬘

␪ 共␦,zij ,zij⬘ )

j⫽1 j⬘⫽j⫹1

(1 ⫺ f(yij 兩xij , ␤))(1 ⫺ f (yij⬘ 兩xij⬘ , ␤))

pi

ᐉ共yi 兩Xi , ␤) ⫽

ij

(2a)

ij

j⫽1

f (yij 兩xij , ␤) ⫽



xij ␤

e 1 ⫹ exij␤

冊冉

xij ␤

y ij

1⫺

e 1 ⫹ exij␤



1⫺y ij

(2b)

where Xi is a matrix the rows of which are all the xij and f(yij兩xij, ␤) is the probability mass function of yij with prior knowledge of vectors xij and ␤. Re-expressing the likelihood function 2a as a function of ␤ conditional on yi and Xi and maximizing the logarithm of this new likelihood expression make it possible to obtain the maximum likelihood estimate of ␤. In terms of likelihood, a copula can be expressed as follows (Choros´ et al. 2010)

写 f共 y 兩x , ␤) pi

ᐉ(yi 兩Xi , ␤,␦) ⫽ c共ui1 , ui2 , . . . , uip i 兩 ␦)

ij

ij

(3)

(4)



(5)

where ␪(␦, zij, zij⬘) is a correlation function based on parameters ␦ and two vectors of covariates zij, and zij⬘ associated with observations j and j⬘ in cluster i. The value of function ␪(␦, zij, zij⬘) is bound to the interval [⫺1, 1]. If ␪(␦, zij, zij⬘) ⫽ 0, then the dependence is assumed to be negligible and the likelihood expression 5 reduces to 2a. Negative and positive estimates for ␪(␦, zij, zij⬘) indicate negative and positive correlations, respectively. For convenience, function ␪(␦, zij, zij⬘) can eventually account for the distance between the pair of observations if the observation coordinates are included in zij and zij⬘. For instance, Bhat and Sener (2009) defined ␪(␦, zij, zij⬘) as

␪ (␦, zij ,zij⬘ ) ⫽

e␦ 1/d共zij ,zij⬘ 兲 1 ⫹ e␦ 1/d共zij ,zij⬘ 兲

(6)

j⫽1

where c(ui1, ui2, …, uipi兩␦) is the density of the copula with variables uij being derived from the marginal distributions of the yij and ␦ is a vector of parameters that parameterize the copula function. Different parametric families of copula functions exist, such as the Farlie-Gumbel-Morgenstern (FGM) family, the Gaussian copula, and the Archimedean family, together with others that have been proposed in the statistical literature (cf. Trivedi and Zimmer 2005). In all cases, the copula function assumes that the margins are uniform on the range [0, 1], and, consequently, the derivations usually rely on the marginal cumulative density functions. An interesting property of the Bernoulli distribution is that the probability mass function already meets this requirement. Actually, when a 254

Forest Science 59(3) 2013

where vector ␦ had a single parameter and d(zij, zij⬘) was the Euclidean distance between observations j and j⬘ in cluster i. The value of the function ␪(␦, zij, zij⬘) is actually not the true correlation. In fact, because of its mathematical formulation, the FGM copula cannot accommodate correlations that exceed the range [⫺1/3, 1/3] (Trivedi and Zimmer 2005, Table 2.1). For bivariate logistic distributions such as the one underlying the likelihood function 5, Spearman’s rank correlation coefficient can be estimated as 3 䡠 ␪(␦, zij, zij⬘)/␲2 and is further restricted to the range [⫺0.304, 0.304] (Gumbel 1961, Equation 6.8). From 5, the log-likelihood function with respect to both ␤ and ␦, using the whole data set is

L(␤, ␦ 兩 y, X, Z) ⫽

pi

ij



i⫽1

ij

i⫽1 j⫽1

冘 ln 1 ⫹ 冘 冘 (⫺1) n

Implementation

冘 冘 ln(f(y 兩x , ␤)) ⫹ n

p i ⫺1

pi

y ij ⫹y ij⬘

␪ 共␦,zij ,zij⬘ )

j⫽1 j⬘⫽j⫹1

(1 ⫺ f (yij 兩xij , ␤))(1 ⫺ f(yij⬘ 兩xij⬘ , ␤))



(7)

An interesting property of the log-likelihood function 7 is that vectors ␤ and ␦ can be estimated simultaneously. Although the computations are more demanding, optimization remains feasible with most small- and medium-sized problems. On the other hand, the log-likelihood function suffers from a limiting condition. It is defined only if the density expressed in the additional term is positive. In other words, all the additional terms must be positive, i.e.,

冘 冘 (⫺1)

p i ⫺1

1⫹

pi

y ij ⫹y ij⬘

Many optimization algorithms are available in mathematics. In statistics, the Newton-Raphson algorithm is among the most efficient ones for optimizing log-likelihood functions of linear mixed models (Wolfinger et al. 1994). We relied on this algorithm to optimize the log-likelihood function 7. The algorithm is largely described in Wolfinger et al. (1994). Basically, the function reaches a maximum (or a minimum) when its first partial derivatives with respect to the model parameters tend toward 0. The algorithm approximates these first derivatives through a first-order Taylor expansion, i.e., g(␥r⫹1 兩y,X) ⬇ g(␥r 兩y,X) ⫹ ⌬r 䡠 H(␥r 兩y,X) g(␥r 兩y,X) ⫽



⭸L(␥ 兩y,X) ⭸␥

H(␥r 兩y,X) ⫽

(11b)

␥⫽␥r



⭸2 L(␥ 兩y,X) ⭸ 2␥

(11a)

(11c)

␥⫽␥r

␪ 共␦,zij ,zij⬘ )

j⫽1 j⬘⫽j⫹1

(1 ⫺ f(yij 兩xij , ␤))(1 ⫺ f(yij⬘ 兩xij⬘ , ␤)) ⬎ 0 for @i.

(8)

If condition 8 is not met, the natural logarithm is undefined, which makes it impossible to evaluate function 7.

Predictions The function f(yij兩xij, ␤) provides the marginal probabilities that yij ⫽ 1 or yij ⫽ 0 for observation j in cluster i. Likewise, the likelihood expression 5 provides the joint probability of observing the vector of observations yi. This joint probability behaves like the likelihood of any mixed model and can serve to enhance predictions. Considering a simple bivariate case, the joint probabilities can be calculated as ᐉ(yi ⫽ 共1,1兲T 兩 Xi , ␤, ␦, Zi ) ⫽ f 共 yi1 ⫽ 1兩 xi1 , ␤) 䡠 f 共 yi2 ⫽ 1兩 xi2 , ␤) ⫻[1 ⫹ ␪ 共␦,zi1 , zi2 兲共1 ⫺ f 共 yi1 ⫽ 1兩 xi1 , ␤))(1⫺f 共 yi2 ⫽ 1兩 xi2 , ␤))] ᐉ(yi ⫽ 共1,0兲T 兩 Xi , ␤, ␦, Zi ) ⫽ f 共 yi1 ⫽ 1兩 xi1 , ␤) 䡠 f 共 yi2 ⫽ 0 兩 xi2 , ␤) ⫻[1 ⫺ ␪ 共␦,zi1 , zi2 兲共1 ⫺ f 共 yi1 ⫽ 1兩 xi1 , ␤))(1⫺f 共 yi2 ⫽ 0 兩 xi2 , ␤))]

(9a)

where vector ␥ ⫽ (␤T, ␦T)T and r is the iteration of the optimization algorithm. The first derivatives and the second derivatives with respect to the model parameters are usually referred to as the gradient and the Hessian matrices, respectively. Setting the approximation 11a equal to 0 and solving the equation for term ⌬r yields ⌬ r ⫽ ⫺(H共␥r 兩y,X))⫺1g(␥r 兩y,X).

(12)

Term ⌬r is actually the optimization step for iteration r. The algorithm stops when a convergence criterion based on the relative improvement is met. According to likelihood theory, ⫺H(␥r兩y, X))⫺1 is an estimator of the variance-covariance matrix of the model parameters (Wolfinger et al. 1994). The log-likelihood function and the Newton-Raphson optimization algorithm were implemented in the software using the Java language. To facilitate the convergence of the log-likelihood function, we also implemented a grid search before the optimization. The grid search made it easier to select starting values for which condition 8 would be met.

(9b)

Correlation Diagnostics ᐉ(yi ⫽ 共0,1兲T 兩 Xi , ␤, ␦, Zi ) ⫽ f 共 yi1 ⫽ 0兩 xi1 , ␤) 䡠 f 共 yi2 ⫽ 1 兩 xi2 , ␤) ⫻[1 ⫺ ␪ 共␦,zi1 , zi2 兲共1 ⫺ f 共 yi1 ⫽ 0兩 xi1 , ␤))(1⫺f 共 yi2 ⫽ 1 兩 xi2 , ␤))] ᐉ(yi ⫽ 共0,0兲T 兩 Xi , ␤, ␦, Zi ) ⫽ f 共 yi1 ⫽ 0兩 xi1 , ␤) 䡠 f 共 yi2 ⫽ 0 兩 xi2 , ␤) ⫻[1 ⫹ ␪ 共␦,zi1 , zi2 兲共1 ⫺ f 共 yi1 ⫽ 0兩 xi1 , ␤))(1⫺f 共 yi2 ⫽ 0 兩 xi2 , ␤))]

(9c)

(9d)

If yi1 is observed, the conditional probability that yi2 ⫽ 1 can be derived as Pr(yi2 ⫽ 1兩Xi , ␤, ␦,Zi ,yi1 ⫽ 1) ⫽

ᐉ共yi ⫽ 共1,1兲T 兩Xi , ␤, ␦,Zi 兲 f共 yi1 ⫽ 1兩xi1 , ␤) (10)

With mixed models, Pearson product-moment correlation estimates can be calculated for different levels of grouping (e.g., Fortin et al. 2008b). The use of Pearson correlations assumes that the relationship between the two variables is linear, which is obviously not the case for two Bernoulli-distributed variables. In such a case, Spearman’s correlation coefficient (SCC) is more appropriate. Spearman’s correlation is within the Pearson family of correlation coefficients, but it is a nonparametric measure of statistical dependence, where ranks replace the actual values of the variables in the product-moment correlation formula. Empirical SCCs can be derived from the data in three steps. First, all possible pairs of observations within the Forest Science 59(3) 2013

255

clusters are computed. Second, the residuals, i.e., the observations minus the predicted probabilities ␲ˆ ij as shown in 1, are calculated for the observations of all pairs. Finally, SCC is calculated from the ranks based on the values of the residuals. Such an SCC is an estimate of the within-cluster correlation. If the distances between the observations are available, the same process can be done for different classes of distance. Note that the contribution of each pair of observations to the correlation estimate is the product of the ranks based on the values of the residuals. Considering the commutative property of multiplication, the order of the observations within each pair, i.e., which one is first or second, has no influence on the correlation estimate.

Case Study In the Province of Que´bec, Canada, public forests, of which approximately 15% are managed through selection cutting and other partial cutting treatments, cover more than 690,000 km2. Managing such large areas is challenging and requires a chain of models to predict future forest conditions. Because these conditions are intimately linked to future harvesting, forest managers need to identify the trees that are the most prone to being harvested whenever partial harvesting occurs in a particular stand. For this purpose, the authorities have decided to rely on empirical statistical models based on the most recent harvesting data from stands that have been harvested by partial cutting. These models are simple logistic models with binary outcomes, i.e., a particular tree is cut or it remains alive in the forest. They are not spatially explicit because tree coordinates are unavailable, but they take advantage of hundreds of thousands of observations. It can be reasonably assumed that harvesting follows a spatial pattern, which would result in spatial correlations between trees that are close to one another. In 2009, we undertook research to test this hypothesis.

Data A total of 55 randomly distributed sample plots were established over the study area, which was located northwest of Montre´al, Que´bec (46°29⬘ N, 76°21⬘ W). Each plot was circular and covered 900 m2, within which all living trees with dbh (1.3 m height) greater than 9.0 cm were Table 1.

considered in the analysis. The number of trees in each plot ranged from 26 to 62 trees, with an average of 46 individuals. For each tree, the species and dbh were recorded. The spatial coordinates of each individual were also measured from the center of the plot using a theodolite for the angle and a hypsometer for the distance. The plots were revisited after cutting to identify the trees that had been harvested. The data are summarized in Table 1.

Logistic Model We first defined the response variable as a binary variable, the value of which would be 1 if the tree had been harvested or 0 otherwise. Some preliminary trials showed that the probability of harvesting followed different trends, depending on the merchantability, which is defined according to the dbh. For broadleaved species, merchantability requires the dbh to be ⱖ23.1 cm, whereas conifers are considered as merchantable when their dbh is ⱖ9.1 cm. A dummy variable was created in the data set to indicate whether or not a tree was merchantable according to its species group. After several trials, the following model was found to have no major lack of fit and was then selected for the implementation of the modified FGM copula: Pr(yij ⫽ 1兩xij , ␤) ⫽

exij␤ 1 ⫹ ex ij␤

(13a)

xij ␤ ⫽ ␤0 ⫹ ␤1,s ⫹ 共␤2 ⫹ ␤3 mij 兲⌬dbhij ⫹ ␤4 mij ⌬dbh2ij (13b) where yij is the binary outcome (1 if harvested or 0 if not) for tree j in plot i, s is the species group index as reported in Table 1, mij is the dummy variable whose value is 1 if the tree is merchantable or 0 otherwise, ⌬dbhij is the difference between the observed dbh and the minimal merchantable diameter, i.e., ⌬dbhij ⫽ dbhij ⫺ 23.1 for broadleaved species and ⌬dbhij ⫽ dbhij ⫺ 9.1 for coniferous species with dbhij being expressed in cm. Actually, ⌬dbhij expressed the departure from the merchantable limit in terms of diameter. The value of this variable is negative when the tree is nonmerchantable and positive when it is. This offset ensured that the model converged toward the same probability

Summary of the data set. No. of trees Species group

s*

Betula alleghaniensis Britt. Acer rubrum L. Acer saccharum Marsh. Short-lived broadleaved species Fagus grandifolia Ehrh. Ostrya virginiana (Mill.) K. Koch. Long-lived coniferous species Abies balsamea (L.) Mill. Other broadleaved species Total

1 2 3 4 5 6 7 8 9

* Species group index.

256

Forest Science 59(3) 2013

Before harvesting

Harvested

169 128 1,077 49 714 136 39 114 81 2,507

30 23 222 19 101 29 6 59 4 493

dbh (1.3 m) range before harvesting (cm) 9.1–63.1 9.1–47.0 9.1–68.5 9.1–61.8 9.1–62.8 9.1–27.6 9.1–42.0 9.1–30.2 9.1–72.5

around the merchantable limit, regardless of the species group. To test the implementation of the FGM copula, we fitted four models using the aforementioned data. The first model was a simple generalized linear model under the assumption of independent observations. The second and third models implemented the FGM copula as shown in the log-likelihood function 7. In these second and third models, we assumed that vector ␦ had a single element and that the correlations were limited to the trees in the same plot. In the second model, we assumed that this correlation was constant and, therefore, spatially uniform, i.e., ␪(␦, zij, zij⬘) ⫽ ␦1. In the third model, the correlation was assumed to be spatially dependent and to follow a negative exponential function, i.e., ␪(␦, zij, zij⬘) ⫽ e␦1d(zij, zij⬘). The fourth model was a generalized linear mixed-effects (both random and fixed) model, which is a common approach for clustered binary data (e.g., Fortin et al. 2008a). In this model, we assumed that there was a plot random effect on the intercept such that Pr(yij ⫽ 1兩xij , ␤,bi ) ⫽

exij␤⫹b i 1 ⫹ ex ij ␤⫹b i

(14)

where bi is a plot random effect, which is normally distributed with mean 0 and variance ␴2plot. Because the plot random effects are unobserved, the parameters are estimated through the following marginal likelihood function (Pinheiro and Bates 2000, p. 62) ᐉ共yi 兩Xi , ␤, ␴2plot兲

写 f共 y 兩x , ␤,b 兲db pi

⫽ 兰␸共bi 兩0, ␴2plot兲 䡠

ij

ij

i

i

j⫽1

共15兲

where ␸(bi兩0, ␴2plot) is the probability density function of the normal distribution with mean 0 and variance ␴2plot. Estimates of ␤ and ␴2plot are obtained by maximizing the loglikelihood function derived from 15. Because the integral usually has no closed-form solution, this optimization is done through numerical approximation. Algorithms such as the Laplacian approximation or the Gaussian quadrature are generally used for the optimization (cf. Pinheiro and Bates 1995). The glmer function in the lme4 package in R implements the Laplacian approximation algorithm (Bates et al. 2011). For the sake of simplicity, we will refer to these four models as the simple model, the uniform-correlation model, the spatial-correlation model, and the mixed model, respectively. Goodness of fit of the four models was compared with respect to their maximum log likelihoods, and Akaike and Bayesian information criteria (cf. Pinheiro and Bates 2000, p. 84). Further, we computed empirical SCCs for 1-m distance classes and compared these with the SCCs predicted by the spatial-correlation model.

Results The fit statistics and maximum likelihood estimates of the parameters are shown in Table 2. All parameter estimates were significantly different from 0 at a probability level of ␣ ⫽ 0.05. The fixed-effect parameter estimates and their SEs were of the same magnitude across the models. The fit statistics clearly indicated that the observations were correlated within the plots, because the uniform-correlation, spatial-correlation, and mixed models had better fits than the simple model. Between the three models that took into account the correlations, the spatial-correlation model had

Table 2. Fit statistics and maximum likelihood estimates of the parameters for the simple, uniform-correlation, spatial-correlation, and mixed models.

Simple model Fit statistics ⫺Log-likelihood Akaike Information Criterion Bayesian Information Criterion Parameter estimates ␤ˆ 0 ␤ˆ 1,1 ␤ˆ 1,2 ␤ˆ 1,3 ␤ˆ 1,4 ␤ˆ 1,5 ␤ˆ 1,6 ␤ˆ 1,7 ␤ˆ 1,8 ␤ˆ 1,9 ␤ˆ 2 ␤ˆ 3 ␤ˆ 4

␦ˆ 1 ␴ˆ 2plot

Uniformcorrelation model

Spatialcorrelation model

Mixed model

⫺1,091.92 2,207.84 2,277.76

⫺1,077.16 2,180.32 2,256.07

⫺1,062.14 2,150.27 2,226.02

⫺1,074.23 2,174.45 2,250.20

⫺4.857 (0.553) 1.813 (0.567) 2.354 (0.583) 2.017 (0.532) 3.175 (0.615) 1.842 (0.541) 2.581 (0.576) 1.101 (0.699) 4.076 (0.571) NA ⫺0.106 (0.017) 0.295 (0.035) ⫺2.950 ⫻ 10⫺3 (0.600 ⫻ 10⫺3) NA NA

⫺4.646 (0.556) 1.723 (0.565) 2.323 (0.583) 1.953 (0.531) 3.104 (0.620) 1.714 (0.542) 2.382 (0.575) 1.147 (0.693) 3.940 (0.570) NA ⫺0.104 (0.017) 0.286 (0.034) ⫺2.867 ⫻ 10⫺3 (0.594 ⫻ 10⫺3) 0.181 (0.039) NA

⫺4.675 (0.555) 1.761 (0.566) 2.280 (0.587) 2.029 (0.533) 3.109 (0.625) 1.741 (0.542) 2.411 (0.577) 1.242 (0.693) 4.135 (0.572) NA ⫺0.105 (0.017) 0.285 (0.034) ⫺2.815 ⫻ 10⫺3 (0.586 ⫻ 10⫺3) ⫺0.170 (0.028) NA

⫺4.855 (0.574) 1.637 (0.586) 2.281 0.604 1.942 (0.548) 3.151 (0.650) 1.685 (0.560) 2.392 (0.595) 1.169 (0.715) 4.135 (0.592) NA ⫺0.108 (0.017) 0.300 (0.036) ⫺3.059 ⫻ 10⫺3 (0.630 ⫻ 10⫺3) NA 0.282†

NA, not applicable. SEs in parentheses. † The glmer function in R does not provide SEs for the covariance parameters. Forest Science 59(3) 2013

257

the better fit. This result indicated that the correlations were not uniform and tended to decrease with increasing distance between the individuals (Figure 1). The decreasing pattern of SCC estimates produced by the spatial-correlation model closely matches the empirical SCC estimates, whereas the uniform-correlation model clearly failed to reproduce this pattern (Figure 1). However, the spatial-correlation model slightly underestimated correlations between individuals that were less than 1 m apart and overestimated those correlations for distances greater than 1 m.

Discussion In this study, we implemented the modified FGM copula developed by Bhat and Sener (2009) in a context of tree harvest modeling. Our results showed that the correlations were not spatially uniform; rather, they tended to decrease with increasing distance between individuals. Decreasing temporal or spatial correlations have already been observed for continuous variables such as diameter increment (e.g., Fox et al. 2007a) and have been taken into account in single-tree growth models (e.g., Fox et al. 2007b, Fortin et al. 2008b). However, for binary responses that are typical of processes such as tree mortality or harvesting, possible distance-dependent correlations have not been explored in most studies related to forestry. To our knowledge, this study represents a first attempt to formalize tree harvest modeling using a likelihood expression that accounts for spatial correlations. Other algorithms, such as the one proposed by Arii et al. (2008), rely on nonparametric methods such as nearest-neighbor methods. In forestry, the only other study comparable to ours was conducted by Eskelson

Figure 1. SCC estimates as a function of the distance between two individuals for the uniform-correlation (——) and spatialcorrelation (– – –) models. F, empirical SCC estimates calculated from the data for 1-m distance classes.

258

Forest Science 59(3) 2013

et al. (2011), in a context of understory vegetation cover modeling. The likelihood optimization of copula models is usually harder to achieve and more demanding in terms of computation than it is for generalized linear models and generalized linear mixed models. Eskelson et al. (2011) used a Gaussian copula to model spatial dependence. Gaussian copulas tend to be more complex than FGM copulas, and their distributions cannot be written in a simple closed form (Alexander 2008, p. 266). Although the FGM copula cannot accommodate the full range of correlations, its simplicity remains a major advantage. Like Bhat and Sener (2009), we implemented the maximum likelihood estimator and an optimization algorithm that enables a full maximum likelihood approach, i.e., all the parameters are estimated simultaneously (see Trivedi and Zimmer 2005, chap. 4). Convergence was harder to achieve than for a regular generalized linear model, but it could be reached in most cases, in part because of the grid search that made it possible to define plausible starting values. In most cases of binary outcome modeling, the spatial correlations are estimated through random effects. To do so, generalized linear mixed models that are based on maximum likelihood estimators are available in software form, and convergence is usually easily achieved. However, a cluster random effect on the intercept assumes a uniform (or near-uniform) correlation within a particular cluster. This approach is actually similar to the distance-independent copula that we implemented in the uniform-correlation model and might not be appropriate for decreasing patterns such as the one observed in this case study. In terms of fit, the generalized linear mixed model we fitted in this case study was similar to the uniform-correlation model but still inferior to the spatial-correlation model (Table 2). Nonuniform correlations can be achieved by specifying the cluster random effect on a particular covariate rather than the model intercept. However, this leads to more complex interpretations as the magnitude of the within-cluster correlations changes with the value of covariate. Other methods that consider spatial correlations in binary outcome models exist, such as generalized estimating equations or penalized quasi-likelihood estimators with correlation structures (McCulloch et al. 2008, p. 24). In addition to criticisms regarding the reliability of the statistical inferences (Breslow and Lin 1995, Lin and Breslow 1996), these methods also suffer from the inability to estimate the joint probability of observing the data (Lindsey and Lambert 1998). A major consequence of this limitation is that the joint distribution is undefined and random samples cannot be drawn. Thus, any realistic Monte Carlo simulation is impossible with these estimators. The copula approach provides a joint probability for any combination of event/nonevent responses. Moreover, because it is based on a maximum likelihood estimator, a copula approach also enables comparison with other models through information criteria. For a simple demonstration of the applicability of the copula approach, let us consider two sugar maple trees that are 1 m apart and whose dbh is 35.8 and 10.4 cm, respectively. According to the spatial-correlation model, the marginal probabilities of being harvested are estimated as 0.306

and 0.212 for these two trees. With use of the properties of the likelihood 5 as shown in the Predictions section, the joint probabilities for these two trees are Pr(y ⫽ 共1,1兲T ) ⫽ 0.306 䡠 0.212 䡠 共1 ⫹ e⫺0.170䡠1 共1 ⫺ 0.306兲 䡠 共1 ⫺ 0.212兲兲 ⫽ 0.095 Pr(y ⫽ 共1,0兲T ) ⫽ 0.306 䡠 0.788 䡠 共1 ⫺ e⫺0.170䡠1 共1 ⫺ 0.306兲 䡠 共1 ⫺ 0.788兲兲 ⫽ 0.211 Pr(y ⫽ 共0,1兲T ) ⫽ 0.694 䡠 0.212 䡠 共1 ⫺ e⫺0.170䡠1 共1 ⫺ 0.694兲 䡠 共1 ⫺ 0.212兲兲 ⫽ 0.117 Pr(y ⫽ 共0,0兲T ) ⫽ 0.694 䡠 0.788 䡠 共1 ⫹ e⫺0.170䡠1 共1 ⫺ 0.694兲 䡠 共1 ⫺ 0.788兲兲 ⫽ 0.577 The copula actually increases the probabilities of observing the same outcome, while decreasing the probabilities of observing different outcomes for these two trees. Now, let us consider that the largest tree was harvested. Then, the conditional probability for the second tree to be harvested can be estimated as Pr(y2 ⫽ 1兩y1 ⫽ 1) ⫽ 0.095/0.306 ⫽ 0.310. This conditional probability is larger than the marginal probability estimated at 0.212. Consequently, the small tree has a larger probability of being harvested because its neighbor was harvested. In terms of limitations, the modified FGM copula that we used in this case study remains within the restricted range of correlations that it can accommodate. In this case study, the largest empirical Spearman correlation coefficient was estimated at 0.370 for trees that are less than 1 m apart, whereas the copula is limited to maximum correlations of 0.304. Although the difference is not large, there is no doubt that the fit would be improved if the model could accommodate correlations larger than 0.304. Other copula specifications exist, but their applications in cases of spatial correlation remain more complex because of their mathematical forms and a full maximum likelihood approach may not be feasible. In many cases, the maximum likelihood estimation of other families of copulas has to rely on a two-step algorithm in which the parameters of the marginal models are estimated first to enable the estimation of the copula parameters in a second step (see Trivedi and Zimmer 2005, chap. 4). A comparison between the different families of copulas in a context of spatial dependence between Bernoulli-distributed variables is a topic for future research.

Conclusions Moderate distance-dependent correlations can be accommodated by the modified FGM copula proposed by Bhat and Sener (2009). In the context of tree harvest modeling, the application of this copula model resulted in an improved fit, as indicated by the maximum log-likelihood and infor-

mation criteria. Spearman’s correlation coefficients that were directly estimated from the data compared fairly well with predicted correlations, although the model tended to slightly underestimate the correlation for individuals close to one another and vice versa. This slight underestimation for close individuals was due to the limitation that is inherent to the FGM copula: it cannot accommodate correlations that exceed the range [⫺0.304, 0.304]. Another major benefit of this approach is the likelihood function, which defines a joint probability distribution. This joint probability distribution can be useful for generating correlated data in the case of Monte Carlo simulations, for instance.

Literature Cited ALEXANDER, C. 2008. Practical financial econometrics. Market risk analysis II. John Wiley & Sons, New York. 424p. ARII, K., J.P. CASPERSEN, T.A. JONES, AND S.C. THOMAS. 2008. A selection harvesting algorithm for use in spatially explicit individual-based forest simulation models. Ecol. Model. 211: 251–266. BATES, D., M. MAECHLER, AND B. BOLKER. 2011. Package “lme4.” R Project Organization. Available online at cran.r-project.org/web/packages/lme4/lme4.pdf; last accessed Mar. 15, 2012. BHAT, C.R., AND I.N. SENER. 2009. A copula-based closed-form binary logit choice model for accommodating spatial correlation across observational units. J. Geogr. Syst. 11:243–272. BRESLOW, N.E., AND X. LIN. 1995. Bias correction in generalised linear mixed models with single component of dispersion. Biometrika 82:81–91. CHOROS´ , B., R. IBRAGIMOV, AND E. PERMIAKOVA. 2010. Copula estimation. P. 77–92 in Copula theory and its application: Proc. of the workshop held in Warsaw, Poland, 25–26 September 2009, Jaworski, P., F. Durante, W. Ha¨rdle, and T. Rychlik (eds.). Lecture Notes in Statistics, Proceedings 198, Springer, New York. ESKELSON, B.N.I., L. MADSEN, J.C. HAGAR, AND H. TEMESGEN. 2011. Estimating riparian understory vegetation cover with beta regression and copula models. For. Sci. 57:212–221. FORTIN, M., S. BE´ DARD, J. DEBLOIS, AND S. MEUNIER. 2008a. Predicting individual tree mortality in northern hardwood stands under uneven-aged management in southern Que´bec, Canada. Ann. For. Sci. 65:205. FORTIN, M., S. BE´ DARD, J. DEBLOIS, AND S. MEUNIER. 2008b. Accounting for error correlations in diameter increment modeling: A case study applied to northern hardwood stands in Quebec, Canada. Can. J. For. Res. 38:2274 –2286. FORTIN, M., G. DAIGLE, C.-H. UNG, J. BE´ GIN, AND L. ARCHAMBAULT. 2007. A variance-covariance structure to take into account repeated measurements and heteroscedasticity in growth modeling. Eur. J. For. Res. 126:573–585. FOX, J.C., H. BI, AND P.K. ADES. 2007a. Spatial dependence and individual-tree growth models. I. Characterising spatial dependence. For. Ecol. Manage. 245:10 –19. FOX, J.C., H. BI, AND P.K. ADES. 2007b. Spatial dependence and individual-tree growth models. II. Modeling spatial dependence. For. Ecol. Manage. 245:20 –30. GREGOIRE, T.G., O. SCHABENBERGER, AND J.P. BARRETT. 1995. Linear modeling of irregularly spaced, unbalanced, longitudinal data from permanent-plot measurements. Can. J. For. Res. 25:137–156. GUMBEL, E.J. 1961. Bivariate logistic distributions. J. Am. Stat. Assoc. 56:335–349. HALL, D.B., AND R.L. BAILEY. 2001. Modeling and prediction of Forest Science 59(3) 2013

259

forest growth variables based on multilevel nonlinear mixed models. For. Sci. 47:311–321. KERSHAW, J.A. JR., E.W. RICHARDS, J.B. MCCARTER, AND S. OBORN. 2010. Spatially correlated forest stand structures: A simulation approach using copulas. Comput. Electron. Agric. 74:120 –128. LIN, X., AND N.E. BRESLOW. 1996. Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91:1007–1016. LINDSEY, J.K., AND P. LAMBERT. 1998. On the appropriateness of marginal models for repeated measurements in clinical trials. Stat. Med. 17:447– 469. MCCULLOCH, C.E., S.R. SEARLE, AND J.M. NEUHAUS. 2008. Generalized, linear, and mixed models, 2nd ed. Wiley Series in Probability and Statistics, John Wiley & Sons, Hoboken, NJ. 384 p.

260

Forest Science 59(3) 2013

PINHEIRO, J.C., AND D.M. BATES. 1995. Approximations to the log-likelihood function in the nonlinear mixed-effects model. J. Comput. Graph. Stat. 4(1):12–35. PINHEIRO, J.C., AND D.M. BATES. 2000. Mixed-effects models in S and S-PLUS. Springer, New York. 528 p. TRIVEDI, P.K., AND D.M. ZIMMER. 2005. Copula modeling: An introduction for practitioners. Found. Trends Econom. 1:1–111. WANG, M., K. RENNOLLS, AND S. TANG. 2008. Bivariate distribution modeling of tree diameters and heights: Dependency modeling using copulas. For. Sci. 54:284 –293. WANG, M., A. UPADHYAY, AND L. ZHANG. 2010. Trivariate distribution modeling of tree diameter, height, and volume. For. Sci. 56:290 –300. WOLFINGER, R., R. TOBIAS, AND J. SALL. 1994. Computing Gaussian likelihoods and their derivatives for general linear mixed models. SIAM J. Sci. Comput. 15:1294 –1310.

Suggest Documents