Simultaneous prediction of plot-level and tree-level ...

0 downloads 0 Views 1MB Size Report
plot network and specially to Daniel Rittié (LERFoB) who gathered and formatted the data. 13 ...... McCulloch, C., Searle, S., and Neuhaus, J. M. (2008).
Title Page (include title; complete author contact information and affiliations; and acknowledgments)

This is the manuscript draft as submitted to Forest Science first. The accepted version Manso, R., Ningre, F. and Fortin, M. 2018. Simultaneous prediction of plot-level and tree-level harvest occurrences with correlated random effects. Forest Science. DOI: 10.1093/forsci/fxy015. will be available on-line at the journal site in due course.

1

2

Simultaneous prediction of plot-level and tree-level harvest occurrences with correlated random effects Rub´en Manso1,2,* , Franc¸ois Ningre1 , and Mathieu Fortin3

3

4

1 INRA,

UMR 1092 LERFoB, 1 rue de l’Arboretum, 54280, Champenoux, France

5

6

2 Forest

Research, Northern Research Station, Roslin, Midlothian, UK, EH25 9SY

7

8

9

3 AgroParisTech,

UMR 1092 LERFoB, 14 rue Girardet, 54042 Nancy, France

* Corresponding

author: e-mail: [email protected],

10

[email protected]

11

ACKNOWLEDGEMENTS

12

The authors want to thank all the people involved in the measurements of the permanent-

13

plot network and specially to Daniel Ritti´e (LERFoB) who gathered and formatted the data.

14

Special thanks are due to Adam Ash (Forest Research) who thoroughly proofread the original

15

version of the manuscript. The UMR 1092 LERFoB is supported by a grant overseen by the

16

French National Research Agency (ANR) as part of the “Investissements d’Avenir” program

17

(ANR-11-LABX-0002-01, Lab of Excellence ARBRE).

1

Management and Policy Implications

1

Management and Policy Implications

2

The probability that a plot is thinned and the probability that a tree within that plot

3

is harvested are likely to be correlated. Neglecting this correlation would lead to an

4

underestimation of the variance of prediction error in harvest models. As a result, forest

5

managers using growth simulators that include two-level harvest models would have a

6

false idea of precision in stand projections. This may result in wrong decisions in the

7

long run. The proposed method deals with this issue and provides theoretically unbiased

8

variance estimates.

1

Manuscript (Text only; do not embed figures or tables)

1

2

Simultaneous prediction of plot-level and tree-level harvest occurrences with correlated random effects

3

1

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

2

ABSTRACT

4

5

In forestry, harvest models have become popular for forecasting thinning under business-as-

6

usual scenarios. There are two binary processes involved in thinning operations: (i) whether a

7

plot is to be thinned and (ii) whether a particular tree within that plot is to be harvested. These

8

processes can be modeled using logistic regressions. The data used to fit such models come

9

from forest inventories, where the observations are not usually independent. Random effects

10

can be used to deal with these correlations. However, fitting the plot-level and tree-level models

11

independently hinders the estimation of the covariance between the random effects of both

12

models. The objective of this paper was to develop a statistical method for the simultaneous

13

prediction of harvest probabilities at the plot and tree levels in a single mixed-effects model.

14

We developed a maximum likelihood estimator based on the joint distribution of the probability

15

that a given plot is thinned and the probability that a given tree within that plot is harvested.

16

The estimator was derived from a zero-altered binomial form, but it assumed distinct harvest

17

probabilities for each single tree. The estimator was tested in the case study of mixed stands of

18

oak (Quercus spp.) and beech (Fagus sylvatica L.) in Northern France.

19

Keywords: harvest models, zero-altered models, mixed-effects models, simultaneous esti-

20

mation, joint distribution

21

1

INTRODUCTION

22

Forest dynamics simulators require at least a growth module and a mortality module, which

23

are usually implementations of statistical models. When these simulators are intended to make

24

predictions of forest evolution in managed stands, a harvest module is also needed (Fortin 2014).

25

In contrast to growth and mortality, harvest modules have traditionally relied on algorithms that

26

deterministically decide when, where and how to cut according to some management rules (e.g.

27

Pukkala and Miina 1998). The use of these algorithms is conceptually straightforward if we

28

assumed that the realized harvesting strictly follows what was planned. However, for many

29

reasons, the intended management often differs from its practical application. When working

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

3

30

with business-as-usual scenarios statistical harvest models aimed at emulating real management

31

decisions are a competing alternative to traditional algorithms (Antón-Fernández and Astrup

32

2012).

33

Technically speaking, harvest models have to deal with a binary outcome at both the plot

34

and the tree levels (i.e. either a plot is thinned or not; either a tree is cut down or not). Lo-

35

gistic regression is the usual statistical technique to model such response variables (cf. Hosmer

36

and Lemeshow 2000). Therefore, the output of harvest models is the probability that a plot be

37

thinned or that a tree be harvested, depending on the level at which the model applies. Among

38

the few papers on this subject, several studies have strictly focused on forecasting the proba-

39

bilities at the plot level (e.g. Antón-Fernández and Astrup 2012; Melo et al. 2017), sometimes

40

discriminating between different harvest modalities (e.g. Sterba et al. 2000). Fortin (2014) fitted

41

tree-level models using data from plots located in areas with ongoing thinning operations.

42

A more informative method would be the combination of both levels, allowing for thinned

43

and unthinned plots in the training dataset. This implies two independent logistic models, with

44

potentially different covariates in each one of them. These combined models would predict the

45

harvest probabilities for a particular plot and for a particular tree within this plot given that it

46

will be harvested. This two-level alternative was developed in the context of forest dynamics

47

simulators by Thurnher et al. (2011). A similar approach was used by Eastaugh and Hasenauer

48

(2012), who substituted predictions of the proportion of harvested stand volume in thinned plots

49

for tree-level harvest probabilities.

50

Given the hierarchical structure of most forest inventories, correlations between observa-

51

tions from the same plot or measurement year are likely to occur. The consequence of not

52

dealing with these correlations is the underestimation of the variance of the fixed-effects param-

53

eter estimates (Gregoire et al. 1995). The correlation issue has been addressed through copulas

54

in tree-level harvest models (Fortin et al. 2013; Delisle-Boulianne et al. 2014). Mixed-effects

55

modeling is an alternative way to cope with correlations and is the most popular in forest sci-

56

ence. Standard statistical software such as SAS or R allows for the inclusion of random effects

57

in logistic models and could be easily used to improve the existing two-level harvest models

58

as well. However, a major challenge remains: the covariance between the random effects in-

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

4

59

cluded in both models cannot be ascertained. This could only be achieved by simultaneously

60

estimating the parameters from both models.

61

One way to do this is to define the joint distribution of the probability of a plot-level harvest

62

and the probabilities of individual-tree harvests conditional on harvest occurrence at plot level.

63

Depicting this joint probability was the main objective of the present study. The approach was

64

inspired by the mathematical description of the so-called zero-inflated distributions, i.e distri-

65

butions holding more zeros than those expected at random (cf. Lambert 1992). Zero-inflated

66

distributions are basically defined through the coupling of a Bernoulli distribution, which yields

67

the probability of observing a zero event, with another distribution that yields the probability

68

of observing non-zero events. This can also be interpreted as an event involving two processes

69

where one is conditional to the other. In the current context, harvesting a particular plot can be

70

seen as a first Bernoulli process, while harvesting a tree within this plot can be interpreted as

71

another Bernoulli process conditional on the first one.

72

In order to illustrate the approach, we chose the case study of even-aged mixed stands of

73

oak (Quercus spp.) and beech (Fagus sylvatica L.) in Northern France. Specifically, we used

74

a network of permanent plots that was set up to investigate different thinning schemes for both

75

species. A two-level harvest model with mixed effects was fitted to these data and random

76

effects at the plot and tree levels were allowed to be correlated.

2

77

78

2.1

MATERIAL AND METHODS

DATA

79

The data used in the present study were selected from a series of historical and ongoing sil-

80

vicultural experiments monitored by the Laboratoire d’Etude de Ressources Forêt-Bois (LER-

81

FoB) and that were set up between 1883 and 1956. This compilation constitutes the LERFoB

82

permanent-plot network, which is primarily aimed at analysing the effect of management on

83

growth and mortality patterns in even-aged sessile oak (Quercus petraea (Matt.) Liebl.) and

84

European beech stands across Northern France. In this respect, low thinnings were applied with

85

the objective of keeping the stand density below a given value. All control plots (i.e those never

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

5

86

thinned) were discarded for the purposes of this study. Given the heterogeneous data origin,

87

diameter distribution, plot size and measurement intervals notably changed from one site to an-

88

other. This information is provided in Table 1 together with the proportion of occasions where

89

thinnings took place.

90

(Insert Table 1 here)

91

The data we used comprised over 175 000 records of diameter at breast height (DBH, 1.3 m)

92

from some 33 000 trees within 75 plots in Northern France (Fig. 1) measured between 1922

93

and 2012. For each DBH measurement tree status was also recorded (alive, dead or recently

94

harvested; see Table 2 for a summary of harvested trees’ proportions). While beech and/or oak

95

predominate, other species were also present of which European hornbeam (Carpinus betulus

96

L.) was the most common among them.

97

(Insert Fig. 1)

98

(Insert Table 2 here)

99

2.2

S TATISTICAL DEVELOPMENTS

100

The probability that a harvest takes place in a plot and that some trees in this plot are then

101

harvested can be thought of as a joint probability distribution derived from many univariate

102

Bernoulli distributions. Let us assign indices i, j and k to the plot, the tree and the measure-

103

ment year, respectively. Plot-level harvest occurrence can be seen as a Bernoulli process with

104

outcome qik = 1 and Pr(qik = 1) = λik if plot i is harvested at year k. Otherwise, qik = 0 and

105

Pr(qik = 0) = 1 − λik . The probability mass function f (qik ; λik ) of plot-level harvest occurrence

106

is then

q

f (qik ; λik ) = λikik (1 − λik )1−qik 107

(1)

Similarly, the harvest of individual trees is also a Bernoulli process with outcome ui jk = 1

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

6

108

when tree j is felled down in plot i at year k, or ui jk = 0 otherwise. As the harvest of a tree

109

is conditional on the harvest occurrence at plot level, the probability that ui jk = 1 needs to be

110

accordingly defined as Pr(ui jk = 1 | qik = 1) = πi jk . Likewise, the probability of a tree being

111

left standing after a thinning operation is Pr(ui jk = 0 | qik = 1) = 1 − πi jk . The joint probability

112

mass function of observing qik and u i·k = (ui1k , ui2k , . . . , uinik k )T conditional on qik = 1 is then u

Pr(qik , u i·k ) = (1 − λik )1−qik λik

nik πi jki jk (1 − πi jk )1−ui jk ∏ j=1

!qik

ik 1 − ∏nj=1 (1 − πi jk )

(2)

113

ik The denominator 1 − ∏nj=1 (1 − πi jk ) in Eq. 2 is the probability that at least one tree is

114

harvested. This truncation in zero is required since a plot-level harvest occurrence cannot result

115

in no harvested tree by definition.

116

If parameters λik and πi jk are modeled as functions of some covariates, then it is possible to

117

link the harvest probabilities to management, site conditions or other factors thought to influence

118

the occurrence of harvest at plot and tree levels. In order to link λik and πi jk to a set of covariates

119

(xxik , z i jk ), we used the logit link function (McCullagh and Nelder 1989), so that

ex ik β 1 + ex ik β ez i jk γ πi jk = 1 + ez i jk γ λik =

120

121

(3) (4)

where β and γ are vectors of parameters. In the end, a model likelihood can be expressed on the basis of Eq. 2:

β , γ | q , u , X, Z) = ∏ ∏ Pr(qik , u i·k | β , γ , x ik , Zi·k )) L(β i

(5)

k

122

where matrix Zi·k has its rows equal to the z i jk .

123

Due to the hierarchical structure of data the observations within the same plot and year

124

may not be independent. Random effects can be specified in the model in order to relax the

125

assumption of independence. In order to simplify the derivation, we will focus on plot random

126

effects only, but the developments can be extended to year random effects as well.

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

127

Random effects were included in Eqs. 3 and 4 so that

λik (bi,1 ) =

ex ik β +bi,1

1 + ex ik β +bi,1 ez i jk γ +bi,2 πi jk (bi,2 ) = 1 + ez i jk γ +bi,2 128

129

7

(6) (7)

where b i = (bi,1 , bi,2 )T is a vector of plot random effects, which is assumed to follow a bivariate normal distribution, such that b i ∼ N2 (00, G).

130

Adapting the Eq. 5 accordingly leads to a conditional likelihood that depends on the unob-

131

served random effects b i . Parameter estimation relies on this conditional likelihood marginal-

132

ized over the distribution of the random effects (Pinheiro and Bates 2000, p.62)

β , γ , G, | q , u , X, Z) = ∏ L(β i

Z

∏ Pr(qik , u i·k | β , γ , x ik , Zi·k , b i) pdf(bbi, G)dbbi

(8)

k

133

where pdf(bbi , G) is the density of the bivariate normal distribution with mean 0 and variance-

134

covariance G. This complex likelihood can be maximized using the PROC NLMIXED proce-

135

dure available in SAS (SAS Institute Inc. 2008). An example of the code we used in this study

136

is shown in Appendix A.

137

138

Matrix G contains the variances of bi,1 and bi,2 on its diagonal whereas the off-diagonal elements consist of the covariance between bi,1 and bi,2 :   G=

σ12 σ1,2

 σ1,2   σ22

(9)

139

It can be reasonably assumed that the covariance between random effects bi,1 and bi,2 is

140

non null, which means that the harvest probabilities at plot and tree levels would be somehow

141

correlated. This covariance parameter σ1,2 can be estimated together with the rest of parameters

142

through the maximization of likelihood in Eq. 8. In contrast, if we assume that σ1,2 = 0, then the

143

model would be equivalent to fitting a plot-level and a tree-level model independently from each

144

other. We will refer to this last model as the constrained model as opposed to the unconstrained

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

145

model in which the covariance is allowed to be non null.

146

2.3

8

M ODEL SPECIFICATIONS AND EVALUATION

147

Different variables were tested sequentially in vectors x ik and z i jk . Because our dataset in-

148

cluded mixed stands, a species effect was specified in interaction with the other covariates. This

149

species effect had four classes: “oak”, “beech”, “hornbeam” while all other marginal species

150

were grouped into a class called “others”.

151

Preliminary fits were carried out on a model without random effects and a visual check of

152

the Pearson residuals (Hosmer and Lemeshow 2000, p.155) at the plot and the tree level served

153

to assess a suitable combination of variables in x ik and z i jk . These preliminary tests showed

154

that πi jk exhibited a complex pattern with respect to the relative DBHi jk , which was calculated

155

as rDBHi jk = DBHi jk /MQDik , where MQDik is the mean quadratic diameter of plot i at year

156

k. A segmented regression was then used to accommodate this pattern. Two segments were

157

required, with rDBHi jk = 0.9 being set as the heuristic joint between them. At the plot level the

158

time elapsed since the last harvest (LASTCUTik ) seemed to be the covariate that better predict

159

the probability of harvest. For the early measurements the time since the last harvest was either

160

unobserved because the plots had never been harvested or the last harvest was carried out before

161

the beginning of the monitoring. In these cases, plot basal area (BAik ) was used instead.

162

The preliminary model structure was

x ik β = β0 + β1 BAik rik + β2 LASTCUTik (1 − rik )

(10)

z i jk γ = γ0 + (γ1,s + γ2,s mi jk )di jk + (γ3 + γ4 (1 − mi jk ))di2jk

(11)

163

where ri jk is a dummy variable that takes the value 1 if the time elapsed since the last harvest

164

is unknown or 0 otherwise; di jk = rDBHi jk − 0.9 is a transformation that allows both segments

165

to converge to the same value at the joint; mi jk is another dummy variable that equals 1 if

166

di jk ≥ 0, and 0 otherwise; s stands for the species index.

167

The aforementioned Pearson residuals were computed over regular intervals of a particular

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

9

168

covariate to identify any lack of fit. For each interval g of this covariate, we defined a subset

169

of observed events as well as a their corresponding predicted probabilities. If y¯g and y¯ˆg are

170

the means of the observations and the predictions, respectively, then the Pearson residual for

171

interval g (resg ) can be calculated as y¯g − y¯ˆg resg = q y¯ˆg (1 − y¯ˆg )/ng

172

173

(12)

where ng is the number of observations in interval g. As a result of the joint probability used in Eq. 2, the observations were qik · ui jk whereas the predictions were calculated as λˆ ik · πˆi jk .

174

Once the covariates were specified the model was re-fitted with plot and year random ef-

175

fects. Two fits were carried out: a first one where random effects were allowed to correlate

176

(unconstrained model) and a second one with independent random effects (constrained model).

177

A major issue with the predictions of this kind of mixed-effects models is that the predictions

178

with the random effects set to 0 are not population-averaged predictions (McCulloch et al. 2008,

179

p.190). One method for obtaining population-averaged predictions is to integrate the predictions

180

which are conditional on the random effects over the distribution of the random effects, just like

181

what is done for the likelihood function 8. This integral has no closed-form solution and can be

182

easily approximated using Gauss-Hermite quadrature (Fortin 2013).

183

Pearson residuals were computed based on approximated population-averaged predictions

184

and their pattern with respect to the different covariates were checked again. The goodness-of-

185

fit of the resulting models was assessed using Akaike’s Information Criterion (AIC).

186

The unconstrained model was also evaluated based on the Hosmer-Lemeshow test (Hosmer

187

and Lemeshow 2000). The test relies on a statistic that is computed similarly to the aforemen-

188

tioned Pearson residuals. Predictions of the joint probability that a plot be harvested and that

189

a particular tree in this plot be harvested were ranked and grouped according to predictions

190

deciles. For each group the squared differences between the mean value of predictions and ob-

191

servations were standardized and summed. The Hosmer-Lemeshow statistic is asymptotically

192

distributed as a χ 2 with 8 degrees of freedom under the null hypothesis that observed values

193

do not significantly differ from the predictions (Hosmer and Lemeshow 2000). All predictions

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

10

194

were pseudo-independent and population-averaged. This was achieved through a 10-fold cross-

195

validation and the marginalisation of the resulting predictions.

3

196

RESULTS

197

When simultaneously tested in both parts of the unconstrained model only the year random

198

effects improved the model fit with an AIC of 98 345 compared with 103 041 for a model with-

199

out random effects. The resulting parameter estimates are shown in Table 3. The unconstrained

200

model proved only slightly better than the constrained one (AIC 98 353).

201

(Insert Table 3 here)

202

Although the Pearson residuals did not show any evidence of lack of fit when the uncon-

203

strained model was being fitted, the Hosmer-Lemeshow statistic revealed a significant lack of

204

agreement between observations and predictions (χ82 = 1963.99; p-value < 0.0001). Graphi-

205

cal comparison of ranked observations and predictions suggested that this divergence could be

206

related to an underestimation of the events within the 10th decile (Fig. 2). Concordance was

207

evident for all other groups.

208

(Insert Fig. 2 here)

210

Based on the estimated elements of matrix G, the correlation between the random effects q c2 σ c2 . In our case study, a positive correlation of could be roughly estimated as ρˆ = σˆ 1,2 / σ 1 2

211

0.66 was found. This means that if the probability that a plot is harvested is higher than the

212

average of the population for a given year, the individual-tree probabilities would also be larger

213

than those of the population.

209

214

In Figs. 3 and 4 we graphically illustrate how the different covariates affect the probabilities

215

that a plot or a tree is harvested. The ranges of the covariates were set according to the values

216

found in the dataset. At the tree level, we chose two different types of plots, a plot in a young

217

stand and a plot in a mature stand. The mean quadratic diameter in these two plots was assumed

218

to be 20 cm and 50 cm for the young and the mature stands, respectively.

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

219

11

(Insert Fig. 3 here)

220

The plot-level harvest probability increased from 0.45 to 0.70 as the time since the last har-

221

vest increased from 1 to 10 years (Fig. 3a). When the date of the last harvest was unknown

222

the predicted probability smoothly increased along with plot basal area and the predicted prob-

223

abilities ranged from approximately 0.6 to 0.7 for basal areas of 18 to 40 m2 ha−1 , respectively

224

(Fig. 3b).

225

The effect of rDBH on the tree-level harvest probabilities followed a similar pattern in what

226

we defined as a mature and a young stand (Fig. 4). Oak trees with small rDBH, i.e. when

227

DBH is smaller than the mean quadratic diameter, were more intensely harvested than beech

228

trees. For a rDBH of 0.50, the predicted probabilities of oak were approximately twice those of

229

beech in both stands. Larger trees of both species were seldom harvested with probabilities of

230

0.10 when the rDBH was close to 1.5. It was only when DBH was twice as large as plot mean

231

quadratic diameter, that the predicted probabilities increased for beech trees, with values close

232

to 0.25. In contrast, oak individuals did not seem to be harvested in practice in these situations.

233

Such large trees with rDBH close to 2 only occurred in the young stand.

234

(Insert Fig. 4 here)

4

235

DISCUSSION

236

The present study sets the grounds for simultaneous parameter estimation in two-level har-

237

vest models. The statistical developments needed for this implementation led to a relevant

238

progress in the current state-of-the-art of zero-inflated models. However, the main achievement

239

of our approach stems from the fact that the full variance-covariance matrix of the random ef-

240

fects can be estimated when random effects are present in both the plot-level and the tree-level

241

parts of the model.

242

Using zero-inflated mixed models with correlated random effects is not new in forestry.

243

Calama et al. (2011) fitted a zero-inflated log-normal model to forecast the weight of cones

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

12

244

produced by a pine species whereas Manso et al. (2014) worked with a zero-inflated bino-

245

mial model to predict seed predation and in both cases the random effect correlation was taken

246

into account. In contrast, two-level harvest models had never been analysed under a zero-

247

inflated/altered framework. As a consequence a simultaneous parameter estimation was not

248

possible and in turn the correlation between the random effects could not be considered. The

249

fact that the likelihood formulation of two-level harvest models with mixed effects is more

250

complicated than those of the previous examples may have prevented modelers from testing

251

this approach.

252

In forestry zero-inflated and zero-altered models have been aimed at modeling various re-

253

sponses such as recruitment (Fortin and DeBlois 2007; Ledo et al. 2015), seed predation (Manso

254

et al. 2014), cone production (Calama et al. 2011), wildfire damage (Guo et al. 2016) or LiDAR-

255

based tree detection (Korhonen et al. 2016). One application that falls conceptually closer to

256

harvesting is that of tree mortality modeling. In this respect some authors have proposed the

257

Poisson distribution as the underlying process driving mortality in a given plot providing that

258

mortality takes place at all so that the number of dead trees in such a context can be represented

259

as a count variable (Affleck 2006).

260

A weakness of the Poisson distribution is that it is has no upper bound whereas there is

261

only a finite number of individuals that can die in a given plot. In order to deal with this issue

262

an offset variable, that is also called the exposure, can be specified (McCullagh and Nelder

263

1989, p.206). As a result, model predictions can be interpreted as a proportion. A more natural

264

approach would be to treat the conditional process as a binary response as well, which would

265

lead to a zero-inflated/altered binomial (e.g. Hall 2000)). The binomial distribution therefore

266

provides the expectation that a tree dies, providing that all trees share the same probability.

267

The same rationale could apply to achieve simultaneous parameter estimation in two-level

268

harvest models as follows: given that a plot is going to be harvested, the number of harvested

269

trees could be assumed Poisson- or binomial-distributed. A major limitation of either approach

270

is that the same probability is assumed for every single tree, as mentioned before. This assump-

271

tion is not valid as shown in previous works on the harvesting of individual trees (Thurnher

272

et al. 2011; Fortin et al. 2013; Delisle-Boulianne et al. 2014; Fortin 2014). In partial harvesting

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

13

273

each tree has a distinctive probability of being harvested that depends on a set of covariates.

274

The Poisson approach does not allow for an individual probability assignment whilst the bino-

275

mial distribution can be adapted. Given that a binomial process is simply the repetition of a

276

Bernoulli trial, the binomial distribution that represents the harvesting of the individual trees

277

can be derived from the individual probabilities. This concept has been defined as a binomial

278

distribution with unequal probabilities (see Friedman 1984).

279

The distinction between zero-inflated and zero-altered models lies in the truncation of zero

280

in the second distribution that composes the joint distribution (Zeileis et al. 2008). This trunca-

281

tion implies that the zeros cannot be observed if a harvest is carried out. This was true in our

282

case study because we modeled the plot-level occurrence and the plots were large in area. If the

283

plots are small and the focus is on the stand-level occurrence, it may happen that no trees are

284

harvested in a given plot simply because it was too small. In such a case a pure zero-inflated

285

formulation would be more appropriate.

286

The sign of the correlation between the random effects of the plot-level and the tree-level

287

parts in a zero-inflated model like that applied in our case study can be interpreted. A positive

288

correlation implies that the random effects tend to be both positive or negative. If the random

289

effects are positive then the plot has a higher probability of being harvested and if it is, more

290

trees will be harvested within that plot. Negative random effects indicate that a plot has a lower

291

probability of being harvested and even if it is, fewer trees will be harvested. In practice, the

292

main implication of a positive correlation is that the deviation of predictions with respect to

293

the mean would be larger than what is predicted when the random effects are uncorrelated.

294

Under the assumption of independent random effects, probabilities higher than the mean at the

295

plot level may be compensated by lower probabilities at the tree level and vice versa, which

296

would eventually lead to underestimating the true variance of the prediction errors. It is also

297

well known that neglecting random effects leads to an underestimation of the variance of the

298

estimated of the fixed effects Gregoire et al. (1995), which may result in an additional impact

299

on the uncertainty of predictions.

300

The method presented in this study was developed to account for the effects of correla-

301

tions between random effects in uncertainty assessment, but it was not possible to quantify the

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

14

302

implications of ignoring them. The unconstrained model did not markedly outperformed the

303

constrained one in terms of AIC, making further comparisons via simulation exercises redun-

304

dant. Nevertheless, our approach makes it possible to test this possibility and to compare the

305

model with simpler alternatives, which may prove useful in other cases.

306

Concerning covariates, the variable with the most dominant effect on the probability of

307

harvest at the plot level was the time elapsed since the last harvest; however, this variable may

308

be unknown in cases where harvesting has not yet occurred or took place prior to monitoring.

309

Such gaps in data can be resolved by using dummy variables and we managed to use the plot

310

basal area in place of the time since the last harvest when the latter was unknown. In preliminary

311

trials we assumed that basal area would be the major driver of harvest occurrence and that the

312

time elapsed since the last harvest would not be needed, however the preliminary model resulted

313

in a poor fit. The significant effect of the time-since-last-harvest on predictions of plot-harvest-

314

occurence can be explained by the management of these stands. Once the stem exclusion stage

315

as defined by Oliver and Larson (1996) is over, these stands are usually thinned on a more or

316

less regular basis and the density is not a major criterion in the management (see ONF 2007;

317

Sardin 2008).

318

At the tree level the relative diameter and species group were the two variables that were

319

found to have a significant effect on the harvest occurrence. Tree diameter and species were also

320

the most important covariates used in other studies (Fortin 2014; Thurnher et al. 2011). In this

321

respect, we found lower probabilities for larger stems in contrast to the findings of other studies

322

(e.g. Fortin 2014). Management differences may explain this divergence. In Fortin (2014) the

323

stands were mainly uneven-aged and managed using selection cutting while in our case the

324

stands are even-aged and low thinnings are applied. Therefore it is not surprising to find higher

325

probabilities of harvest in smaller stems (Fig. 4). The different species-specific patterns found

326

in the present study are related to species’ ecology. For instance, the fact that oak is harvested

327

more than beech is related to its tolerance to shade; beech is very shade-tolerant whereas oak is

328

intermediate. Suppressed oak trees have a low probability of survival which has been evidenced

329

in Manso et al. (2015). In this context, forest managers prefer to harvest them while they are

330

still alive and valuable.

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

5

331

15

CONCLUSIONS

332

The covariance of random effects between plot-level and tree-level harvest models cannot

333

be estimated if the models are fitted independently. In the present study an approach inspired by

334

zero-inflated modeling was developed which simultaneously estimates parameters at both plot

335

and tree levels and subsequently the covariance of random effects. The method was applied in

336

a case study and compared to a classical approach of independently estimating parameters. The

337

method presented did not clearly outperform the existing approach; however we believe that it

338

is still a relevant contribution to harvest modeling, given the potential implications of neglecting

339

random effect covariance on uncertainty assessment.

340

R EFERENCES

341

Affleck, D. L. (2006). Poisson mixture models for regression analysis of stand-level mortality.

342

Canadian Journal of Forest Research, 36(11):2994–3006.

343

Antón-Fernández, C. and Astrup, R. (2012). Empirical harvest models and their use in regional

344

business-as-usual scenarios of timber supply and carbon stock development. Scandinavian

345

Journal of Forest Research, 27(4):379–392.

346

Calama, R., Mutke, S., Tomé, J., Gordo, J., Montero, G., and Tomé, M. (2011). Modelling

347

spatial and temporal variability in a zero-inflated variable: The case of stone pine (Pinus

348

pinea L.) cone production. Ecological Modelling, 222(3):606–618.

349

Delisle-Boulianne, S., Fortin, M., Achim, A., and Pothier, D. (2014). Modelling stem selection

350

in northern hardwood stands: assessing the effects of tree vigour and spatial correlations

351

using a copula approach. Forestry, 87:607–617.

352

353

Eastaugh, C. S. and Hasenauer, H. (2012). A statistical thinning model for initialising largescale ecosystem models. Scandinavian Journal of Forest Research, 27(6):567–577.

354

Fortin, M. (2013). Population-averaged predictions with generalized linear mixed-effects mod-

355

els in forestry: an estimator based on Gauss-Hermite quadrature. Canadian Journal of Forest

356

Research, 43:129–138.

357

Fortin, M. (2014). Using a segmented logistic model to predict trees to be harvested in forest

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

358

16

growth forecasts. Forest Systems, 23(1):139.

359

Fortin, M. and DeBlois, J. (2007). Modeling tree recruitment with zero-inflated models: The

360

example of hardwood stands in southern Quebec, Canada. Forest Science, 53(4):529–539.

361

Fortin, M., Delisle-Boulianne, S., and Pothier, D. (2013). Considering spatial correlations

362

between binary response variables in forestry: an example applied to tree harvest modeling.

363

Forestry, 59(3):253–260.

364

365

Friedman, M. F. (1984). On the extended binomial distribution. Computers & Operations Research, 11(3):241–243.

366

Gregoire, T., Schabenberger, O., and Barrett, J. (1995). Linear modelling of irregularly spaced,

367

unbalanced, longitudinal data from permanent-plot measurements. Canadian Journal of For-

368

est Research, 25:137–156.

369

Guo, F., Wang, G., Innes, J., Ma, Z., Liu, A., and Lin, Y. (2016). Comparison of six generalized

370

linear models for occurrence of lightning-induced fires in northern Daxing’an Mountains,

371

China. Journal of Forestry Research, 27(2):379–388.

372

373

374

375

Hall, D. (2000). Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics, 56:1030–1039. Hosmer, D. J. and Lemeshow, S. (2000). Applied logistic regression. John Wiley & Sons, New York, 2nd edition.

376

Korhonen, L., Salas, C., Østgård, T., Lien, V., Gobakken, T., and Næsset, E. (2016). Predicting

377

the occurrence of large-diameter trees using airborne laser scanning. Canadian Journal of

378

Forest Research, 46(4):461–469.

379

380

Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34(1):1–14.

381

Ledo, A., Cayuela, L., Manso, R., and Condés, S. (2015). Recruitment of woody plants in a

382

cloud forest: a combination of spatial mechanisms. Journal of Vegetation Science, 26(5):876–

383

888.

384

Manso, R., Morneau, F., Ningre, F., and Fortin, M. (2015). Incorporating stochasticity from

385

extreme climatic events and multi-species competition relationships into single-tree mortality

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

386

387

388

389

390

391

392

17

models. Forest Ecology and Management, 354:243–253. Manso, R., Pardos, M., and Calama, R. (2014). Climatic factors control rodent seed predation in Pinus pinea L. stands in Central Spain. Annals of Forest Science, 71(8):873–883. McCullagh, P. and Nelder, J. A. (1989). Generalized linear models. Monographs of Statistics and Applied Probability 37. Chapman&Hall, New York, 2 edition. McCulloch, C., Searle, S., and Neuhaus, J. M. (2008). Generalized, linear, and mixed models. John Wiley & Sons, New York.

393

Melo, L. C., Schneider, R., Manso, R., Saucier, J.-P., and Fortin, M. (2017). Using survival

394

analysis to predict the harvesting of forest stands in Quebec, Canada. Canadian Journal of

395

Forest Research, 0:accepted.

396

397

398

399

Oliver, C. D. and Larson, B. C. (1996). Forest Stand Dynamics. Updated Edition. John Wiley and Sons, New York, USA. ONF (2007). Gestion des hêtraies dans les forêts publiques françaises. Office National des Forêts.

400

Pinheiro, J. and Bates, D. (2000). Mixed effects models in S and S-PLUS. Springer, New York.

401

Pukkala, T. and Miina, J. (1998). Tree-selection algorithms for optimizing thinning using a

402

distance-dependent growth model. Canadian Journal of Forest Research, 28:693–702.

403

Sardin, T. (2008). Chênaies continentales. Guide des sylvicultures. Office National des Forêts.

404

SAS Institute Inc. (2008). SAS/STAT 9.2 User’s Guide. SAS Institute Inc., Cary, NC.

405

Sterba, H., Golser, M., Moser, M., and Schadauer, K. (2000). A timber harvesting model for

406

407

408

409

410

Austria. Computers and Electronics in Agriculture, 28(2):133–149. Thurnher, C., Klopf, M., and Hasenauer, H. (2011). Forests in transition: a harvesting model for uneven-aged mixed species forests in Austria. Forestry, 84(5):517–526. Zeileis, A., Kleiber, C., and Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8).

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

411

A

18

IMPLEMENTATION OF THE MODEL IN SAS

412

The next code reproduces the necessary elements to fit the two-level harvest model intro-

413

duced in the present paper under the unconstrained assumption for the variance-covariance ma-

414

trix of the random effects.

415

/*******************************************************************/

416

proc nlmixed data=... ;

417

parms

418

g0 = , ... , g3 = /* Initial values of parameters, plot-level */

419

b0 = , ... , b10 = /* Initial values of parameters, tree-level */

420

su2 = , sv2 = , suv = ; /* Initial values variance random effects */

421

/* arrays for the different covariates. 1433 is the maximum number of

422

observations in a plot in the present example */

423

array ssd [1433] ssd1-ssd1433; /* scaled mean square diameter */

424

array het [1433] bee1-bee1433; /* dummy species = beech */

425

array chx [1433] oak1-oak1433; /* dummy species = oak */

426

array cha [1433] hor1-hor1433; /* dummy species = hornbean */

427

array aut [1433] oth1-oth1433; /* dummy species = others */

428

array threshold [1433] thr1-thr1433; /* dummy dbh > threshold */

429

array cut[1433] cut1-cut1433; /* dummy cut tree */

430

allcutProb = 1;

431

likTreeLevel = 0;

432

do i = 1 to N; /* N is the number of trees in each plot */

433

marginalLinearTermsTree =

b0 +

434

(b1*bee[i] + b2*oak[i] + b3*hor[i] + b4*oth[i] +

435

(b5*bee[i] + b6*oak[i] + b7*hor[i] + b8*oth[i])*thr[i])*ssd[i] +

436

(b9 + 10*thr[i])*(ssd[i]**2);

437

linearTermsTree =

marginalLinearTermsTree + u;

438

pi = exp(linearTermsTree)/(1 + exp(linearTermsTree));

439

allcutProb = allcutProb*(1-pi);

440

likTreeLevel = likTreeLevel + cut[i]*log(pi) + (1-cut[i])*log(1-pi);

P REDICTION OF PLOT- LEVEL AND TREE - LEVEL HARVEST OCCURRENCES

441

end;

442

marginalLinearTermsPlot = notCutYet*(g0 + g1*ST) +

19

443

alreadyCut*(g2 + g3*timeSinceLastCut);

444

linearTermsPlot = marginalLinearTermsPlot + v;

445

p = exp(linearTermsPlot)/(1 + exp(linearTermsPlot));

446

logLik = (1-cutPlot)*log(1 - p) + cutPlot*log(p) + cutPlot*likTreeLevel cutPlot*log(1 - allcutProb); /* cutPlot: dummy thinned plot */

447

448

model cutPlot $\sim$ general(logLik);

449

random u v $\sim$ normal([0, 0], [su2, suv, sv2]) subject= year;

450

run;

451

/*******************************************************************/

Table

1

Tables

1

2

Experimental site All´ ee de Blois Bois Brochet Beaulieu Butte de Tir Camp Souverain Camp Cusson Charlemagne Chatellier Chavigny Chr´ etiennette Cl´ es des Foss´ es Cˆ otes aux Hˆ etreaux Croix de Saverne Ducellier Epic´ eas Faˆıte Grande Bouzule Grand Pierrier Hallet Hermousset Lacharmaie Fontaine aux Ordons Launay-Morel M. des Cordeliers Morat Mortefert Pauverts Plantonn´ ee Plˆ o du Poteau Pr´ e des Seigneurs Puiseux-en-Retz Rennweg Carr´ e latin de R´ eno Richebou rg Sablonni` eres Sainte Marie Tr´ esor” Verbamont

Initial date 1927 1931 1923 1959 1927 1931 1923 1934 1923 1922 1931 1927 1946 1945 1923 1922 1928 1923 1951 1934 1925 1923 1945 1928 1931 1922 1928 1981 1931 1922 1922 1951 1989 1931 1966 1958 1959 1923

Observations 2946 6577 3032 5821 769 14501 2606 5808 2209 2366 2326 1260 945 1909 3260 12671 19860 1128 13575 9657 589 1797 330 9024 798 569 14743 589 639 1167 3035 1388 10354 2100 8521 1310 5478 766

DBH (cm) 20.4-85.3 10.5-85 9.5-50.6 7.6-53.2 25.1-85.3 7.3-90.7 6.4-80.5 7-85.3 6.4-68.8 7-73.8 14.3-96.1 19.7-87.5 6.7-58.3 15.6-75.8 9.2-75.1 4.5-61.8 6-70.3 8-66.2 6-65.9 5.7-81.8 35.7-77.0 13.4-80.2 30.9-144.2 14.6-87.2 14-106 39.2-106 6-74.2 6-43.6 25.5-76.7 29.6-95.5 6.4-78.9 9.9-44.2 5.7-54.7 17.5-112.7 7.0-65.3 3.8-41.7 6-69.4 17.8-88.8

Basal area (m2 ha−1 ) 23.8-37.4 20.8-29.7 21.2-27.2 19.6-34.7 25.9-36.6 20.0-43.1 29.7-47.6 22.2-37.2 31.9-45.2 18.2-39.6 30.4-38.7 30.2-37 28.6-30.5 16.6-26.1 21-44.2 10.1-33.6 19.1-36.7 31.2-39 15.3-35.5 19.8-34.6 2.0-23.6 27.9-33.4 6.8-28.1 21.8-35.8 15.6-46.3 15.6-31.6 20.1-33.9 4.3-23.4 5.8-21.5 34.7-44.4 21-31.3 25.5-29.1 16.3-34.1 30.0-34.6 10.9-35.4 22.6-23.5 23.6-31.9 9.2-21.8

Density (stems ha−1 ) 145-304 94-602 352-673 168-1102 138-217 81-813 156-988 96-611 155-1115 96-531 98-483 163-275 277-335 91-283 160-776 165-2155 147-1223 156-1068 205-1162 122-1433 7-126 145-309 20-89 92-567 34-137 35-107 119-1242 84-376 28-156 107-203 94-1572 684-728 295-1010 88-212 81863 606-740 199-878 58-172 Interval length (years) 1-10 1-10 2-8 1-9 1-5 1-7 2-10 3-8 2-10 3-10 1-10 2-9 5 1-10 2-10 2-10 1-9 1-10 2-10 3-10 2-6 2-8 4-6 1-9 5-7 3-7 1-10 1-10 1-6 3-10 3-10 5 1-5 2-8 1-10 4 4-10 1-8

Plot size (ha) 1 1 1 0.5 1 1 0.2 1 0.2 0.3 1 1 1 1 0.2 0.2 1 0.2 1 1 2 1 2 1 2 1 1 0.5 0.9 1 0.5 0.5 0.2 2 0.8 1 1 1

Prop. intervals with cuts 0.57 0.67 0.67 0.5 0.4 0.61 0.82 0.6 0.62 0.62 0.73 0.83 0.33 0.7 0.53 0.71 0.58 0.7 0.79 0.89 0.8 0.75 1 0.67 0.8 0.57 0.58 0.4 0.4 0.57 0.7 0.75 0.62 0.71 0.62 1 1 0.5

Table 1: Summary of the dataset by experiment including the range of the main variables for the considered measurement periods.

Table 2: Frequencies of the proportion of cut trees (all, beech, oak) in the years and plots where thinnigs took place. Proportions are set over the total and per species (when present in the plot). proportion 0-0.05 0.05-0.15 0.15-0.25 0.25-0.35 0.35-0.45 0.45-0.55 0.55-0.65 0.65-0.75 0.75-0.85 0.85-0.95 0.95-1

all species/total 93 119 138 62 20 3 3 1 1 0 1

beech/total beech 50 74 78 47 12 7 3 6 2 2 12

3

beech/total 121 72 59 27 5 3 2 0 0 0 4

oak/total oak 45 66 99 39 16 6 3 3 3 2 16

oak/total 108 71 77 29 7 1 1 0 1 0 1

Table 3: Maximum likelihood parameters estimates and standard errors of the simulta2 2 neous plot and tree thinning model. σyear,b and σyear,b stand for the variance of the year 1 2 random effects for the plot and tree models, while σyear,b1 ,b2 is the covariance between them. Parameter β0 β1 β2 γ0 γ1,beech γ1,oak γ1,hornbeam γ1,others γ2,beech γ2,oak γ2,hornbeam γ2,others γ3 γ4 2 σyear,b 1 2 σyear,b 2 σyear,b1 ,b2

estimate -0.3694 0.0387 0.1760 -2.1306 -4.0513 -5.3613 -0.9210 -2.3803 -1.2513 -3.7132 -2.0291 -0.1231 4.0450 -11.1102 1.5807 1.5076 1.0155

4

standard error 0.2862 0.0135 0.0368 0.1634 0.1587 0.1508 0.5513 0.7612 0.3242 0.3038 0.6480 1.5476 0.1989 0.4614 -

Figure

Figure

Figure

Figure

Figure Captions

1

Figure Captions

2

• Figure 1. Location of sample plots

3

• Figure 2. Predicted (circles) and observed (triangles) numbers of events by decile

4

in the Hosmer-Lemeshow test

5

• Effect of the time since the last harvest (a) or the basal area (b) when this time is

6

unknown on the plot-level probabilities of harvest. The 0.95 confidence envelopes

7

are represented in gray

8

• Figure 4. Effect of the relative DBH, i.e the ratio between tree dbh and plot

9

mean quadratic diameter, on the conditional probabilities that a tree is harvested

10

in a mature stand (a) and a young stand (b). The 0.95 confidence envelopes are

11

represented in gray

1

Suggest Documents