The human race has depended on various forest products for subsistence needs since time immemorial. However, increase in population has over-burdened.
2009]
Modelling and Validation for Volume Estimation of Eucalyptus
231
MODELLING AND VALIDATION FOR VOLUME ESTIMATION OF EUCALYPTUS DELGERJARGAL DUGARJAV AND RAJIV PANDEY
Statistical Branch, Forest Research Institute, Dehra Dun (India). Introduction The human race has depended on various forest products for subsistence needs since time immemorial. However, increase in population has over-burdened the limited exhaustible natural resources, which has led to quantitative and qualitative reduction. The balance can be met only by the adaptation or development of new technology for meeting the growing needs of resources. The plantation of multi-purpose trees can be a potential option. Information about these new plantation stands, particularly species composition, yield, volume, growth etc. are essential for management as well as for policy formulation purposes. Precise and accurate methods for estimating tree volume are vital tools for efficient forest inventory, sound forest management and silvicultural research work (Fonweban, 1999). The yield and volume of timber are required by stand managers (Yamamoto, 1994; Tewari and Gadow, 1999), and are dependent on tree diameter, height, basal area, form, bark thickness and other parameters. The relationship between these parameters serves as a key to tree volume estimation. Volume equations
(models) have been the most widely used technique for estimating tree volumes. A model should provide information that is sufficiently precise and comprehensive to execute the intended purpose with its simplicity, is easily understood and helpful for drawing inferences. Such a parsimonious model gives better predictions than one that is unnecessarily complicated. The basic steps of modeling are (Vanclay, 1994) : Collection and analysis of data; Formulation of conjecture about system to be studied; Construction of “real” model: making the problem as precise as possible by identification; Approximation and idealization; Formulation of the mathematical model (expression of “a real” model in symbolic terms); Study of system using mathematical ideas and techniques, and Comparison of results predicted by mathematical model with real world. Before a model is used, some checks of its validity for intended purposes are essential. A check on the model predictions and coefficients, and minimization of predictive error should be made as soon as the model has been developed. The best
232
Indian Forester
model is the one whose estimated prediction error is least. Eucalyptus is a multipurpose tree species from Australia, adapted to a variety of edaphic and climatic conditions varying from tropical to warm temperate. The species is generally evergreen, new leaves appear as the old ones are shed. There is a great variation in the height of various Eucalyptus species which ranges from 30 45 m and diameter of 1-2 m in Indian conditions (Tewari, 1992). It is used as a main source of pulpwood, match splints and plywood as well as raw material in the Rayon industry (Rajan, 1987). However, information on volume of standing trees is not available spatially. This paper attempts to develop volume models for Eucalyptus plantations grown at Hempur, Uttrakhand.
[February,
Methodology For the present study, Eucalyptus trees grown at NEPA farm constitute target population. The samples were selected from the population through Simple Random Sampling Without Replacement methods for estimation of volume of trees. Sample size (n) was estimated as per standard procedure (Cochran, 1972). For the purpose of developing a model for predicting volume of the trees, height (in metres) and diameter at breast height (in centimetres) were recorded on the randomly selected trees with the volume by making small log after cutting the trees was also recorded. For the volume calculation of the trees, Smalian’s formula, where the bole of Eucalyptus trees considered as frustum of paraboloid was used (Chaturvedi and Khanna, 2000).
Study Region The plantation area is situated at NEPA farm at Hempur, North of Kashipur in Udham Singh Nagar District of Uttarakhand. The area lies at 29o17' N latitude and 78o58' E longitude. The area is almost plain with elevation ranging from 250m to 260m, with well-drained and fertile soil. NEPA Ltd. (A Govt. of India Undertaking) is engaged in the manufacturing of newsprint at its registered office in Nepanagar, Madhya Pradesh. The company has raised Eucalyptus plantations during 1990 to 1993 over 305 ha of land at Hempur with the objective of making raw material (pulp) available to a proposed newsprint mill at Aliganj in Moradabad, Uttar Pradesh. During 1998 and 1999, selective felling was carried out and cut trees were left for coppicing.
Normality was checked by the W test given by Shapiro and Wilk (1965). Empirical relationships were estimated through regression analysis. The coefficients of model were tested by t-test and Adjusted (Adj.) R2 were also estimated. The adequacy of model was tested by chisquare (Draper and Smith, 1960). Validation Technique : There are two ways for model validation. The first deals with a data set – which is to divide in two parts. One is for model development and other is for validation; and vice versa (known as resampling procedures). The most used methods for it are half-splitting, crossvalidation, jackknifing and bootstrapping. And, the second deals with two independent data sets – one for model development and the independent one will be for model validation (known as
2009]
Modelling and Validation for Volume Estimation of Eucalyptus
validation through independent data) through different evaluation criteria. The latter is not used frequently in the forestry models. However, besides this, for model validation, the estimates of Apparent error; True error and Excess error of model are critical. Apparent error (also called resubstitution error) is computed by applying the fitted equation to the data used in calibration of the model and will normally give an optimistic view of the quality of a model. True error is estimated by fitting the model to independent data (computed by applying the fitted equation to the data not used for calibration of the model). Apparent error underestimates the true error (“it is downwardly biased”). The difference between true error and apparent error is known as Excess error. The aforementioned relationship can be formulated as : True error = Apparent error + Excess error Validation with independent set of data: Evaluation criteria : For the validation purpose, the data set has been divided into two through random procedure and the first data set (known as fitting data set) was used for the model building. Then the predictive ability of the different models was assessed on the basis of following evaluation criteria (Tewari and Kumar, 2003) by using second data set (known as validating data set). Average residual or prediction bias (B) : n ri ∑ B = i=1 n where : ri represents the difference between the observed and predicted volume for ith tree in the validating data set.
233
The variances : n
∑ (r
i
Var(B) =
− B) 2
i =1
n−1 The Root mean square error (RMSE), provides a composite measure (combining bias and precision) of the overall accuracy of prediction. The smaller these values the better the prediction : RMSE = √ [B2 + Var(B)] This Co-efficient of dispersion (CD) based on standard deviation, which measure the proportion variation in bias provides a composite measure of overall accuracy of prediction. The smaller the value, the better the prediction. Moreover it is unitless too.
CD =
Var(B) B
Wilcoxon’s signed rank test (Wilcoxon, 1945) was used to test bias produced by each equation. This non-parametric test assumes that there is information in the magnitudes of the differences between paired observations, and rank them from smallest to largest by absolute value. Add all the ranks associated with positive differences and then negative differences. Finally, the p-value associated with this statistic is found from an appropriate table. A rank was assigned to each equation based on each evaluation criterion (Cao et al., 1980). The smaller the rank value the better the performance of the model. These ranks of all criterion are then summed up to arrive at the final fit rank for each equation, which is the indicative of model’s performance with respect to all the criteria considered. The lowest rank
234
Indian Forester
model is the best model in the available models. Results and Discussion The data for 249 trees selected at random was collected for model development. Since it is not feasible to collect new data, the whole data set was divided into two sets by Simple random sample without replacement through Random Sample Generation procedure of S-plus software. The first data set (fitting data set) consisted of 173 observations and was used for fitting the volume equations while the latter consisted of remaining 76 observations and was used for validation. The descriptive statistics of these two data sets are given in Table 1.
[February,
The Shapiro-Wilk and randomness test confirms the test for the data. Various models for predicting volume were found during the perusal of literature on the subject as well as on the basis of different combination of graphics and been tried to the fitting data set. Amongst them, 4 models have shown adj. R2 value 0.92 or more and thus these models have been selected for the further study. The related statistics for each of the equations are given in Table 2 with ranking based on adj. R2. Here the equation 2 has shown the highest value of adj. R2 (ranked first), therefore may be preferred over other equations. In order to see trends, if any, we calculated the residuals (observed volume - predicted volume) and plotted the values
Table 1 Descriptive Statistics for Model Fitting Data Set Parameter
Fitting Data Set Min.
Diameter
6.40
Height Volume
Max.
Validating Data Set Mean ± S.E
22.90
Min.
12.52 ± 0.29
Max.
6.40
22.70
Mean ± S.E 12.09 ± 0.43
8.50
21.60
14.81 ± 0.23
8.60
23.00
14.31 ± 0.34
0.0088
0.3715
0.0945 ± 0.0056
0.0090
0.4250
0.0870 ± 0.0087
Table 2 Selected volume equations Equation
R2
adj. R2
Notation
Coefficients a
b
V= a + b D + c H
0.92
0.92 (3)
1
- 0.16058 0.01570 (0.0010)*
V= a + b D2H
0.98
0.98 (1)
2
- 0.00066 0.00003 (0.0000)*
V= a + b D2
0.96
0.96 (2)
3
- 0.02668 0.00070 (0.0000)*
V= a + b DH
0.96
0.96 (2)
4
- 0.05457 0.00076 (0.0000)*
c 0.00395 (0.0013)*
Values in parentheses under coefficients are SE and under adj. R2 show the rankings of model.
Plots of residuals against predicted volume equations
Modelling and Validation for Volume Estimation of Eucalyptus
2009]
235
Fig. 1 Equation 1
Equation 2 0.0800
Residual (cub.m)
Residual (cub.m)
0.1500 0.1000 0.0500 0.0000 -0.1
0
0.1
0.2
0.3
0.4
-0.0500
0.0700 0.0600 0.0500 0.0400 0.0300 0.0200 0.0100 0.0000 -0.0100 0
0.05
Predicted volume (cub.m)
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Predicted volume (cub.m)
Equation 3
Equation 4 0.1000 0.0800
0.0600 0.0400 0.0200 0.0000 -0.0200 0
0.1
0.2
0.3
0.4
-0.0400 -0.0600 Pr ed ict ed vo l ume ( cub . m)
Residual (cub.m)
0.1000 0.0800
0.0600 0.0400 0.0200 0.0000 -0.1 -0.0200 0
0.1
0.2
0.3
0.4
-0.0400 Predicted volume (cub.m)
Plots of residuals against Predicted Volume Equations
against the values of the predicted volume (Fig. 1). These indicate no clear trends in residuals (in case of equation 1, some trend can be seen), hence, it can be said that they were randomly distributed and it was also revealed that the least dispersion was for Equation 2. Model Validating : In order to assess the predictive ability of the different equations, the models were validated by using independent data set (known as validating data set). For this purpose, the volume equations obtained from the fitting data set were applied to the validating data set. The apparent error, true error and excess error were calculated for each equations. Since true error is estimated by fitting
the model to independent data set, it is likely to give very important information for the model evaluation. In general, apparent error comes to zero for regression models, which means that the true error indirectly will express the excess error. Less the value of excess error, the better the predictive ability of the equation. From Table 3 it can be concluded that Equations 1, 3 and 4 have given reasonable value of excess error, whereas Equation 2 has given the highest value, which means that the predictive ability of Equation 2 is poorer than other equations in case of excess error criterion. This depends on data set also, that is its distribution but in this case the size is fairly large and range too (Table 1), therefore, the obtained result is model performance.
236
Indian Forester
[February,
Table 3 Values of errors of volume equations Equation
Apparent error
True error
Excess error
χ2
1
0.0000
0.0012
0.0012
0.3254
2
0.0000
0.0111
0.0111
0.1702
3
0.0000
0.0016
0.0016
0.2264
4
0.0000
0.0024
0.0024
-0.0191
Table 4 Validation statistics for volume equations with independent data set Equation
Bias (+m3)
1
0.00117 (1)
2
0.01116 (4)
3
0.00164 (2)
4
0.00245 (3)
RMSE
CD
Σ Rank
0.00055 (4)
0.02349 (4)
20.04 (4)
13
4
0.00019 (1)
0.01784 (3)
1.24 (1)
9
2
0.00025 (3)
0.01600 (2)
9.64 (3)
10
3
0.00022 (2)
0.01493 (1)
6.05 (2)
8
1
Var (B) (+m3)
Final rank
Values in parentheses show the rank.
Now let us consider the root mean square error (RMSE), which provides a composite measure (combining bias and precision) of the overall accuracy of prediction. The bias (B) gives the accuracy of prediction while the variance (Var (B)) provides information regarding precision of the prediction. The smaller these values the better the prediction. In order to test the “goodness-of-fit” of the models, we applied chi-square test on the validating data set. The chi-square test results came out to be insignificant and hence we accepted the hypothesis under test, which means that the model under study is qualified for goodness of fit test and can be used for prediction purposes after validation. All these statistics were considered to assess overall performance of each equation and the values are represented in Table 4.
Table 4 shows that Equation 1 has the lowest bias whereas Equation 2 has the highest bias. In case of variance, Equation 2 has the lowest value and Equation 1 has the highest value. The combined effect of bias and variance is expressed as RMSE and here Equation 4 has the least value whereas Equation 1 has the highest value. Coefficient of dispersion has been calculated to evaluate the proportion variation in the mean, standard deviation being considered as the total variation in the mean and for this Equation 2 has the least value whereas Equation 1 has the highest value. Ranks have been assigned for the each criteria mentioned above for each equation as described. After considering all the rankings, the authors have arrived at the final ranks of the validating data set. As a result, the equation 4 was ranked
2009]
Modelling and Validation for Volume Estimation of Eucalyptus
237
Table 5 Wilcoxon’s signed rank test and Combined result of the criterion ranks Equation
Z
Asymptotic significance
Rf
Rv
Σ Rank
Final rank
1
-0.9759
0.3291
3
4
7
3
2
-1.3798
0.1677
1
2
3
1
3
-0.8103
0.4178
2
3
5
2
4
-1.1934
0.2327
2
1
3
1
as first followed by Equations 2, 3 and 1 at last. The asymptotic significance of Wilcoxon’s signed rank test for validating data set for all the equations showed that the null hypothesis of test, i.e. the difference between sum of the positive and negative rank is zero, is accepted (Table 5). Now one can arrive at the final ranks by considering ranks of both fitting (Rf) and validating (Rv) data sets. This shows that the overall ranks of Equation 2 and Equation 4 are lowest, which means that Equation 2 and Equation 4 would perform better for predicting volume of tree.
Therefore it can be concluded that equations V = - 0.00066 + 0.00003 D2H and V = - 0.05457 + 0.00076 DH are preferred over other equations considered for the present study (Table 5). Conclusion The model adequacy was ranked on the basis of different criteria, namely, adj. R2, bias, variance, root mean square error and coefficient of dispersion. Finally, on the basis of ranking, the following equations are recommended to be preferred over other models : V = - 0.00066 + 0.00003 D2H and V = - 0.05457 + 0.00076 DH
SUMMARY Models for volume estimation for Eucalyptus plantation have been developed for NEPA farm, Uttrakhand. The model validation was ranked on the basis of different criteria, namely, adj. R 2, bias, variance, root mean square error and coefficient of dispersion. Finally, on the basis of ranking, the models V = - 0.00066 + 0.00003 D 2H and V = - 0.05457 + 0.00076 DH were recommended. Key words : Eucalytpus, Volume estimation, Modelling, Validation.
;qdfs yIVl ds vk;ru dk vuqeku yxkus ds izfr:i cukuk vkSj mudk oS/kdj.k MsYxsjtkjxy Mqxkjtko o jktho ik.Ms; lkjka'k ;qdfs yIVl jksiouksa ds vk;ru vuqeku yxkus okys izfr:i usik QkeZ] mÙkjk[k.M ds fy, cuk, x, gSAa izfr:i dk oS/kdj.k fofHkUu dlkSfV;ksa ,Mt vkj2] vfHkufr fofHkUurk] tM+ksa dh ek/; oxZ =qfV vkSj vifdj.k
238
Indian Forester
[February,
xq.kkad ds vk/kkj ij inLFkkfir fd, x, vkSj mlds vk/kkj ij V= –0.00066+0.00003D2H vkSj V= –0.05457 + 0.00076 DH izfr:i vfHkLrkfor fd, x,A References Cao, Q.V., H.E. Burkhart and T.A. Max (1980). Evaluation of two methods for cubic volume prediction for Loblolly pine to any merchantable limit. For. Sci., 26 : 71-80. Chaturvedi, A.N. and L.S. Khanna (2000). Forest Mensuration and Biometry (3rd edn.). Khanna Bandhu, Dehra Dun. Cochran, W.G. (1972). Sampling techniques (3rd edn.). Wiley Eastern, New Delhi. Draper, N. and H. Smith (1960). Applied regression analysis. John Wiley and Sons, New York. Fonweban, J.N. (1999). An evaluation of numerical integration of taper functions for volume estimation in Eucalyptus saligna stands. J. Trop. For. Sci., 11 : 410-419. Rajan, B.K.C. (1987). Versatile Eucalyptus. Diana Publications, Bangalore. Shapiro, S.S. and M.B. Wilk (1965). An analysis of variance test for normality. Biometrica, 52 : 591-611. Tewari, D.N. (1992). Monograph on Eucalyptus. Surya Publication, Dehra Dun. Tewari, V.P. and K.V. Gadow (1999). Modeling the relationship between tree diameters and heights using SBB distribution. For. Ecol. Manage., 119 : 171-176. Tewari, V.P. and V.S.K. Kumar (2003). Volume equations and their validation for irrigated plantations of Eucalyptus camaldulensis in the hot desert of India. J. Trop. For. Sci., 15 : 136-146. Vanclay, J.K. (1994). Modelling forest growth: Application to mixed tropical forests. CAB International, Wallingford. Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrica, 1 : 80-83. Yamamoto, K. 1994. A simple volume estimation system and its application to three coniferous species. Can. J. For. Res., 24 : 1289-1294.