Study sites are located in Costa Rica and Honduras, both covering an area of ... the case of the study site located in Honduras, the most important factors ...
Classification and modeling of trees outside forest in Central American landscapes by combining remotely sensed data and GIS
Inaugural-Dissertation zur Erlangung der Doktorwürde der Forstwissenschaftlichen Fakultät der Albert-Ludwigs-Universität Freiburg i. Brsg.
vorgelegt von
Bernal Herrera-Fernández aus Costa Rica
Freiburg im Breisgau 2003
Acknowledgements I wish to express my gratitude to Prof. Dr. Barbara Koch for accepting me as a graduate student at the Department of Remote Sensing and Landscape Information Systems (FELIS), and for supporting this research initiative. I also want to thank Dr. Matthias Dees for his supervision, contributions, and support during my studies and work at FELIS. I wish to thank Dr. Claus Peter Gross for commenting on an earlier draft of this document. I am indebted to Prof. Dr. Christoph Kleinn, who offered me the opportunity to work within the framework of TROF-Project, for contributing to the definition of the subject of this dissertation, and for providing invaluable scientific feedback that improved the quality of the research. Gracias Christoph. Thanks to my colleagues at FELIS, especially to Felipe Guanaes Rego and Raymundo Villavicencio, for their friendship, patient and support, and for the continuing discussions on topics of remote sensing, GIS, forestry, geography, and otherwise.
These
acknowledgments would be incomplete without thanking Teresa Méndez, Anka Zimmerman, Christi Bianchini, Carolina Gebler, Verena Bushle, Guillermo Navarro, Robinson Cruz, and Randolph Welte for their friendship and support during my stay in the unforgettable city of Freiburg. I am deeply grateful to Jessica Deis for help in editing English grammar and style, and to Gernot Ramminger for the translation of the German executive summary. I want to thank the scientists involved in the TROF project, especially to Tatjana Koukal and Prof. Dr. Wegner Schneider (IVFL, Austria), who kindly provided the TOF classification on IRS images. I extend my grateful to David Morales (CATIE, Costa Rica), Guillermo Suazo (IHCAFE, Honduras), Tomas Pap-Vary, Jan Schultz, and Kai Türk (FELIS, Germany). Vicente Watson (CCT, Costa Rica) for providing most of the thematic information used in this research. I would like to thank to German Academic Exchange Service (DAAD) that contributed a scholarship and the European Community through TROF-Project (Project number ERB3514PL973202) for providing most part of the raw data.
ii
To the memory of my father
…A never-ending source of inspiration
iii
Bernal Herrera-Fernández Department of Remote Sensing and Landscape Information Systems (FELIS) Faculty of Forestry and Environment University of Freiburg, Germany
Executive Summary The present research was conceived and developed within the framework of the “Tree Resources Outside the Forest” (TROF) project funded by the European Commission (Project number ERB3514PL973202).
Most of the raw data was provided for this
research initiative and some of the results obtained in the project were used with the consent of the authors. Study sites are located in Costa Rica and Honduras, both covering an area of 199,600ha. Both areas are covered by an IRS 1-D panchromatic image with 5.8m spatial resolution. In the case of Costa Rica, a smaller area of 127,500ha was also selected for investigation. Twenty-three scanned aerial photos cover this area. For the purposes of this study, TOF comprises as all trees outside the legal forest borders comprising an area < 2ha.
In this research, an algorithm was developed for TOF
extraction on digital aerial photos in a study area located in Costa Rica. Furthermore, the effect of the biophysical and spatial covariables on the spatial distribution of TOF in the two study sites selected. The effects of the spatial resolution at which TOF was extracted, the effect of TOF absence event, and covariables scale on factors affecting TOF spatial distribution were also assessed. TOF information was extracted from scanned aerial photos 1:40,000 with 3m spatial resolution. Firstly, a multi-resolution segmentation method was applied, and then the classification was performed using an object-driven approach.
The results of this
classification were compared with a TOF classification based on an IRS-1D panchromatic image, and Landsat ETM+. A set of biophysical and spatial covariables was evaluated as a potential determinants of the spatial distribution of TOF. All covariables were stored
iv
and processed in a GIS and georeferenced to the respective national map projection system. Explicit spatial models were constructed by using logistic regression techniques. The multi-resolution segmentation method applied proved very efficient in extracting the segments required for the classification of forest, non-forest, and TOF on 3m spatial resolution scanned color aerial photos. TOF-land (i.e. land where is likely to observe TOF) area estimated from aerial photos was 3709.77ha, representing 3% of the total study area (127,500ha). In the case of Costa Rica, the most important factors affecting TOF presence classified using 3m spatial resolution images are the altitude above sea level, distance to nearest settlement, and distance to nearest forest edge. The logistic model adjusted using TOF classified on IRS image in the same study site, shows that the most important factors affecting TOF are the mean annual rainfall, distance to nearest paved road, distance to nearest human settlement, and distance to nearest forest edge. Distance to the nearest paved road was the most important covariable determining TOF spatial distribution in the study site of Costa Rica, accounting for 57.1% of the probability to observe TOF. In the case of the study site located in Honduras, the most important factors affecting TOF spatial distribution are the altitude, distance to nearest paved road, and distance to nearest human settlement.
The most important factors affecting TOF presence in each study
site, reflected by the logistic model, showed differences between study site not only in the significant covariables selected, but also in its magnitude and sign. The results indicate a low to moderate capacity of the logistic models to discriminate between TOF presence/absence observations. In the case of Honduras, its value was 62.9%, while in the model fitted for the study site of Costa Rica it was 60.2%. The model fitted using TOF classified on aerial photos showed the moderate discriminatory power with a value of 68.1%. It should be borne in mind that there are many other factors affecting the presence of TOF in a landscape that were not used available for this study due to information restrictions. Some of these factors are, among others, land cover and distance to markets. Furthermore, there are other factors affecting TOF that are more difficult to quantify, such as the individual decision of farmers on where a tree should be
v
planted, which trees, for instance, growing on pasture should be cut and when these activities should take place. An increase in the precision of some biophysical covariables such as soil, altitude, slope, and annual rainfall could improve the results obtained here. It is recommended that future research efforts in this area take into consideration these types of factors as determinants of TOF spatial distribution. The model-goodness-of fit of the logistic model, its predictive power, the most important covariables determining TOF spatial distribution, and the structure of the spatial autocorrelation were dependent on the spatial resolution at which TOF were extracted. The capacity to discriminate TOF presence/absence observations resulted 8% higher in the model fitted using TOF classified on aerial photos. The results obtained support the hypothesis that the number of TOF absence events and covariable observations in fact affects the logistic model. Such an effect was observed in the number of covariables entering the models, and in the capacity to discriminate between TOF absence and TOF presence observations. Although the magnitude of the changes was different according to each study site, the trend in the variation was approximately the same. Spatial accuracy of the covariables used to fit a logistic model showed important effects on statistics both in the univariable analysis and in the multivariable procedures. The most important effect of the lack of spatial accuracy occurs in the final set of factors entered in the TOF presence model. Fitting a model using the original covariables, additional factors will be considered as determinants of TOF presence. This situation could produce an overestimation or a underestimation of the relative importance of the factors affecting TOF presence, thereby leading the model user to misinterpretations. Therefore, for modeling purposes it is imperative to establish clear standards of data collection, storage, and transformation in the establishment of a GIS set up. The TOF spatial logistic model based on aerial photos is definitely the most precise, not only for analysis of the most important factors affecting TOF, but also for its usefulness in the prediction of TOF probability. The differences in terms of the most important
vi
covariables when compared to the model fitted on an IRS model could produce some misinterpretations. In terms of the predictive capacity, the models can show important differences in the surface probability map produced. Since the model based on aerial photos more closely approximates field conditions, it is highly recommended that this model be used for both activities: analysis of most important factors and TOF mapping. The models can be used to study future possible scenarios of TOF distribution in the landscape. These scenarios can be constructed based on different questions, i.e.: what will be the TOF probability in a specific study site if more paved roads are developed? What will be the probability of observing TOF if determined deforestation rate is reached? Using the capabilities offered by the GIS technology, these scenarios could be spatially analyzed.
Considering that the research and necessities of information at
landscape level are priorities if natural resources are to be managed sustainably, the model, using scenarios, can provide partial information to help improve landscape connectivity or to increase carbon pools in a determined landscape. Thus, areas with low probability of observing TOF can be given priority to establish trees and therefore improve the forest connectivity of the area under analysis. Furthermore, the map produced by the model can also be used as an information input for forest inventory field missions, using the probability surfaces as strata and thus, for each strata different sampling intensities can be planned. The resulting logistic models can also serve as point of departure for selecting new covariables as well as in discarding those found to be not significant in the present study.
vii
Contents Acknowledgments Executive summary List of Tables List of Figures List of Appendices List of abbreviations, acronyms, and symbols
ii iv xii xvii xxii xxiv
1. Introduction
1
1.1.
Forestry in Central America
1.2
The role of trees outside forest in developing countries
1.3
Research experience and information requirements in Central America
1.4
Starting point of this study
1.5
Research objectives
1.6
Organization of the thesis
1 2 2 4 4 4
2. State-of-the-art: concepts and applications
7
2.1
7
TOF: concepts, characteristics and research 2.1.1
Definitions
2.1.2 TOF classification 2.1.3 Environmental and socioeconomic functions 2.1.4 Assessment of TOF on a large-scale basis 2.2. Spatial modeling: concepts, methods, and applications 2.2.1 Spatial explicit models, remote sensing, and GIS 2.2.2 Logistic regression in the TOF modeling context 2.2.3 Spatial autocorrelation: diagnosis and consequences 2.2.4 Logistic regression models: applications in forest and land management
2.3
7 8 10 13 14 14 16 23
2.2.5 Errors sources in spatial statistical models 2.2.6 Scale effects on statistical models: an overview
30 31 35
Image segmentation and classification: concepts and methods
37
2.3.1 Image segmentation 2.3.2 Image classification: pixel- vs. object-driven approach
37 39
viii
3. Materials and methods
42
3.1
Study areas
42
3.2
Data source
43
3.2.1 TOF classification on IRS panchromatic images 3.2.2 Digital thematic information
43 45
3.3
TOF definition
45
3.4
Forest and TOF segmentation - classification on aerial photos
46
3.4.1 Images pre-processing 3.4.2 Images processing 3.4.3 Accuracy assessment of classification 3.5
Response variable 3.5.1 Definition 3.5.2 Preliminary data analysis
3.6
Biophysical and spatial covariables 3.6.1 Setting the covariables information in geodatabases 3.6.2 Accuracy assessment of covariables 3.6.3 Covariables correction
3.7
Sampling design 3.7.1 General description 3.7.2 Definition of optimal distance between observations 3.7.3 Selection of number of TOF presence - absence observations
3.8
Multivariable spatial logistic models for TOF presence 3.8.1 Model building strategy 3.8.2 Model integration in a GIS
46 47 50 50 50 51 52 52 56 59 59 59 60 60 62 62 66
ix
3.9
Effect of TOF spatial resolution, covariables scale, and spatial accuracy on TOF distribution 3.9.1 Effect of TOF spatial resolution on models statistics 3.9.2 Effect of TOF absence and covariables scale 3.9.3 Effect of covariable spatial accuracy
67 67 67 68
3.10 Summary of the spatial models fitted
68
4. Results
70
4.1
70
TOF segmentation and classification on aerial photos 4.1.1 Multi-resolution segmentation of TOF 4.1.2 Forest cover and TOF classification 4.1.3 Accuracy assessment 4.1.4 Comparison with IRS/Landsat image classification
4.2
Spatial modeling of TOF distribution classified on aerial photos in Costa Rica 4.2.1 TOF spatial characterization 4.2.2 Covariables characterization 4.2.3 Biophysical and spatial factors affecting TOF distribution 4.2.4 Spatial explicit model of TOF presence
4.3
80 80 83 87 93
Spatial modeling of TOF distribution classified on an IRS image in Costa Rica
100
4.3.1 TOF spatial characterization 4.3.2 Covariables characterization 4.3.3 Biophysical and spatial factors affecting TOF distribution
100 102 107 113
4.3.4 Spatial explicit model of TOF presence 4.4
70 70 74 76
Spatial modeling of TOF distribution classified on an IRS image in Honduras 4.4.1 TOF spatial characterization 4.4.2 Covariables characterization 4.4.3 Biophysical and spatial factors affecting TOF distribution 4.4.4 Spatial explicit model of TOF presence
119 119 122 126 132
x
4.4.5 Models comparison at country level 4.5
Effect of TOF spatial resolution, covariables scale, and spatial accuracy on TOF distribution 4.5.1 Effect of TOF spatial resolution on models statistics 4.5.2 Effect of TOF absence event and covariables scale 4.5.3 Effect of covariables spatial accuracy
137
139 139 145 151
5. Discussion
156
5.1
TOF segmentation and classification on aerial photos
5.2
What are the most important factors affecting TOF spatial distribution?
5.3
What is the influence of the specific study site conditions on factors affecting TOF? What is the effect of the spatial resolution on TOF spatial models?
156 158 165
5.4 5.5 5.6
What is the effect of the number of TOF absence event and covariable scale on TOF spatial models? What is the effect of the covariables spatial accuracy on TOF spatial model?
5.7
Applications of models
5.8
Limitations of the study
168 171 173 175 177
6. Conclusions
182
References
188
Appendices
200
xi
List of Tables 2. State-of-the-art: concepts and methods 2.1.
Official forest definitions given by FAO, and national forest laws of Costa Rica and Honduras.
8
2.2.
Table Trees outside forest classification scheme (Morales 2001).
10
2.3.
Summary of the most important environmental and socioeconomic functions of TOF at local and landscapes scales.
11
3. Materials and methods 3.1.
Aerial photos major characteristics (B= blue, G= green, R = red, NIR = near infrared, MIR = mid infrared, TIR = thermal infrared) (after Jensen 1996).
48
3.2.
Description of the biophysical and spatial covariables used in the analysis.
52
3.3.
General description of soil series in the study site in Honduras and their classification in terms of their soil agricultural potential.
57
3.4.
Classification and general description of soil units in the study site in Costa Rica in terms of their agricultural potential
58
3.5
Grid size and number of observations and sampling fraction used in each study site.
61
4. Results 4.1.
4.1 4.2
TOF segmentation and classification on aerial photos
Accuracy measures for classification results on aerial photos scale 1:40,000 and 3m spatial resolution.
74
Land cover and TOF classifications based on aerial photos and on an IRS 1-D image.
75
xii
4.2
4.3
Spatial modeling of TOF distribution classified on aerial photos in Costa Rica.
Descriptive statistics for the set of covariables used as determinants of TOF presence. Data for TOF classified on aerial photos images in the study site of Costa Rica.
84
Spatial displacement statistics for features digitized on topographic maps 1:50,000 using as reference the 23 color aerial photos features in the study site of Costa Rica.
85
Univariable logistic regression models for continuous covariables used for fitting the multiple logistic model of TOF presence estimation using data from aerial photos in the study site of Costa Rica. (p < 0.05).
88
Goodness-of-fit statistics for the best models according to Score Test Approximation to Mallow’s Cq (STAM). TOF classified on 23 color aerial photos in the study site of Costa Rica.
91
Likelihood and Wald tests for the final model for TOF presence estimation in the study site of Costa Rica. Model fitted using data from aerial photos.
94
Summary of the model fitting information for the final model for TOF presence estimation using data from aerial photos in the study site of Costa Rica.
94
Akaike Information Criterion (AIC), Schwartz Criterion (SC) and, and –2Log L statistics for the final model for TOF presence estimation using data from aerial photos in the study site of Costa Rica.
94
4.10 Analysis of maximum likelihood estimates for the final model for TOF presence estimation in the study site of Costa Rica. TOF information extracted from 23 color aerial photos.
95
4.11 Contribution of each individual covariable entering in the model to discriminate between TOF presence and absence observations (c) in the study site of Costa Rica. TOF information extracted from 23 aerial photos.
97
4.4
4.5
4.6
4.7
4.8
4.9
4.3
Spatial modeling of TOF distribution classified on an IRS image in Costa Rica
4.12 Descriptive statistics for the set of covariables used as determinants of TOF presence. Data for TOF classified on IRS-1D image in the study site of Costa Rica.
104
xiii
4.13 Spatial displacement statistics for features digitized on topographic maps 1:50,000 using as reference the IRS-1D image features in the study site of Costa Rica.
105
4.14 Univariable logistic regression models for continuous covariables used for fitting the multiple logistic model of TOF presence estimation using data from an IRS-1D image in the study site of Costa Rica. (p < 0.05).
108
4.15 Goodness-of-fit statistics for the best models according to Score Test Approximation to Mallow’s Cq (STAM). TOF classified on IRS-1D image in the study site of Costa Rica.
110
4.16 Likelihood and Wald tests for the final model for TOF presence estimation in the study site of Costa Rica. Model fitted using data from an IRS-1D image.
113
4.17 Summary of the model fitting information for final model for TOF presence estimation using data from an IRS-1D in the study site of Costa Rica.
114
4.18 Akaike Information Criterion (AIC), Schwartz Criterion (SC) and, and –2Log L statistics for final model for TOF presence estimation using data from IRS-1D image in the study site of Costa Rica.
114
4.19 Analysis of maximum likelihood estimates for final model for TOF presence estimation in the study site of Costa Rica. TOF information extracted from an IRS-1D.
115
4.20 Contribution of each individual covariable entering in the model to discriminate between TOF presence and absence observations (c) in the study site of Costa Rica. TOF information extracted from an IRS-1D image.
116
4.4
Spatial modeling of TOF distribution classified on an IRS image in Honduras
4.21 Descriptive statistics for the set of covariables used as determinants of TOF presence. Data for TOF classified on an IRS-1D in the study site of Honduras.
123
4.22 Spatial displacement statistics for features digitized on topographic maps 1:50,000 using as reference the IRS-1D image features in the study site of Honduras.
124
xiv
4.23 Univariable logistic regression models for continuous covariables used for fitting the multiple logistic model of TOF presence using data from an IRS- 1D in the study site of Costa Rica. (p < 0.05).
127
4.24 Goodness-of-fit statistics for the best models according to Score Test Approximation to Mallow’s Cq (STAM). TOF classified on IRS-1D image in the study site of Honduras.
130
4.25 . Likelihood and Wald tests for the final model for TOF presence estimation in the study site of Honduras. Model fitted using data from an IRS-1 D image.
133
4.26 Summary of the model fitting information for final model for TOF presence estimation using data from an IRS-1D in the study site of Honduras.
133
4.27 Akaike Information Criterion (AIC), Schwartz Criterion (SC), and – 2Log L statistics for the final model for TOF presence using data from an IRS image resolution in the study site of Honduras.
134
4.28 Analysis of maximum likelihood estimates for the final model for TOF presence estimation in the study site of Honduras. TOF information extracted from an IRS-1D image.
134
4.29 Contribution of each individual covariable entering in the model for TOF presence to discriminate between TOF presence and TOF absence observations (c) in the study site of Honduras. TOF information extracted from an IRS-1D image.
135
4.30 Most important factors affecting TOF spatial distribution in the study sites of Costa Rica and Honduras using TOF classified on IRS 1-D images. The signs indicate the direction of the association with TOF presence. ns: no statistically significant.
138
4.31 Goodness-of-fit tests and model’s discrimination power of the logistic models fitted in the study sites.
138
4.5 Effect of TOF spatial resolution, covariables scale, and spatial accuracy on TOF distribution
4.32 Univariable logistic regression models for the covariables using TOF classified on two spatial resolutions in the study site of Costa Rica (p < 0.05). Results for IRS-1D correspond to the same area covered by the 23 aerial photos.
140
xv
4.33 Goodness-of-fit statistics for the best logistic models according to Score Test ?2. TOF classified on IRS-1D image covering the same area covered as the aerial photos in the study site of Costa Rica.
141
4.34 Summary of logistic models fitted using TOF classified on two spatial resolutions in the study site of Costa Rica.
143
4.35 Analysis of maximum likelihood estimates for final model for TOF presence estimation using data from an IRS-1D image and covering the same area as aerial photos in the study site of Costa Rica.
144
4.36 Statistical significance of models’ parameters (p < 0.05) for TOF presence estimation classified on two different spatial resolutions in the study site of Costa Rica. The signs indicate the direction of the association with TOF presence. ns: no statistically significant.
145
4.37 Statistics for logistic models adjusted using different grid sizes for sampling covariables in the study site of Honduras. Number of TOF presence observations = 1320, corresponding a grid size of 250x250m. See text for further explanation.
147
4.38 Statistics for logistic models adjusted using different grid sizes for sampling TOF absence observations (p < 0.05) in the study site of Costa Rica. Number of TOF presence observations = 920, corresponding a grid size of 250x250m. See text for further explanation.
149
4.39 Univariable logistic regression models using original and spatial corrected covariables (p < 0.05). TOF information extracted from aerial photos in the study site of Costa Rica.
152
4.40 Goodness-of-fit statistics for the best models according to Score Test ?2. TOF information extracted from 23-color aerial photos and using original covariables.
152
4.41 Analysis of maximum likelihood estimates for final model for TOF presence estimation using data from 23 color aerial photos and original covariables.
154
4.42 Statistical significance of TOF models’ parameters according to covariables spatial accuracy. ? Statistically significant at p < 0.05. ns: no statistically significant.
155
4.43 Summary of models fitting information according to the covariables spatial accuracy.
155
xvi
List of Figures 2. State-of-the-art: concepts and methods 2.1.
Trees outside forest in Central American landscapes. a). TOF associated with coffee plantations. b) Trees in line associated with pastures for cattle grazing. c) Trees in line along paved roads. d) Single trees and small woodlots in a human settlement. Photos: D.Morales and J. Morales.
9
3. Materials and methods 3.1.
Study areas in Costa Rica and Honduras. The panchromatic images correspond to an IRS-1D. The red square is the area covered by a set of 23 color aerial photos (scale 1:40,0000) in the study site located in Costa Rica
42
3.2.
Flowchart of the methodological step applied in this research
44
3.3.
Class hierarchy and membership function used in the TOF classification algorithm.
49
3.4.
Flowchart of the geospatial database prepared for covariables.
54
3.5.
Sampling design applied for gathering TOF presence and absence observations. TOF classified on an IRS-1D image bases in Honduras.
61
3.6
Summary of the spatial logistic models fitted in this research. CR= study site of Costa Rica. HND= study site of Honduras. The information on the left corresponds to the models fitted and the lines indicates the data sources combined to reach each aim, which appear on the right side of the figure. Area in hectares. AP= aerial photography.
69
4. Results 4.1. TOF segmentation and classification on aerial photos.
4.1
Segmentation resolutions used to generate objects for classification purposes on 23 mosaicked color aerial photos in the study site of Costa Rica.
71
xvii
4.2
Texture images estimated. a) Angular Second Moment, b) Homogeneity with 3x3 window, c) Homogeneity with 7x7 window, d) Homogeneity with 7x7 window. See text for further details.
72
Land cover classification map in the study site of Costa Rica. Classification based on 23 color aerial photos, scale 1:40,000 and spatial resolution of 3m.
73
Difference in objects classification between aerial photos and IRS 1-D image.
76
4.5
Distribution of the TOF-land percentage from aerial photos and an IRS 1-D image in the study site of Costa Rica.
77
4.6
Distribution of the total TOF-land area extracted from aerial photos and an IRS 1-D image in the study site of Costa Rica.
78
Comparison between the percentages of TOF objects extracted from aerial photos and an IRS 1-D image.
79
4.3
4.4
4.7
4.2
4.8
Spatial modeling of TOF distribution classified on aerial photos in Costa Rica
TOF density map and spatial arrangement of some covariables in the study site of Costa Rica. Estimations based on TOF classified on 23 aerial photos
81
Empirical semi-variograms in North-South (a) and East-West directions for TOF density (number of TOF objects/Km2) in the study site of Costa Rica. TOF extracted from color aerial photos.
82
4.10 Spatial displacement between reference observations on the aerial photos and different features digitized on topographic maps (1:50,000) in the study site of Costa Rica
86
4.11 Smoothed scatter plots for each covariable determining the TOF presence classified on aerial photos in the study site of Costa Rica
89
4.12 Variation in the area under ROC curve (c) as the number of covariables in the logistic model for TOF presence estimation increases in the study site of Costa Rica. TOF information extracted from 23 color aerial photos.
91
4.9
xviii
4.13 Percentage of gain in c statistic when one additional covariable is added to the logistic model of TOF presence estimation in the study site of Costa Rica. TOF information extracted from 23 color aerial photos.
92
4.14 Empirical semi-variogram of Pearson residuals for the final model for TOF presence estimation in the study site of Costa Rica. TOF information extracted from 23 color aerial photos.
97
4.15 Difference in Pearson residuals versus estimated probability for final model for TOF presence estimation in the study site of Costa Rica. TOF information extracted from on 23 color aerial photos.
98
4.16 Predicted TOF presence probability in the study site of Costa Rica integrating the model fitted for TOF presence estimation in a GIS. TOF information extracted form a set of 23 color aerial photos.
99
4.3
Spatial modeling of TOF distribution classified on an IRS 1-D image in Costa Rica
4.17 TOF density map and spatial arrangement of some covariables in the study site of Costa Rica. Estimations based on TOF classified on an IRS-1D.
101
4.18 Empirical semi-variograms in North-South (a) and East-West directions for TOF density (number of TOF objects/Km2) in the study site of Costa Rica. TOF information extracted from IRS-1D image.
103
4.19 Spatial displacement between reference observations on the IRS-1D panchromatic image and different features digitized on topographic maps (1:50,000) in the study site of Costa Rica.
106
4.20 Smoothed scatter plots for each covariable determining the TOF presence classified on IRS-1D image in the study site of Costa Rica.
109
4.21 Variation in the area under ROC curve (c) as the number covariates in the logistic model for TOF presence estimation increases in the study site of Costa Rica. TOF information extracted from an IRS-1D image.
111
4.22 Percentage of gain in c statistic when one additional covariable is added to the logistic model of TOF presence estimation in the study site of Costa Rica. TOF information extracted from an IRS-1D image.
112
4.23 Empirical semi-variogram of Pearson residuals for final model for TOF presence classified on IRS-1D in the study site of Costa Rica. TOF information extracted from an IRS-1D image.
117
xix
4.24 Difference in Pearson residuals versus estimated probability for final model for TOF presence classified in the study site of Costa Rica. TOF information extracted from an IRS-1D image. 4.4
118
Spatial modeling of TOF distribution classified on an IRS 1-D image in Honduras.
4.25 TOF density map and spatial arrangement of paved roads and human settlements in the study site of Honduras. Estimations based on TOF classified on an IRS-1D image.
120
4.26 Empirical semi-variograms in North-South (a) and East-West directions for TOF density (number of TOF objects/Km2) in the study site of Honduras. TOF information extracted from an IRS-1D image.
121
4.27 Spatial displacement, in highlands a) and lowlands b), of reference observations on the IRS-1D panchromatic image and different features digitized on topographic maps (1:50,000) in the study site of Honduras.
125
4.28 Smoothed scatter plots for each covariable determining the TOF presence classified on IRS-1D image in the study site of Honduras. See text for further details.
129
4.29 Variation in the area under ROC curve (c) as the number covariates in the logistic model for TOF presence estimation increases in the study site of Honduras. TOF classified on an IRS-1D image.
131
4.30 Percentage of gain in c statistic when one additional covariable is added to the logistic model of TOF presence estimation in the study site of Honduras. TOF information extracted from an IRS-1D image.
132
4.31 Empirical semi-variogram of Pearson residuals for final model for TOF presence estimation in the study site of Honduras. TOF information extracted from an IRS-1D image.
136
4.32 Difference in Pearson residuals versus estimated probability for final model for TOF presence estimation in the study site of Honduras. TOF information extracted from an IRS-1D image.
137
4.5 Effect of TOF spatial resolution, covariables scale, and spatial accuracy on TOF distribution
4.33 Variation in the area under ROC curve (c) as the number covariables in the logistic model for TOF presence estimation increases. TOF classified on IRS-1D covering the same area as the aerial photos in the study site of Costa Rica.
142
xx
4.34 Percentage of gain in c statistic when one additional covariable is added to the logistic model of TOF presence estimation. TOF classified on IRS-1D covering the same area as the aerial photos.
142
4.35 Effect of a systematic grid size increases on the goodness-of fit statistics of TOF spatial logistic model.
148
4.36 Effect of a systematic grid size increases on the goodness-of fit statistics of TOF spatial logistic model in the study of Costa Rica.
150
4.37 Variation of area under ROC (c) curve as the number covariates increases in the logistic model for TOF presence using data from 23color aerial photos and original covariables.
153
xxi
List of Appendices 1
Example of SAS program for managing databases, univariable analysis and final models.
200
2
Spatial distribution of significant covariables determining TOF presence classified on IRS pan 5.8m spatial resolution image in the study site of Honduras.
209
Spatial distribution of significant covariables determining TOF presence classified on IRS pan 5.8m spatial resolution image in the study site of Costa Rica.
210
Spatial distribution of significant covariables determining TOF presence classified on 23 color aerial photos 3m spatial resolution in the study site of Costa Rica.
211
Representation of the three models of best goodness-of-fit for each number of covariables entering the logistic model for TOF presence estimation in the study site of Costa Rica. TOF classified on 23-color aerial photos. STAM= Score test approximation to Mallow’s Cq.
212
Representation of the three models of best goodness of fit for each number of covariables entering the logistic model for TOF presence estimation in the study site of Costa Rica. TOF classified on IRS-1D image.
213
Representation of the three models of best goodness of fit for each number of covariables entering the logistic model for TOF presence estimation in the study site of Honduras. TOF classified on IRS-1D image.
214
Representation of the best models of best goodness of fit for each number of covariables entering the logistic model for TOF presence classified on 23-color aerial photos and using original covariables.
215
Pearson residuals for covariables entered in final model for TOF spatial prediction and classified on 23 aerial photos 3m spatial resolution the study site of Costa Rica.
216
Pearson residuals for covariables entered in final model for TOF spatial prediction and classified on IRS pan 5.8m spatial resolution image the study site of Costa Rica.
217
3
4
5
6
7
8
9
10
xxii
11
12
Pearson residuals for covariables entered in final model for TOF spatial prediction and classified on IRS pan 5.8m spatial resolution image the study site of Honduras.
218
Zusammenfassung
219
xxiii
List of abbreviations, acronyms, and symbols c
Area under the ROC curve.
C
Chi Square corresponding to the HL test.
%C
Percentage of concordant pairs.
CATIE
Tropical Agricultural Research and Higher Education Center
CV
Coefficient of variation.
%D
Percentage of discordant pairs.
DEM
Digital elevation model.
Df
Degrees of freedom.
e
Exponentiation.
FAO
Food and Agricultural Organization of the United Nations
GIS
Geographic Information System.
HL
Hosmer-Lemeshow goodness-of-fit test.
IRS-1D
Indian Remote Sensing Satellite. Panchromatic, 5.8m spatial resolution.
IVFL
Institute of Surveying, Remote Sensing and Land Information, University of Agricultural Sciences, Vienna, Austria.
KIA
Kappa Index of Agreement
n
Number of observations.
OLS
Ordinary least squares.
ROC
Receiver operating characteristic curve.
STAM
Score test approximation to Mallow’s Cq.
STB
Standardized estimation of the parameter.
%T
Percentage of tie pairs.
TOF
Trees outside forest.
?2
Chi-Square test.
xxiv
Introduction
1. Introduction 1.1. Forestry in Central America
According to the last forest cover assessment, the total forest area in Central America was 17,400,000ha in year 2000, which corresponds to thirty-four percent of the Central American territory (FAO 2001). Although the sustainable management of forest has been established as a priority by the governments of the different countries, the loss of forest cover remains a major concern in the region (Rodríguez 1998). During the last decade, Central America has experienced one the highest negative rates of forest area change in the world with a deforestation rate of 340,000ha per year (FAO 2001).
Similar
deforestation rates were reported during the decade of the 1980’s by de Camino and McKenzie (1988). This deforestation is mainly attributed to the expansion of crop land and pasture (Geist and Lambin 2001), but other associated factors such as poverty, inequitable lands distribution, population growth, economic incentives, and law contradictions, have also promoted the reduction of forest cover (Rodríguez 1998, Geist and Lambin 2001). This process of loss of forest cover has produced an increase of forest fragmentation. For example in Costa Rica, Sanchez et al. (2001) report that during 1986 to 1991 forest fragmentation increased with the creation of 524 new islands with an area 0.03-0.5-km2 and fifteen new islands greater than 5km 2. The increasing forest cover loss and its fragmentation on one hand, the need to conserve remnants of representative forest ecosystems and the increasing demand of forest products in developing countries (Mateo 1998, Rodríguez 1998, Salas 1998) on the other hand, make the development of innovative sustainable management tools imperative for other less studied tropical forest resources such as trees outside forest.
1
Introduction
1.2. The role of trees outside forest in developing countries TOF, which comprise all trees excluded from the definition of forest and other wooded land, are embedded in a landscape matrix composed by different land-uses. These trees can grow in meadows, associated with crops pastures, along rivers, canals or roadsides, in towns, gardens, and parks. The area occupied by TOF, the species composition, the volume, as well as specific characteristics such as the geometry and spatial distribution, can change depending of the biophysical, socioeconomic and political characteristics of a particular landscape. Currently it is recognized that TOF embrace not only many ecological functions, such as conservation of biodiversity, erosion control, and carbon sequestration, but also economic functions, such as provision of firewood, fodder, fence posts, and living fence posts (Estrada et al. 1993, Schroeder 1994, Current et al. 1995, Burel 1996, Harvey et al. 1999).
In the socioeconomic context, fuelwood is of particular interest, because it
remains the first source of energy in developing countries, representing ca. 81% of the total wood harvest (FAO 1999). Although there is no precise data about the contribution of fuelwood from TOF as a proportion of total fuel consumption, it is clear that agroforestry systems in developing countries provide a large part of this resource (Current et al. 1995, Rodríguez 1998, FAO 1999).
1.3. Research experience and information requirements in Central America The research efforts on TOF have been concentrated mainly at local scales. Only in recent years have TOF at landscape and regional scales captured the attention of the research community in the Neotropics (Kleinn 2000a). The need for information on TOF in Central American countries was recognized and a three-year project was implemented whose aim was the development of a method for the assessment and mapping of TOF in this region (Kleinn 2000b). In the “Tree Resources Outside the Forest” (TROF) project three institutions from Central America participated
2
Introduction
as well as three other partners from Europe (Kleinn and Morales 2001a). The project, funded by the European Commission (Project number ERB3514PL973202), established field sites in Costa Rica, Honduras, and Guatemala. The project developed a proposal for a classification of TOF, analyzed the potential of satellite imagery for TOF assessment, developed a sampling design, estimated the above-ground biomass, developed a structure of a natural resource information system, and recommended an assessment system (Kleinn and Morales 2001a). The assessment of any natural resource is only the first step toward its sustainable management (FAO 2001). Reliable information on TOF presence, spatial distribution, type, quality, and on temporal changes is also needed, particularly for larger areas such as provinces, countries or regions (FAO 2001, Kleinn and Morales 2001a,). In this context, the first goal of the present research is to contribute methodological tools for assessment of TOF by means of aerial photography. Sustainable forest management requires reliable data for its implementation. Due to the suitable geometric characteristics of aerial photography, it is possible to estimate TOF quantities and their spatial distribution with high degree of precision. This sort of data can complement available natural resources databases; and it can provide the input information for the estimation of tree cover and spatial analyses. The second goal of this research is to provide information leading to the understanding of TOF spatial distribution and the forces that drive it in rural landscapes of Central America by means of statistical models. Models link data and theory through a set of formal equations that represent the key relationships underlying the spatial distribution of TOF. The models permit the analysis of the driving forces that determine the location of TOF according to, for instance, slope classes, soil type, and infrastructure. Furthermore, on the basis of spatial models, it is possible to generate possible future scenarios of TOF distribution, allowing the examination of the impacts of policy initiatives on TOF distribution, for example in the design of biological corridors. The results of this research will provide new planning tools and relevant information that will contribute to sustainable TOF management in Central American countries.
3
Introduction
1.4. Starting point of this study This present research was conceived and developed within the framework of the TROFProject described above. Most of the raw data, which was provided for this research initiative, as well as some results obtained in the TROF-Project were here used with the consent of the authors.
1.5. Research objectives In order to help bridge the current research gap that exists, this dissertation has the following objectives: • To develop a standardized procedure for extracting TOF information from scanned color aerial photography in a landscape of north-western Costa Rica. • To assess and compare the effect of biophysical and spatial factors on the spatial distribution of TOF extracted from an IRS-1D image (5.8m spatial resolution) in a landscape of central-western Honduras and in the landscape of Costa Rica. • To assess and compare the effect of biophysical and spatial factors on the spatial distribution of TOF extracted from scanned aerial photos (3m spatial resolution) in the landscape of Costa Rica. • To evaluate the effect of the spatial resolution at which TOF information is extracted on biophysical and spatial factors affecting the spatial distribution of the resource. • To evaluate the effect of TOF absence event and biophysical and spatial factors scale, as well as the spatial precision of the latter information on logistic spatial models for TOF presence prediction.
1.6. Organization of the thesis The content of each chapter is briefly summarized as follows: Chapter 2, State-of-the-art: concepts and applications. This chapter focuses on the review of the current knowledge regarding TOF concept, classification, and research in the Neotropics. Based on TOF as an integration concept, biophysical and socioeconomic
4
Introduction
functions are highlighted.
The conceptual framework and fundamentals of spatial
modeling in the TOF context as well as the role of scale, remote sensing and GIS are reviewed. The final section focuses on the fundamentals of image classification using an object-driven approach. The advantages of this method as well as the multi-resolution segmentation and classification used in this research are developed. Chapter 3, Materials and methods. In this chapter, the study areas and the thematic data sources are described. The methodological steps towards an automatic classification of TOF on aerial photos using an object-oriented approach are presented. It is expected that this algorithm is one of the few available for TOF automatic detection on remote sensing imagery. In the next subchapters, the definitions of the components of a spatial model are addressed. The definitions of the response variable, the covariables used in the analysis, as well as the sampling design are described in detail. A detailed model building strategy for spatial logistic models is presented. It is expected that such methodological procedures contribute to improve the quality and precision of this type of model in future investigations. Then the methodology is described to evaluate the effect of TOF spatial resolution, covariables scales, and spatial accuracy on TOF distribution. Chapter 4, Results. This chapter summarizes the key results of the research. The results concerning the TOF segmentation on aerial photos, and the comparison of TOFland detected on remote sensing imagery of coarser spatial resolution are presented. The spatial distribution of TOF according to a set of biophysical and spatial covariables, and the spatial modeling of TOF in the two study areas are also documented. The most important factors affecting TOF in each study site and at different spatial resolutions are discussed.
Furthermore, a comparison between the factors affecting TOF between
countries is presented. Finally, the results of a special case study regarding the effect TOF absence and covariables scale and the spatial accuracy are documented. Chapter 5 and 6 Discussion and Conclusions.
In Chapter 5 (Discussion), the
implications of the results are highlighted. First, the results of the TOF segmentation and classification are discussed. The next section is organized in terms of research questions,
5
Introduction
derived from the objectives that drive the research. In light of the research objectives, the following questions are answered: • What are the most important factors affecting TOF spatial distribution? • What is the effect of the study site conditions on factors affecting TOF spatial distribution? • What is the effect of the spatial resolution of the response variable on factors affecting TOF? • What is the effect of the TOF absence event and covariables scale on factors affecting TOF and logistic model statistics? • What is the effect of the covariables spatial accuracy on factors affecting TOF spatial distribution and logistic model statistics? The last two sections of Chapter 5 deal with the applications of the models and note their restrictions and the limitations of the study in general. Chapter 6 summarizes the main conclusions of the research and draws implications for future research efforts.
6
State-of-the-art: concepts and applications
2
State-of-the-art: concepts and applications
2.1
TOF: concepts, characteristics and research
2.1.1 Definitions TOF include tree resources ranging from single to systematically managed trees in agroforestry systems. According to FAO (1998) TOF are considered "trees on land not defined as forest and other wooded land".
As Kleinn (2000a,b) points out, TOF
definition obviously depends on the "forest" definition used, which can be different depending on the objective of the study (e.g. large-area inventory) or can change according to a particular national forest law. Under the official forest definition of FAO, the tree crown cover must be more than 10%, and the area must be larger than 0.5ha (Table 2.1). In Costa Rica and Honduras, where this research is developed, the national forest laws do not include a definition of TOF. By analyzing the forest definition, it is possible to define what could be considered as TOF by default. In the Honduran forest law and its regulations, a minimum area for the forest definition is not included (Table 2.1). In the Costa Rican forest law, the minimum area of a forest is 2ha (Table 2.1). Therefore, those resources can be included in the definition of TOF. A recent research initiative in Central America (TROF-Project) defined TOF as all trees outside the legal forest borders with a height > 5m and a diameter at the breast height (dbh) > 10cm, comprising an area < 2 ha, having a crown cover < 20%. TOF include agricultural land, agroforestry plantations of fruit trees and rubberwood (CATIE et al. 1999). Figure 2.1 depicts some examples of TOF in Central American countries. The definition adopted in this research is based on the analysis of the Costa Rican forest law. However, for classification purposes additional criteria complement the definition, as explained in chapter 3.
7
State-of-the-art: concepts and applications
Table 2.1. Official forest definitions given by FAO, and national forest laws of Costa Rica and Honduras.
FAO
Costa Rica
Honduras
Land with tree crown cover (or equivalent stocking level) of more than 10% and area of more that 0.5ha. Trees should be able to reach a minimum height of 5 meters (m) at maturity (FAO 1998).
Native ecosystem, intervened or pristine, regenerated by natural succession or other forestry techniques, that occupies an area >2ha, characterized by the presence of nature trees of several edges, species, with one or more strata covering more than 70% of that area and with a number of trees per ha >70 of 15 diameter at breast height (Asamblea Legistiva 1996)
Land covered by a vegetal association dominated by trees or shrubs of any size, that are capable of producing wood, firewood or other forest products, of having influence on the climate, soil or provide shelter to cattle and wildlife (Gobierno de Honduras 1984).
2.1.2 TOF classification A classification is needed not only for a better understanding of the structure and composition of the resource, but also for the evaluation of the resource, the comparison among different TOF surveys and its presentation in maps (Kleinn 2000a). TOF can be classified according to the land use where they are found and the geometry of the resource and other criteria (Kleinn 2000a).
A starting point to define a
classification system is the development of a list of attributes that characterize the tree resources in terms of their physical and functional attributes, as well as the land where they are found (Kleinn 2000a). Morales (2001), proposes to classify TOF according to two major criteria: to their position, or considering TOF as individual elements. The classification of TOF according to their position implies their relationships with the landuse on which they are observed (Table 2.2).
8
State-of-the-art: concepts and applications
a)
c)
b)
d)
Figure 2.1. Trees outside forest in Central American landscapes. a). TOF associated with coffee plantations. b) Trees in line associated with pastures for cattle grazing. c) Trees in line along paved roads. d) Single trees and small woodlots in a human settlement. Photos: D.Morales and J. Morales.
9
State-of-the-art: concepts and applications
Table 2.2. Trees outside forest classification scheme (Morales 2001).
According to their relative position on the landscape Land-use Human settlements Cattle grazing Agriculture
Areas where TOF are observed Urban Semi-urban Rural Pastures Annual crops Permanent crops
TOF class Trees on human settlements Trees associated with pastures Trees associated with annual crops Trees associated with permanent crops Fruit plantation
Savannas
According to individual characteristics Spatial arrangement Scattered In lines In stripes Systematic In groups Scattered Systematic In lines In stripes In groups Scattered Systematic In lines In stripes In groups
Nevertheless, the application of the classification presented in Table 2.2, or any other, will depend on the spatial and spectral resolutions of the remote sensing image applied for its detection, as well as the objective of the survey.
2.1.3 Environmental and socioeconomic functions Using the classification given in Table 2.2, a summary of the environmental and socioeconomic functions of TOF can be summarized (Table 2.3). In the case of trees in human settlements, tress associated with permanent crops, tress associated with annual crops, and trees in fencerows the role played in the small farmers’ economies in Central American countries has been well documented (Current et al. 1995). These systems provide the farm economy with many goods such as timber production, fuel, fodder, green manure, food, poles, posts, fruit, shade (Beer et al. 1987, Nair 1993, Current et al. 1995, Harvey et al. 1999). The resource also play an important role in the carbon fixation (Schroeder 1994, Lopez et al. 1999), control of soil erosion (Pellek 1992), and to improve rainwater infiltration and retention (Nair 1993, Table 2.3).
10
State-of-the-art: concepts and applications
♦
♦
Trees in gallery forests Trees in savannahs
Woody agricultural crops
♦
Trees associated with grasses
Tress associated with annual crops
♦
Trees in life fences
Tress associated with permanent crops
Trees in human settlements
Table 2.3. Summary of the most important environmental and socioeconomic functions of TOF at local and landscapes scales.
Biological diversity Ø Habitat for plant and animal populations Ø Conservation of rare species Ø Movement for wideranging species Ø Dispersal between isolated populations Ø Maintenance of ecological processes
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
Water resources Ø Surface drainage patterns Ø Ground water accession Ø Flood mitigation and control Ø Sedimentation Ø Water quality Ø Nutrient levels
♦ ♦
♦ ♦
♦
♦
♦
♦
♦
♦
♦
Timber production, agriculture and some environmental services Ø Soil erosion Ø Windbreaks for crops, pasture, and livestock Ø Timber production Ø Firewood Ø Fruits Ø Other non timber products Ø CO2 fixation Ø Wildlife observation Ø Landscape aesthetics
♦
♦ ♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
♦
Community cohesion Ø Political division Ø Administrative division Ø Property boundaries
♦
♦
♦
♦
♦
♦
Climate change Ø Habitat for species with limited dispersal ability Ø Pathway for redistribution of populations
♦
♦
♦
♦
♦
11
State-of-the-art: concepts and applications
Trees in pastures and trees associated with annual or perennial crops play an important role in the conservation both of plant and animal species. In this sense Estrada et al. (1993) state that scattered trees are very important for migratory and resident birds in terms of nesting sites, feeding and resting places. Another important TOF types are fencerows (trees in line) defined as rows of trees enclosing or separating fields (Merriam 1981, cited by Burel 1996). The fencerows play a role in physical flows such as wind or water, and on biological flows, being either a corridor or a barrier for individual movements (Burel 1996, Rosenberg et al. 1997, Bennett 1999). Other TOF types such as for example small woodlots also play important biophysical and socioeconomic roles (Turner 1996, Laurence et al. 1998, Table 2.3).
TOF can also be observed in the
landscape near rivers or as fragments in gallery forests. Tropical gallery forests play an important role in the conservation of the biodiversity through the landscape (Kellman et al. 1998). Studies related to the role of these ecosystems in water quality conservation, hydrological regulation, control of soil erosion, carbon pool, and connectivity in the landscape have also been developed (Rosenberg et al. 1997, Kellman et al. 1998, Bennett 1999, Delitti and Burger 2000). Other ecological and socioeconomic functions of these resources are summarized in Table 2.3. It seems clear that studies that integrate socioeconomic and biophysical elements related to TOF in a specific landscape have not been published yet and are needed. Rather, according to Mercer and Miller (1998) there has been a predominance of studies at local scales (i.e. on farm research). One of the major advances is the development of methods for integrating socioeconomic and biophysical approaches, but it is necessary to do more research of wide applicability in the socioeconomic area (Nair 1998).
One priority
research area is to analyze the impacts of alternative policies (at local, landscape, regional, and national levels) on the potential agroforestry-based rural development initiatives (Nair 1998, Mercer and Miller 1998). This priority can be extended to TOF information requirements, if the efforts are integrated.
12
State-of-the-art: concepts and applications
2.1.4 Assessment of TOF on a large-scale basis De Gier et al. (2001) modeled the aboveground woody biomass assessment (fresh weight, volume, dry weight) of TOF using field data collected in three Central American countries (30-45 trees per study site). They also fit model including a combination of data sets, so that models applicable to one study site, to two or even to the three countries. The data includes different TOF types such as trees in line, groups of trees, and gallery trees. The developed equations, which showed an explained variation between 32-85% depending on the site, were used for subsequent spatial analysis. In this sense, biomass continuous layers were produced using the field data. In order to fit a regression model that predict the biomass, a set of biophysical (slope, aspect, altitude) and spatial covariables (distance to nearest human settlement, distance to nearest road, distance to nearest river) were used. These relationships were established in five study sites located in Costa Rica and for data from two study sites in Honduras. The set of explanatory variables used only accounts for 15% of the total variation on TOF biomass. Kleinn and Morales (2001b) developed a sampling strategy for TOF in Central American countries. The large area inventory mainly utilizes three different sorts of data sources: satellite imagery, aerial photos, and field measurements. The sampling strategy developed allows the estimation of target variables like number of species/floristic diversity, volume/biomass/carbon, basal area, etc. At the same time, area estimates can be made, estimating the area of different types of TOF land and of TOF configurations. Schneider et al. (2001) developed automatic algorithms in order to classify TOF on satellite imagery. The procedure shows acceptable results for classifying TOF combining Landsat ETM+ image (30m) and IRS panchromatic images (5.8m, Koukal and Schneider (2001). They also tested the algorithm for an IKONOS panchromatic (1m), covering a relative small area in the pacific region of Costa Rica. This data is used in the present research; details on the methodology used for these authors are given in chapter 3 of this research.
13
State-of-the-art: concepts and applications
2.2. Spatial modeling: concepts, methods, and applications 2.2.1 Spatial explicit models, remote sensing, and GIS A spatial model is a model, understood as a simplified representation of a feature of research with the aim of description, explanation, forecasting, projection or planning; in a bispace that considers the space and the feature attribute (Long and McMillen 1987, Wegener 2000). Spatial, statistical models are born from the combination of remote sensing, GIS, and multivariate, multitemporal mathematical models (Baker 1989, Sklar and Costanza 1991, Lambin 1994). Their emphasis is on the spatial distribution of landscape elements and/or on changes in landscape patterns. This sort of models differs from classical models in that the observations under analysis are not independent (Griffith 1996). One of the goals of these models is the projection and display in a cartographic form of future landscape patterns that would result from the continuation of current land management practices or the lack thereof (Lambin 1994). From these definitions, it seems imperative to differentiate between forecast and projection. The former concept refers to the generation of a scenario of the response variable based on a set of arbitrary assumptions related to the magnitude, direction and rate of change of the underlying driven forces of the feature modeled. Rather, forecast or prediction assumes that the assumptions made on the model truly represent the future events (Long and McMillen 1987).
The prediction produces the likely values of
unobserved events, not necessarily those in the future (McCullagh and Nelder 1989). Remote sensing researchers, technology producers, ecologists, forest and land managers agree in the potential role of remote sensing as an information resource to support sustainable forest management.
This potential is based largely on the unique
characteristics that remote sensing data provides: synoptic, repetitive, quantitative, and spatial explicit capabilities (Franklin 2001).
14
State-of-the-art: concepts and applications
On the other hand, the application of GIS technology in forest management and other fields of natural resource management has increased over time (Franklin 2001, Longley et al. 2001). This technology has allowed management and integration of an important quantity of spatial and temporal information (Franklin 2001, Longley et al. 2001). Thus, as is stated by Franklin (2001) enormous collections of empirical observations, but this creates the need for better, more powerful tools to help make sense of these data. Models represent one such powerful tool”. At the same time GIS provides the input information to model and the appropriate platform to run them (e.g. Johnston 1987, Nisbet and Botkin 1993, Fotheringham and Wegener 2000). In recent years, there has been an increasing interest in providing integration tools in the area of remote sensing, GIS and spatial models (Fortheringham and Wegener 2000, Franklin 2001). However, this integration has showed different degrees of development from external models, where the integration with the GIS is carried out by means of ASCII or binary files, to the full integration, where the linkage operates like a homogeneous system (Wegener 2000). Independently of the development degree of integration the combination of the remote sensing, GIS and statistical models have allowed the development of applications in solving forest management problems at local scales (Murray and Snyder 2000) as well as landscape and coarser scales (Franklin 2001).
In this sense, the development and
empirical applications to forest management problems have shown important contributions to its sustainable management. In the next sections, the technical details of the logistic regression model are discussed as well as some of the applications in forest management and land management including those that integrate remote sensing and GIS.
15
State-of-the-art: concepts and applications
2.2.2 Logistic regression in the TOF modeling context
Specification and theoretical background
The logistic multiple regression is designed to estimate the parameters of a univariate or a multivariate explanatory model in situations in which the response variable is categorical and has two possible values only, 0 or 1 (absence or presence of TOF, respectively), and the predictors or covariables are continuous, categorical or a combination of both (Ryan 1997, Agresti 1990). The function describes a model that mathematically follows S-curve configuration. The logistic model has been widely used in biology to model the stabilization in the growth of populations in a resource-limited environment (Lambin 1994). Its theoretical foundations used in modeling deforestation processes (Lambin 1994) can be used to justify its application in TOF modeling, as follows. TOF presence in a landscape increases as TOF area does.
This area increment is
attributed to forest conversion or conversion to other land uses (agronomic change to pasture, for instance). Socioeconomic mechanisms of regulation, such as deforestation, fragmentation, and governmental financial incentives, among others will increase TOF establishment rates. Nevertheless, these mechanisms of regulation will force, due to limited or competing resources (e.g. land) the TOF establishment rate to slow down, and eventually stop. This theoretical flow follows the expected S-shape pattern of a logistic regression model. There are other statistical methods to model presence/absence data such as artificial neural networks and discriminant analysis. In species distribution studies, such recourse techniques have not shown superiority over logistic regression models in predicting presence or absence events (Manel et al. 1999).
Logistic regression technique has
16
State-of-the-art: concepts and applications
demonstrated to be a robust and flexible technique in empirical forest management and land-use applications (Lambin 1994), as discussed later in this chapter.
Components and interpretation
As a class of generalized linear model (McCullagh and Nelder 1989), three parts compose the logistic model: the random component, the systematic component, and the link function (McCullagh and Nelder 1989, Agresti 1990). The first component refers to the response variable, which has independent normal distributions. The systematic component refers to the covaribles or explanatory variables, while the link function relates the linear predictor to the expected value of the response variable. The random part of the model is assumed independent and constant variance of error (McCullagh and Nelder 1989, Agresti 1990). The specific form of the linear logistic regression model is:
p( x) =
eß
0
+ ß1 X
1+ eß
0
+ ß1 X
[2.1]
where p(x) = E (Y| x), which means the expected value of Y (the binomial response variable, the random component) given certain value of the covariable x (i.e. systematic component), e denotes exponentiation, ß0 is the intercept, ß1 the parameter estimated by maximum likelihood method. Note that p = P (Y=1) and 1- p = P (Y = 0). A transformation of p(x) is the logit transformation g(x), also known as the logit, which is defined for a set of n covariables as: π (x) g ( x) = log 1 − π ( x)
17
State-of-the-art: concepts and applications
π ( x) log = β 0 + β 1 x1 + β 2 x 2 + ...β n x n 1 − π ( x)
[2.2]
If equation [2.2] is solved for π (x) then equation [2.3] is obtained and the respective probabilities of an event can be estimated (Allison 1999).
π ( x) =
e (α + β 1 x i1 + β 2 x i2 + ...β n x in ) 1 + e (α + β 1 x i1 + β 2 x i2 + ...β n x in )
[2.3]
Odds ratio (odds) is useful for interpreting the logistic model. The odds ratio is a measure of association that approximates how much more likely (or unlikely) it is for the response variable to be present for a set of values of covariables (Hosmer and Lemeshow 2000) and is given by: odds = e ß1
[2.4]
In other words, odds ratio is the ratio of the expected number of times that the TOF presence event will occur to the expected number of times it will not occur (Allison 1999). There is a simple relationship between probabilities and odds ratio, which is given by equation [2.5] (Allison 1999).
odds =
π ( x) 1 − π (x)
[2.5]
The interpretation of model parameters is different depending on whether the covariables are dichotomous or continuous scaled (Hosmer and Lemeshow 2000). In the case of a dichotomous covariable when x could be 1 or 0, the odds approximates how likely (or unlike) is the response variable to be present among those with x=1 than those with x=0. In the other case, continuous covariables, the interpretation of the estimated parameter
18
State-of-the-art: concepts and applications
depends on how it was entered into the model and the particular unit of each covariable (Hosmer and Lemeshow 2000). In this sense, given equation [2.2], it follows that the slope parameter ß1 gives the chance in the log odds for an increase of one unit in the covariable x (Hosmer and Lemeshow 2000). The method used in the logistic regression model to estimate the parameters is called maximum likelihood (McCullagh and Nelder 1989). This method seeks to maximize the probability that the estimated parameters (also called maximum likelihood estimators or logit parameters) agree closely with the observed data (Hosmer and Lemeshow 2000). The estimation is an iterative algorithm that begins with an initial arbitrary estimation of what the parameters should be; the function determines the direction and magnitude change in the logit parameters (logit is the natural log of an odd ratio) that will increase the log likelihood. Using this initial function, the residuals (deviance) are tested and estimated again with an improved function. The process is repeated until a convergence criterion is met, or what is the same, until the log likelihood does not change significantly (McCullagh and Nelder 1989, Hosmer and Lemeshow 2000). The logistic model allows analysis of binomially distributed data without requiring the large number of observations necessary for normal distribution approximation (Bergerud 1996). Nevertheless, the set of test to evaluate the adequacy of the model requires large enough sample sizes (this study) so that the observed statistics behave like normally distributed statistics. That means that if samples sizes used are large enough, and considering that the chi-square statistic is based on the normal distribution, the deviance and the Wald statistics can both be compared to the chi-square distribution to develop probability values for the observed statistics (Bergerud 1996).
Assumptions
Logistic regression model is less restrictive in terms of assumptions than OLS. However, the following assumptions still apply:
19
State-of-the-art: concepts and applications
• Covariables are not linear functions of each other. In OLS regression, this problem is known as multicollinearity (Kleinbaum et al. 1998). Such violation produces that as the magnitude of the correlation between covariables increase, the standard error of the logit (response variable) coefficients becomes inflated (Ryan 1997). • Linearity.
Logistic regression does not require linear relationships between the
covariables and the response variable as OLS regression does (Kleinbaum et al. 1998). However, it does assume a linear association between the logit of the covariables and the response variable (Hosmer and Lemeshow 2000). • Error terms are assumed independent. This violation occurs in the presence of spatially or temporal correlated data, as is discussed in section 2.2.4. • Expected dispersion. The discrepancies between the expected variance of the response variable and the observed can be under or overdispersed (Hosmer and Lemeshow 2000). In the presence of moderate discrepancies, the adjusted standard error must be used, producing wider confidence intervals.
If the discrepancies are large (i.e.
overdispersion), the maximum likelihood estimation does not exist and then the model should be respecified (Hosmer and Lemeshow 2000). • Large samples. Maximum likelihood estimation depends on large-sample asymptotic normality, meaning that reliability of the estimates increases when there are enough number of observations (Bergerud 1996, Hosmer and Lemeshow 2000).
Measurement of model’s goodness-of-fit and predictive power
In logistic regression, there are several statistics to measure, as in OLS regression, the observed and fitted values. Two of these measures, available in most statistical packages, are the Pearson ? 2 and the Deviance. The distribution of both statistics under the assumption of model is correct is supposed to be ?2 (Hosmer and Lemeshow 2000). The residual deviance, that according to the former authors could be used as a goodness-of-fit
20
State-of-the-art: concepts and applications
measure, is defined as twice the difference between the maximum achievable log likelihood and that attained under the fitted model (McCullagh and Nelder 1989). The statistics behave partially the same way as the residual sum of squares in ordinary linear models (McCullagh and Nelder 1989). These authors suggest that the deviance function is most useful for comparing two nested models and not as an absolute measure of goodness-of-fit. Within the available set of goodness-of-fit test measures analyzed by Hosmer and Lemeshow (2000) the Hosmer-Lemenshow test seems to be a consistent test for such purposes, though it is not reported frequently in the natural resources studies consulted. For the estimation of this statistic, the observations are divided into approximately ten groups of the same size based on the percentiles of the estimated probability. The differences between the observed and expected number of observations in these groups are summarized by the Pearson ?2 statistic, which is compared to a ?2 distribution with t (number of groups minus n) grades of freedom (Hosmer and Lemeshow 2000). The null hypothesis tested is that "there is no difference between the observed and modelpredicted values of the dependent covariable". If, for instance, the value of p is < 0.05 then the null hypothesis must be rejected. Another way to evaluate the goodness-of-fit of a logistic model is via a classification table, often used in discriminant analysis (Hosmer and Lemeshow 2000).
By means of
classification tables, the estimated probabilities are used to predict group membership and if the model predicts group membership accurately then this provides evidence that the model fits. However, as is demonstrated by Hosmer and Lemeshow (2000, p. 157) this is not always the case, arguing that the distances between observed and expected values could be unsystematic, and within the variation of the model. Classification is sensitive to the number of observations of the event groups and always favors classification into the larger group, independently whether the model fits. In this sense, Koutsias and Karteris (1998) point out, as one of the important pre-requirements to fit a logistic model, that the sample size for the presence and absence of the event to be modeled should be about the same in order to avoid bias in the final model. This argument is also supported by the research of Manel et al. (2001).
21
State-of-the-art: concepts and applications
Other measures of goodness-of-fit in logistic model are those related to the approximation to the determination coefficient used in OLS. However, these statistics (e.g. pseudo-R2) are no recommended and they should be used to evaluate competing models, in the model building stage (Hosmer and Lemeshow 2000). From the fundamentals presented above about the measurements of goodness-of-fit tests available for logistic modeling it is necessary to stress some important remarks. None of the goodness-of-fit tests considered yields by itself a reliable measure of goodness-of-fit of a logistic model. Rather, the available statistics should be combined during the model building stage (i.e. for selecting competing models) and for the selection of the final model. Furthermore, the final decision for a model should no be based only in the statistical significance of the goodness-of fit statistic, but should take into account the analysis of model’s residuals (McCullagh and Nelder 1989, Hosmer and Lemeshow 2000), as discussed later. The capability of the model to classify observations can be assessed by the area under the ROC curve (c). The c statistic ranges between 0 and 1, and provides a measure of the model's predictive ability (i.e. the ability of the model to discriminate between those observations that experience the outcome of interest (i.e. the TOF presence) versus those that do not). As a rule of thumb, values lower than 0.5 of this statistic indicate no discrimination (Hosmer and Lemeshow 2000). This statistic is estimated using the percentage of concordant observations (%C), the percentage of discordant observations (%D), and the percentage of tie observations (%T). The SAS Manual (SAS 2000) defines those concepts as follows: "Define an event response having Ordered Value of 1. A pair of observations with different responses is said to be concordant (discordant) if the observation with the response having the larger Ordered Value has the lower (higher) predicted event probability. If a pair of observations with different responses is neither concordant or discordant, it is a tie."
22
State-of-the-art: concepts and applications
Analysis of residuals and influential observations
As in ordinary linear models, standard plots are recommended to evaluate the performance of a model (McCullagh and Nelder 1989). As complement measure of goodness-of-fit, standardized deviance residuals are recommended, plotted against the estimated probability. Plotting the residuals against significant covariables is considered another relevant information in checking the adequacy of a logistic model (McCullagh and Nelder 1989). In all these plots, the null pattern is expected. Although this has been a standard step in model building strategies in linear regression, the checks for systematic departure from logistic model have not been explicitly reported in the most empirical application in ecology and natural resources management fields. A very useful statistic in detecting influence observations is one that examines the effect that deleting all events with a particular covariable pattern has on the value of the estimated coefficients and the likelihood ratio and deviance.
Large values of these
statistics allow the identification of those covariables that are poorly fit (Hosmer and Lemeshow 2000).
2.2.3 Spatial autocorrelation: diagnosis and consequences
Conceptual framework
As was mentioned above one of the important underlying assumptions in logistic models is the independence of the random component of the model (see section 2.2.2.3). Independence denotes that the attribute (e.g. location or time) of the observation Yi (i.e. TOF presence) is not influenced by the attribute of Yi+1…n. Autocorrelation can occur either in time series data or spatial data (Anselin 1992), but in this discussion, only the later attribute is considered. Spatial autocorrelation quantifies the degree to which near and more distant objects are interrelated (Anselin 1992). The concept is based on the First Law of Geography (also
23
State-of-the-art: concepts and applications
knows as Tobler’s law) that states: “everything is related to everything else, but near things are more related than distant things” (Longley et al. 2001). Spatial dependence is beneficial in recognizing scale because it simplifies the perception of spatial variation, identifies the scale of the underlying variation, and provides a link between spatial variation and the sampling frame (Cressie 1991, Haining 1990, Atkinson 1997, Meisel and Turner 1998, Wu and Qi 2000). However, when autocorrelated data is used to fit a logistic model, the parameter estimates have higher variances (i.e. lower precision) than if this term is ignored (Gumpertz et al. 2001).
Diagnosis
Two of the most common statistics for describing the spatial distribution of objects in a determined space are the Moran’s I and Geary’s C indices (Goodchild 1986, Anselin 1992). The equations [2.6] and [2.7] are used to calculates Geary’s C and Moran’s I indices, respectively. C = ? ? wijcij/ (2(? ? wij) (? (zi – zm)2 )/ (n –1) I = ? ? ? ? wijcij/(? ? ? ? wij) (? ? (zi – zm)2/n)
[2.6] [2.7]
Where ? ? wij in equation [2.6] and ? ? ? ? wij in equation [2.7] = 4 * n (the number adjacent cells in the grid). The interpretation of these indices is given according to the following criteria (Goodchild 1986):
24
State-of-the-art: concepts and applications
Geary C
Moran I
01
I ?2
Odds ratio
0.2481 -0.1551 -0.00135 -0.0111 -0.3707 0.7399 -0.4921 0.4405 -0.5143 -1.8405 -1.5240 0.6270
0.0377 0.0268 0.000339 0.00146 0.0891 0.1200 0.1173 0.1115 0.1184 0.2152 0.1170 0.0802