AN APPROACH BASED ON ENTROPY TO DISAGGREGATE AGRICULTURAL DATA AT A LOCAL LEVEL: THE CASE OF THE ALENTEJO REGION ANTÓNIO XAVIER CEFAGE-UE (Center For Advanced Studies in Management and Economics) e MEDITBIO. E-mail:
[email protected]
MARIA DE BELÉM COSTA FREITAS Professora Auxiliar c/Agregação, Universidade do Algarve, Faculdade de Ciências e Tecnologia. MeditBio. E-mail:
[email protected].
RUI FRAGOSO Professor Auxiliar c/Agregação Universidade de Évora e CEFAGE-UE (Center For Advanced Studies in Management and Economics). E-mail:
[email protected].
MARIA DO SOCORRO ROSÁRIO Direção de Serviços de Estatística, GPP (Gabinete de Planeamento e Políticas). E-mail:
[email protected].
Abstract The European’s Union agriculture faces complex challenges in the beginning of a new programming period and disaggregated information is crucial for policy analysis and evaluation. In the last years, there was an increase in the demand for tools to analyse the impact of agricultural policies on the territories at a more local level. Previous studies carried by the authors allowed obtaining disaggregated information of agricultural land uses at pixel level for the Algarve using a kilometric grid. However, no approach was developed for the Alentejo, since its dimension represents a challenge, namely at computational level: it implies estimating data for more than 30.000 disaggregated units, using the same kilometric grid. Therefore, the objective of this paper is to present an entropy approach to disaggregate agricultural data at a local level for the Alentejo Region. The approach provided satisfactory results since the estimated values revealed a good approximation to the true values. Keywords: data disaggregation, minimum cross entropy, land uses, Alentejo.
1. INTRODUCTION Nowadays, the European’s Union and Portuguese agriculture faces very complex challenges – territorial policies, at local scales imply disaggregated analysis with detailed spatial references (Xavier et al., 2016). However, data is limited in the several available statistical sources. The agricultural census is conducted every 10 years and it has information for the agrarian region, municipalities and parishes (INE, 2011). All the other publications, with a shorter time frame have a lower resolution. Several studies at international level were carried out by Kempen et al. (2005), You and Wood (2006), You et al. (2009) and Chakir (2009) using the combination of spatial datasets and econometric models to obtain disaggregated data. In Portugal, Xavier et al. (2016) presented a data disaggregation approach at pixel level for the Algarve Region. This 1
methodology seemed promising, but also had obstacles when considering bigger regions due to the very high computational burden. One particular region with these problems and where agricultural and livestock activities have still a great importance is the Alentejo (INE, 2011; Xavier et al., 2014). Thus, the objective of this work is to present a combined entropy approach to disaggregate agricultural data - temporary and permanent crop’s areas - at a local level for the Alentejo adapting the approach proposed by Xavier et al. (2016). The remainder of the paper is organised as follows: section two describes the selected approach; section three presents the empirical implementation; in section four, results are presented and the discussion is made. Finally, section five presents the main conclusions of this work.
2. THE METHODOLOGICAL APPROACH The methodological approach combines cluster analysis, iterative methods and entropy to disaggregate agricultural data at a local level. It has a sequential implementation, according to Xavier et al. (2016). The several steps are presented as follows: 1) Previous estimate, which implies: i) Application of a HJ-Biplot (Galindo, 2006) and cluster analysis to the main available land uses provided by the land use cartography for defining several homogenous groups for data disaggregation. ii) Implementation of the process regarding dasymetric mapping proposed by Gallego and Peedell (2001) which allows redistributing farm data by land use classes. iii) Improvement of the previous estimate using experts’ analysis depending on the available information. 2) Implementation of a cross entropy minimization process, which allows the disaggregation of the data with respect to the previous estimate, the consistency with the aggregate and assuring that the biophysical and historical restrictions are respected. 3.EMPIRICAL IMPLEMENTATION The study area to implement the proposed model was the Alentejo Region. The Alentejo region has about 27160 km2. It is a region where agriculture and livestock breeding have still a great importance (INE, 2011; Xavier et al., 2014; Xavier and Freitas, 2014) and reveal several specific dynamics due to the recent agricultural policies. Several areas of this region are also in demographic decline. 2
The main land use classes for implementing the HJ-Biplot method and the iterative process proposed by Gallego and Peedell (2001) result from the simplification of the COS 2007 in a reduced number of classes, attending the detail of the available information and the objectives of the study. For the transformation process in the HJ-Biplot method, we selected double centring for the analysis of the main land cover classes. The different homogenous groups of municipalities were defined using the Biplot coordinates to apply a hierarchal cluster analysis method. The Euclidean distances were used as a dissimilarity index and the Ward’s method was used as a linkage method. The groups of temporary crops considered were the following: Cereals (CER), Dried pulses (LEG), Temporary pastures and forages (PCF), Horticulture crops (Fresh vegetables, melons and strawberries in open field and a market gardening and under glass or protective cover) and potatoes (HORTEBAT). The groups of permanent crops are: fresh fruits (FRTFRES), citrus (CITR), nuts fruits (FTRCRIJ), olive trees (OLIV), vineyards (VIN) and other permanent crops (OCP). The entropy process guarantees that the data is disaggregated respecting the existent biophysical conditions and the general aggregate, which is known, providing the data compatibility between the several layers of information. For the error definition, as in previous studies (Xavier et al., 2016), the three-sigma rule and the error limits that provided better results were used. These variables were disaggregated to the pixel level, totalizing more than 30000 disaggregated units, which were then aggregated by pixel in the required situations. Validation was done using the measures proposed by Xavier et al. (2016), using Weighted Prescription Absolute Deviation (WPAD), Pearson correlation coefficient (R), R2 and focus was put in the Efficiency Indicator (You et al., 2009). The technical implementation is done using the General Algebraic Modelling System (GAMS).
4.RESULTS AND DISCUSSION 4.1. HJ-Biplot and cluster analysis
For the HJ-Biplot approach two axis were retained with 68.45% of the accumulated inertia. The HJ-Biplot representation and the spatial distributing of the groups of parishes are presented in the following figure. These are described as follows: Group 1- Parishes highly 3
oriented to temporary crops; Group 2 – Parishes oriented to temporary crops, where there is also a tendency for gaining relevance other uses; Group 3 - Parishes with mixed agricultural and forest uses; Group 4 - Parishes oriented to forest and shrubs areas; Group 5 - Parishes with mixed agricultural uses; Group 6 - Parishes with mixed uses oriented to heterogeneous agricultural areas; Group 7- Parishes oriented to permanent pastures and to heterogeneous agricultural areas; Group 8- Parishes highly oriented to forest and shrubs uses.
Figure 1–The HJ-Biplot representation and the spatial distribution of the clusters/groups (source: model results)
4.2. Final results-entropy approach The model allowed obtaining results at “pixel” level. The following maps (figure 2) present examples of the area results per disaggregated unit for better understating the spatial patterns. These results allow the identification of some of the major contrasts regarding the distribution of temporary and permanent crops. For instance, the major areas of vineyards’ concentration or olive trees’ concentration are clearly identified.
Figure 2- The final results at a disaggregated unit level (source: model results)
4
A WPAD of 54.31% for the temporary crops and 54.32% for the permanent crops was obtained (excluding the outliers with extreme results), which it is satisfactory when compared to Xavier et al. (2016), which obtained values in the several parishes of 86% for the temporary crops and of 38% for the permanent crops. The Pearson correlation coefficients and R2 were calculated following the validation process of You and Wood (2006) and You et al. (2009). You and Wood (2006) obtained correlation coefficients between 0.4 and 0.65. You et al. (2009) validated their model using 4 crops and obtained R2 values of 0.8 for one crop while the other presented values between 0.40 and 0.45. In our study (table 1), and in what concerns temporary crops, the best results are presented by cereals with a correlation coefficient of 0.816 and the R2 of 0.666 while the lowest value is observed in the dried pulses. All the other crops, present values in the thresholds obtained by You et al. (2009). For permanent crops the results are quite satisfactory when comparing with previous studies, the best results being presented by vineyards, followed by olive trees, both with a relevant correlation coefficient. The coefficient of determination R2 presents its worst values in nuts fruits and citrus, which have residual expression in this group. Table 1 - Pearson correlation coefficient, R2 and the modelling Efficiency (EF) indicator Crops
Pearson
R2
EF
Temporary crops: CER
0.816
0.666
0.657
HORTEBAT
0.726
0.527
0.51
OUT
0.636
0.404
0.364
PCF LEG
0.768 0.59 0.436 0.19 Permanent crops:
FTRCRIJ
0.414
0.171
0.171
VIN
0.942
0.887
0.831
OLIV
0.828
0.686
0.66
CITR
0.447
0.2
0.087
FRTFRES
0.746
0.557
0.543
0.83
0.688
0.688
OCP
0.527 -0.323
CER-Cereals, LEG- Dried Pulses, , PCF-Temporary pastures and forages, HORTEBAT -horticultural crops and potatoes, OUT-other temporary crops, FRTFRES-Fresh fruits, CITR-citrus, FTRCRIJ -nuts fruits, OLIV-olive trees, VIN-vineyards, OCP- other permanent crops. (source: model results)
Finally, the results were validated with the Efficiency Indicator (EF) proposed by You et al. (2009), which obtained values between 0.23 and 0.71, with 3 crops presenting values between 0.23 and 0.44. The study presented in this paper overcomes most of these thresholds 5
(table 1) being greater than 0.4 in most cases (exceptions in Other temporary Crops (OUT), Dried Pulses (LEG), Nuts Fruits (FTRCRIJ) and Citrus (CITR)). 5. CONCLUDING REMARKS The proposed methodology allowed obtaining disaggregated data for the Alentejo Agrarian Region at pixel level being robust and giving relevant information for policy analysis and evaluation. Information at local level allows official entities to define a more equitable policy that also considers the biophysical aspects of the territory. Also, it was proved that the approach proposed by Xavier et al. (2016) may be applied to areas which imply a greater complexity due to its size, and it’s our conviction that it may be easily applied to the other Portuguese agrarian regions overcoming statistical periodicity problems.
REFERENCES Chakir, R. (2009). Spatial downscaling of agricultural land use data: an econometric approach using cross– entropy. Land Economics85(2): 238–251. Fragoso, R., Martins, M.B., and Lucas, M.R. (2008). Generate disaggregated soil allocation data using a Minimum Cross Entropy Model. WSEAS Transaction on Environment and Development, 9(4), 756-766. Galindo, M. (1986). Una alternativa de representacionsimultanea: HJ-Biplot. Questio. 10(1), 13-23. Gallego. F.J. and Peedell S. (2001). “Using CORINE Land Cover to map population density. Towards Agrienvironmental indicators”. Topic report 6/2001 European Environment Agency, Copenhagen, pp. 92103. Golan, A., Judge, G. and Miller, D. (1996). Maximum Entropy Econometrics: Robust Estimation with Limited Data. NewYork, USA: John Wiley & Sons. INE-Instituto Nacional de Estatística (2011). Recenseamento geral da agricultura de 2009. Lisbon, Portugal, INE. Kempen, M., Heckelei, T., Britz, W., Leip, A., Koeble, R., &Marchi, G. (2005). Computation of a European Agricultural Land Use Map–Statistical Approach and Validation. Discussion Paper. Bonn: Institute for Food and Resource Economics. Martins, M. B., Fragoso, R., Xavier, A. (2011). Spatial disaggregation of agricultural data in Castelo de Vide, Alentejo, Portugal: an approach based on maximum entropy. J.P. Jounal of Biostatistics,5(1), 1-16. Martins, M.B., A.M. Xavier, R. Fragoso (2012), Redistributing agricultural data by a dasymetric mapping methodology. Agricultural and Resource Economics Review, 41/3, 351-366. You, L. and Wood, S. (2006). An entropy approach to spatial disaggregation of agricultural production. Agricultural Systems 90, 29–347. You, L., Wood, S. and Wood-Sichra, U. (2009). Generating plausible crop distribution maps for Sub-Saharan Africa using a spatially disaggregated data fusion and optimization approach. Agricultural Systems, 99 (23), 126-140. Xavier, A., Costa Freitas, M. B. & Fragoso, R. (2014) Disaggregation of Statistical Livestock Data Using the Entropy Approach. Advances in Operations Research, Volume 2014, Article ID 397675, 9 pages. Xavier, A., Costa Freitas, M. B. (2014). Recent dynamics and trends of Portuguese agriculture – a Biplot analysis. New Medit, 13 (4), 67-74. Xavier, A. Martins, M. B., Fragoso. R. (2011). A mininum cross entropy model to generate disaggregated data at the local level. 122nd EAAE Seminar "Evidence-based agricultural and rural policy making: Methodological and empirical challenges of policy evaluation", Ancona, 17-18 February, 2011. Xavier, A., Freitas, M.B., Socorro Rosário, M. Fragoso, R. (2016). Disaggregating statistical data at field level: an entropy approach, CEFAGE Working Paper 2016/06.
6