application of statistical and machine learning ... - Semantic Scholar

3 downloads 245 Views 805KB Size Report
Methodologies for grassland monitoring (such as classi- fication ... using Python script. Ground truth data ... Using the RMSE as a model performance in- dicator ...
APPLICATION OF STATISTICAL AND MACHINE LEARNING MODELS FOR GRASSLAND YIELD ESTIMATION BASED ON A HYPERTEMPORAL SATELLITE REMOTE SENSING TIME SERIES Iftikhar Ali ∗,1,2 , Fiona Cawkwell1 , Stuart Green2 , Ned Dwyer3 1

Department of Geography, University College Cork, Cork, Ireland. ∗ Corresponding author: Email. iffi[email protected]. 2 Spatial Analysis Unit, Teagasc, Dublin, Ireland. 3 Coastal & Marine Research Centre, University College Cork, Cork, Ireland. ABSTRACT More than 80% of agricultural land in Ireland is grassland, providing a major feed source for the pasture based dairy farming and livestock industry. Intensive grass based systems demand high levels of intervention by the farmer, with estimation of pasture cover (biomass) being the most important variable in land use management decisions, as well as playing a vital role in paddock and herd management. Many studies have been undertaken to estimate grassland biomass using satellite remote sensing data, but rarely in systems like Irelands intensively managed, small scale pastures, where grass is grazed as well as harvested for winter fodder. The objective of this study is to estimate grassland yield (kgDM/ha) from MODIS derived vegetation indices on a near weekly basis across the entire 300+ day growing season using three different methods (Multiple Linear Regression (MLR), Artificial Neural Networks (ANN) and Adaptive Neuro-Fuzzy Inference Systems (ANFIS)). The results show that ANFIS model produced best result (R2 = 0.86) as compare to the ANN (R2 = 0.57) and MLR (R2 = 0.31).

as a surrogate for pasture carrying capacity, i.e. the number of livestock units that a pasture can feed for specific length of time, is the quality and quantity of above ground biomass. Grassland biomass can be estimated using both traditional (or ground based) and remote sensing methods. Traditional methods include eyeball or visual inspection, cut and dry, rising plate meter and field spectrometry, however all these methods are very time consuming and are only applicable for small scale monitoring [5]. With the advances in space-borne sensor technology and increase in spatial and temporal resolution, satellite remote sensing is now regarded as the best alternative for large scale monitoring [6]. Methodologies for grassland monitoring (such as classification, mapping, and retrieval of biophysical parameters) based on space-borne remotely sensed data are becoming more cost-effective as satellite data are more easily, and freely, available. A review of the literature suggests that the approaches for grassland biomass estimation can be divided into three broad categories:

Index Terms— Grassland, MODIS time series, ANN, ANFIS, biomass prediction 1. INTRODUCTION Grasslands cover about 40.5% of the earth surface [1, 2] and being a widespread land cover type, and one of the largest terrestrial ecosystems, they play a very important role as a major carbon sink [3]. In addition to regulating the global carbon cycle [2], grazed grass is the cheapest feed source for the livestock industry, but in order to meet the increasing global demand for dairy products [4] the efficient management of grasslands is essential, which requires regular monitoring. Consistent, repeatable and objective monitoring of grasslands is very important due to the fact that overgrazing can lead to their degradation, and one measure which is used We acknowledge Walsh fellowship for funding.

‹,(((



1. Vegetation index based regression models:satellite derived vegetation parameters (mostly vegetation indices), validated by in–situ measurements, have commonly been used for the development of regression models [5]. 2. Biophysical simulation model: numerical simulation models, for example LINGRA [7], have been developed for the forecasting of grassland productivity. These have been refined further by the combined use of a C–fix model, in–situ measurements and remote sensing data for grassland gross primary production [8]. 3. Machine learning: few studies have been reported in the literature on the use of machine learning methods for grassland biomass monitoring [9, 10], although the use of this approach in crop and forest monitoring is quite common.

,*$566

The objective of this paper is to estimate biomass for intensively managed grasslands by using three different models: multiple linear regression (MLR), artificial neural networks (ANN) and adaptive neuro fuzzy inference systems (ANFIS), and compare the results with field measurements collected at the paddock level.

2. STUDY SITE AND DATA USED The Moorepark study site (Lat: 50◦ 7 N, Long: 8◦ 16 W) is a research farm in the south of the Republic of Ireland. The site covers an area of 100 hectares, and has been closely monitored for many years, providing a valuable source of grassland biomass, meteorological and farm management data. This study uses in-situ data of weekly biomass (kgDM/ha) from 2001 to 2012. A 12-year time series of MODIS Terra surface reflectance 250m 8-day composite (MOD09Q1) images was downloaded from the NASA Land Process Distributed Active Archive Centre (LPDAAC) for Ireland, covering the same time period as the Moorepark field data.

3.1. Model development 3.1.1. Multiple linear regression model The Multiple Linear Regression (MLR) approach is used when several predictor variables are drivers of output, in order to establish a possible linear relationship between the dependent and independent variables [11]. Five vegetation indices and two spectral reflectance bands (red and NIR) were used as independent predictor variables. 3.1.2. Artificial Neural Networks model Artificial Neural Networks (ANN) are based on the concept of a biological neuron where the information flows in is processed by the neuron and the results flow out [12]. A single processing unit (an artificial neuron) computes the weighted sum of input data sets, and there is always an activation function, which gives the output of the unit. Mathematical representation of an artificial neuron at an instance and its activation function is given by [13]: yi = f

 N

 wij yj

j=1

3. METHODOLOGY Vegetation indices, including Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI2), Soil Adjusted Vegetation Index (SAVI), Modified Soil Adjusted Vegetation Index (MSAVI) and Optimized Soil Adjusted Vegetation Index (OSAVI), were calculated from the red and NIR bands of the MODIS composites for training, testing and validation. A Savitzky-Golay filter was applied to smooth out noise in the time series and fill gaps resulting from low quality or missing data. The dataset was standardized to a zero mean and unit variance. Principal Component Analysis (PCA) was applied to reduce data dimensionality and variable dependencies. Models based on statistical (Multiple Linear Regression; MLR) and machine learning (Artificial Neural Networks; ANN and Adaptive Neuro-Fuzzy Inference Systems; ANFIS) were developed to estimate the grass yield. Figure 1 shows the work-flow for this study. MODIS data acquisition

Information extraction In (reading .hdf files using Python script

Ground truth data (Grass yield DM kg/hc)

Multiple ltiple Linear egression Regression (MLR)

Artificial Neural Networks (ANNs)

Calculation of ve vegetation indices (VI)

Temporal data filtering Te ng (Savitzy-Golay)

Principal Component nt Analysis (PCA)

Data standardization n (0 Mean, 1 Standard d deviation)

Neuro-Fuzzy Adaptive Neuro Systems Inference Sys (ANFIS)

Grass yield estimation

Fig. 1. Flow diagram for biomass estimation methodology.

Each input has an associated weight w which is modified by an algorithm learning process. yi is the input for the unit i where f is the activation function, which may be a sigmoid, linear or hyperbolic tangent functions. For this study a feed-forward backpropagation neural network algorithm [13] is used, as shown in figure 2. Input layer

Hidden layer(s)

Output layer(s)

.. .. ..

..

..

..

Fig. 2. Feed-forward back-propagation neural network architecture.

3.1.3. Adaptive Neuro Fuzzy Inference Systems (ANFIS) model Adaptive Neuro Fuzzy Inference Systems (ANFIS) combines the learning power of ANN and the reasoning capability of fuzzy logic. ANFIS is a five layer model (see figure 3), first introduced by Jang in 1993 [14], comprising: 1–Fuzzy layer: Every node in this layer is fixed and adaptive; membership of each label is calculated in this layer. 2–Product layer: Every node in this layer is labeled, the outcome of this layer is the product of incoming signals.



3–Normalization layer: In this layer the ith node calculates the ratio of the ith rule’s firing strength to the sum of all rules’ firing strenghts. 4–Defuzzify layer: Every node in this layer is adaptive, and the parameters defined in this layer are termed as consequent parameters. 5–Output layer: All the incoming signals are summed up in this layer in order to compute the overall output of the system. Inputs

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Output

Fig. 3. Adaptive Neuro Fuzzy Inference Systems (ANFIS) architecture for two inputs.

4. RESULTS AND DISCUSSION Conventional mathematical and statistical (MLR) modelling approaches did not perform well due to the highly non-linear and varied nature of the data, with the soft computing methods (ANN and ANFIS) out-performed the traditional modeling approaches. Using the RM SE as a model performance indicator, MLR generated the highest RM SE value of 25.05 and lowest value of coefficient of correlation R2 = 0.31. The ANN model showed improved results (R2 = 0.57, RM SE = 19.65), and the ANFIS model produced the best prediction results (R2 = 0.86, RM SE = 11.07). Most notably, the ANFIS has successfully identified the seasonal phenological peaks which were missed by the ANN as shown in the actual and predicted time series biomass (kgDM/ha) for the Moorepark site (Figure 4). These results show that the proposed ANFIS approach has performed more accurately than in some previously conducted studies ( e.g. ANN: R2 = 0.81, [9]). Figure 5 shows the scatter plots of actual and predicted biomass, with the cluster of points around the 1:1 line greater for the ANN and ANFIS models, and the number of outliers greatly reduced for the ANFIS approach. These results concur with the literature for other disciplines, where, in almost every case, application of ANFIS approach has resulted in improved model outputs as compared to the ANN and MLR [15, 16].

Fig. 4. Comparison between actual (blue) and predicted/simulated time series by using MLR (top), ANN (middle) and ANFIS (bottom). to field measurements. The machine learning approaches demonstrated a greater potential for accurate yield estimates from the time series vegetation index data throughout the growing season over a number of different years, each with a unique climatic and management signal. This work represents the first use of the ANFIS model for a long time series, with the output from this model aligning well with field data and identifying many of the inflection points in the time series. However, there are some occasions when the model data under-estimates the actual biomass peak (a common feature of VI driven biomass models) and further work is required to understand these anomalies. Nevertheless, these results show significant promise for the use of a hypertemporal time series of satellite imagery as input to modelling for an effective tool for grassland monitoring and management.

5. CONCLUSION MLR, ANN and ANFIS models for grassland biomass prediction were developed for this study, using 8-day composite MODIS data to derive biomass values which were compared



6. REFERENCES [1] FAO, “What are grasslands and rangelands?,” http://www.fao.org/agriculture/ crops/core-themes/theme/spi/ scpi-home/managing-ecosystems/ management-of-grasslands-and-rangelands/ grasslands_what/en/, [Accessed 13-May-2014]. [2] J. M. O. Scurlock and D. O. Hall, “The global carbon

Fig. 5. Scatter plot between predicted/simulated and actual biomass. sink: a grassland perspective,” Global Change Biology, vol. 4, no. 2, pp. 229233, 1998. [3] J. D. Derner and G. E. Schuman, “Carbon sequestration and rangelands: A synthesis of land management and precipitation effects,” Journal of Soil and Water Conservation, vol. 62, no. 2, pp. 77–85, Mar. 2007. [4] John Kearney, “Food consumption trends and drivers,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 365, no. 1554, pp. 2793–2807, 2010.

[9] Yichun Xie, Zongyao Sha, Mei Yu, Yongfei Bai, and Lei Zhang, “A comparison of two models with landsat data for estimating above ground grassland biomass in inner mongolia, china,” Ecological Modelling, vol. 220, no. 15, pp. 1810–1818, Aug. 2009. [10] Xiuchun Yang, Bin Xu, Jin Yunxiang, Li Jinya, and Xiaohua Zhu, “On grass yield remote sensing estimation models of chinas northern farming-pastoral ecotone,” in Advances in Computational Environment Science, Gary Lee, Ed., number 142 in Advances in Intelligent and Soft Computing, pp. 281–291. Springer Berlin Heidelberg, Jan. 2012.

[5] B. Xu, X. C. Yang, W. G. Tao, Z. H. Qin, H. Q. Liu, J. M. Miao, and Y. Y. Bi, “MODISbased remote sensing monitoring of grass production in china,” International Journal of Remote Sensing, vol. 29, no. 17-18, pp. 5313–5327, 2008.

[11] G. Civelekoglu, N.O. Yigit, E. Diamadopoulos, and M. Kitis, “Prediction of bromate formation using multilinear regression and artificial neural networks,” Ozone: Science & Engineering, vol. 29, no. 5, pp. 353–362, 2007.

[6] Martin Claverie, Valrie Demarez, Benot Duchemin, Olivier Hagolle, Danielle Ducrot, Claire Marais-Sicre, Jean-Franois Dejoux, Mireille Huc, Pascal Keravec, Pierre Bziat, Remy Fieuzal, Eric Ceschia, and Grard Dedieu, “Maize and sunflower biomass estimation in southwest france using high spatial and temporal resolution remote sensing data,” Remote Sensing of Environment, vol. 124, pp. 844–857, Sept. 2012.

[12] B. Yegnanarayana, Artificial Neural Networks, PHI Learning Pvt. Ltd., Jan. 2009.

[7] A.H.C.M Schapendonk, W Stol, D.W.G van Kraalingen, and B.A.M Bouman, “LINGRA, a sink/source model to simulate grassland productivity in europe,” European Journal of Agronomy, vol. 9, no. 23, pp. 87–100, Nov. 1998. [8] F. Maselli, G. Argenti, M. Chiesi, L. Angeli, and D. Papale, “Simulation of grassland productivity by the combination of ground and satellite data,” Agriculture, Ecosystems & Environment, vol. 165, pp. 163–172, Jan. 2013.

[13] I. A Basheer and M Hajmeer, “Artificial neural networks: fundamentals, computing, design, and application,” Journal of Microbiological Methods, vol. 43, no. 1, pp. 3–31, Dec. 2000. [14] J.-S.R. Jang, “ANFIS: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, May 1993. [15] Melih Iphar, “ANN and ANFIS performance prediction models for hydraulic impact hammers,” Tunnelling and Underground Space Technology, vol. 27, no. 1, pp. 23– 29, Jan. 2012. [16] Alireza Karami and Somaieh Afiuni-Zadeh, “Sizing of rock fragmentation modeling due to bench blasting using adaptive neuro-fuzzy inference system (ANFIS),” International Journal of Mining Science and Technology, vol. 23, no. 6, pp. 809–813, Nov. 2013.



Suggest Documents