Crop Management Using BIG DATA

5 downloads 356 Views 908KB Size Report
Crop Management Using BIG DATA. Younes Oulad Sayad*. OSER research team, FSTG. Cadi Ayyad University. Marrakesh, Morocco.
Crop Management Using BIG DATA Younes Oulad Sayad*

Hajar Mousannif

Michel Le Page

OSER research team, FSTG Cadi Ayyad University Marrakesh, Morocco [email protected]

LISI Laboratory, FSSM, Cadi Ayyad University Marrakesh, Morocco [email protected]

CESBIO (UMR IRD, UPS, CNES, CNRS) Toulouse, France LMI TREMA Marrakesh, Morocco [email protected]

Abstract—Crops management might be a hard task especially in semi-arid environments, where crops are not getting enough water by rainfall or natural drainages. Providing additional water resources to crops by irrigation comes with many challenges. On the one hand, too much irrigation will imply losses by evaporation or percolation, while a lack of irrigation can cause water stress. On the other hand, the application of the exact amount of water needed requires the knowledge of numerous parameters such as soil conditions, plant development, and weather. This paper addresses such issues by providing an efficient strategy for crops and water resources management. The strategy makes use of Big Data and remote sensing to process satellite images over large areas and extract insights from the collected data to help farmers in managing their crops and taking irrigation decisions. We implemented a web service that allows farmers to perform all above-mentioned tasks online. Keywords- Big Data, Remote Sensing, crop management.

I.

meteorological forcing (rain, temperature, wind speed, air humidity, solar radiation) to estimate crop evapotranspiration, and then an hydric budget. This data comes from different sources and is calculated in different time spans. The big volume, the variety and velocity of this data renders taking irrigation decisions a very challenging task. To address this issue, this paper proposes a crop and water management strategy that leverages Big Data to help farmers obtain the best crop yields with less usage of water resources. The remainder of this paper will be organized as follows: Section 2 introduces some existing strategies for managing crops and saving water resources, and highlights our contribution. In section 3, we describe our strategy, through a clear roadmap, for crops and water resources management. In section 4 we present the implemented web service. Conclusions and directions for future work are given in section 5.

INTRODUCTION

Of all human activities, agriculture is by far the first source of water consumption (85% of water consumption) [1].unfortunately, a part of this scarce resource is not used by the crops as it may be lost in leaky channels, seepage, or unnecessary evaporation or percolation. Conversely, inadequate application of irrigation can also lead to water stress affecting yields, as different crops and regions vary in their water needs. In fact, rainfall patterns, soil quality, climate, and vegetative cover all impact soil moisture levels. For all the above-mentioned reasons an efficient water management strategy to preserve water resources and meet the needs of crops to get a local-optimal yield is likely to serve the water saving effort. On another aspect, numerous factors, including soil salinity, land fertility, evaporation, transpiration, runoff, deep percolation and poor soil management may also have an impact on crops growth. It is, thus, wise to provide farmers with a state of the art information to make sure crops get the right quantity of water needed in the right time. To this purpose, it is necessary to know the state of the soil moisture at different depths and the crop characteristics, and theoretically in each square meter of the cropped plot. Different techniques and different levels of complexity have been developed in the late fifty years to estimate the soil water content available to the crop. We have chosen to implement a relatively simple model [2] which is based on relatively small numbers of

II.

RELATED WORK

Authors In [3] introduced a new software called SAMIR dedicated to irrigation budget management on large scale, especially in arid and semi-arid environment where water resources are scarce. SAMIR computes spatialized estimates of evapotranspiration (ET), irrigation and water budget on large areas. This approach is based on the use of satellite image time series allowing accurate monitoring of the actual development of vegetation. SAMIR is based on an approach presented in [2] for computing reference and crop evapotranspiration using crop coefficients and meteorological data. The procedure allows the estimation of the amount of water used by crops, considering the effect of both climate and crop characteristics. The guideline incorporates more accurate procedures for determining crop water use by computing the crop evapotranspiration reference according to the FAO PenmanMonteith method [4]. Another approach in [5] presented a new system for crop growth monitoring based on remote sensing, this system is composed from five modules: real time monitoring, process monitoring, result visualization, system configuring and business management. The first module, real time monitoring extract the value of NDVI from satellite images of a period in present and past, then classify the result into five categories according to its value. These five categories represent the crop

978-1-4673-8149-9/15/$31.00 ©2015 IEEE

conditions from worse than to better than the period in the past. The second module, process monitoring realizes the crop growing-process monitoring method, the third module, result visualization visualizes the results of the above two modules and outputs them in JPG file. The fourth module, system configuring is the first module to run after starting the system, it allows the configurations of the system parameters such as file path and data base connecting parameters. The last module, business management shows the progress of the system by showing different business statues, such as work finished, work not finished and so one. Growth model-based decision support system for crop management (GMDSSCM) [6] was developed to simulate growth, development and quality of a crop growing under various environmental conditions such as changing soil water, nitrogen conditions and so on. The system performs dynamics simulation and comprehensive evaluation, and provides a framework for crop management. This system is composed of six functions: 1) Data management accomplish three function: debugging crop parameters, generating weather data, inquiring and inputting various data, 2) Dynamic simulation selects the parameters, run the model and display the results, 3) Strategy evaluation allows individual and comprehensive management for users, 4) Real-time prediction is a simulation function from the real-time crop growth at a given stage ,5) Temporal and spatial analysis simulates crop growth over multiple years and allows the evaluations of the effects of uncertain future weather conditions on crop performance, 6) Expert consultation provide a technical support of the system and a comprehensive introduction to crop production in the subsystem. With respect to the related literature presented above, the existing efforts provided general procedures to estimate crop water use based on soil and climate parameters. In the case of a synoptic application, it is also necessary to take into account the aspects of multi-sources data processing and immediate irrigation actions. The present work put forward a strategy that makes use of big data and remote sensing to manage water resources and monitor crops: from data collection, to final data analysis. III.

STRATEGY

In this section, we describe the proposed strategy for crop and water resources management using Big Data. This approach is also being explored as a method of building resistance against floods and droughts. A changing climate might make water variability extremes more common. Big Data and Remotely sensed data are now allowing researchers to accurately predict the risk of floods and droughts and to allow farmers to adjust their activities accordingly. We applied the methodology proposed in [7], which is a generic methodology for building any Big Data project, and adapted it to the context of water resources management. The updated approach consists of five steps as shown in figure 1:

Figure 1. Proposed methodology for crops management

A. Land cover Classification In order to manage crops, farmers need to first define the land cover in the studied area. This operation consists in determining the type of crops planted in the land. In [8], land cover is defined as the bio-physical coverage of land (e.g. crops, forest, buildings and roads or lakes). Land cover is specified according to a classification with multiples subclasses. At its most basic level, we find artificial land, cropland, woodland, grassland, bare land, and wetlands. Farmers need to specify their land cover, as this will define the formulas to use to compute various coefficients needed to estimate water need. The land cover classification can be done manually, by determining the crops planted or sowed on a plot, in case of large plots we can use multiple classification methods that determine the land cover. The classification process involves translating the pixel values in a satellite image into meaningful categories comprising different types of land cover. Such process can be automated, manual, or hybrid [9]: •

Automated: The majority of classification methods belong to this category. The advantage of automated approaches is that the algorithm is applied throughout the entire image relatively quickly. Here is a short overview of some of the more popular automated classification approaches. Many of these algorithms adopt machine learning techniques and can be quite complex. TABLE I.

Approach ISODATA unsupervised classification Supervised statistical classification Artificial Neural net classification Binary decision tree

LAND COVER DEFINITION APPROACHES. Presentation

It creates a number of classes or clusters defined by the user in a labeled image to create a land cover map. It defines samples in the image of each class. Then compares each pixel in the image with the different samples to determine which one is most "similar" to the pixel in the image. It imitates the human learning process to associate the correct land cover class to image pixels. It is a machine learning tool that has taken hold in remote sensing areas and consists of a set of binary

Approach classification





Presentation rules that define how specific land cover classes should be assigned to individual pixels.

Manual: Manual classification of remotely sensed data is an efficient method for classifying land cover, especially when the analyst is familiar with the area being classified. This approach relies on the interpreter to use visual cues such as texture, shape, pattern, and relationship to other objects in order to recognize different land cover classes. The main advantage of the manual approach is that the user will identify features in the image and relate them to features on the land manually. This makes this approach more accurate in identifying image features. Hybrid: The hybrid approach combines the advantages of the automated and manual methods to create a better land cover map. The hybrid approach uses the automated classification techniques to do an initial classification and then uses the manual methods to refine the classification and correct errors.

B. Data Collection Data collection is the second step once land cover is classified. The methods for estimating the water needs of each crop from meteorological data require various climatological and physical parameters. Some of these data are measured directly at local agro-meteorological weather stations. Otherwise, non-reference data can be taken from online weather stations such as SYNOP’s weather station of the World Meteorological Organization. In this case we can use the data served by OGIMET1. To get the meteorological data from these weather stations, first we need to identify the nearest one from the user’s crops by calculating the distance between the center of the crop and available weather stations [10]. The second step is to evaluate the station activity; if there is a lack of data in the last ten days, the station is declared as non-functional. The third step is to launch a web query to the selected station, and extract the meteorological data available on the Internet in real time. Other data are calculated from satellite images that continue to provide great benefits to meteorological services throughout the world. Remote sensing represents a tremendous source of meteorological data, The amount of data collected by a single satellite data center is increasing by several TB per day. Remote sensing provides valuable information for crops management by retrieving data, such as Normalized Difference Vegetation Index (NDVI)2, Leaf Area Index (LAI) 3 and brightness temperature, from satellite images. In order to get this data, we can download the GEOTIFF images that contain our plots (if they exist). These

images are freely available in many satellites such as (Landsat, aster, Modis and so on), each satellite presents a different product, and these products differ in their scene sizes “Volumes”, their spatial and time resolution and spectral range. Once the images are collected, we need to extract the crop’s data, This can be done using tools such as GDAL library [11] or Python Imaging Library (PIL). C. Data Preprocesing Cloud covers, technical problems on the satellite, on the ground system, or on the images distribution service are all possible causes of data lack in remote sensing, the same problem can be faced for the meteorological data in case of a malfunction or breakdown in the weather stations. In this case, the user will have no information on his plot. Data pre-processing is a crucial step before data analysis and includes multiple operations: 1) Data Cleaning: in this step we can apply a number of preprocessing techniques such as geometric correction, radiometric correction, and image enhancement to remove noises and correct inconsistencies. In case the image does not contain any data we remove it. 2) Data integration: Consists in combining the data collected from different sources (weather stations and satellite images), and finally storing them in the same database. 3) Data transformation: Consists in normalizing the data by keeping only one timestamp, preferably “daily”, using data interpolation with different timestamp. Many different Data interpolation methods exists, those methods can be classified in three categories: deterministic, probabilistic and other methods. The deterministic methods create a continuous surface by using the geometric characteristics of point observations. The probabilistic methods uses a probabilistic theory, they allow to include the variance in the interpolation process and to compute the statistical significance of the predicted values. The other methods are specially developed for meteorological purposes using both deterministic and probabilistic methods. Here are some of the most popular methods in each category [12]: TABLE II. Categories

Deterministic methods

1

OGIMET is a free weather information service in a narrow bandwidth server, it uses freely available data from the internet and it uses open software to process it. 2 The Normalized Difference Vegetation Index (NDVI) is the difference between the visible (red) and near-infrared (nir) bands, divided by their sum. 3 Leaf Area Index (LAI) is the total one sided green leaf area per unit ground surface area.

Probabilistic methods

THE MOST POPULAR METHODS FOR DATA INTERPOLATION

Methods • Nearest Neighborhood (NN): Assigns the value from the nearest observation to a certain grid cell. Pros: fast and simple. Cons: the interpolated fields do not look realistic in all cases. • Inverse Distance Weighting (IDW): IDW is an advanced nearest neighbor approach. The value at a certain date is obtained from a linear combination of the surrounding locations. Pros: fast, easy to implement. Cons: Don’t Support Ancillary data. • Linear regression: identify the relation between a predicted variable and one or more explanatory variables. Pros: Ancillary data included, Cons: can be stochastic in some cases. • Optimum interpolation: Based on a spatial correlation function, requires a first guess field like model output from numerical weather prediction

Categories

Other methods

Methods models. • Kriging: Kriging is a geostatistical method for spatial interpolation. Kriging differs from the other interpolation methods because it can evaluate the quality of prediction with estimated prediction errors. The three fundamental Kriging methods are simple Kriging, ordinary Kriging and universal Kriging. • MISH: Incorporate information from time series in the interpolation procedure. It consists of two modules: MISH for interpolation and MASH to obtain homogenized data series. • PRISM: uses point measurements of temperature, precipitation, and other climate values to produce continuous, digital coverage. It is used with Geographic Information Systems (GIS) to build maps and do many types of analysis.

D. Big Data Analysis After applying various preprocessing techniques and removing multiple imperfections in raw collected data, data can now be analyzed. This operation is composed of three steps: 1) Satellite image processing : After preprocessing the "GeoTIFF" images which contain all the user’s plots, they must be cut to match the format of the user’s plots. This operation is crucial for the crops data abstraction from the satellite images, however many difficulties can be faced performing this operation, the main problem is the image volume, the more the image has a higher resolution the more it is big and the more it is difficult to be processed, and few tools and platforms support processing large images. The must known free library for satellite image processing is GDAL library. 2) Data Extraction: After generating the GeoTIFF images matching the shape of the user’s crops, we move to data extraction, this stage produces various data products, such as NDVI, LST, LAI and so on. For example to calculate the NDVI values, crops’ pixels are extracted "one by one" and their average is calculated. For the MODIS products “MOD13Q1” the NDVI is already calculated in each pixel. For others products such as LANDSAT and ASTER the NDVI is more difficult to be calculated, for example LANDSAT’s Thematic Mapper (TM) instruments includes seven spectral bands, including a thermal band. In order to calculate the value of NDVI we use the third band (Visible Red) and the fourth band (Near-Infrared) and calculate the NDVI according to this formula : NDVI=(NIR-RED)/(NIR+RED) [2]. In order to extract the values from the pixels many libraries can be used such as python or GDAL for instance. 3) Coefficients computation: After extracting different data using satellite images. Multiple coefficients such as: Fraction Cover (Fc) 4 , Basal Crop Coefficient (Kcb) 5 ,

4

The fractional green vegetation cover (Fc) accounts the exchanges of carbon, water and energy at the land surface. 5 The basal crop coefficient (Kcb) is the ratio of the crop evapotranspiration over the reference evapotranspiration when the soil surface is dry.

reference evapotranspiration (ET0)6 , Crop Evapotranspiration (Etc) 7 , Irrigation are then calculated using equations developed in [2]. Once all the required coefficients are calculated, we can extrapolate them for the next 10 or 15 days in order to predict the crops needs and react properly by providing them the exact amount of water needed. E. Storage Crops management using remote sensing and meteorological data requires the storage of huge amount of diverse data each day, and in some cases many times in one day, this fast growth of data size requires horizontal scaling, which is the ability to extend the database over additional servers. Beside, managing rapidly changing data needs greater flexibility in schema definition which is not available in classical databases. Thus the traditional database management systems have shown their limitations. Several alternatives have been developed in order to meet the needs of the fast growing data, these products are grouped within the NoSql family [13]. Each product fits a particular area and support horizontal scaling thanks to automatic replication and auto-sharding. They also support dynamic schemas allowing for transparent real-time application changes. The main databases belong to the relational database management system (RDBMS) such as: MySql, PostgreSQL. There are also other databases like: Big Table, Hyper Table, MongoDB, Hbase, Casandra, Rasdaman and so on [14]. IV.

IMPLEMENTATION

As a proof of concept, we implemented a web service that allows farmers to perform all above-mentioned tasks online. A. Graphic User Interface The graphical user interface allows users to manage their crops online and monitor their development. This interface provides access to the Framework "mapshup"8, it allows users to perform many tasks including drawing their own plots on a geographical map, creating and controlling the crops, and managing water resources. It is divided into two sub-interfaces: 1) User Interface: This interface gives users complete control over their crops. It is composed of five main components: a) Home, b) Who we are? c) Interface user access, d) Map Access and e) Contact.

6

The reference crop evapotranspiration (ET0) is the evapotranspiration rate from a reference surface, not short of water. 7 The crop evapotranspiration under standard conditions (ETc) is the evapotranspiration from disease-free, under optimum soil water conditions, and achieving full production under the given climatic conditions. 8 Mapshup provides an efficient access to Geospatial web services. It brings on a unique map to easily build a comprehensive "information context" and help decision making for end users.



Chart representing the values of Kcb and Fc.



Chart representing the values of soil water content, irrigation and rain.



Chart illustrating the values of ETc, ET0, Irrigation and Rain

The chart below shows an example of the evolution of Kcb and Fc for a crop in a specific period.

Figure 2. User Interface

2) Mapshup Interface: This sub-interface provides an efficient access to Geospatial web services that allows users to draw their plots on a geographical map. We added many features to it and it can now be accessed by any user, registered or non-registered. If the user accesses it as a visitor, he can draw plots without any storage functionality. But if the user accesses the mapshup interface as a registered user, he can monitor the whole crops’ development process. If the user has previously drawn some plots, they will be automatically recovered from the database and drawn in the map. This operation is done by retrieving the geometrical data of the plots, creating as many JSON files as plots, and calling these files in “mapshup interface” to display the plots in the map. Figure 3 summarizes this process.

Figure 4 : Chart illustrating the values of Kcb and Fc

V.

CONCLUSION AND FUTURE WORK

In this paper, we presented a well-established methodology for water resources management and plant growth monitoring using Big Data. This approach will allow farmers to get the best yields with the exact amount of water needed. We implemented our solution as a web application accessible via the internet, and which allows users to perform many tasks online, ranging from drawing their plots to taking irrigation decisions. Future Work will mainly consist in allowing processing and manipulation of other data sources, such as “The Sentinel-2 constellation” which has the objectives of providing a monitoring system of multi-spectral Earth, allowing continual observations. ACKNOWLEDGMENT

Figure 3 : Process for displaying plots in mapshup

The user can draw new plots in Mapshup interface. Those plots will be stored in the database, and then the centroid of the plot will be calculated to identify the corresponding tile. B. Visualization & Results Visualization will help in presenting the results in a meaningful way. The user can visualize the available results of his crops in the current season. Those results are provided in real-time and change every time new coefficients are calculated. Users can access those results via the application interface in the “irrigation guide” section accessed from the home page or the users sub menu. The results are presented in three charts, each one represents different coefficients.

The authors would like to acknowledge The International Mixed Laboratory "Remote sensing and Water Resources in Semi-Arid Mediterranean" (LMI TREMA) and especially Mr Michel Le Page for supporting this work. REFERENCES [1] [2] [3]

[4]

M. Blinda, More efficient water use in the mediterranean, no. November. 2012, p. 42. R. Allen, L. S. Pereira, D. Raes, and M. Smith, “Crop evapotranspiration: Guidelines for computing crop requirements,” Irrig. Drain. Pap. No. 56, FAO, no. 56, p. 300, 1998. V. Simonneaux, M. Le Page, D. Helson, J. Metral, S. Thomas, et al. Estimation spatialisée de l'Evapotranspiration des cultures irriguées par télédétection. Application à la gestion de l'Irrigation dans la plaine du Haouz (Marrakech, Maroc). Sécheresse, 2009, 20 (1), pp.123-130. P. Steduto, T. C. Hsiao, E. Fereres, and D. Raes, Crop yield response to water. 2012, p. 505.

[5] [6]

[7] [8] [9] [10] [11] [12] [13] [14]

M. Ji-Hua, W. Bing-Fang, and L. Qiang-Zi, “A Global Crop Growth Monitoring System Based on Remote Sensing,” 2006 IEEE Int. Symp. Geosci. Remote Sens., vol. 00, no. 3, pp. 2277–2280, 2006. W. Cao, L. Tang, Y. Zhu, J. Pan, W. Li, and B. Chen, “Development of growth model-based decision support system for crop management,” Proc. - Second Int. Symp. Plant Growth Model. Simulation, Vis. Appl. PMA 2006, pp. 181–184, 2007. H. Mousanif et al, “From Big Data to Big Projects: a Step-by-step Roadmap,” 2014. P. Mather and B. Tso, Classification methods for remotely sensed data. crc press, 2010, p. 367. C. Schmullius, “Land cover classification methods,” vol. 21, pp. 791– 795, 2010. OGIMET : Professional information about meteorological conditions in the world GDAL Library official page R. Sluiter, “Interpolation methods for climate data: literature review,” KNMI, R&D Inf. Obs. Technolgy, pp. 1–28, 2009. NOSQL Database. R. Ramakrishnan and J. Gehrke, “Database Management Systems,” pp. 9–40, 2002.

Suggest Documents