Agricultural researchers over the world insist on the need for an efficient mechanism ... This research paper suggests a crop yield prediction model ( CRY ) which ...
CRY – An improved Crop Yield Prediction model using Bee Hive Clustering Approach for Agricultural data sets Ms. Gunasundari Research Scholar, Mother Teresa University
Dr Arunkumar Thangavelu VIT University
Ms. Hemavathi VIT University
Abstract Agricultural researchers over the world insist on the need for an efficient mechanism to predict and improve the crop growth. The need for an integrated crop growth control with accurate predictive yield management methodology is highly felt among farming community. The complexity of predicting the crop yield is highly due to multi dimensional variable metrics and unavailability of predictive modeling approach, which leads to loss in crop yield. This research paper suggests a crop yield prediction model ( CRY ) which works on an adaptive cluster approach over dynamically updated historical crop data set to predict the crop yield and improve the decision making in precision agriculture. CRY uses bee hive modeling approach to analyze and classify the crop based on crop growth pattern, yield. CRY classified dataset had been tested using Clementine over existing crop domain knowledge. The results and performance shows comparison of CRY over with other cluster approaches.
Keywords: Crop yield prediction, Bee Hive clustering algorithm, Crop Growth, Precision Agriculture
1.0 Introduction Majority of research works in agriculture focus on biological mechanisms to identify crop growth and improve its yield. It had been understood that crop yield often thrives only in specific region or country, while few crops fail at yield in few regions. The outcome of crop yield primarily depends on parameters such as variety of crop, seed type and environmental parameters such as sunlight (Temperature), soil (ph), water (ph), rainfall and humidity [11]. This paper discusses on improving crop growth and analysis with focus on predictive crop yield. To improve
crop yield, precision agriculture plays a vital role in providing expert guidelines along with help of technological enhancements to traditional farming and management tools [4]. Success in the precision agriculture [9] practice depends
heavily on the ability to map crop characteristics with crop demographic data accurately. The crop yield monitoring [10] process provides consistent maintenance of the quantity of a crop harvested at particular region along with their related environmental parameters and inter-related agronomic data. Yield measurements of each region along with their respective environmental parameters are stored in a relational database. Crop growth and yield are functions of a large number of metabolic processes, which are affected by environmental and genetic factors. Data mining algorithms plays vital role in agricultural database and applications [7] for extracting knowledge and update of crop information. This paper uses Bee Hive Cluster approach to describe agricultural datasets which helps crop growth decision making. Bee Hive cluster approach considers heterogeneous data
cluster maintained as repositories or cells in a hive. Bee Hive performs better compared over neural network [17], support vector machines or bayesian classification [16], since neural network and support vector can be applied only to single flat relations and Bayesians relies on probability metrics. A detailed survey on bee colony algorithms had been carried out with specific suggestions on cluster based on bee behaviors criteria and honey gathering analysis. The algorithms are based on foraging behavior of bees in the bee colony and food source searching behavior related to nest site searching and storing. Clustering approaches of data mining help to accomplish forementioned goals by extracting or detecting hidden crop growth characteristics, need and behavior from large databases. Several parameters (Fig-1) work together to enable predictive crop yield, specific to sociodemographic attributes. Interesting measures had also been identified in case study, where the agronomic aspects of crop which are related to environmental issues and socio-economic field outs are considered accountable. The performance of system and precision degrades when traditional data mining algorithm attempts to find patterns in large complex crop database. In order to achieve optimum performance, crop type have to be classified based on crop yield and other attributes which can extract relationship patterns among objects in crop database. In this paper, a relational cluster Bee Hive algorithm is proposed for extracting yield patterns across multiple data sets. From the extracted patterns, graphs are plotted to illustrate yield variation at any region of discussion or type of crop in different regions. The outcome helps in identification of and investigate areas of unusually high or low yield. 2 Literature Survey Many cluster algorithms were proposed for generalizing known structure and focus on classifiers weakness and boosting its performance by model combination. C.Preisach et.al [1] proposed a generic relational ensemble
model which improves cluster accuracy of scientific publications by combining the probability distribution of several relational attributes and local attributes. Relational attributes probability is determined using graph representation and local graph using traditional text cluster. Heterogeneity becomes a major issue since models are combined in specific format. Xiaoxin Yin et.al [2] proposed cross mine tree and cross mine rule. Both cluster approaches make use of Tuple ID propagation which helps in virtual join among relation rather than physical join. In Tuple ID selection, key attributes are used for spamming among relations. For non target relation, all relations are joined together for computing foil gain. However both the methods are unable to handle database imbalances for complex application. H. Guo et.al [3] proposed multi relation cluster by multiple view creation without upgrading or flattering the original dataset. In this approach MRC algorithm is proposed for cluster which in turn makes use of conventional data mining methods at different stages. Finally view are validated by correlation based viewed validation algorithm. Yet this approach makes use of training data set for different views for learning the target concept which may not be helpful for large complex relational database. J.M.Serrano et.al [5] proposed query system architecture for retrieving information of olive crop information along with geographical data, crop management and soil attributes. Precise and imprecise data cluster is determined by using fuzzy relational database. Priori and posteriori fuzzy data processing is used for storing data and querying process. A.Jiménez et.al [6] proposed a tree mining approach for multi relational cluster. In this approach two different schemes are proposed for representing multi relational database as sets of trees a) Key-based tree representation b) Object-based tree representation. In Key-based tree representation primary key attribute is taken as a root node and remaining attributes of relation as child nodes. In Object-based tree representation intermediate nodes act as roots of
subtrees. All the attributes values will be the children of the root node. However in this approach focus on number of tuples and attributes are considered as constant and representation scheme heavily relies on foreign key relations it is suitable for frequent patterns identification. Ben Taskar et.al [8] proposed a general model for cluster and clustering for relational databases by combination of probabilistic reasoning and learning for linear scaling. Training dataset helps in model selection and fine tuning clusters. However for model selection underlying cluster domain expertise is required which in turn becomes less adoptable in case automatic model construction. 3.0 Crop Knowledge Base
represents type of crop. Crop Variety represents different variety of crops, seed type (e.g. Co43, IR20, ADT36). Season represents suitable seasons for cropping particular crop. Region of crop growth contains attributes Region_ID, Region Name, Soil Type, Soil Ph, Water Ph, Sunlight and Rainfall. Region_ID is assigned to uniquely to each RegionName. SoilType represents different type of soils in particular region (e.g. black soil, clay soil, alluvial soil). Sunlight represents amount of temperature (*c). Rainfall represents amount of rainfall in the particular region receives per season. Cropping_info contains attributes Region_ID, Crop_ID, Month, Year, Season and Yield. Region_ID and Crop_ID are foreign key attributes referring crop planted in the region. Month, Year and Season represents time period and Yield represents amount of yield a particular crop produced in a given region. 3.1 Crop yield prediction architecture Subset selection of attributes from crop knowledge base is handled by feature selection for robust learning. The cluster algorithm classifies crop yield into various classes for a particular region. Classified yield information helps for finding better crop for a region and crop yield is prediction done by prediction rules.
Fig-1 Crop Knowledge Base Frame work A crop knowledge base is constructed with set of relations (Crop_details, Region and Cropping _info). Each relation contains at least one or more key attributes (primary or foreign). Crop Type contains attributes Crop_ID, CropType, CropVariety, Yield Duration and Favorable Seasons. Crop_ID is guaranteed to be unique and used as primary key. CropType
Fig. 2. Crop yield prediction architecture
In a crop the growth parameters like optimum LAI and CGR at flowering have been identified as the major determinants of yield [10]. A combination of these growth parameters can demonstrate the varying yields which is better than any individual growth variable [14]. 3.2 Feature Selection Feature selection plays a major role in selecting attributes from crop knowledge base. To classify crop yield feature selection filters necessary parameters on basis of 1) ∀Region x Crop yield (Query –A) 2) ∀Region x SoilpH x Water pH x Rainfall (Query –B) 3) ∀ [Region x Crop yield] (Query –C) 4) ∀[Region x Avg (Soil pH) x Avg (Water pH) x Avg (Rainfall)] (Query – D) 4.0 Bee Hive Clustering Approach CRY cluster method exploits the search capability of the Bees Algorithm as shown in Fig-2 to overcome the local optimum problem of the k-means algorithm. CRY algorithm searches for appropriate cluster centres (c1, c2,...,ck ) such that the clustering metric E (Equation 1) is minimized. The basic steps of the proposed clustering operation is essentially being carried out by Bee Algorithm as explained. CRY algorithm requires specific parameters to be set, n : number of scout bees m : number of sites selected from neighbourhood search (n visited sites) (m), e: number of qualified sites among ‘m’ selected sites (e), b : number of bees selected for best ‘e’ sites, m – e : number of bees recruited for the remaining selected sites g : the initial size of each patch
(a patch can be considered as a region in the search space that includes the visited site and its CRY evaluates the sites and bee activity with specific steps as described 1. Initialise the solution population. 2. Evaluate the fitness of the population. 3. While (stopping criterion is not met) //Forming new population. a. Select sites for neighbourhood search. b. Recruit ‘b’ bees for selected ‘m’ sites and evaluate fitnesses Fi c. Select the fittest bee from each site. d. Assign remaining bees to search randomly and evaluate their fitnesses. 4. End While The algorithm starts with an initial population of n scout bees. Each bee represents a potential clustering solution as set of ‘z’ cluster centres or zones. Step 1 The initial locations of the zones are randomly assigned to the bees list ‘n’. The Euclidean distances between each data object and all centres are calculated to determine the cluster to which the data object belongs (i.e. the cluster with centre closest to the object). Hence initial clusters can be constructed, while clusters centres are replaced by actual centroids of the clusters to define a particular clustering solution (i.e. a bee). This initialization process is applied each time new bees are to be created. Step 2 The fitness computation process Fi is carried out for each site visited by a bee using clustering metric ‘e’, which is inversely related to fitness. Step 3 ‘m’ sites with the highest fitnesses are designated as “selected sites” and chosen for neighbourhood search. Step 4 Conducts search around the selected sites ‘m’ , assigning more bees to search in the vicinity of the best ‘e’ sites.
Step 5 Selection of the best sites can be made directly according to the fitnesses associated with them. Alternatively, the fitness values Fi are used to determine the probability of the sites being selected.
To build a cluster algorithm for yield prediction, the new crop yield needs to be analyzed with respect to region and time intervals of Crop Knowledge Base. The Bees foraging behaviour engages the bee to store the crop information in hive storage repository (Fig-2)
Step 6 Searches in the neighbourhood of the best e sites – those which represent the most promising solutions are carried out recruiting more bees for the best ‘e’ sites than for the other selected sites. Together with scouting, this differential recruitment is a key operation of the Bees Algorithm. Step 7 for each patch ‘g’, only the bee that has found the site with the highest fitness (the “fittest” bee in the patch) will be selected to form part of the next bee population. This restriction is purposefully introduced to reduce the number of search sites to be explored. Step 8 remaining bees in the population are randomly assigned around the search space to identify for new potential solutions. Step 9 Each iteration ends with the colony having two parts to its new population, such as representatives from the selected patches, and scout bees assigned to conduct random searches. These steps are repeated until a stopping criterion ‘s’ is met.
4.1 CRY – Customer Yield prediction using Bee Hive cluster algorithm Multiple data mining methods are used to analyse a large data set of crop growth attributes. The data set has been assembled from crop surveys of Indian agricultural regions. The research has utilized existing data collected from three commonly occurring crop types in order to establish patterns and correlations between a numbers of crop properties namely Rice, Paddy and Sugarcane.
Fig 3- Bee Hive Storage repository for CRY The crop yield feature selection algorithm shows the average yield of a particular crop over multiple regions over period of years or variable seasons. To analyze the outcome of specific crop growth, the mean value of a crop yield is to be analyzed over period of years based on differential agronomic attributes. CRY cluster algorithm classifies crop yield as high yield, average, and low yield. The algorithm also classifies yield as yield out of yield window (Lower or Upper) for a crop (Fig-4). This case arrives when yield thrives at higher amount or losses heavily than normal scenario. In such specific situations, cluster is carried out by taking the particular region yield over time period intervals. Definition Set of region R (r1, r2, r3… rn) Crop type C (c1, c2, c3… cm) Yield of crop yieldc. Mean of yield for a given crop
=
(
region Yield window is defined between region ) and ( ) (fig. 3.) ( Numerical constant
(
If
where n is no of
=
and
) then (δvl)
If (
=
= Cut-off) ) UCL and If –- Window ( (Upper (
=
(δvg)
) then (δou)
-- Control line CL
Or
Next
) then
-- Upper Margin UM
-- Lower Margin LM
End Procedure
-- Window (Lower Cut-off) LCL
Fig. 4 Crop Yield Analysis Crop class δ = {δou, δvg, δgd,δa, δl,δvl, δol}
δa=Average yield δvg= Very good yield δg=Good yield δou=Yield-out of yield window (upper cut off) δvl=Very low yield δl=Low yield δol=Yield-out of yield window (lower cut off) Algorithm: CRY Cluster Algorithm Input: Region (R), Crop Type (C), Yield of a crop (yieldc), Mean yield of crop ( Output: Crop class (δ)
)
Begin Procedure For R: =1 to n do If ( (
(
If (
>=
If (Condition 1) & (Condition2)…….. (Condition n) then “Prediction” Prediction rules: If (region = “Katpadi”) & (crop=”paddy”) then yield >”1560 Kg/ha” crop class=” Very Good yield” (δvg) If (crop=”sugarcane”) & (soilph=”7.2”)&(rainfall=”10cm”) then yield =”1065Kg/ha” crop class=”Average Yield” (δa) EndIf 5.0Experimental Results and Discussions
>=
)
and
)
and
) then (δa)