Computers and Electronics in Agriculture 56 (2007) 174–186
Delineation of site-specific management zones using fuzzy clustering analysis in a coastal saline land Li Yan a,b , Shi Zhou b,∗ , Li Feng c , Li Hong-Yi b a
Institute of Land Science and Property Management, College of Southeast Land Management, Zhejiang University, Hangzhou 310029, China b Institute of Agricultural Remote Sensing and Information Technology Application, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310029, China c Department of Environmental Engineering, Zhejiang University, Hangzhou 310029, China Received 13 November 2006; received in revised form 29 January 2007; accepted 30 January 2007
Abstract Recent research in precision agriculture has focused on use of management zones as a method for variable application of inputs. In this paper, five soil and landscape attributes, including a NDVI image, soil electrical conductivity, total nitrogen, organic matter and cation exchange capacity acquired for a coastal saline land were selected as data sources, and their spatial variability were analyzed and spatial distribution maps constructed with geostatistics technique. Principal component analysis and fuzzy c-means clustering algorithm were then performed to delineate the management zones, fuzzy performance index (FPI) and normalized classification entropy (NCE) was used to determine the optimal cluster number. To assess whether the defined three management zones can be used to characterize spatial variability of soil chemical properties and crop productivity, 139 georeferenced soil and yield sampling points across each management zone was examined by using variance analysis. It was found that the optimal number of management zones for the study area was three and there existed significantly statistical differences between the chemical properties of soil samples and yield data in each defined management zone, management zone 3 presented highest nutrient level and potential crop productivity, whereas management zone 1 lowest. The results revealed that the given five variables could be aggregated into management zones that characterize spatial variability in soil chemical properties and crop productivity. The defined management zones not only can direct soil sampling design, but also provide valuable information for site-specific management in precision agriculture. © 2007 Elsevier B.V. All rights reserved. Keywords: Management zones; Fuzzy clustering; Soil electrical conductivity (EC); Spatial variability; Saline land; Precision agriculture
1. Introduction The definition of classified management zones, utilizing the spatial management tools of precision agriculture, has been proposed as a cost-effective approach to improve crop management and reduce detrimental environmental impact (Khosla et al., 2002; Franzen et al., 2002). Management zones, in the context of precision agriculture, are subfield regions representing homogenous attributions in landscape and soil condition. When homogenous in a specific area, these attributes should lead to the same results in crop yield potential, input use efficiency, and environmental impact (Schepers et al., 2004). ∗ Corresponding author at: Institute of Agricultural Remote Sensing and Information Technology Application, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310029, China. E-mail addresses:
[email protected] (Y. Li),
[email protected] (Z. Shi).
0168-1699/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.compag.2007.01.013
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
175
Besides representing areas of equal production potential, within-field management zones have many other uses. Several studies have indicated that homogenous management zones could be used as an alternative to grid soil sampling and to develop nutrient maps for variable rate fertilizer application (Khosla and Alley, 1999; Fleming et al., 2000). Spatially coherent areas within fields may also be useful in relating yield to soil and topographic parameters for crop-modeling evaluation (Fraisse et al., 2001a). There are numerous methods for defining management zones, but most methods rely on spatial information sources that are stable or predictable over time and are related to crop yield (Doerge, 1999). Soil properties (fertility, electrical conductivity, organic matter and texture, etc.), aerial photographs, topography factors (slope, gradient and elevation, etc.) and yield maps have all been suggested as logical basis to define homogenous zones in agricultural fields (Franzen et al., 2002; Schepers et al., 2000, 2004). Many researchers used one or multiple information sources above to delineate subfield management zones. Boydell and McBratney (1999) found that imagery of a growing crop and yield data collected in the same year would be highly correlated and thus an accurate representation of crop production potential for that specific year. Long et al. (1994) concluded that aerial photographs of growing crops were the most accurate for classifying a field into management units to predict grain yield. Since the variability in soil electrical conductivity (EC) reflects the cumulative variability in multiple soil properties, it is one criterion for defining management zones (Sudduth et al., 1995). For some soils, EC mapping appears to integrate soil parameters related to productivity to produce a template of potential yield (Kitchen et al., 1999). Johnson et al. (2001) found that management zones based on EC mapping provided a useful framework for soil sampling to reflect spatial heterogeneity and could potentially be applied to assess temporal impacts of management on soil conditions. Ferguson et al. (2003) compared management zones based on slope and surface soil texture with those based on soil EC and concluded that the management zones based on easily obtained soil EC measurements, were preferable and had the potential for use in the site-specific management of nitrification inhibitors. Franzen and Kitchen (1999) utilize a variety of data resources such as topography, satellite imagery, soil electrical conductivity, crop yield maps and intensive soil survey data to construct management zones for N fertilizer management. Other published research used yield map to identify generalized management zones of low, medium and high yield (Stafford et al., 1998). In China, Classified management sub-zones based on spatial variability of soil nutrients have been developed to aid site-specific management of fertilizers (Bai et al., 2001; Huang et al., 2003). One promising statistical approach for identifying management zones on the basis of a number of different sources is called cluster analysis. This can be used to identify areas that have similar landscape attributes, soil properties and plant parameters, to quantify patterns of variability and to reduce the empirical nature of defined management zones (Fraisse et al., 2001b). Stafford et al. (1998) used fuzzy clustering of combine yield monitor data to divide a field into potential management zones. Similarly, Boydell and McBratney (1999) divided a field into management zones using cotton yield estimates from satellite imagery. Jaynes et al. (2003) used cluster analysis to decipher temporal and spatial patterns in 6 year of corn yield measurements and to reduce the temporal yield patterns for 224 yield plots into five clusters with similar temporal yield patterns. The objectives of this study in a coastal saline field were to: (1) characterize the spatial variability of the soil and landscape attributes affecting crop productivity, (2) identify the management zones using fuzzy cluster analysis based on the variability of the selected soil and landscape attributes, (3) evaluate the likely potential of the defined management zones for site-specific management in coastal saline regions. 2. Materials and methods 2.1. Area description, sampling, and measurements The study was conducted on a 10.5 ha cotton field in a coastal saline region, which is located in the northern region of Shangyu City, Zhejiang Province, and covers an area of 26 061 ha (30◦ 04 00 –30◦ 13 47 N, 120◦ 38 32 –120◦ 51 53 E). The region is subtropical with evergreen broadleaf vegetation, an average annual temperature of 16.5 ◦ C, and an average annual precipitation of 1300 mm. Modern marine and fluvial deposits form the dominant soils having light loam or sandy loam soil textures with a sand content of 592 g kg−1 and high concentrations of Na- and Mgsalts (>1%). Over the past 30 years, many coastal tideland areas have been successively enclosed and reclaimed for agricultural land uses under a series of reclamation projects. The field used in the present study was reclaimed in 1996 (Fig. 1).
176
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
Fig. 1. The study area and spatial distribution of sampling points.
A grid-sampling scheme was imposed on the field with 139 composite bulk electrical conductivity (ECb ) measurements for the soil profile (0–20 cm) using a portable WET sensor in November 2003. At each grid point, one representative sample was collected. At each sampling grid point, the WET sensor probes were inserted into the soil and five soil EC measurements were made within a 1 m diameter circle. The average reading for each grid point was computed as an ECb datum point. Each ECb measurement was geo-referenced using a Trimble Global Positioning System (GPS) (with differential correction). The GPS receiver accuracy was within 2 m of horizontal accuracy. When performing ECb measurements in the open, 139 soil samples were collected and taken back to the laboratory where their chemical properties were analyzed with conventional methods. NDVI data was calculated from SPOT5 satellite imagery acquired in 27 September 2003, to reflect the way crop is growing. During the harvest period in 2003, 139 cotton yield samples were also collected at each grid-point location. Five cotton plants at each grid point were harvested and the average seed cotton yield computed. 2.2. Conventional statistics Descriptive statistics including means, median, standard deviation (S.D.), coefficient of variation (CV), the maximum, minimum values, skewness and kurtosis were determined for each sample property. A correlation analysis was conducted to determine the relationship between cotton yield and soil pH, ECb , EC1:5 , organic matter (OM), total N (TN), available nitrogen (AN), available phosphorus (AP), and available potassium (AK) and cation exchange capacity (CEC). Then, a stepwise multiple regression analysis was performed to obtain the optimum model for describing the effects these soil properties on yield. Following this, distributions of some selected soil parameters in the optimum regression model were tested for normality using the Kolmogorov–Smirnov statistic. 2.3. Geostatistics analysis To characterize the spatial distribution of the soil parameters in the optimum regression model, semi-variance analyses were carried out on these selected soil variables using a geostatistical software package (GS+7.0) to determine the type of spatial structure. In the present study, isotropic spherical or exponential models were fitted to the experimental semivariograms using the method of least squares. The fitted models were then used in an ordinary punctual Kriging procedure to estimate the values of these selected soil properties at unsampled locations, smoothed contour maps of each soil property were then constructed using the interpolated value. 2.4. Principal component analysis To summarize and aggregate the selected soil and landscape attributes, principal component analysis (PCA) was performed on these variables in ERDAS procedure. Principal components for a data set are defined as linear combina-
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
177
tions of variables that account for the maximum variance within the entire data set by describing vectors of closest fit to the n observations in p-dimensional space, subject to being orthogonal to one another. A correlation matrix involving these selected soil and landscape properties was used as input for the analysis in lieu of a covariance matrix, resulting in normalized PCA. There are many documented strategies for using PCA or closely related factor analyses to select a smaller subset of variables from a larger number of variables. The strategy we utilized is similar to that described by Dunteman (1989). We assumed that principal components (PCs) receiving high eigenvalues best represented the landscape attributes. Therefore, we retained only the PCs with eigenvalues greater than 1. Fuzzy c-means clustering algorithm was then performed, using values for the retained PCs, to develop the management zones classes, with 99% convergence of class membership after a maximum of 50 iterations. 2.5. Fuzzy c-means clustering algorithm The fuzzy c-means clustering algorithm was used for the purpose of partitioning n data observations in feature space into c-groups or clusters based on a fuzzy c-means partition. There are three primary matrices involved in the clustering process. First, there is the data we want to classify, the data matrix X, consisting of n observations with p classification variables each. Second is the cluster centroid matrix V, consisting of c cluster centroids located in the feature space defined by the p classification variables. Finally, there is the fuzzy membership matrix U, consisting of membership values to every cluster in V for each observation in X, bounded by the constraints for all i = 1 to c and all k = 1 to n that: uik ∈ [0 − 1],
1 ≤ i ≤ c,
1≤k≤n
and
c
uik = 1,
1≤k≤n
(1)
i=1
An optimal fuzzy c partition is defined as the minimization of the generalized least-squared errors function, Jm , which is a weighted measure of the squared distance between pixels and class centroids: Jm (U, v) =
n c
(uik )m (dik )2
(2)
k=1 i=1
where m is the fuzziness weighting exponent (1 ≤ m < ∞), m controls the relative weights placed on each of the squared errors. Increasing m tends to increase the degree of fuzziness, hard clusters occur as m approaches a value of 1. There is no theoretical or computational evidence to distinguish an optimal m. (dik )2 is the squared distance in feature space between xk and vi , it can be computed in the following manner: (dik )2 = xk − vi 2 = (xk − vi )T A(xk − vi )
(3)
where xk is the data observation k in the data matrix X, vi the centroid of cluster i in the cluster centroid matrix V, and A is the positive-define (p × p) weight matrix that determines the norm used that controls the shape of the classes. Optimal fuzzy clusterings of X are obtained from pairs (U, v) that may be locally optimal for only if: n (uik )m xk vi = k=1 1≤i≤c (4) n m , k=1 (uik ) ⎤ ⎡ 2/(m−1) −1 c d ik ⎦ , 1 ≤ k ≤ n; 1 ≤ i ≤ c uik = ⎣ (5) djk j=1
The several types of cluster validity functions are usually calculated on each U produced by fuzzy c-means since the local minima of Jm are not consistent with the visually acceptable clustering patterns of the data. For this study, the fuzziness performance index (FPI) (Odeh et al., 1992; Boydell and McBratney, 1999) and normalized classification entropy (NCE) (Bezdek, 1981) were used for determining the optimal number of clusters:
n c 2 c k=1 i=1 (uik ) 1− (6) FPI = 1 − c−1 n
178
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
NCE =
n c
n uik loga (uik ) − k=1 i=1 n−c n
(7)
FPI is a measure of the degree of separation (i.e., fuzziness) between fuzzy c-partitions of X. Values of FPI may range from 0 to 1. Values approaching 0 indicate distinct classes with little membership sharing while values near 1 indicate no distinct classes with a large degree of membership sharing. The NCE models the amount of disorganization of a fuzzy c-partition of X (Odeh et al., 1992; Lark and Stafford, 1997). The optimal number of clusters for each computed index is when the index is at the minimum, representing the least membership sharing (FPI) and greatest amount of organization (NCE) as a result of the clustering process (Fridgen et al., 2004). Conventional statistics was performed with SPSS 12.0. GS+7.0 program was used for geostatistics analysis. Image analysis and display were done with ERDAS8.6 and ArcGIS8.3, Matlab6.5 was used in implementing the fuzzy c-means clustering algorithm. 3. Results 3.1. Conventional statistics of soil properties and crop yield The descriptive statistics of soil properties and cotton yield from 139 sampling points are summarized in Table 1. It was evident that these saline soils were characterized by high salinity content, low fertility and crop yield. The soil was alkalescent, pH varied from 7.19 to 8.5, with low coefficient of variation. The means of ECb and EC1:5 data did not differ significantly but varied widely. In common with other reports, coefficients of variations (CVs) of EC were fairly high (Mahmut and Cevat, 2003). This can be due to uneven crop growth and non-uniform management practices, resulting in marked changes in topsoil salinity over small distances. In addition, the micro-landform and the level of groundwater also contributed to the variability of salinity in the topsoil. OM content was very low, less than quarter of that in paddy soil in Zhejiang province (Bao, 2000). TN content was also very low. According to the study by Shi et al. (2003), the saline soils were characterized by high sand content, which was also typical of the present study site. Because of the coarse soil texture with high sand content and permeability, nitrogen fertilizer was prone to lose, which resulted in low level of nitrogen content. AP had higher content, which was due to the phosphorus fertilizer input by the farmers. AK in the study site was moderate and can meet the need for crop growth. In conclusion, the soils in coastal saline land have the characteristics with the low nutrient and high salinity, resulted in very low output and agricultural benefit. Table 2 lists the correlation coefficients of soil properties and cotton yield. For this study area, cotton yield was significantly correlated to ECb (P < 0.01), EC1:5 (P < 0.01), OM (P < 0.01), TN (P < 0.01), AN (P < 0.01), AP (P < 0.01) and CEC (P < 0.01), which indicated that these components were the main limiting factors for cotton growth. Also ECb and EC1:5 were negatively correlated with cotton yield, whereas soil OM, TN, AN, AP and CEC—properties indicative of yield potential—were positively correlated. Table 2 also revealed negative correlations between ECb and EC1:5 and most soil nutrient properties. Table 1 Descriptive statistics of soil chemical properties and cotton yield Variables
Mean
Median
S.D.
CV (%)
Minimum
Maximum
Skewness
Kurtosis
pH ECb (mS m−1 ) EC1:5 (mS m−1 ) OM (g kg−1 ) TN (g kg−1 ) AN (mg kg−1 ) AP (mg kg−1 ) AK (mg kg−1 ) CEC (cmol kg−1 ) Cotton yield (g plant−1 )
7.82 135.7 134.0 8.22 1.19 46.80 33.47 105.53 7.75 213.7
7.81 119 111 8.20 1.12 42.81 27.7 92 7.72 182
0.229 87.6 98.02 2.24 0.36 23.77 22.44 54.44 2.10 148.9
2.9 64.6 73.1 27.3 30.3 50.8 67.1 51.6 27.1 69.7
7.19 15 7.40 2.50 0.52 13.66 0.40 28.50 3.93 14
8.50 345 513.5 14.30 2.67 95.8 104.50 249.81 19.14 714
0.472 0.436 1.239 0.031 1.307 3.021 0.880 0.687 0.206 1.176
0.724 −0.842 1.802 −0.386 2.338 15.188 0.529 −0.520 4.998 0.931
n = 139; OM, soil organic matter; ECb , bulk electrical conductivity; TN, total nitrogen; AN, available nitrogen; AP, available phosphorus; AK, available potassium; CEC, cation exchange capacity.
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
179
Table 2 Correlation matrix for soil chemical properties and cotton yield in the study area Observation items
pH
pH ECb EC1:5 OM TN AN AP AK CEC Yield
1 −0.038 −0.272** −0.242** −0.035 −0.495** −0.348** 0.065 −0.156 −0.011
* **
ECb (mS m−1 )
EC1:5 (mS m−1 )
OM (g kg−1 )
TN (g kg−1 )
AN (mg kg−1 )
AP (mg kg−1 )
AK (mg kg−1 )
CEC (cmol kg−1 )
Yield (g plant−1 )
1 0.852** −0.182* −0.194* −0.053 −0.187* 0.425** −0.204* −0.524**
1 −0.232** −0.202* −0.191* −0.181* 0.420** −0.291** −0.549**
1 0.606** 0.335** 0.098 0.151 0.408** 0.335**
1 0.508** 0.005 0.121 0.302** 0.431**
1 0.077 0.107 0.248** 0.255**
1 0.255** 0.110 0.271**
1 0.066 0.112
1 0.284**
1
Significant at the 0.05 probability level. Significant at the 0.01 probability level.
To obtain the optimum model for describing the effects of soil properties on the yield with cotton yield as the dependent variable and the six main limiting factors: OM, ECb , TN, AN, AP and CEC as independent variables, a stepwise multiple regression analysis was performed. Since no a priori information was available about the regression model, the presence of a linear combination of the variables was assumed. The most appropriate model obtained with significance levels of P = 0.05 was: Y = 760.83 − 2.52(ECb ) + 1.47(OM) + 1.14(TN) + 0.893(CEC)
(R2 = 42%)
(8)
The R2 of 0.42 indicated that the model using only the four soil chemical factors successfully explained slightly more than 40% of the total variation of the measured yield. The regression coefficients in the equation generally indicated the magnitude of each factor in its contribution to cotton yield performance. In Eq. (8), ECb had a negative contribution to yield and the largest magnitude. Thus, the increase of soil salinity decreased yield to a large extent. Previously, Fu et al. (2000) found that, in the same coastal saline land, salinity was negatively correlated with the relative yield of cotton, soybean and mustard leaf, etc., with correlated coefficient of about 0.9. These findings and this work confirmed that the redundant salinity was the most limiting factor for crop growth in the study area. OM, TN and CEC were positively correlated with cotton yield, implying that the increase of these components would increase yield. 3.2. Geostatistics analysis Distributions of soil ECb , OM, TN and CEC using the Kolmogorov–Smirnov statistic were found to have normal distributions, thereby providing a basis for further structural analysis. The results of structural analysis on the four variables are given in Fig. 2. It was evident that the four variables illustrated isotropic behavior. All semivariograms had good continuity in space, soil ECb and OM could be modeled quite well with spherical models, whereas TN and CEC with exponential models. The presence of nugget variance in each soil property, was probably due to short-range variability and unaccountable measurement errors. The ratio of nugget variance to sill variance could be regarded as a criterion to classify the spatial dependence of soil properties. If the ratio is less than 25%, the variable has strong spatial dependence, between 25 and 75%, the variable has moderate spatial dependence, and greater than 75%, the variable shows only weak spatial dependence (Chien et al., 1997). The four variables exhibited moderate to strong spatial dependencies with the ratio of nugget variance to sill variance from 22 to 35%. The range of spatial dependence of the four variables was 44.7 to 310.4 m and was considered the distance beyond which observations were not spatially dependent. Kriging interpolation was applied to interpolate the four soil properties into a 10 m grid cell to represent the four variables on the same spatial resolution as NDVI data (10 m, see section below). This enabled the plot to be divided into several classes to determine homogenous zones. The smoothed contour maps obtained for the four variables are presented in Fig. 3.
180
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
Fig. 2. Semivariograms of some soil properties and their fitted curves and parameters.
It can be seen evidently that high salinity distributed in the eastern section and low salinity in the southwest and northern parts of the study area. Salinity in the groundwater induced the high salinity level in the east. Because there were some fish ponds to the east of the field, groundwater filtered into the eastern edge of study area transporting salts, which were deposited and then accumulated in the topsoil when the water subsequently evaporated. The low salinity in the southwest and northern parts of the study area was due to the influence of soil management practices. The distribution of TN was opposite in direction to that of ECb , low TN content distributed in the east to the study
Fig. 3. Smoothed contour maps produced by ordinary Kriging for soil ECb , OM, TN and CEC.
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
181
Fig. 4. NDVI images produced from SPOT5 image.
area, the northern and south-west parts had high TN level. OM and CEC displayed quite similar patterns, with high value in the west and low value in the east section. Normalized difference vegetation index (NDVI) are closely related to many vegetation parameters such as leaf area index, vegetation cover, vegetation biomass and crop growth, so is often used to monitor crop growth and predict crop yield. In the present study, with SPOT5 image, NDVI was calculated to reflect cotton cover and growth. The obtained NDVI image was given in Fig. 4. NDVI image coincided with considerably the distribution map of soil ECb . The higher NDVI value was in the northern and southwest to the study area, where salinity content was low, whereas, lower NDVI value appeared in the east section, where salinity content was high. This phenomenon was mainly caused by the negative correlation between salinity content and crop growth. Where the cotton grew well, salinity content was low, the other way around, where cotton grew worse, salinity content was high. As an important index to reflect crop cover and growth, NDVI image and the distributions map of ECb , TN, OM and CEC (Fig. 3) were together used in principal component analysis to compress the information content and reduce the dimensionality into two transformed principal component images. 3.3. Principal component analysis Because the five attributes of ECb , TN, OM, CEC and NDVI are highly correlated, to aggregate and summarize the variability in the five attributes, principal component analysis (PCA) was performed, retaining principal components producing eigenvalues greater than 1 and accumulative contribution rate greater than 85%. Using this criterion, only first two principal components were considered for the final analysis, with the two principal components accounting for 87.65% of the variability in the five attributes (Table 3). The principal component contribution rate for the two principal components shown in Table 3 indicated ECb , NDVI and TN had the most significant influence on PC1 although their influence was opposite in direction as indicated by their positive and negative signs. For PC2, OM and CEC produced a Table 3 Principal component analysis of the five attributes that related to cotton yield in the study area Principal components
Attribute value
Contribution rate (%)
Accumulative contribution rate (%)
Principal component 1 Principal component 2 Principal component 3 Principal component 4 Principal component 5
2.861 1.521 0.263 0.212 0.143
57.23 30.42 5.25 4.23 2.87
57.227 87.65 92.90 97.14 100
Principal contribution rate for each variable Variables
ECb
NDVI
OM
TN
CEC
Principal component 1 Principal component 2
−0.849 0.375
0.824 −0.421
0.481 0.830
0.883 −0.217
0.370 0.683
182
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
Fig. 5. PC images produced by principal component analysis.
large contribution rate relative to the other variables. In summary, the principal component analysis aggregated the five attributes into two principal components, accounting for a majority of the overall spatial variability in these attributes. Fig. 5 displays the two principal component images. 3.4. Fuzzy c-means algorithm To classify the two principal component images into management zones, the two principal component images were first layer-stacked in the ERDAS procedure, the obtained map layer was analyzed with the fuzzy c-means algorithm to assign a pixel to the appropriate classes based on the minimization of the generalized least squared error function of each pixel. We considered six clusters to be the maximum number of practical use as management zones. So the obtained map layer was divided into two, three, four, five and six clusters. Two cluster validity functions, including fuzzy performance index (FPI) and normalized classification entropy (NCE) were used as indicators of optimum cluster number. The results from the two indices were graphed in Fig. 6. The optimal number of clusters for each computed index is when the index is at the minimum, representing the least membership sharing (FPI) or greatest amount of organization (NCE) as a result of the clustering process. The minimum FPI and NCE were obtained with three clusters for the present study area, which is also coincident with our prior knowledge in this area. The final decision of how many clusters to use for creating management zones when the two cluster validity indices are dissimilar may require additional verification. For example, when developing productivity zones, verification of cluster number might be accomplished by comparing the within-zone yield variation as one increase the number of clusters (Fridgen et al., 2004).
Fig. 6. Calculated FPI and NCE for study area.
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
183
Fig. 7. (a) Management zones map and (b) spatial distribution of sampling point across each zone.
Fig. 7a shows the resultant management zone map, depicting three management zones. For statistical comparison, Kriging method was also used to interpolate the pre-processed yield data into a 10 m grid cell to examine how well these potential management zones reflected productivity levels (Fig. 7b). The maps illustrated that the procedures utilized for aggregating the five soil and landscape attributes resulted in management zone map with spatial patterns very similar to those of the crop yield (Fig. 7b). Visually, management zone 1 appeared to be located in the region with less productivity while management zone 3 was situated in the highly productive region. To assess whether our method of utilizing NDVI, ECb , TN, OM and CEC data to delineate management zones could be used to effectively characterize spatial variation in soil chemical properties (pH, ECb , EC1:5, AP, AK, OM, AN, TN, CEC) and crop productivity, all georeferenced soil sample and yield data points were assigned to one of the three respective management zones (Fig. 7a). Once soil sample and yield data points were assigned zone classification, the data were exported and analyzed via variance analysis to provide an indication of statistical distinction between the different potential management zones. The results revealed distinctly different soil chemical properties and crop yield for the three management zones (Table 4). The average of the mean ECb , EC1:5, AP, AK, AN and measured cotton yield within each management zone were significant at P < 0.01 probability level, for soil OM, TN and CEC, at P < 0.05 probability level. In summary, soil chemical properties were much more optimal for crop growth in the management zone 3 than in the management zone 1. For example, 256% increase in crop yield level from the management zone 3 to the management zone 1 was detected, whereas, nearly 202% decrease in EC level from the management zone 3 to the management zone 1 was observed, implying a negative association between soil EC and crop yield. As important fertility indices, AP, OM, AN, TN and CEC content were also highest in management zone 3 and lowest in management zone 3, where for AK, the opposite trend was observed. Fu et al. (2000) also found AK presented higher level in newly reclaimed coastal region where salinity content was high, lower level in earlier reclaimed coastal region where salinity content was low. Further studies will be conducted before general conclusions were made about the distribution of AK in the coastal saline land. Thus, it appears that soil and landscape attributes such as NDVI, soil ECb , TN, OM and CEC can be used to delineate management zones that characterize spatial variation in soil chemical properties and crop productivity. 4. Discussion The definition of site-specific management zones rely on spatial information that is stable or predictable over time and is related to crop yield. So, the spatial data sources, such as yield data, soil properties and terrain attributes from multiple years may be needed. In this paper, the spatial variations of some selected soil properties and yield data from only single year were used to delineate management zones, the temporal variability or stability of these variables were not taken into account. In addition, spatial continuity of management zones is not only associated with the spatiotemporal variations of the used variables, but also associated with their spatial scale. Further studies will be conducted to define management zones according to the spatial variability and temporal stability of soil attributes and crop yield, also, under different spatial scales of data resources, to assess the spatial continuity of classified management zones by using some ecological indices such as landscape fragmentation and aggregation degree. The number of management zones depended upon desired measurement sensitivity and the level of within-field variability. For this study, separation into three management zones proved to be a good compromise between sensitivity and visually discernable variability patterns of soil and landscape.
184
Management zones
Number of samples
Soil properties pH
ECb (mS m−1 )
EC1:5 (mS m−1 )
AP (mg kg−1 )
AK (mg kg−1 )
OM (g kg−1 )
AN (mg kg−1 )
TN (g kg−1 )
CEC (cmol kg−1 )
Yield (g plant−1 )
251.29 158.12 83.27
186.92 173.91 89.83
30.39 33.99 37.97
145.64 114.82 88.32
7.58 8.27 8.40
36.99 46.68 51.13
1.003 1.139 1.263
6.79 7.88 7.96
67.2 145.87 239.28
3.999 0.021
2.54 0.053
1 2 3
20 47 72
7.91 7.61 7.83
Variance analysis
F-Value Prob > F
1.909 0.152
55.06 0.000
17.03 0.000
1.642 0.000
11.14 0.009
0.980 0.038
2.541 0.004
13.46 0.000
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
Table 4 One-way variance analysis of soil pH, ECb , EC1:5, AP, AK, OM, AN, TN, CEC and cotton yield in the topsoil for the three management zones
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
185
For farmers to adopt site-specific management, the development of management zones must be simple, functional, and economically feasible. Complex field assessments and data manipulation may not be justifiable in terms of time, benefit or economics. Therefore cluster analysis, based on the assumption that grouping data points into naturally occurring clusters will reduce within-zone variability, provides an opportunity for identifying management zones in a site and, potentially, applying site-specific management to maximize crop production across the entire area. This also represents a simplified approach for identifying threshold parameters related to yield potential. 5. Conclusion Recent research in the area of precision agriculture has focused on the use of management zones as a more economical approach to define regions within a farm field for variable application of crop inputs. In this paper, a stepwise multiple regression analysis was performed to obtain the limiting factors for crop growth in a coastal saline field. The spatial variations of these limiting-yield factors were quantified using geostatistical tool and aggregated into management zones by principal component analysis and fuzzy c-means clustering algorithm. The classified management zones can be used to characterize spatial variability in soil chemical properties and showed favorable agreement with crop yield variability pattern. These results provide a basis of information for managing coastal saline land in a rationally site-specific and precise way and may be used to develop a targeted soil sampling plan to capture variability in various soil properties that are likely to influence crop yield. Knowledge of such management zones may enable a reduction in the number of soil analysis needed to create application maps for certain cultivation operations. Since the cost of obtaining soil samples to characterize field variability is a key problem in precision agriculture, this is particularly advantageous. Acknowledgements We thank the following for financial support for this project: the National Natural Science Foundation of China (no. 40001008, no. 40571066), and the Postdoctoral Science Foundation of China (no. 20060401048). We also thank Cheng Jie-Liang, Jin Hui-Ming, Ye Ji-Yao for their assistance in collecting field and crop measurements. References Bai, Y.L., Jin, J.Y., Yang, L.P., Liang, M.Z., 2001. Research on the subarea management model of soil nutrients by GIS. Sci. Agric. Sin. 34, 46–50 (in Chinese). Bao, S.D. (Ed.), 2000. Soil and Agricultural Chemistry Analysis. China Agricultural Press, Beijing (in Chinese). Bezdek, J.C. (Ed.), 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York. Boydell, B., McBratney, A.B., 1999. Identifying potential within field management zones from cotton yield estimates. In: Stafford, J.V. (Ed.), Proceedings of the 2nd European Conference on Precision Agriculture. Odense Congress Cent., Denmark, July 11–15. SCI, London, pp. 331–341. Chien, Y.J., Lee, D.Y., Guo, H.Y., Houng, K.H., 1997. Geostatistical analysis of soil properties of mid-west Taiwan soils. Soil Sci. 162, 291–297. Doerge, T., 1999. Defining management zones for precision farming. Crop Insights 8 (21), 1–5. Dunteman, G.H. (Ed.), 1989. Principal Components Analysis. Sage Publ., London. Ferguson, R.B., Lark, R.M., Slater, G.P., 2003. Approaches to management zone definition for use of nitrification inhibitors. Soil Sci. Soc. Am. J. 67, 937–947. Fleming, K.L., Westfall, D.G., Wiens, D.W., Brodah, M.C., 2000. Evaluating farmer developed management zone maps for variable rate fertilizer application. Precis. Agric. 2, 201–215. Fraisse, C.W., Sudduth, K.A., Kitchen, N.R., 2001a. Calibration of the ceres-maize model for simulating site-specific crop development and yield on claypan soils. Appl. Eng. Agric. 17 (4), 547–556. Fraisse, C.W., Sudduth, K.A., Kitchen, N.R., 2001b. Delineation of site-specific management zones by unsupervised classification of topographic attributes and soil electrical conductivity. Trans. ASAE 44 (1), 155–166. Franzen, D.W., Kitchen, N.R., 1999. Developing management zones to target nitrogen applications. SSMG-5. In: Site-specific Management Guidelines Series. Potash Phosphate Institute, http://www.ppi-far.org/ssmg. Franzen, D.W., Hopkins, D.H., Sweeney, M.D., Ulmer, M.K., Halvorson, A.D., 2002. Evaluation of soil survey scale for zone development of site-specific nitrogen management. Agron. J. 94, 381–389. Fridgen, J.J., Kitchen, N.R., Sudduth, K.A., Drummond, S.T., Wiebold, W.J., Fraisse, C.W., 2004. Management zone analyst (MZA): software for subfield management zone delineation. Agron. J. 96, 100–108. Fu, Q.L., Li, R.A., Ge, Z.B., 2000. Study and Practice on Agricultural Technology Demonstration in Coastal Saline Land in Zhejiang Province. Zhejiang University Press, Hangzhou, pp. 102–104 and 122–123 (in Chinese).
186
Y. Li et al. / Computers and Electronics in Agriculture 56 (2007) 174–186
Huang, S.W., Jing, J.Y., Yang, L.P., Cheng, M.F., 2003. Spatial variability and regionalized management of soil nutrients in the grain crop region in Yutian County. Acta Pedol. Sin. 40, 79–88 (in Chinese). Jaynes, D.B., Kaspar, T.C., Colvin, T.S., James, D.E., 2003. Cluster analysis of spatiotemporal corn yield patterns in an Iowa field. Agron. J. 95, 574–586. Johnson, C.K., Doran, J.W., Duke, H.R., Wienhold, B.J., Eskridge, K.M., Shanahan, J.F., 2001. Field-scale electrical conductivity mapping for delineating soil condition. Soil Sci. Soc. Am. J. 65, 1829–1837. Khosla, R., Alley, M.M., 1999. Soil-specific management on Mid-Atlantic coastal plain soils. Better Crops with Plant Food 83 (3), 6–7. Khosla, R., Fleming, K., Delgado, J.A., Shaver, T.M., Westfall, D.G., 2002. Use of site specific management zones to improve nitrogen management for precision agriculture. J. Soil Water Conserv. 57 (6), 513–518. Kitchen, N.R., Sudduth, K.A., Drummond, S.T., 1999. Soil electrical conductivity as a crop productivity measure for claypan soils. J. Prod. Agric. 12, 607–617. Lark, R.M., Stafford, J.V., 1997. Classification as a first step in the interpretation of temporal and spatial variation of crop yield. Ann. Appl. Biol. 130, 111–121. Long, D.S., Carlson, G.R., DeGloria, S.D., 1994. Quality of field management maps. In: Robert, P.C., Rust, R.H., Larson, W.E. (Eds.), Proceedings of the 2nd International Conference on Site-specific Management for Agricultural Systems. ASA, CSSA, SSSA, Madison, WI, pp. 251–271. Mahmut, C., Cevat, K., 2003. Spatial and temporal changes of soil salinity in a cotton field irrigated with low-quality water. J. Hydrol. 272, 238–249. Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1992. Soil pattern recognition with fuzzy-c-means: application to classification and soil–landform interrelationships. Soil Sci. Soc. Am. J. 56, 505–516. Schepers, J.S., Schlemmer, M.R., Fergunson, R.B., 2000. Site-specific considerations for managing phosphorus. J. Environ. Qual. 29, 125–130. Schepers, A.R., Shanahan, J.F., Liebig, M.K., Schepers, J.S., Johnson, S.H., Luchiari Jr., A., 2004. Appropriateness of management zones for characterizing spatial variability of soil properties and irrigated corn yields across years. Agron. J. 96, 195–203. Shi, Z., Huang, M.X., Li, Y., 2003. Physico-chemical properties and laboratory hyperspectral reflectance of coastal saline soil in Shangyu city of Zhejiang province, China. Pedosphere 13 (3), 111–120. Stafford, J.V., Lark, R.M., Bolam, H.C., 1998. Using yield maps to regionalize fields into potential management units. In: Robert, P.C. (Ed.), Proceedings of the Fourth International Conference on Precision Agriculture. ASA, CSSA, SSSA, Madison, WI, pp. 225–237. Sudduth, K.A., Kitchen, N.R., Hughes, D.F., Drummond, S.T., 1995. Electromagnetic induction sensing as an indicator of productivity on claypan soils. In: Robert, P.C. (Ed.), Proceedings of the 2nd Internal Conference on Site-specific Management for Agricultural Systems. ASA, CSSA, SSSA, Madison, WI, pp. 671–681.