Computers and Electronics in Agriculture 51 (2006) 99–109
Application of support vector machine technology for weed and nitrogen stress detection in corn Y. Karimi a , S.O. Prasher a,∗ , R.M. Patel a , S.H. Kim b a
b
Department of Bioresource Engineering, McGill University, Macdonald Campus, 21, 111 Lakeshore Road, Ste-Anne-de-Bellevue, Que., Canada H9X 3V9 Department of Environmental Engineering, Yeungnam University, Kyongsan 712-749, South Korea Received 13 April 2005; received in revised form 21 November 2005; accepted 5 December 2005
Abstract This study was conducted to evaluate the usefulness of a new method in artificial intelligence, the support vector machine (SVM), as a tool for classifying hyperspectral images taken over a corn (Zea mays L.) field. The classification was performed with respect to nitrogen application rates and weed management practices, and the classification accuracy was compared with those obtained by an artificial neural network (ANN) model on the same data. The field experiment consisted of three nitrogen application rates and four weed management strategies. A hyperspectral image was obtained with a 72-waveband Compact Airborne Spectrographic Imager, at an early growth stage during the year 2000 growing season. Nitrogen application rates were 60, 120, and 250 kg N/ha. Weed controls were: none, control of grasses, control of broadleaf weeds, and full weed control. Classification accuracy was evaluated for three cases: combinations of nitrogen application rates and weed infestation levels, nitrogen application rates alone, and weed controls alone. The SVM method resulted in very low misclassification rates, as compared to the ANN approach for all the three cases. Detection of stresses in early crop growth stage using the SVM method could aid in effective early application of site-specific remedies to timely in-season interventions. © 2006 Elsevier B.V. All rights reserved. Keywords: Remote sensing; Hyperspectral; Support vector machine; Nitrogen stress; Weed stress; Corn field
1. Introduction Managing the problem of the ever increasing demand for food by the world’s human population and, at the same time, minimizing the impact of the ensuing intensive agriculture on environmental quality requires the development of more efficient approaches to crop production. The option of precision farming, which is based on site-specific application of agricultural inputs (e.g., fertilizer, herbicide, and pesticide), where they are needed (variable rate technology), is very promising in this respect. This methodology can result in appreciable reduction in overall quantities of agro-inputs (Tomer et al., 1997; Christensen et al., 1998). The effectiveness of precision farming depends not only on faster collection of field data, as early as possible in the critical growth period, but also on analysis and interpretation of the collected data and the development of variable rate application techniques. The new generation of hyperspectral sensors can provide information at very fine spatial and spectral resolutions at a reasonable cost (Lamb, 1998). However, there is a need to develop better and faster methods ∗
Corresponding author. Tel.: +1 514 398 7783; fax: +1 514 398 8387. E-mail address:
[email protected] (S.O. Prasher).
0168-1699/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.compag.2005.12.001
100
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
of data transfer, data storage, and data analysis (Thenkabail et al., 2003), with minimal requirement of manpower for validation and ground truthing. Various statistical and artificial intelligence methods have been used for analysis of remote sensing data in agricultural fields. Among these, artificial neural networks (ANNs) (Deck et al., 1995; Ghazanfari et al., 1996; Nakano, 1997; Wilkinson, 1997; Yang et al., 1999; Goel et al., 2003b), decision trees technology (Hansen et al., 1996; Friedl and Brodley, 1997; Friedl et al., 1999; Soh, 1999; Goel et al., 2003b), and discriminant analysis (Meyer et al., 1998; Burks et al., 2002; Cho et al., 2002; Kenkel et al., 2002; Terawaki et al., 2002; Vrindts et al., 2002; Slaughter et al., 2003; Karimi et al., 2005) are widely used approaches. Goel et al. (2003b) used decision trees (DTs) and ANNs to discriminate corn plots, cropped under various weed controls and nitrogen application rates. Their results showed that reasonable accuracy could be obtained in distinguishing between nitrogen levels or between weed controls at a given growth stage. They reported that ANNs performed considerably better than DTs. By comparing these results with five other classifier methods, Goel et al. (2003a) found that there was no consistently better classifier for all problems studied. The application of discriminant analysis in agricultural remote sensing has been reported in many recent studies (Burks et al., 2002; Cho et al., 2002; Terawaki et al., 2002; Slaughter et al., 2003). Discriminant analysis methods have been used to recognize different weed types from soil based on textural image processing (Meyer et al., 1998). Vrindts et al. (2002) used ANN and discriminant functions to detect weeds, based on three-band ratios of canopy reflectance, and found discriminant analysis to be superior to ANN. They reported that a better accuracy was obtained using discriminant analysis. Whereas the misclassification rate for distinguishing corn from weeds was very low (about 1%) using laboratory data, it was as high as 85% for corn under field conditions. The misclassification rate for distinguishing weeds from the corn crop, however, was rather low (3%) under field conditions. The support vector machine (SVM) algorithm, a supervised machine learning method based on the statistical learning theory (Vapnik, 1995), is a new method in artificial intelligence. SVM is basically a binary classifier, which finds the maximal margin (hyperplane) between two classes. SVM can classify non-linearly separable data sets by plotting the data into a high-dimensional feature space using kernels. In comparison to other data mining techniques such as ANNs, it is easier to use, and only a few parameters need to be adjusted by the users. SVM has been widely applied to classification problems, such as electronic nose data (Pardo and Sberveglieri, 2002; Trihaas and Bothe, 2002), tissue classification (Furey et al., 2000; Pavlidis et al., 2004), shape extraction and classification (Cai et al., 2001; Du and Sun, 2004), protein recognition (Zien et al., 2000), bakery process data (Rousu et al., 2003), and crop classification (Camps-Valls et al., 2003). However, the applications of SVM method in the field of agricultural remote sensing are not fully explored at the present time. The overall objective of this study was to examine the capability of the SVM method in analyzing aerial observations taken in a cornfield at an early growth stage. The classification results were compared with the results obtained using an ANN method, used by Goel et al. (2003a,b), on the same data set. 2. Materials and methods 2.1. Experimental design The study was carried out in the growing season of the year 2000 at the Lods Agronomy Research Center of Macdonald Campus, McGill University, Ste-Anne-de-Bellevue, Que., Canada (45◦ 25 45 N latitude, 73◦ 56 00 W longitude). The experimental design was a two-factor split plot, with the plots of size 20 m × 20 m containing 26 rows of plants. Corn was planted under different weed controls strategies and nitrogen application rates. Weed treatment was assigned as the main factor and three nitrogen application rates as the sub-factor, replicated four times. The four weed treatments were: no weed control (W1), control of grass species (W2), control of broadleaf species (W3), and full weed control (W4). The nitrogen application rates were low nitrogen (60 kg N/ha, N60 ), normal nitrogen (120 kg N/ha, N120 ), and high nitrogen (250 kg N/ha, N250 ). 2.2. Spectral data collection A Compact Airborne Spectrographic Imager (CASI) was used for hyperspectral aerial image acquisition. The data were collected in 72 narrow bands in a range from 408.73 to 947.07 nm (visible to near-infrared),
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
101
Fig. 1. Measured spectral response curves of corn at an early growth stage (June 30, 2000) under different nitrogen application rates and weed control conditions (N60 , N120 , and N250 are nitrogen treatments of 60, 120, and 250 kg N/ha and W1, W3, and W4 are weed treatments with no weed control, broadleaf control, and full weed control, respectively).
with bandwidths that varied from 4.27 to 4.41 nm. The distance between two band centers ranged from 7.4 to 7.74 nm. Spatial resolution was 2 m at the flight altitude. The image was taken on June 30, 2000, which represents an early growth stage for corn. Radiometric, geometric, and atmospheric corrections were applied to the image, details of which are given in Goel et al. (2003b). Reflectance values from the image were extracted using ENVI software (ENVI 3.1, Research System, Inc., Boulder, CO, USA). In each plot, a total of 20 random points were chosen arbitrarily to obtain representative reflectance values for the plot. The average spectral response of corn under different nitrogen levels and weed control treatments is illustrated in Fig. 1. 2.3. Weed survey The information on weeds was collected on July 14, 2000. Weed data were collected using three randomly chosen 50 cm × 50 cm quadrates. Various observed weed parameters were: weed types, density, weed height, and percent ground coverage. In each quadrate, the percent of weed cover was established visually. The most dominant grassy weeds were Yellow Nutsedge (Cyperus esculenthus), Barnyard grass (Echinochloa crusgalli), and Crab grass (Digitaria ischaemum), whereas the dominant broad-leaves were Canada thistle (Cirsium arvensis), Sow thistle (Sonchus oleraceus), Redroot pigweed (Amaranthus retroflexus), and lamb’s quarter (Chenopodium album). More details on weed survey can be found in Goel et al. (2003b). 3. The SVM method The support vector machine is a “maximal margin classifier”. For example, for a given number of training samples, containing two different classes of data, SVM derives a classification function which is a hyperplane at the maximum distance from the closest points belonging to both classes.
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
102
Fig. 2. Linear separating planes between two classes (adapted from Burges, 1998).
3.1. Linearly separable binary classification Let us consider a simple binary classification problem with M training samples that can be represented by a set of pairs {(xi , yi ), i = 1, 2, . . ., M}, where the yi are class labels (yi = +1 (or Class C1) or −1 (or Class C2)) and xi are the input vectors. The classification function (i.e., the separating hyperplane—in this case, a line) can be written as follows: yi = w · xi − b
(1)
where w is a normal vector to the hyperplane, |b|/ w is the perpendicular distance from the hyperplane to the origin, and w is the Euclidean norm of w (Burges, 1998). It is possible to have an infinite number of planes (or hyperplanes) that can separate the two classes of data; two such planes are shown in Fig. 2. Intuitively speaking, one would prefer to select the plane P1 as the classification function (over P2), owing to the fact that a minor change in the data items (of C2) will not introduce any errors in the classification. This is not the case for plane P2. In Fig. 3, the supporting planes (marked as P1 and P2) for the two classes and the optimal classifying plane (in bold) are shown. It is clear from the figure that only certain data points (the ones shown in bold—called support vectors in SVM) actually influence the equation of the optimal separating plane. Remaining data points are redundant, as far as the formulation/construction of separating plane is concerned. From a geometric perspective, we would select an optimal plane P that is farthest from both the classes. The plane can be determined by computing two parallel supporting planes, one for each of the two classes, and maximizing
Fig. 3. Optimal separating plane between the two classes of data (adapted from Burges, 1998).
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
103
the distance between them. A supporting plane of a class is defined as any plane such that all points of the class are on one and only one side of that plane. Ideally, for a supporting plane, we would like that (w · xi − b ≥ 1) for one class (y = +1) and (w · xi − b ≤ −1) for the other (y = −1). The points that lie on the hyperplane w · xi − b = 1 will have a perpendicular distance from the origin of |1 − b|/ w and those on hyperplane w · xi − b = −1 would have a corresponding distance of | − 1 − b|/ w. Hence, the distance between the two planes (2/ w) can be obtained by subtracting | − 1 − b|/ w from |1 − b|/ w. For optimal classification, we would like to maximize this distance which is equivalent to following minimization problem (Burges, 1998): 1 w2 2 Subject to : w · xi ≥ b + 1
Minimize
for xi ∈ C1, and
w · xi ≥ b − 1
(2)
for xi ∈ C2.
This is a well-known convex quadratic optimization problem and the explicit solution of this equation is rather difficult. Using Lagrangian multipliers, the solution to this optimization problem is (Burges, 1998): Minimize
W(α) =
M
M M
αi −
i=1
Subject to :
M
1 αi αj y i y j x i · x j 2 i=1 j=1
α i yi = 0
and
(3)
0 ≤ αi
i=1
where αi are the positive Lagrange multipliers. Solution of Eq. (3) provides the equation of the optimal separating plane. Now, consider a more realistic case when some of the data points belonging to classes C1 and C2 overlap each other. In order to find a robust classification function in such cases, we can relax certain constraints. Ideally, we would have preferred to have not even a single point misclassified. In the presence of outliers, this is not possible; instead, we would prefer to have the “majority” of points correctly classified. In other words, there will be some points in both classes, which will fall on the “wrong” side of the supporting plane. Such points will be treated as outliers or errors. The optimization problem is then modeled not only to maximize the margin between the supporting planes, but also to minimize the errors of classification. To this end, non-negative “slack variables” (denoted by z) are introduced into the optimization problem, which transform the optimization Eq. (2) as follows: M
Minimize
1 w2 + C zi 2 i=1
Subject to : w · xi ≥ b + 1 + zi w · x i ≤ b − 1 + zi
for xi ∈ C1, and for xi ∈ C2
(4)
where C is the tradeoff between the following two factors: the margin between the two supporting planes and the classification error. The equivalent dual problem corresponding to Eq. (3) becomes (Burges, 1998): Minimize
W(α) =
M i=1
Subject to :
M
α i yi = 0
M M
αi −
1 αi αj y i y j x i · x j 2 i=1 j=1
and
(5)
0 ≤ αi ≤ C
i=1
3.2. Non-linearly separable binary classification For training samples that are not linearly separable, the data need to be transferred onto a space of higher dimensionality (called the “feature space”) so that a reliable linear separation can be computed (Gunn, 1998). Let us denote this mapping onto the feature-space by a function, Φ, such that z = Φ(x) is the feature-point corresponding to a data
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
104
item x. The equation of the optimal separating hyperplane (OSH) in this feature-space can then be written as (Burges, 1998; Gunn, 1998): f (x) =
M
αi yi xi · Φ(x) − b
(6)
i=1
Eq. (6) is comparable to Eq. (1) where w and x are replaced by M i=1 αi yi xi and Φ(x), respectively. Following the procedure explained for the linearly separable case (Eq. (5)), it can be shown that the classification can be performed by solving the following optimization equation ((Burges, 1998; Gunn, 1998): W(α) =
Minimize
M
M M
αi −
i=1
Subject to :
M
αi y i = 0
1 αi αj yi yj Φ(xi ) · Φ(xj ) 2 i=1 j=1
and
(7)
0 ≤ αi ≤ C
i=1
However, direct solution of the above minimization problem is often not feasible. This difficulty is circumvented by employing a “kernel function”, K(x, y), which is the dot product of Φ(x) and Φ(y). Replacing the dot product in the above equation by the kernel function, the equation for the OSH becomes: f (x) =
M
αi yi K(xi , x) − b
(8)
i=1
The commonly used kernels are the radial basis function (RBF) kernels, the sigmoid kernels, and the polynomial kernels (Gunn, 1998; Chang and Lin, 2001). The RBF kernel, most commonly used in SVM classification problems, is given as follows: K(x, y) = e−γ(x−y)
2
(9)
where γ is a kernel parameter. The relevant optimization function can then be written as: Minimize
M i=1
Subject to :
M
αi −
M M
αi αj yi yj K(xi , xj )
i=1 j=1
αi y i = 0
and
(10) 0 ≤ αi ≤ C
i=1
Furthermore, one also has to choose the parameter γ for the RBF kernel and a parameter C that determines how severely classification errors are to be penalized. The quality of classification is greatly affected by the parameter C; a very large C value can lead to overfitting of the training data (Cao and Tay, 2003). The above binary classification scheme can be extended easily to N classes, where N > 2. Such schemes follow two approaches: one-against-all and one-against-one. In the former approach, N SVMs are trained with the examples of one particular class being trained against all other classes. In the latter, N(N − 1)/2 SVMs are created in a binary tree-like fashion. While the former strategy is more efficient in terms of the training time, the latter strategy has been empirically shown to yield more desirable results (Hsu and Lin, 2002). The model is trained using a portion of the data set (e.g., 50%) containing the dependent and independent variables. The remaining 50% of the data is used for testing purposes. In order to find the optimum value of parameters γ and C, the model is run with different sets of values of γ and C, using the training data set. Next, the SVM model is built based on these optimum values and training data set. The generalization ability of the model is determined using the test data set. Finally, to evaluate the performance of the SVM model, the following cross-validation approach is used. The data set is randomly divided into two separate sets for training and testing, and this process is repeated 10 times. A kappa coefficient (Abalos et al., 2000; Kitchen et al., 2005), which is used to quantify the level of agreement, may be applied to compare the results of different classification approaches to address the best results. A kappa coefficient of 1 means a full agreement and a 0 means full disagreement between predicted and the actual value.
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
105
4. Results The spectral responses of corn under different weed control methods and nitrogen application rates (Fig. 1) demonstrate the difference in reflectance values for various treatments, and also highlight the possibility of using certain wavebands in the visible region of the electromagnetic spectrum and many more in the near infrared region for classifying different treatments. A closer inspection of reflectance data showed that the trend of data for the 72nd waveband was quite different from the general trend for the rest of the wavebands. While the reflectance values of the other wavebands were generally less than 40%, in the case of 72nd waveband, the reflectance values appeared to be more than 100% (about 121%). Therefore, the reflectance values for this waveband were considered to be too noisy and eliminated from the analysis. Furthermore, visual observation of weed populations in different treatments showed that there were no broadleaf weeds in the second treatment (W2, grass control); therefore, the W2 treatment was excluded from weed–nitrogen classification analysis. Thus, the classification problem was limited to differentiating between nine weed–nitrogen groups on the basis of data from 71 wavebands. Having 20 data points in each plot of 9 treatments with 4 replicates, the data set record consisted of 720 entries. The SVM method was used to distinguish the combined effect of weed and nitrogen treatments (nine combinations) as well as the effect of weed and nitrogen treatments alone (three cases each). For the purpose of this study, for all three classification problems, the data set was randomly divided into two equal parts: one part for training and the other for testing. In the case of combined effect of weeds and nitrogen, where the three weed treatments and the three nitrogen treatments were to be classified separately, overall accuracies of 98 and 69% were obtained for training and testing data, respectively. It is expected that higher classification accuracies would normally be obtained for the training data, as this data set is also used for model construction. It is also normal to expect lower accuracy when testing the model with an unseen data set (i.e., the one that was not used in model development), and the difference between the training and testing values will be greater if the model was not trained properly. The latter could result from either not having enough data to train or when training data do not encompass those scenarios which may be present in test data set. Considering the complexity of the classification problem and the relatively smaller size of the data set, the obtained classification accuracy appears to be reasonable. The classification results for the testing data set are presented in Table 1. Although the classification results are good for all categories, the misclassified categories are tended to be “close to” the correct classification categories in most cases. To examine the degree of agreement between the actual and predicted classes, the kappa coefficient was also calculated. The relatively high value of kappa (0.66) indicated good model performance. When nitrogen treatment was considered alone, model successfully classified all categories with the training data set and furthermore, a classification accuracy of 81% was obtained for the testing data set. In the case of weeds alone, the model was able to fully classify different categories with the training data set and for the test data set a classification accuracy of 86% was obtained (Table 2). The calculated kappa coefficients (0.79 and 0.71 for weeds and nitrogen classes, respectively) also showed strong agreement between the actual and predicted Table 1 Classification matrices for the test data for the nine weed–nitrogen combinations at the early growth stage Predicted N60 W1
N60 W3
N60 W4
N120 W1
N120 W3
N120 W4
N250 W1
N250 W3
N250 W4
Actual N60 W1 N60 W3 N60 W4 N120 W1 N120 W3 N120 W4 N250 W1 N250 W3 N250 W4
22 3 1 1 2 0 3 4 2
2 27 3 2 1 0 1 1 0
0 1 37 0 1 0 0 0 1
3 9 0 30 3 0 2 2 0
0 2 0 3 24 0 3 4 0
0 1 1 2 10 28 0 0 1
2 1 0 1 1 0 28 4 0
1 0 0 3 0 3 10 25 1
2 1 0 0 5 1 0 0 28
32 45 42 42 47 32 47 40 33
Total
38
37
40
49
36
43
37
43
37
360
Total
Classification accuracy = 69%. Kappa coefficient = 0.663. N60 , low N; N120 , normal N; N250 , high N; W1, no weed control; W3, broadleaf control; W4, full weed control.
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
106
Table 2 Classification matrices for the cross-validation data for weed controls and for nitrogen application rates at an early growth stage Predicted
(a) Weed treatments Actual W1 W3 W4 Total
W1
W3
W4
Total
40 1 1
4 32 5
1 5 31
45 38 37
42
41
37
120
Total
Predicted
(b) Nitrogen treatments N60 N120 N250 Total
N60
N120
N250
34 8 0
10 30 0
1 4 33
45 42 33
42
40
38
120
(a) Weed treatments: classification accuracy = 85.8%; kappa coefficient = 0.787. W1, no weed control; W3, broadleaf control; W4, full weed control. (b) Nitrogen treatments: classification accuracy = 80.8%; kappa coefficient = 0.712. N60 , low N; N120 , normal N; N250 , high N.
classes. Higher classification accuracy when single class factors (weed and nitrogen classes separately) are considered indicates that SVM can perform much better when the complexity of the problem is somewhat reduced. This can also be concluded from kappa coefficient, which had a much higher value for single classification cases (weeds or nitrogen). In all three classification problems, most misclassified cases were classified into the next nearest categories (Tables 1 and 2). In other words, it was, for example, rare for N60 (low nitrogen application rate) to be misclassified as N250 (high nitrogen application rate). To further check the validity of model training and performance, a 10-fold cross-validation was conducted for all three classification problems. For this purpose, data were randomly divided into two parts for training and testing, and the process was repeated 10 times. For each fold, a SVM model was trained using the training data set and evaluated with the testing data set. In the case of the combined effect of weed and nitrogen application rates, the SVM model was able to fully classify different classes in 3 out of 10 folds of the training data sets, with an average classification accuracy of 98%. For all 10-fold data, the minimum classification accuracy was at least 95% with the training data set. However, for the test data sets, an average classification of 70%, with a range of 66–76%, was obtained (Table 3). Once again, taking into consideration the limited number of data and the complexity of the combined effect of weed and nitrogen application rates, the results are quite good. The calculated kappa coefficient varied from 0.64 to 0.73 for the test results, which again indicates that the SVM models were able to classify the various treatments with good accuracy. For a simpler classification problem where only nitrogen treatments were to be classified, the results of 10-fold cross-validation procedure are given in Table 3. As far as model training is concerned, the SVM method was able to fully classify the three nitrogen classes in 3 out of 10 folds. The average classification accuracy of 97% was obtained with all 10 training data sets. For the testing data sets, for most folds, the classification accuracy was more than 80% (only two low values of 73 and 79%), which is much higher than the classification accuracy that was obtained for the problem of combined effect of weed and nitrogen treatments. Also, for most folds, the kappa coefficient was higher than the one calculated for the combined effect of weeds and nitrogen (Table 3). Table 3 also shows the results of 10-fold cross-validation for a simpler classification problem of classifying the three weed treatments only. For the 10 training data sets, the SVM method was able to fully classify different classes in 4 out of 10 folds. The classification accuracy during training was generally more than 97% (only once it was 91%). For the testing data sets, the classification accuracy was more than 81% for all folds, with an average of 85%, which is higher than the classification accuracy that was obtained for the other two classification problems. In addition, a high kappa coefficient value was calculated for this classification problem (ranging from 0.75 to 0.90 for the test data sets).
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
107
Table 3 Comparison of the classification results obtained for different classification problems
Fold-1 Fold-2 Fold-3 Fold-4 Fold-5 Fold-6 Fold-7 Fold-8 Fold-9 Fold-10 Average S.D. Kappa Minimum Maximum
Combined effect of weeds and nitrogen
Effect of weeds alone
Effect of nitrogen alone
Training
Testing
Training
Testing
Training
Testing
100 100 97.8 96.9 99.7 94.7 100 99.4 96.4 95.3
70.3 69.7 71.4 68.6 65.8 70.8 68.3 75.6 71.1 70.3
99.2 100 100 90.8 96.7 100 97.5 96.7 96.7 100
86.7 93.3 80.8 82.5 81.7 82.5 86.7 86.7 80.8 85
98.3 100 98.3 94.2 99.2 89.2 92.5 97.5 100 100
82.5 80.8 83.3 79.2 80 81.7 73.3 81.7 83.3 80.8
98.0 2.08 0.941 1.000
70.2 2.53 0.640 0.725
97.8 2.87
84.7 3.88
0.950 1.000
0.750 0.899
96.9 3.72
80.7 2.91
0.837 1.000
0.600 0.750
Table 4 Comparison of the classification accuracy obtained from different methods of classification Method used
Combined effect of weeds and nitrogen
Effect of weed alone
Effect of nitrogen alone
SVM Regular Max Min
69.2 65.8 75.6
85.8 80.8 93.3
80.8 73.3 83.3
ANN
58.3
81.2
69.4
Regular: the results of the SVM models that was developed and validated with the same sets of data used in ANN model development and testing; Max and Min: maximum and minimum results obtained with SVM in a 10-fold cross-validation.
The results of this study were compared with the results obtained by Goel et al. (2003b) using an ANN method in classifying weed and nitrogen treatments with the same data set used. The ANN model was a fully connected, feed-forward, ANN model with one input layer, one output layer, and one or two hidden layers. The ANN model was developed using a back-propagation learning method and used a delta learning rule and a sigmoid transfer function. For all the three classification problems, the ANN results along with a summary of the classification results with the SVM method are given in Table 4. In the table, the “regular” classification accuracies of SVM model refer to the results that were obtained using the same data sets that were used in the development of the ANN model. The maximum and minimum values show the maximum and minimum results of the 10-fold cross-validation procedure with the SVM models. Comparing the regular SVM results as well as the results obtained from the 10-fold cross-validation with the results of the ANN model, it is clear that the SVM models provided much higher classification accuracy for all three classification problems for the testing data sets. It is hard to say why SVM results are better than those obtained by ANNs. The techniques are based on different philosophies of model building. ANNs are formulated based on the “human” way of solving problems while SVMs are based on maximizing the distance between the closest points belonging to different classes of data. 5. Discussion and conclusions This study demonstrated the capability of support vector machines method to analyze hyperspectral data for identifying weed and nitrogen stresses in cornfield in early growth stage. It provided reasonable classification accuracy for combined weed and nitrogen application rate (69%). More accurate classification results were obtained when weed and
108
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
nitrogen treatments were investigated separately (86 and 81%, respectively). Using a 10-fold cross-validation (testing data set) with this method, a classification accuracy ranging from 66 to 76%, for combined weed and nitrogen application rates, was achieved. Also, for weeds and nitrogen treatments separately, the classification accuracy was mostly above 80%, and as high as 93% for testing (unseen) data. In comparison to the results obtained with an ANN model, our results were much better which clearly demonstrates the superiority of support vector machines methodology in resolving classification problems of precision agriculture. From a precision agriculture point of view, critical decisions need to be made at the early growth stage of the crop to determine site-specific herbicide and fertilizer applications. Our study shows that methods based on SVM technology can prove to be a much better decision making tool in precision farming. Early in the growing season, it is generally difficult to differentiate among different crop growing conditions. In this study, we demonstrated that hyperspectral measurements made in the early growth stage of corn can be used with SVM technology to locate areas with nitrogen and weed problems, thereby permitting timely corrective interventions to take place in the same growing season. References Abalos, P., Daffner, J., Pinochet, L., 2000. Evaluation of three Brucella soluble antigens used in an indirect Elisa to discriminate S19 vaccinated from naturally infected cattle. Vet. Microbiol. 71, 161–167. Burges, C., 1998. A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Discov. 2 (2), 121–167. Burks, T.F., Shearer, S.A., Green, J.D., Heath, J.R., 2002. Influence of weed maturity levels on species classification using machine vision. Weed Sci. 50, 802–811. Cai, Y.-D., Liu, X.J., Xu, X., Zhou, G.P., 2001. Support vector machines for predicting protein structural class. Bioinformatics 2, 3. Camps-Valls, G., G´omez-Chova, L., Calpe-Maravilla, J., Soria-Olivas, E., Mart´ın-Guerrero, J.D., Moreno, J., 2003. Support vector machines for crop classification using hyperspectral data. Lect. Notes Comput. Sci. 2652, 134–141. Cao, L.J., Tay, F.E.H., 2003. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Netw. 14 (6), 1506–1518. Chang, C., Lin, C., 2001. LIBSVM: A Library for Support Vector Machines., http://www.csie.ntu.edu.tw/∼cjlin/libsvm/. Cho, S.I., Lee, D.S., Jeong, J.Y., 2002. Weed–plant discrimination by machine vision and artificial neural network. Biosyst. Eng. 83 (3), 275–280. Christensen, S., Nordbo, E., Heisel, T., Walter, A.M., 1998. Overview of development in precision weed management, issues and future directions being considered in Europe. In: Medd, R.W., Pratley, J.E. (Eds.), Precision Weed Management in Crops and Pasture. Proceedings of a Workshop. 5–6 May. CRC for Weed Management Systems, Adelaid, p. 154. Deck, S.H., Morrow, C.T., Heinemann, D.H., Sommer III, H.J., 1995. Comparison of a neural network approach. Trans. ASAE 39 (6), 2319–2324. Du, C.J., Sun, D.W., 2004. Shape extraction and classification of pizza base using computer vision. J. Food Eng. 64, 489–496. Friedl, M.A., Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61 (3), 399–409. Friedl, M.A., Brodley, C.E., Strahler, A.H., 1999. Maximizing land cover classification accuracies produced by decision trees at continental to global scales. IEEE Trans. Geosci. Remote Sens. 37 (2), 969–977. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D., 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16 (10), 906–914. Ghazanfari, A., Irudayaraj, J., Kusalik, A., 1996. Grading pistachio nuts using a neural network approach. Trans. ASAE 39 (6), 2319–2324. Goel, P.K., Prasher, S.O., Landry, J.A., Patel, R.M., Viau, A.A., 2003a. Hyperspectral image classification to detect weed infestations and nitrogen status in corn. Trans. ASAE 46 (2), 539–550. Goel, P.K., Prasher, S.O., Patel, R.M., Landry, J.A., Bonnell, R.B., Viau, A.A., 2003b. Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn. Comput. Electron. Agric. 39, 67–93. Gunn, S.R., 1998. Support Vector Machines for Classification and Regression. Technical Report. Department of Electronics and Computer Science, University of Southampton, May 10. Hansen, M., Dubayah, R., DeFries, R., 1996. Classification trees: an alternative to traditional land cover classifiers. Int. J. Remote Sens. 17 (5), 1075–1081. Hsu, C.W., Lin, C.J., 2002. A comparison on methods for multi-class support vector machines. IEEE Trans. Neural Netw. 13, 415–425. Karimi, Y., Prasher, S.O., McNairn, H., Bonnell, R.B., Dutilleul, P., Goel, P.K., 2005. Classification accuracy of discriminant analysis, artificial neural networks and decision trees for weed and nitrogen stress detection in corn. Trans. ASAE 48 (3), 1261–1268. Kenkel, N.C., Derksen, D.A., Thomas, A.G., Watson, P.R., 2002. Review: multivariate analysis in weed science research. Weed Sci. 50, 281–292. Kitchen, N.R., Sudduth, K.A., Myers, D.B., Drummond, S.T., Hong, S.Y., 2005. Delineating productivity zones on claypan soil fields using apparent soil electrical conductivity. Comput. Electron. Agric. 46, 285–308. Lamb, D.W., 1998. Opportunity for satellite and airborne remote sensing of weeds in Australian crops. In: Medd, R.W., Pratley, J.E. (Eds.), Precision Weed Management in Crops and Pasture. Proceedings of a Workshop. 5/6 May. CRC for Weed Management Systems, Adelaid, p. 154. Meyer, G.E., Mehta, T., Kocher, M.F., Mortensen, D.A., Samal, A., 1998. Textural imaging and discriminant analysis for distinguishing weeds for spot spraying. Trans. ASAE 41 (4), 1189–1197. Nakano, K., 1997. Application of neural networks to the color grading of apples. Comput. Electron. Agric. 18, 105–116. Pardo, M., Sberveglieri, G., 2002. Support vector machines for the classification of electronic nose data. In: Proceedings of the Eighth International Symposium on Chemometrics in Analytical Chemistry, Seattle, USA.
Y. Karimi et al. / Computers and Electronics in Agriculture 51 (2006) 99–109
109
Pavlidis, P., Wapinski, I., Noble, W.S., 2004. Support vector machine classification on the web. Bioinformatics 20 (4), 586–587. Rousu, J., Flander, L., Suutarinen, M., Autio, K., Kontkanen, P., Rantanen, A., 2003. Novel computational tools in bakery process data analysis: a comparative study. J. Food Eng. 57 (1), 45–56. Slaughter, D.C., Lnini, W.T., Giles, D.K., 2003. Discriminating weeds from processing tomato plants using visible and near infrared reflectance. In: ASAE Annual International Meeting, Las Vegas, NV, USA, 27–30 July, Paper No. 031108. Soh, L.K., 1999. Segmentation of satellite imagery of natural scenes using data mining. IEEE Trans. Geosci. Remote Sens. 37 (2), 1086–1099. Terawaki, M., Kataoka, T., Okamoto, H., Hata, S., 2002. Distinction between sugar beet and weeds for development of automatic thinner and weeding machine of sugar beet. In: ASAE Annual International Meeting, Chicago, IL, USA, 26–27 July, pp. 129–136, Paper No. 701P0502. Thenkabail, P.S., Enclona, E.A., Ashton, M.S., Van Der Meer, B., http://www.geology.yale.edu/∼smith/optimal hyper bands.pdf, January 1, 2003. Tomer, M.D., Anderson, J.L., Lamb, J.A., 1997. Assessing corn yield and nitrogen uptake variability with digitized aerial infrared photographs. Photogrammetric Eng. Remote Sens. 63 (3), 299–306. Trihaas, J., Bothe, H.H., 2002. An application of support vector machines to E-nose data. In: Proceedings of the Ninth International Symposium on Olfaction and Electronic Nose, Rome, Italy. Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer Verlag. Vrindts, E., De Baerdemaeker, J., Ramon, H., 2002. Weed detection using canopy reflection. Precision Agric. 3, 63–80. Wilkinson, G.G., 1997. Open question in neurocomputing for Earth observation. In: Kanellopoulos, I., Wilkinson, G.G., Roli, F., Austin, J. (Eds.), Neurocomputing in Remote Sensing Data Analysis. Springer-Verlag, Berlin/Heidelberg/New York, pp. 3–13. Yang, C.-C., Prasher, S.O., Landry, J.A., 1999. Use of artificial networks to recognize weeds in corn field. In: Journ´ee d’information Scientifique et Technique en genie agroalimentaire, Saint-Hyacinthe, QC, Canada, March 3. Conseil des production v´eg´etales du Qu´ebec, Inc., pp. 60–65. Zien, A., R¨atsch, G., Mika, S., Sch¨olkopf, B., Lengauer, T., M¨uller, K.R., 2000. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16 (9), 799–807.