Journal of Food Engineering 119 (2013) 159–166
Contents lists available at SciVerse ScienceDirect
Journal of Food Engineering journal homepage: www.elsevier.com/locate/jfoodeng
A feature-selection algorithm based on Support Vector Machine-Multiclass for hyperspectral visible spectral analysis Shuiguang Deng a,c, Yifei Xu a, Li Li a, Xiaoli Li b,c, Yong He b,c,⇑ a
College of Computer Science and Technology, Zhejiang University, 38 ZheDa Road, Hangzhou 310027, PR China College of Biosystems Engineering and Food Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, PR China c Cyrus Tang Center for Sensor Materials and Applications, Zhejiang University, PR China b
a r t i c l e
i n f o
Article history: Received 14 August 2012 Received in revised form 16 February 2013 Accepted 16 May 2013 Available online 27 May 2013 Keywords: Food quality inspection Feature selection Hyperspectral visible and near infrared (Vis–NIR) Support Vector Machine-Multiclass Forward Feature Selection (SVM-MFFS) Sesame oil
a b s t r a c t Quality and safety of foods is one of the world’s top topics. Using high-precision spectral devices is a main technology trends by its high accuracy and nondestructive of food inspection, but the common obstacle is how to extract informative variables from raw data without losing significant information. This article proposes a novel feature selection algorithm named Support Vector Machine-Multiclass Forward Feature Selection (SVM-MFFS). SVM-MFFS adopts the wrapper and forward feature selection strategy, explores the stability of spectral variables, and uses classical SVM as classification and regression model to select the most relevant wavelengths from hundreds of spectral data. We compare SVM-MFFS with Successive Projection Analysis and Uninformative Variable Elimination in the experiment of identifying different brands of sesame oil. The results show that SVM-MFFS outperforms in accuracy, Receiver Operating Characteristic curve, Prediction and Cumulative Stability, and it will provide a reliable and rapid method in food quality inspection. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction In the recent years, the quality of food is one of the most important parts of food engineering. Governments and studiers both focus on this issue, nowadays. With the outbreak of massive security events, there has been a growing concern on food safety of people’s daily life, especially in developed countries. As kinds of common food additives, liquid additives play important roles in the manufacturing of daily food. More seriously, counterfeit sesame oil has post a threat to the people’s health, so it is anxious to identify different brands of sesame oil to ensure food quality. In our study, we apply hyperspectral visible–near infrared spectroscopy on identifying different brands of liquid food additives to ensure the safety of the food critical production process. As the most attractive method, hyperspectral visible–near infrared (Vis–NIR) spectroscopy has already been widely used as a simple and nondestructive technique in analyzing the organism properties and quality in food engineering and various other fields, such as petrochemical (Kim et al., 2000), medicine (Wolf et al., 2007), environmental (He et al., 2007; Heo et al., 2011), and agricultural (Li et al., 2007; Wu et al., 2008). Hyperspectral Vis–NIR ⇑ Corresponding author at: College of Biosystems Engineering and Food Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, PR China. Tel./fax: +86 571 88982143. E-mail address:
[email protected] (Y. He). 0260-8774/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jfoodeng.2013.05.024
has gained more and wide acceptance in food quality inspection (Lu and He, 2008; Lu et al., 2009; Wang et al., 2010) by virtue of its advantages over other traditional techniques. During the procedure of Vis–NIR spectroscopy, hundreds or even thousands of spectral features were collected by high-precision multi-bands devices, and then used to construct classification or regression model. However, if all spectral features are input into multivariate calibration models, they take up too much memory and slow down the computation. Thus, feature selection is a very critical step with a great influence on modeling processes, which is also called ‘‘frequency’’ or ‘‘wavelength’’ selection. The goal of feature selection is to identify a subset of spectral features as smaller as possible to be used as standards for classification or regression from different samples. At present, feature selection techniques can be generally divided into two major categories: filter methods and wrapper methods (Jain et al., 2000; Sun et al., 2004). Filter methods (Blum and Langley, 1997) evaluate the intrinsic relevance of each variable, and are characterized by their uncorrelation with the models of classification or regression. They include Successive Projection Algorithm (SPA), Uninformative Variable Elimination (UVE) (Centner et al., 1996) and so on. SPA is a forward feature selection for multivariate. It uses simple projection operations in a vector space to obtain subsets of variables with minimal colinearity. Its principle of variable selection is that the new variable selected is the one among all the remaining features, which has the maximum projection value on the orthogonal sub-space
160
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166
Nomenclature SVM-MFFS Support Vector Machine-Multiclass Forward Feature Selection Vis–NIR visible and near infrared spectroscopy SPA Successive Projection Analysis UVE Uninformative Variable Elimination ROC Receiver Operating Characteristic curve CS Cumulative Stability SVM-RFE SVM-Recursive Feature Elimination MSC multiplicative scatter correction
of the previous selection variable (Araújo et al., 2001). However, SPA hardly ensures a high-level evaluation as it focuses on the correlations among variables only, but ignores the impact of class labels. UVE is another approach implemented by adding random noise and eliminating uninformative variables beyond a given threshold. But in term of prediction, both the appropriate threshold and the artificial noises are the bottleneck of the UVE. In general, filter methods outperform in efficiency and are easy to be implemented. However, they do not usually take into account the existence of nonlinear relationship among features. Wrapper methods, on the other hand, use the performance of classification or regression model as the quality criterion for evaluating the relevant information conveyed by a subset of features (Kohavi and John, 1997). SVM-Recursive Feature Elimination (SVM-RFE) (Guyon et al., 2002) and SVM-MFFS are two classical wrapper methods. SVM-RFE is originally designed to solve binary gene selection problems by recursively removing the low-scored variables in SVM. It has been extended to solve multiclass problems using one-versus-all techniques (Zhou and Tuck, 2007). Unfortunately, it takes up too much time when processing high-dimensional data due to the time-consuming strategy of backward selection. Generally, wrapper methods need longer elapsed time, but outperform in accuracy than filter methods. In this paper, we use hyperspectral visible–near infrared spectroscopy on distinguishing different brands of liquid food additives to ensure the food production procedure. Confronted with the problem of feature selection, we exploit new method to solve the dimensional disaster efficiently. To sum up, the objectives of this study are (1) to propose and validate the novel algorithm called SVM-MFFS which adopts the forward selection strategy and customized stability model related to SVM; (2) to propose the new criterion called Cumulative Stability (CS) to evaluate the performance of multivariate models related with SVM; and (3) to compare several feature selection algorithms (SPA and UVE) through the experiment identifying different brands of sesame oil.
2. Experimental section 2.1. Experimental treatments Experiments were conducted with four representative brands of sesame oil: JinLongYu (made in ShangHai, China), ChangKang (made in HuNan, China), LiuYangHe (made in HuNan, China), and NongJiaMoFang (made in ShangHai, China). The reason why we chose sesame oil as experimental objects can be listed in the following two points: (1) as a representative kind of liquid food additives, sesame oil already interacts with people’s daily life widely and closely. (2) As a relatively simple and classical hydrocarbon compound, sesame oil can be analyzed in detail and easily in the near infrared spectra. In this experiment, we got 30 samples from each brand of sesame oil, resulting in 120 samples in total. Every
SVM-BFFS SVM-Binary Forward Feature Selection Algorithm KS Kennard-Stone SVM Support Vector Machine AUC area under the ROC curve OVO one-versus-one OVA one-versus-all RBF radial basis function OBD Optimal Brain Damage
sample (40 ml) was placed in a clean petri with the same size (a diameter of 70 mm and height of 13 mm). All samples were divided into a training set (2/3, 80 samples) and a test set (1/3, 40 samples) by exploiting Kennard-Stone (KS) algorithm (Kennard and Stone, 1969) (Table 1). The KS algorithm is a classical method that mainly used to extract a representative data set consisting of samples separated by Euclidean distance.
2.2. Spectroscopic instrument and measurements Vis–NIR reflectance spectra in the range of 325–1075 nm were measured using ImSpectorV10E (Specim, Finland) and a 12-bit CCD camera (Hama-matsu, Japan). The probe was positioned above the surface of the sample at a distance of 15 mm and the sample view angle of the probe was 30°. The light source of two 150 W tungsten halogen lamps (Fiber-Lite DC950 Illuminator, Dolan Jenner Industries Inc., USA) was positioned at a distance of 150 mm and at an angle of 30° from the surface of the sample. The average spectral distribution for each sample was the average of 30 successive scans. In this research, only the wavebands between 458 nm and 699.73 nm were considered in calculation to achieve a high signal-to-noise ratio according the spectral properties of sesame oil. Reflectance spectra were measured at 1.23 nm intervals, thus yielding 195 variables in the interested visible spectral range.
2.3. Multivariate modeling Firstly, SVM-MFFS and other two well-performance algorithms (SPA and UVE) were used to select representative wavelengths, respectively. The set of representative wavelengths selected by each algorithm were inputted to a classifier to identify different brands of sesame oil. For the purpose of fairness, uniform SVM was performed in all cases to establish the classification model for the prediction of sesame type (variable matrix Y) based on the spectra (variable matrix X). In the uniform SVM, C-SVM and radial basis function (RBF) kernel were adopted here, and they were implemented based on the famous SVM open-source toolbox named libsvm (Chang and Lin, 2011). A Grid-Search technique was applied to determine the optimal parameters (Fig. 1). The regularization parameter c and the RBF kernel function parameter r2 were limited in 210 to 210. We implemented SPA based on an open-source toolbox named SPA-Toolbox1. For the implementation of UVE, some uninformative noises were added to the data set to calculate the stability of each variable and the cutoff as shown in Fig. 2. The variables with the stability between two red dotted lines were selected. Thus, only eight variables are retained. The detail of SVM-MFFS can be referred to Section 3.3. 1
http://www.ele.ita.br/.
161
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166
Table 1 Reflectance statistics of sample sets distributed by KS algorithms. For computing convenience, all the values in the last four columns tenfold the true reflectance of sample sets in decimal form.
a b c d
Samples/size
Sample name
Maximum
Minimum
Mean
Standard deviation
Total samples/120
JLYa CKb LYHc NJMFd
3.6959 3.9505 4.1145 3.3267
0.3340 0.4014 0.2206 0.1603
2.1055 2.0164 1.7820 1.4923
0.9303 0.8528 0.8910 0.8277
Training samples/80
JLY CK LYH NJMF
3.6959 3.9509 4.1145 3.3267
0.3340 0.4014 0.2206 0.1603
2.0807 2.0356 1.8192 1.4535
0.9594 0.8606 0.9446 0.8409
Test samples/40
JLY CK LYH NJMF
3.4967 3.6932 3.7351 3.3157
0.5805 0.5293 0.3888 0.3631
2.1553 1.9794 1.7078 1.5699
0.8672 0.8361 0.7677 0.7953
JLY: JinLongYu. CK.: ChangKang. LYH.: LiuYangHe. NJMF: NongJiaMoFang.
2.4. Model performance evaluation The performance of each calibration model is evaluated using the accuracy, receiver operating ROC curve (Zweig and Campbell, 1993), prediction and CS. The definitions of the first two are as follows: TP Sensitivity ¼ TPþFN TN Specificity ¼ TNþFP
ð1Þ
TNþTP Accuracy ¼ TPþFNþTNþFP
Fig. 1. The procedure of searching parameters in SVM by Grid-Search method. The abscissa axis and vertical axis represent the value of regularization parameter and the RBF kernel function parameter in logarithm form, respectively (best c = 0.5, r2 = 8, CV accuracy = 97.5%).
TN, TP, FN, and FP denotes the true negative, the true positive, the false negative and the false positive, respectively. The ROC curve is insensitive to the prior probability of classes, and ROC curve can visualize the performance of classifier at multiple thresholds (Zhu et al., 2010). When using the ROC curve for analysis, a twodimensional space is formulated using sensitivity as the vertical axis and 1-specificity as the horizontal one. The area under the ROC curve (AUC) is an important measurement of accuracy and it plays a critical role in judging whether the classifier is good or not. Moreover, an area of 1 represents a perfect performance and an area of 0.5 means that the classifier has no discriminative power at all. Prediction analysis is another method to evaluate the performance of a classifier, which is based on the comparison between true class labels and predicted class labels to show the deviations of the predicted class labels. CS is a new indicator put forward in this paper to evaluate the classification ability of a single variable or assembled variables. In fact, it is similar to the explained variance by principal components in principle components analysis (Abdi and Williams, 2010). For CS, the definition of incremental stability and cumulative stability are below:
S ¼ kxk
DSiþ1 ¼ Siþ1 Si ði > 1Þ
Fig. 2. Stability distribution of each variable. The abscissa is the variable index while the ordinates is the stability of each variable. The two red dotted lines indicate the upper and lower cutoff (±0.4). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
ðS1 ¼ 1Þ
ð2Þ
where x is defined in Expression (10) mentioned in Section 3.3, S is a norm of x, Si is the cumulative stability of the combination of ivariables, and DSi+1 is defined as the incremental stability and it means the difference between Si+1 and Si. For CS, the higher incremental stability infers the more significant of the single feature, and the higher cumulative stability indicates the more informative of the multiple variables. Therefore, CS is a feasible criterion for evaluating the performance of feature selection algorithm related with SVM.
162
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166
3. Theory and calculations Consider M training data pairs: S = {}, i = 1, . . . , M, where Xi(eRN) is the feature vector representing the i-th sample, and Yi was the class label of Xi. For a binary problem, Yi e {1, 1}; while for k-class (k > 2) problem, Yi e {1, 2, . . . , k}.
Actually, the magnitude of x2i in Expression (3) reflects the approximate change when the i-th feature is selected. This can be explained by the Optimal Brain Damage (OBD) algorithm (Cun et al., 1990). And also, Q can be expanded in Taylor series to the second order as shown in Expression (6).
DQðiÞ ¼ 3.1. Support Vector Machines (SVMs) SVM is a powerful algorithm that has shown state-of-the-art performance in variety of classification tasks. Also, it is not very sensitive to the curse of dimensionality, and it’s well-suited to work with high dimensional data. Intuitively, an SVM searches for a hyperplane with maximal distance between itself and the closest samples from each of two classes. In order to make the study understandable, we introduce the necessary content of SVM below, and the other details can be referred in Vapnik (1999). The decision function of SVM, just as other linear classifiers, can be represented as f(X) = xTX + b, where x = [x1, x2, . . . , xN]T is the weight vector and b is the bias. The mechanism of SVM is to minimize the following optimization problem:
min Qðx; eÞ ¼ 1=2kxk2 þ t
N X i¼1
s:t
Y i ½xT Xi þ b P 1 ei
ð3Þ
ei P 0; i ¼ 1; . . . ; M where t is a trade-off between training accuracy and generalization;
ei is the classification error. A Lagrangian function of the convex optimization program can be established by introducing a dual set of positivity Lagrange multipliers ai, expressed as:
x¼
N X
ai Y i Xi
i¼1 N X
T
ai Y i Xi
x Xþb¼
Xþb
ð4Þ
N X
ai Y i < Xi ; X > þb
i¼1
Typically, kernel function is adopted here to transform data from low-dimension to high-dimension, and it can be expressed as:
xT X þ b ¼
N X
ai Y i < Xi ; X > þb
i¼1
¼
@Q @V k
¼ 12
N X N X i¼1 j¼1
i ;Xj Þ ai aj Y i Y j @kðX @V k
N X N X
where V k ¼ 1
ai aj Y i Y j ð2V k X2k Þ
ð7Þ
i¼1 j¼1
¼ x2k Therefore, selecting the feature with maximal x2i not only gets good precision, but also maximizes the differentiation among variables. In other word, SVM-BFFS can find the feature subset with minimal Q. In order to speed up the computation, the algorithm can select more than one feature in each loop. However, the extraction of several features at a time might degrade the performance of the feature selection method. 3.3. SVM-Multiclass Forward Feature Selection Algorithm (SVM-MFFS)
!T
i¼1
¼
ð6Þ
At the optimum point of Q, the first order term can be ignored and we get DQ(i) (Dxi)2 with Q = 1/2||x||2. Let Qi denote the value of Q when i-th is remained, we get Q i Q ðiÞ þ x2i . Another reasonable explanation for adopting x2i as the stability P comes from the sensitivity analysis of Q ¼ 1=2kxk2 þ t Ni¼1 ei . To compute the gradient, a virtual scaling factor V is introduced into the kernel function, and it acts as a component wise multiplicative term on the input variables and thus K(Xi, Xj) evolves into K(VXi, VXj) where denotes the componentwise vector product. For a linear SVM, the sensitivity can be expressed as:
¼ 12
ei
@Q @2Q Dwi þ 2 ðDxi Þ2 þ OððDxi Þ3 Þ @ xi @ xi
N X
ð5Þ
ai Y i KðXi ; XÞ þ b
i¼1
where KðXi ; XÞ is the kernel function. 3.2. SVM-Binary Forward Feature Selection Algorithm (SVM-BFFS)
SVM was originally designed for binary problem, but most applications in practice are multi-class problems. Thus, some researchers expanded binary SVM to multiclass SVM with various strategies, such as one-versus-one (OVO), one-versus-all (OVA), DAG SVMS, ECOC SVMS, and H-SVMS (Chih-Wei and Chih-Jen, 2002). In this research, SVM-MFFS is implemented based on SVMBFFS in OVO manner. This manner involves the construction of k(k 1)/2 SVM classifiers where each one is trained on data from two classes. In total, we need to solve k(k 1)/2 optimization problem with less than N variables for a k-class problem. For a given sample, its final class is decided by voting among all the classifiers. So Q for multiclass SVM in OVO manner can be expressed as:
QXt ¼ arg max
k X
sgnðY t ðxij X t þ bij ÞÞ
j¼1;2;...;k j–i;i–1
SVM-BFFS is firstly proposed in this paper, and it conducts feature selection in a sequential forward selection manner. Just like SVM, SVM-BFFS is initially raised for binary problems. The squared coefficient x2j ðj ¼ 1; . . . ; NÞ of the weight vector x in the Expression (3) above is employed as feature stability. SVM-BFFS is a round-robin algorithm and has three sequential steps for each loop: (1) to train the SVM classifier; (2) to compute the stability x2j for all features; and (3) to choose the feature with the maximal stability. It does not come into an end until it gets the minimal subset of features, with which it gets the maximal accuracy during classification. SVM-BFFS can extract informative features without traversing all the solution space due to its forward selection strategy, and hence it’s a timesaving algorithm.
where Y t ¼
1 when X t 2 C i 1 when X t 2 C j
ð8Þ ;
1 6 i 6 j 6 k:
where xij and bij are the weight vector and bias between i-th class and j-th class, respectively. Ci and Cj are the labels of i, j classes, respectively. Moreover, to illustrate the basic idea of SVM-MFFS in OVO manner more clearly, a simple multiclass problem is considered where k = 3 and N = 2 (Table 2). In Table 2, xij(i – j, i, j e 1, 2, 3) denotes the weight between class i and class j. xijf1 and xijf2 are the members of xij(i – j, i, j e 1, 2, 3) for feature 1 and feature 2. x1, x2 and x3 are the weight vectors of all the classes defined as the mean value of their own members. As discussed in (Guyon and André, 2003),
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166 Table 2 An example of three-class problem.
4. Results and discussion 4.1. Spectra curves analysis
Class label
Weight vector
Elements of weight vector
Class 1
x1 ¼ ðx12 þ x13 Þ=2
x12
x12f 1 x12f 2
x13
x13f 1 x13f 2
Class 2
x2 ¼ ðx21 þ x23 Þ=2
x21
x21f 1 x21f 2
x23
x23f 1 x23f 2
Class 3
x3 ¼ ðx31 þ x32 Þ=2
x31
x31f 1 x31f 2
x32
x32f 1 x31f 2
many feature selection methods are sensitive to small perturbations of the experimental conditions. Consequently, we define the new feature weight vector x and new stability vector s with the Expressions (9) and (10), and repeat the procedure to stabilize the feature selection method.
.
Pk
i¼1
x2ifj k
xj ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . 2ffi Pk
i¼1
x2if
Pk
j
x2 m¼1 mf
ð9Þ
k
j
k1
sj ¼ kxk kxj k
163
ð10Þ
where xifj is the weight vector of j-th feature for i-th class label. xj and sj are the weight vector and stability of j-th feature. xj represents the weight vector gained from multiclass SVM on the sample without j-th feature, and ||x|| was a norm of x. For SVM-MFFS, the procedure of SVM-MFFS can be described as follows like SVM-BFFS: (1) Start: define the selected feature subset R = []; The input feature sample set X = [X1, . . . , XN]; The current length of X is l. (2) Repeat until it gets the minimal subset of features, with which it gets the maximal accuracy during classification. (i) For each feature in X, computer sj according to Expressions (9) and (10); (ii) Find the feature with the maximal sj: e = arg maxjsj; (iii) Update: R = [e, R], X = X [e]; (3) Output: R.
Ingredients of different sesame oil affect the spectral characteristic of light reflected through the oil. Due to the existence of baseline shifts in the spectral profiles, raw spectral data need to be preprocessed. In this experiment, two spectral preprocessing methods were applied, i.e., simple smoothing and multiplicative scatter correction (MSC). For the simple smoothing method, a 9point window was used to smooth the reflectance spectra to reduce the random noises induced by the system internal factors. MSC was used to correct scatter effects. Fig. 3 shows that there are some latent differences especially in the peaks and troughs, although the tendency curves for the four sesame oils seem to be the same. Hereinto, the UnscrambleÒ 9.7 (CAMO, PROCESS, AS, Oslo, Norway) was adopted for the realization of simple smoothing and MSC. In Fig. 3, there are some evident peaks and troughs in the spectral curves (488.71 nm, 613.89 nm, etc.), and the possible reason in the following can explain this phenomenon. Visible spectrum are mainly varies according to the color of the natural of a material when it is shining under sunlight or full-color light. The curve of visible spectrum firstly appears to be low during the blue and green wavelength, then raise up as soon as it coming into yellow, orange and red wavelength. The sesame oil appeared to be dark orange (in full color light), which confirms the main trend of visible spectrum curve. The trough appear around 488.71 nm is the most significant one, which may mainly due to the absorbing of betacarotene molecules (Darvin et al., 2004). In our study, we use three different feature selection methods to select the most informative wavelength, respectively. What is more, these methods are all based on SVM, which projecting the actual signal variables to another higher dimension space. Therefore, the reason why one method selects the exact wavelengths is that it finds the support vector on this wavelength group, which let it easily classify the samples in a higher accuracy.
4.2. Accuracy analysis We use the accuracy of recognizing different sesame oils to judge the performance of different feature selection methods
Fig. 3. Spectral curves of different sesame oils processed by smoothing and MSC. The colored bars are the bands selected by SPASVM (deep copper green), SVM-MFFS (grey) and UVESVM (silver). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
164
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166 Table 3 AUC of two calibration models (SPASVM and SVM-MFFS).
Fig. 4. Recognition accuracy of different sesame oils in SPASVM and SVM-MFFS method.
pre-requisite to a good classifier model. Fig. 4 indicates that SPASVM and SVM-MFFS can reach the accuracy of 100% with no more than ten feature variables. Meanwhile, both SPASVM and SVMMFFS produce small perturbations in a reasonable and acceptable scope due to the impact of the negative feedback features. What is more, SVM-MFFS needs only two features to reach 100% while SPASVM needs two more. Therefore, SVM-MFFS is better than SPASVM because it can get higher accuracy with less feature variables. 4.3. ROC and predictions analysis ROC analysis is a metric to check the quality of a binary classier. The more the AUC is, the better the classification is. To illustrate
Number of features
Calibration model
Area of ROC JLY
CK
LYH
NJMF
Mean
1
SPASVM SVM-MFFS
0.75 0.95
0.6 0.875
0.975 0.6
1 0.9
0.83125 0.83125
2
SPASVM SVM-MFFS
1 1
0.85 1
0.975 1
1 1
0.95625 1
the multiclass ROC more clearly, all the binary curves were put together. Figs. 5 and 6 are the ROC comparison between SPASVM and SVM-MFFS. Table 3 illustrates the AUC of the two calibration models. Fig. 5 and Table 3 show that SVM-MFFS obtains better separation performance than SPASVM in the case of JLY and CK, but not in the case of LYH and NJMF. Thus, SPASVM and SVM-MFFS have the equal classification performance with only one feature. Fig. 6 and Table 3 show that SVM-MFFS is obviously better than SPASVM when using two informative features. This is because the mean value of AUC of all four classifiers in SVM-MFFS and SPASVM is 1 and 0.95625, respectively. 4.4. Prediction analysis Prediction analysis is another method to evaluate the performance of a classifier mentioned in Section 2.4. Figs. 7 and 8 show the deviations of the predicted class labels. The abscissa represents 40 samples in prediction sets that are divided into four different groups (JLY, CK, LYH and NJMF) and each group has 10 samples. The ordinate represents the predicted class labels processed by
Fig. 5. Multiclass ROC curves developed by SPASVM (a) and SVM-MFFS (b) with one feature.
Fig. 6. Multiclass ROC curves developed by SPASVM (a) and SVM-MFFS (b) with two features.
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166
165
Fig. 7. True class label versus predicted class label with one feature using SPASVM (a) and SVM-MFFS (b).
Fig. 8. True class label versus predicted class label with two features using SPASVM (a) and SVM-MFFS (b).
Fig. 9. CS of SVM-MFFS (a) and SPASVM (b). The abscissa represents the number of feature variables. Bars in different colors measure by the left ordinates demonstrate the stability of each variable. Lines and the right ordinates demonstrate the cumulative stability of combined all the variables it had selected now and before.
different recognition models. The colored stripes represent the true class labels of the prediction sets, and the position of symbols indicates the predicted value. Fig. 7 shows that a quarter of samples are misclassified with only one retained feature both in SPASVM and SVM-MFFS. Fig. 8b indicates no sample is misclassified for SVMMFFS, while Fig. 8a shows there are two misclassified samples for SPASVM. Therefore, it concludes that SVM-MFFS outperforms SPASVM in accuracy and stability. 4.5. CS analysis CS is used to reflect the importance of the selected features processed by SVM. In Fig. 9, the abscissa represents the number
of feature variables. Bars in different colors measure by the left ordinates demonstrate the stability of each variable. Lines and the right ordinates demonstrate the cumulative stability of combined all the variables it had selected now and before. It shows that both the incremental stability and cumulative stability of the first feature variable selected by SPASVM and SVM-MFFS can get higher incremental stability (1.086 and 1.016). Moreover, for the second feature, its incremental stability from SPASVM (0.158) is lower than the one (0.385) from SVM-MFFS. From Fig. 9a, it could be seen that the cumulative stability of the first two features gets the minimum value (1.471), and the features extracted by SVM-MFFS are more informative than the ones extracted by SPASVM.
166
S. Deng et al. / Journal of Food Engineering 119 (2013) 159–166
Table 4 Selected feature wavelength and elapsed time with three methods. Method
Optimal wavelengths
Elapsed time (s)
SPASVM
488.71, 545.38, 613.89, 681.95
368.6254
UVESVM
488.71, 519.44, 545.38, 575.18, 613.89, 637.75, 652.87, 681.95
SVMMFFS
545.38, 613.89
17.4582 412.9621
4.6. Selected feature wavelength and elapsed time analysis The optimal wavelengths and the running time in the case of 100% accuracy for the three algorithms above are presented in Fig. 3 and Table 4. Both of them show the selected wavelengths from UVESVM include the ones from SPASVM and SVM-MFFS. In term of running time, due to the fact that UVESVM searches the effective wavelengths in only one step without traverse, it gets the shortest elapsed time of 17.4852 s. Although SPASVM and SVM-MMFS elapse longer (368.6254 s and 412.9621 s), they are still in the same magnitude. Therefore, all of the three methods perform well in the cases that don’t care about the runtime. 5. Conclusions The quality of food is one of the most important parts of food engineering. Nowadays, people are paying more attention to their health and equally to the food they are consumed. As a key procedure of food production, liquid additives play important roles in food manufacturing. Commonly speaking, brands with wide acceptance are more likely to produce safety foods. Hence, identifying different brands of liquid additive is an available way to conclude its quality. In our work, we use hyperspectral visible–near infrared spectroscopy on identifying different brands of liquid food additives to ensure the safety of the food critical production process. What is more, all the current work will lay a foundation to produce an online hyper spectrometer for daily factory use of online food quality inspection with highly accuracy and least bands. When using multi-dimensional hyperspectral technique in food inspection. Unfortunately, it needs too much memory and processing time to deal with the spectral data. Therefore, a well-performance feature selection method is in need to reduce data dimensions and to make the hyperspectral technology more applicable. This study brings forwards a new feature selection algorithm named SVM-MFFS. It is based on the forward selection strategy and the customized stability model. Its efficiency and feasibility are proved theoretically and practically. In theory, we use Taylor’s theorem to prove its correctness. In practice, we compare SVMMFFS with SPASVM and UVESVM through the experiment in classifying various brands of sesame oil. The results indicate that SVMMFFS outperforms in accuracy, ROC, prediction and CS, thus can be used to reduce data during the inspection of food quality affectively. Acknowledgments This study was supported by 863 National High-Tech Research and Development Plan (Project No: 2013AA102301), National
Natural Science Foundation of China (Project No: 61170033) and the National Science and Technology Support Program of China (Project No: 2011BAD20B12-04 and 2011BAD21B02). References Abdi, H., Williams, L.J., 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2 (4), 433–459. Araújo, M.C.U., Saldanha, T.C.B., Galvão, R.K.H., Yoneyama, T., Chame, H.C., Visani, V., 2001. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems 57 (2), 65–73. Blum, A.L., Langley, P., 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97 (1–2), 245–271. Centner, V., Massart, D.-L., de Noord, O.E., de Jong, S., Vandeginste, B.M., Sterna, C., 1996. Elimination of uninformative variables for multivariate calibration. Analytical Chemistry 68 (21), 3851–3858. Chang, C.-C., Lin, C.-J., 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligence System and Technology 2 (3), 1–27. Chih-Wei, H., Chih-Jen, L., 2002. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13 (2), 415–425. Cun, Y.L., Denker, J.S., Solla, S.A., 1990. Optimal brain damage. In: David, S.T. (Ed.), Advances in Neural Information Processing Systems 2. Morgan Kaufmann Publishers Inc., pp. 598–605. Darvin, M., Gersonde, I., Ey, S., Brandt, N., Albrecht, H., Gonchukov, S., Sterry, W., Lademann, J., 2004. Noninvasive detection of beta-carotene and lycopene in human skin using Raman spectroscopy. LASER PHYSICS-LAWRENCE 14 (2), 231–233. Guyon, I., André, Elisseeff, 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182. Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines. Machine Learning 46 (1–3), 389– 422. He, Y., Huang, M., GARCIA, A., 2007. Prediction of soil macronutrients content using near-infrared spectroscopy. Computers and Electronics in Agriculture 58 (2), 144–153. Heo, J., Troyer, M.E., Pattnaik, S., Enkhbaatar, L., 2011. A hierarchical approach to Compact Airborne Spectrographic Imager (CASI) high-resolution image classification of Little Miami River Watershed for environmental modelling. International Journal of Remote Sensing 33 (5), 1567–1585. Jain, A.K., Duin, R.P.W., Jianchang, M., 2000. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (1), 4–37. Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11 (1), 137–148. Kim, M., Lee, Y.-H., Han, C., 2000. Real-time classification of petroleum products using near-infrared spectra. Computers Chemical Engineering 24 (2–7), 513– 517. Kohavi, R., John, G.H., 1997. Wrappers for feature subset selection. Artificial Intelligence 97 (1–2), 273–324. Li, X., He, Y., Wu, C., Sun, D.-W., 2007. Nondestructive measurement and fingerprint analysis of soluble solid content of tea soft drink based on Vis/NIR spectroscopy. Journal of Food Engineering 82 (3), 316–323. Lu, F., He, Y., 2008. Classification of brands of instant noodles using Vis/NIR spectroscopy and chemometrics. Food Research International 41 (5), 562–567. Lu, F., Jiang, Y.H., He, Y., 2009. Variable selection in visible/near infrared spectra for linear and nonlinear calibrations: a case study to determine soluble solids content of beer. Analytica Chimica Acta 635 (1), 45–52. Sun, Z., Bebis, G., Miller, R., 2004. Object detection using feature subset selection. Pattern Recognition 37 (11), 2165–2176. Vapnik, V.N., 1999. An overview of statistical learning theory. IEEE Transactions on Neural Networks 10 (5), 988–999. Wang, J., Nakano, K., Ohashi, S., Takizawa, K., He, J.G., 2010. Comparison of different modes of visible and near-infrared spectroscopy for detecting internal insect infestation in jujubes. Journal of Food Engineering 101 (1), 78–84. Wolf, M., Ferrari, M., Quaresima, V., 2007. Progress of near-infrared spectroscopy and topography for brain and muscle clinical applications. Journal of Biomedical Optics 12 (6), 062104. Wu, D., Yang, H.Q., Chen, X.J., He, Y., Li, X.L., 2008. Application of image texture for the sorting of tea categories using multi-spectral imaging technique and support vector machine. Journal of Food Engineering 88 (4), 474–483. Zhou, X., Tuck, D.P., 2007. MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23 (9), 1106–1114. Zhu, X., Li, S., Shan, Y., Zhang, Z., Li, G., Su, D., Liu, F., 2010. Detection of adulterants such as sweeteners materials in honey using near-infrared spectroscopy and chemometrics. Journal of Food Engineering 101 (1), 92–97. Zweig, M.H., Campbell, G., 1993. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry 39 (4), 561–577.