information Article
A Comparison Study of Kernel Functions in the Support Vector Machine and Its Application for Termite Detection Muhammad Achirul Nanda 1 and Akhiruddin Maddu 3 1 2 3
*
ID
, Kudang Boro Seminar 1, *
ID
, Dodi Nandika 2
Faculty of Agricultural Technology, Bogor Agricultural University, Bogor 16680, West Java, Indonesia;
[email protected] Faculty of Forestry, Bogor Agricultural University, Bogor 16680, West Java, Indonesia;
[email protected] Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, Bogor 16680, West Java, Indonesia;
[email protected] Correspondence:
[email protected]; Tel.: +62-8164834625
Received: 10 December 2017; Accepted: 28 December 2017; Published: 2 January 2018
Abstract: Termites are the most destructive pests and their attacks significantly impact the quality of wooden buildings. Due to their cryptic behavior, it is rarely apparent from visual observation that a termite infestation is active and that wood damage is occurring. Based on the phenomenon of acoustic signals generated by termites when attacking wood, we proposed a practical framework to detect termites nondestructively, i.e., by using the acoustic signals extraction. This method has the pros to maintain the quality of wood products and prevent higher termite attacks. In this work, we inserted 220 subterranean termites into a pine wood for feeding activity and monitored its acoustic signal. The two acoustic features (i.e., energy and entropy) derived from the time domain were used for this study’s analysis. Furthermore, the support vector machine (SVM) algorithm with different kernel functions (i.e., linear, radial basis function, sigmoid and polynomial) were employed to recognize the termites’ acoustic signal. In addition, the area under a receiver operating characteristic curve (AUC) was also adopted to analyze and improve the performance results. Based on the numerical analysis, the SVM with polynomial kernel function achieves the best classification accuracy of 0.9188. Keywords: acoustic signal; kernel function; support vector machine; termite detection
1. Introduction Living in large underground colonies, termites can attack any wood that has a direct contact to the ground and can even lead to the death of a healthy tree. Termites are harmful pests that economically impact the quality of the wood in wooden buildings, forest trees and crops. As can be seen in Figure 1, it shows the initial attack of subterranean termites on Acacia crassicarpa plantation, Riau Province, Indonesia. In addition, the damage of wooden buildings by termites is also easy to find in Bogor city and surrounding areas [1]. In fact, some areas of the important buildings in Indonesia have been seriously attacked e.g., Presidential Palace, Istana Merdeka, Jakarta, etc. [1]. Nandika et al. [2] reported that the cost, due to termite attacks to wooden buildings, was estimated to reach about Rp 8.7 trillion in 2015 not including treatment costs, repairs of the damaged buildings and loss of property value. Approximately 2500 termites species exist worldwide and about 300 species are considered pests [3]; the most wooden building attacks in Indonesia are caused by subterranean termites in the genus Coptotermes (Isoptera: Rhinotermitidae), species Coptotermes curvignathus.
Information 2018, 9, 5; doi:10.3390/info9010005
www.mdpi.com/journal/information
Information 2018, 9, 5
2 of 14
Information 2018, 9, 5
2 of 14
Figure 1. Initial attack of subterranean termites on Acacia crassicarpa plantation, Riau Province, Figure 1. Initial attack of subterranean termites on Acacia crassicarpa plantation, Riau Province, Indonesia (image (image provided provided by by Nandika Nandika and and Tjahjono Tjahjono [3], [3], used used with with permission). permission). Indonesia
To overcome termite attacks and avoid a higher wood damage, a detection system is required Toinspection overcome process. termite attacks and methods avoid a higher wood damage, anamely detection system required for the So far, two have been developed, visual and is non-visual for the inspection process. So far, two methods have been developed, namely visual and non-visual inspection. The visual inspection requires to one open the wood directly, which becomes the most inspection. The visual to one opentothe woodin directly, which becomes the most dominant method usedinspection nowadaysrequires [4]. Hence, it leads damage the wood structure. In addition, dominant method used nowadays [4]. Hence, it leads to damage in the wood structure. In addition, in terms of inspection efficiency, so far there is no scientific report that tested this method [4]. in terms of the inspection efficiency, so becomes far there the is no scientific reportbecause that tested method [4]. Conversely, non-visual inspection attractive solution it is athis non-destructive Conversely, the non-visual inspection becomes theintact. attractive solution because it is a non-destructive method, where the wooden structure remains Nowadays, many detection systems are method, where the wooden structure remains intact. Nowadays, many detection systems [4], are extensively applying the non-visual method, which may include an electronics stethoscope extensively applying the non-visual method, which may include an electronics stethoscope [4], methane gas odor [5], moisture meter and acoustic emission [6,7]. methane odor [5], moisture and acoustic emission In a gas state of the art studymeter of termite detection, based [6,7]. on the acoustic signal, Lewis et al. [8] In a state of the art study of termite detection, based on tool the acoustic signal, Lewis insect et al. [8]detector analyzed ®, analyzed the performance of the termite detection (wood-destroying ® the performance of the termite detection tool (wood-destroying insect detector , DowAgrosciences, DowAgrosciences) and obtained the detection accuracy result of 89.45%. Lewis used the western Indianapolis, IN, USA) and obtained the detection 89.45%. Lewis used the western drywood termite, Incitermes minor (Hagen) for his accuracy research.result Since,ofthe most massive termite attacks drywood termite, Incitermes minor (Hagen) for his research. Since, the most massive termite attacks in Indonesia come from subterranean termites, Coptotermes curvignathus, this study focused on the in Indonesia comesystem from subterranean termites, Coptotermes curvignathus, this study focused on the termite detection using Coptotermes curvignathus. termite detection system using Coptotermes curvignathus. The acoustic from insects often produce signals with spectral and temporal features that make The acousticand from insects often produce with spectral and temporal features thatamake them distinctive easily detectable [9]. Thesignals acoustic signals generated by termites become basic them distinctive and easily detectable [9]. The acoustic signals generated by termites become a basic standard to design a termite detection system. Some researchers reported that specific activity, standard to design a termiteand detection system. Some reported that specific namely eating, excavation head-banging wouldresearchers generate the particular acousticactivity, signal namely [10,11]. eating, excavation and head-banging would generate the particular acoustic signal [10,11]. Over the Over the last decade, the application of microphones has been successful to detect insects within last decade, the application of microphones has been successful to detect insects within wood [12–14] wood [12–14] and to solve various engineering problems [15–19]. Therefore, in this study, we used and to solve various engineering in this by study, we used the microphone the microphone electret sensor toproblems sense the [15–19]. acousticTherefore, signal produced termites. electret to sense in thedeveloping acoustic signal produced by termites. Thesensor core problem a termite detection system is to separate the signals generated The core problem in developing a termite detection system istotoaddress separate theproblem signals is generated by termites and background noise [20]. The proposed alternative this to build by termites and background noise [20]. The proposed alternative to address this problem is to build a reliable classification model. In this study, we apply the support vector machine (SVM) algorithm. a reliable classification model. In this study, we apply the support vector machine (SVM) algorithm. This is a new approach implemented into our system, because so far, no scientific report has been This is a new approach implemented system, because so far, noofscientific report has been reviewed about this work. Please noteinto thatour some significant advantages using this algorithm are reviewed about this work. Please note that some significant advantages of using this algorithm are high accuracy, direct geometric interpretation and elegant mathematical tractability [21]. In addition, high accuracy, direct geometric interpretation and elegant mathematical tractability [21]. In addition, the SVM does not require a large number of training data set to avoid the overfitting issues. In the the SVMofdoes not require a large number of using training datathe setkernel to avoid the overfitting issues. Inrole the process developing a classification model SVM, function plays a significant process developing a classification model using SVM, the kernel a significant role because of it assists in mapping dataset to a higher dimensional space function to obtainplays a better interpretation because it assists in mapping dataset to a higher dimensional space to obtain a better interpretation at the classification model. However, in fact, there are many types of kernel functions that can be applied such as linear, radial basis function, sigmoid and polynomial. This study will investigate the
Information 2018, 9, 5
3 of 14
at the classification model. However, in fact, there are many types of kernel functions that can be applied such as linear, radial basis function, sigmoid and polynomial. This study will investigate the comparison of several kernel functions used at the SVM algorithm to achieve the best result performance in the termite detection system. The rest of the paper is organized as follows: Section 2 discusses the materials and methods involved: the description of monitoring and feature extraction on acoustic signals, various types of kernel function in SVM, and performance evaluation with analysis of receiver operating characteristics (ROC). Section 3 details acoustic signal characteristics and parameter optimization on kernel function with grid-search method. Section 4 gives the data processing results obtained from the implemented algorithm in the termite detection system. Finally, Section 5 shows our main conclusions. 2. Materials and Methods 2.1. Selection of Boards Pine wood (Pinus merkusii) was selected as the experimental sample of the board with the initial moisture content of 8.75 ± 0.05%. The board has the dimensional size (20 × 9.5 × 2.5 cm) and the inside has a hole with size (12 × 6 × 0.5 cm). The groups of the boards were divided into two classes i.e., infested class and uninfested class. The infested class has 220 subterranean termites (i.e., 200 workers and 20 soldiers) inserted into a board, while the uninfested class does not include any termite into a board and serves as the control. Each class includes a total of five boards; for example, the infested class and uninfested class use a total of five boards for the measurement process respectively. The selection of wood species and the number of termite populations refer to Indonesia standard (SNI 2006) [22]. This standard testing to detect the existence of termite attacks through its acoustics was completed at Termite Laboratory, Bogor Agricultural University, 28 ◦ C, 70% RH, in the dark room for 2 weeks before the acoustic signal acquisition. 2.2. Acoustic Signal Monitoring First, we used two electret microphones (Itead studio, Shenzhen, China) placed on the wood for sensing termite acoustic signals. This sensor has a frequency range of 0.1–10 kHz and a sensitivity of −50 dB. Furthermore, it is connected with a microcontroller to convert acoustic signals into an equivalent electrical voltage. This signal is displayed and analyzed using the software of R studio, Inc. (version: 0.99.896-©2009-2016, Boston, MA, USA) on computer (ThinkPad X-240) with 2.49 GHz Intel® Core™ i5 CPU, 64-bit operating system, and 8 GB RAM. As can be seen in Figure 2, it shows the schematic diagram of our termite detection system. Initially, each wood in two groups (infested and uninfested class) had its acoustic signal measured 10 times. The data acquisition process was digitized at a 100 Hz sampling rate for 3 s and the signal was normalized using the following equation [23]: N[i] =
x [i ] x[max]
(1)
where, N[i] is the normalization results of the data-i, x[i] is the data value, x[max] is the maximum value of the entire data. The next stage is the feature extraction, which is a process of reducing the data to produce the features that describe the characteristics of the observation object in order to avoid the complex computation. In this paper, we propose using the two features: energy and entropy. Energy (E(i) ) is described as the sum of squares of the amplitude (value of acoustic signal) in the length of signal. This is defined as follows [24]: 1 wL E(i) = (2) | xi (n)|2 w L n∑ =1
Information 2018, 9, 5
4 of 14
where, w L is the length of the acoustic signal, n = 1, . . . , w L , and xi is defined as a value of the acoustic Information 2018, 9, 5 4 of 14 signal. Whereas, entropy (H) is depicted as a measure of abrupt changes in the energy level from the acoustic signal. This can be calculated using Equation (3) [24,25], where, e j is the ratio of the total from the acoustic signal. This can be calculated using Equation (3) [24,25], where, 𝑒𝑗 is the ratio of energy of the sub-length acoustic signal to the total energy of the entire acoustic signal. the total energy of the sub-length acoustic signal to the total energy of the entire acoustic signal. HH = −eej jlog log22((eej )j )
(3) (3)
Figure 2. Schematic of the signal processing in termite detection system.
Both of of the the proposed proposed features features are are obtained obtained in in the the time time domain. domain. It means that that the the features features are are Both It means extracted directly from the data results generated by the electret sensor without requiring a signal extracted directly from the data results generated by the electret sensor without requiring a signal transformation. Furthermore, for the the transformation. Furthermore, at at the the classification classification stage, stage, the the feature feature extraction extraction result result is is used used for input to tobuild buildthe theclassification classification model. In this study, we implemented a support machine input model. In this study, we implemented a support vectorvector machine (SVM) (SVM) algorithm to recognize the termite acoustic characteristics. The explanation of SVM algorithm to recognize the termite acoustic characteristics. The explanation of SVM implementation implementation ourin system is 2.3. given in Section 2.3. into our system isinto given Section 2.3. Support Vector Machine Classifier
numerous artificial intelligence intelligence factors based on supervised learning algorithms [26] There are numerous which can be applied to detect acoustic signals. An SVM classifier is applied in the current detection framework, due inin the training data setset to due to to its itsoutstanding outstandinggeneralization generalizationcapability capabilityand andreputation reputation the training data achieve high accuracy [27–29]. This method is based on statistical learning theory and and structural risk to achieve high accuracy [27–29]. This method is based on statistical learning theory structural minimization principle. The strategy of this classifier is to find an optimal separating hyperplane risk minimization principle. The strategy of this classifier is to find an optimal separating with the themaximum maximummargin marginbetween between classes focusing on the training samples located the with thethe classes by by focusing on the training samples located at theatedge edge the class distribution a very effective method for pattern recognition, SVM proposed of theofclass distribution [30]. [30]. As aAs very effective method for pattern recognition, SVM proposed by by Vapnik, has characteristics which are [31]: 1. SVM can be generalized in a high-dimensional space Vapnik, has characteristics which are [31]: (1) SVM can be generalized in a high-dimensional with aasmall smallsample sampleofoftraining training only; The optimum result be given through transformation with only; (2)2.The optimum result cancan be given through transformation into a quadratic programming; 3. SVM simulate nonlinear functional relationships. ainto quadratic programming; (3) SVM can can simulate nonlinear functional relationships. A brief description of SVM is illustrated illustrated below. below. In binary classification problem with linearly separable (Figure (Figure3), 3),has hasa agoal goaltoto find optimum hyperplane, through maximizing the margin separable find thethe optimum hyperplane, through maximizing the margin and {+1, and minimizing the classification error between each class 𝑦 ∈ −1} from x-dimensional input minimizing the classification error between each class yi ∈ {+𝑖 1, −1} from x-dimensional input data. data. thisof case of termite detection 𝑥𝑖 represents theextraction features extraction acoustic In thisIncase termite detection system, xsystem, the features in acousticin signal, i.e., i represents signal, and i.e., energy entropy, while 𝑦𝑖 labelsclass the infested class (+1, green color) and energy entropy,and while yi labels the infested (+1, green color) and uninfested classuninfested (−1, blue class (−1, blue (Figure 3).can This be described in Equation (4),a vector where,of𝑤the is color) (Figure 3). color) This hyperplane be hyperplane described incan Equation (4), where, w is normal normal a vector the hyperplane and 𝑏 isofdescribed as area the position of the relative hyperplane and bof is described as the position the relative to the coordinate center. area to the coordinate center. w.xi + b = 0 (4) w.xi b 0 (4)
Information 2018, 9, 5 Information 2018, 9, 5
5 of 14 5 of 14
Figure of support vectorvector machine (SVM) to generalize the optimalthe separating Figure3.3.Illustration Illustration of support machine (SVM) to generalize optimal hyperplane separating in linear separable data. hyperplane in linear separable data.
Theoptimization optimizationofof margin its support can be converted into a constrained The thisthis margin to itstosupport vectorvector can be converted into a constrained quadratic quadratic programming as seen in(5) Equation (5) [32]. 𝜉𝑖 is the slack variable which programming problem asproblem seen in Equation [32]. Where, ξ i isWhere, the slack variable which represents represents the misclassified of the margin corresponding margin hyperplane, parameter the misclassified sample of the sample corresponding hyperplane, parameter C represents the costC represents theIfcost of the penalty. If Cminimization is too large,isthen error minimization of the penalty. C is too large, then error predominant. Otherwise, is if Cpredominant. is too small, Otherwise, C is too small,isthen margin maximization is emphasized. then marginifmaximization emphasized.
nn 2 11 min CC ∑ ξii min 2 kwwk + 2 ii=11 s.t. yi w T xiT + b ≥ 1 − ξ i b0 1 i s.t. y i w ξxi ≥ i i 0
(5) (5)
2.4. Kernel Function One ofFunction the obstacles in the classification process is the dispersion of data tending diversely, so it 2.4. Kernel will be difficult to be separated linearly [33,34]. In this case, SVM introduces the kernel function [35], One of the obstacles in the classification process is the dispersion of data tending diversely, so it K ( xn , xi ), which transforms the original data space into a new space with a higher dimension; will be difficult to be separated linearly [33,34]. In this case, SVM introduces the kernel function [35], this process includes the transformation function with dot product φ( x ) (Equation (6)). The aim 𝐾(𝑥𝑛 , 𝑥𝑖 ), which transforms the original data space into a new space with a higher dimension; this is the data, which already transformed into a higher dimension, can be separated easily. Thus the process includes the transformation function with dot product 𝜙(𝑥) (Equation (6)). The aim is the hyperplane function can be written in Equation (7). data, which already transformed into a higher dimension, can be separated easily. Thus the hyperplane function can be written in K Equation (7). φ( x )φ( x ) (6) (x , x ) = n
i
n
i
K xn , xi xn . xi N
∑ α n y n K ( x n , xi ) + b f xi n y n K x n , xi b is lagrange multiplier and y
f ( xi ) =
(6) (7)
n=1N
(7) where, xn is support vector data, αn n is the label of membership class n 1 (+1, −1) with n = 1, 2, 3, . . . , N. In this study, we investigate the comparison of using the four kernel where, 𝑥𝑛atisthe support vector data, is lagrange multiplier and(RBF), 𝑦𝑛 is sigmoid the labeland of membership functions SVM algorithm, i.e.,𝛼𝑛linear, radial basis function polynomial, class (+1, −1) with n = 1, 2, 3, …, N. In this study, we investigate the comparison of using the four which are listed in Table 1. Each kernel function has a particular parameter that must be optimized to kernel the functions at performance the SVM algorithm, i.e., linear, radial basis function (RBF), sigmoid and obtain best result [21]. polynomial, which are listed in Table 1. Each kernel function has a particular parameter that must be optimized to obtain the best result performance [21].
Information 2018, 9, 5
6 of 14
Table 1. Four common kernels [36]. No.
Kernel Function
Formula
Optimization Parameter
1
Linear RBF
K ( xn , xi ) = ( xn , xi ) K ( xn , xi ) = exp −γk xn − xi k2 + C
C and γ
2 3 4
Sigmoid Polynomial
K ( xn , xi ) = tanh(γ( xn , xi ) + r ) K ( x n , xi ) = ( γ ( x n , xi ) + r )d
C, γ, and r C, γ, r, and d
C and γ
Explanation, C: cost; γ: gamma; r: coefficient; d: degree.
The core of this stage is to analyze the optimal value of the parameter (i.e., C, γ, r, and d) for each kernel, so the unknown data can accurately be predicted by the classifier. In this study, we use the grid-search method for tuning parameters at the kernel function. In the grid-search method, firstly, we set the appropriate values in the region of the grid for the upper and lower bounds as follows: C 2−15 , 2−14 , . . . , 22 , γ 2−10 , 2−9 , . . . , 22 , r 2−10 , 2−9 , . . . , 22 , and d (0, 1, 2, 3). This method works by searching the combination of the parameters in the given length of region, then issuing the best parameter value based on the minimum classification error to build the classification model. In addition, in this study, this method offers the accuracy with 10-fold cross validation in the training data set. The grid-search is straightforward, but seems naive. The reasons why we use this method are as follows: (1) Psychologically, we may not feel safe to use approximation methods or heuristics, which perform extensive parameter search; (2) The computational time to find the optimal parameter values by the grid-search is not much more than those by advanced methods [37]; (3) The grid-search can be easily parallelized because each pair is independent [38]. In addition, the grid-search is kind of an iterative method and as we know, many advanced algorithms are based on iterative processes. 2.5. Classifier Evaluation Before making any predictions on whether the wood is infested by termites or not, we need to train the data set containing the characteristics corresponding to the experimental samples of the known class. Next, with the same data set, we evaluate the performance of the classification models. In this study, we used the receiver operating characteristics (ROC) curve for the evaluation process. A ROC curve depicts relative trade-offs between sensitivity or true positive rate (TPrate ) as the y coordinate and 1-specificity or false positive rate (FPrate ) as the x coordinate; it is useful in assigning the best cut-offs for classification [39]. The most common quantitative index for describing the accuracy is expressed by area under the ROC curve (AUC), which provides a useful parameter for assessing and comparing classifier. The calculation of AUC includes the results from f ( xi ) (Equation (7)) in training data set with different kernel function. Furthermore, the AUC can be determined in Equation (8) and Table 2 summarizes the accuracy’s grading system in AUC. AUC =
1 + TPrate − FPrate 2
(8)
Table 2. Grading system of accuracy in area under the ROC curve (AUC). AUC Range
Description
0.9 < AUC < 1.0 0.8 < AUC < 0.9 0.7 < AUC < 0.8 0.6 < AUC < 0.7
Excellent Good Worthless Not good
The area under the ROC curve (AUC) is a better tool for visualizing and evaluating classifiers than scalar measures such as accuracy, error rate or error cost [40,41]. The advantage of this curve is to enable visualizing and organizing classifier performance without regard to class distributions or
Information 2018, 9, 5 Information 2018, 9, 5
7 of 14 7 of 14
error costs. This characteristic becomes very important when investigating a learning with skewed distributions or cost-sensitive learning. According to Bradley [42], the ROC curve offers some error costs. This characteristic becomes very important when investigating a learning with skewed desirable properties to measure the classification performance, i.e., it indicates how well separated distributions or cost-sensitive learning. According to Bradley [42], the ROC curve offers some desirable the negative and positive classes are for the decision index. properties to measure the classification performance, i.e., it indicates how well separated the negative and positive classes are for the decision index. 3. Results 3. Results 3.1. Acoustic Signal Dispersion 3.1. Acoustic Signal Dispersion Figure 4, based on experiment results, shows two dimensional (2D) plots of the features extraction acoustic signal acquisition for the groups, (2D) i.e., plots the infested and uninfested Figurefrom 4, based on experiment results, shows two two dimensional of the features extraction group. Visually, the data dispersions in both groups are difficult to be separated if we apply the from acoustic signal acquisition for the two groups, i.e., the infested and uninfested group. Visually, linearly hyperplane, because will lead of course, it willseparable have an the dataseparable dispersions in both groups areitdifficult toto beoverlapping separated ifissues; we apply the linearly impact on errors in the classification process. In such a case, a kernel function is required to hyperplane, because it will lead to overlapping issues; of course, it will have an impact on errors in the transform the data into a higher dimensional space, so the acoustic signal characteristics in both classification process. In such a case, a kernel function is required to transform the data into a higher groups can easily beso separated. dimensional space, the acoustic signal characteristics in both groups can easily be separated.
Figure 4. Sample of 2D plots of the feature extraction from acoustic signal acquisition. Figure 4. Sample of 2D plots of the feature extraction from acoustic signal acquisition.
3.2. Grid-Search Optimization 3.2. Grid-Search Optimization In this section, the critical process presents the grid-search methods for the parameter optimization In this section, the critical process presents the grid-search methods for the parameter in each kernel function. First, the experiments were completed with package e1071 in R studio software. optimization in each kernel function. First, the experiments were completed with package e1071 in R To build the optimal hyperplane model, we used the total numbers of 50 data sets in each class for a studio software. To build the optimal hyperplane model, we used the total numbers of 50 data sets in training process that includes two feature extractions i.e., entropy and energy. To visualize how the each class for a training process that includes two feature extractions i.e., entropy and energy. To grid-search is employed, we give one example of a kernel that has two parameters (C and γ) for the visualize how the grid-search is employed, we give one example of a kernel that has two parameters optimization, i.e., RBF. Its graphical display is shown in Figure 5. As mentioned, based on 10-fold (C and 𝛾) for the optimization, i.e., RBF. Its graphical display is shown in Figure 5. As mentioned, cross validation results, the grid-search successfully finds the optimal pair of the both parameters based on 10-fold cross validation results, the grid-search successfully finds the optimal pair of the located in the blue zone (with error