Available online at www.sciencedirect.com
ScienceDirect
Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia
Procedia Computer Science 125 (2018) 902–909
6th International Conference on Smart Computing and Communications, ICSCC 2017, 7-8 December 2017, Kurukshetra, India A Novel Subset Feature Selection Framework for Increasing the Classification Performance of SONAR Targets
Sai Prasad Potharajua,*, M. Sreedevi b a,b
Department of CSE, K L University,Guntur(AP), India-522502
Abstract Proposing a strong subset of feature for a classifier to detect the SONAR (sound navigation and ranging) target is the most valuable subject in the sonar data analysis for the safety of naval vessels. In this research article, we introduced a novel generalized feature selection framework to determine the best subset of features for classifying sonar data set based on Symmetrical Uncertainty(SU). Using the proposed framework, we tried to form 'M' candidate subsets of features. Each set comprises of a finite number of features without any ingeminate. The resultant subset is analyzed using various Tree, Lazy, Bayesian, and Rule based classifiers. As each subset formed by proposed method consists of limited number of features in it, an equal number of top features extracted by existing filter based feature selection methods( Chi Squared -Chi, Information Gain(IG), Gain Ratio(GR), and ReliefF(Rel)) are taken into consideration, and analyzed with the same classifiers. After careful investigation of obtained results by existing and proposed methods, it has been noticed that minimum one of the candidate subset of features outperforming than some of the existing methods. For this experiment, a real time SONAR dataset found at UCI machine learning repository is considered. This framework also applied on other benchmark datasets to make it generalized. © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications. Keywords:Data mining; Feature Selection; Classification; SONAR; Symmetrical Uncertainty
1. Introduction Data Mining(DM) is an interdisciplinary technique used to read the unknown facts from the historical datasets. DM techniques are becoming popular in civil engineering, mechanical engineering, electrical engineering, and other branches of engineering in recent days because of different reasons. Those are: the size of the data sets along with the amount of memory required is rapidly increasing because of the advancements in technology. Discovering hidden information with these datasets is possible with the DM techniques[1]. It is not any easy process by a human to analyze the huge amount of datasets, so an artificial expert system for getting knowledge from the available data set is mandatory[2]. DM techniques provide a rapid solution for obtaining depth analysis about products and phenomena[3]. DM methods also helps in decision making, for example, fraud detection, intrusion detection, power *Corresponding author. Tel+91-9130002554; E-mail address:
[email protected]
1877-0509 © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications 10.1016/j.procs.2017.12.115
Sai Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai Prasad Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
903
demand forecasting and many[4]. An expensive experiment need not to be performed by DM, because they are purely on the basis of already collected dataset[5]. There is a wide scope of DM, it can be applied to wide variety of universal problems[6]. DM has specific methods for extracting knowledge by performing step by step activity[7]. Based on the implementation, operations, scalability of data, DM methods are classified as 1. Description, 2. Classification, 3. Regression, 4. Clustering, 5. Associations. Description deals with the graphical methods to represent the data set, and also it deals with the statistical measure of the dataset. In this research also one of the statistical measure called Symmetric Uncertainty(SU) is considered as the key criteria for the proposed framework. A classification is a supervised approach used to decide which class an instance belongs to. For this research, we considered various classification algorithms(Jrip, OneR, Ridor, J48, SimpleCart, NB, KNN) for testing the proposed framework. Regression also supervised approach and acts like classification, but the model is created on the numerical variables. Clustering is an unsupervised method used to group the similar objects in one set (Cluster). Association is used to know the close relationship between the objects of transactional data sets. DM is not a direct method to know the interesting patterns. First, data need to be gathered from different sources, then perform the preprocessing task for removing the noisy data, normalizing the missing data, reducing the data set. After this stage, create the model for classification by training the classifier. In this research paper, we mainly focused on the preprocessing task of DM. More precisely, we introduced a novel method for dimensionality reduction. For dimensionality reduction, feature selection (FS) is one path in the preprocessing[8]. The main objective of FS is to extract the best features which can create a strong classifier. FS is an essential approach because of many reasons. In a data set, all features need not to be taken into consideration for model generation. Because few features do not contribute anything or they can't give excessive strength to the created model, it means those are not relevant. Few features may be duplicated in the data set. These irrelevant and duplicated features consume more memory, if all features of data set is considered, then it requires high computational cost, it may obscure or debase the generated model. So, it is crucial to identify the irrelevant, redundant feature and select the best features for the classification[9]. FS techniques allow to choose the strong features and minimize the difficulties that can take place with the unwanted and duplicated features. There are few intelligent FS methods available in the literature. Those are categorized as Filter based, Wrapper based, and Hybrid methods. Filter based methods designate the rank to each feature based on the information worth. Wrapper method produces the candidate subset of feature by considering a searching algorithm. Hybrid combines the both these methods[10]. For this research contribution, we have considered filter based FS methods. The disadvantage of these filter method is, it ignores interaction with classifier. In the current research work, we tried to innovate a framework to decrease the feature space by selecting the unique features by forming in different clusters with a novel approach which will increase the classification performance. 2. Literature The main objective of the sonar systems is to examine the underwater acoustic waves obtained from various directions by a sensor system and decide the kind of target that has been noticed in a given direction. It is not an easy task to detect and classify the underwater mines. Data fusion is known to increase the accuracy of detection and classification, which is applied for sonar signal classification. Gaussian mixture model (GMM) based classifier for classification of underwater mines is proposed to increase the classification accuracy[11]. Because of the parallel processing and adaptive nature of the neural network, it has been applied to classification of sonar signals [12]. The authors followed two phase strategy for classification: first phase is for signal preprocessing and extraction of features, next phase is for the recognition process. Wavelet transformation is used in the first phase for feature extraction from the input data set. Then neural network is applied for the recognition process. For the underwater passive sonar signals classification, four types of neural network classifiers have been applied. Those classifiers are Probabilistic Neural Network (PNN), Learning Vector Quantization (LVQ), Multilayer Perceptron (MLP), and Adaptive Kernel Classifier (AKC)[13]. Association rule mining is applied to know the intrinsic rules for proper classification using two phase algorithm [14]. Discrete Cosine Transform and Linear prediction method is applied for feature extraction from the sonar signal system then applied neural network for
Sai Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai Prasad Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
904
classification of sonar signals to minimize the load of operator [15]. Classification of passive sonar signals based on multiway analysis is proposed by the authors[16]. In their research, multiway analysis is used for both dimensionality reduction and signal denoising. Irrelevant features are removed using parallel decomposition method. It is very difficult and also workload to detect the undersea mines and rocks by a manual intervention. General regression neural network (GRNN) has been used by the authors in their research to solve the practical difficulties for classification of underwater objects. To improve the classification performance, Principal component analysis (PCA) is applied to a feature extraction process[17]. DM techniques are applied in various engineering fields for different reasons. Clonal Selection is introduced by the authors for monitoring of a hydraulic break system[18]. For yarn quality prediction authors suggested support vector regression model in textile production management[19]. For process optimization, authors used ensemble combination of ANN and multi objective genetic algorithm for resistance spot welding process[20]. However, The key criteria considered to form the 'M' clusters of features is SU, which can be defined as SU=2*IG/(H(A) + H(B)) H(A) is Entropy of A, H(B) is Entropy of B SU gives the value in the range [0,1]. SU value 1 shows one feature can predict purely others, 0 shows two features are uncorrelated. 3. Methodology The objective of the proposed framework is to compact the feature space. If complete data set consists of 'R' attribute, from 'R' if there is a need to choose most popular 'S' attributes without any ingeminate, in such situation total C(R, S) number of clusters can be formed. Testing those many clusters(subsets) in case of the high dimensional dataset is not an easy task. But alternatively, filter based ranking approaches can be used to get the rank to each attribute and then most popular 'S' attributes can be taken into consideration for analysis. Other than the attributes chosen by existing methods, we proposed a novel framework for generating a candidate subset of features. Proposed method is as per the below algorithm. Algorithm 1. Get the rank and SU value of each attribute. 2. Ignore the attribute whose value is Zero, as it can't influence the model generation. 3. Order the all attributes in decreasing order of their rank . 4. Define a number of clusters or groups of features to be formed (M). 5. Calculate the total # features in each cluster (FC), where FC=n/M where 'n' is the total number of attributes whose SU value is greater than Zero. 6. Arrange the first 'FC' number of features from the list left to right direction .( ) 7. Arrange the next 'FC' number of features from the list right to left direction .( ) 8. Repeat step 6 then step 7 until all the features are arranged. 9. Group, all vertically first order features into the 1st cluster, 2ndorder features into the second cluster and so on. 10. If the number of features in each cluster are not equal, then remove the last feature from the cluster which has more features. 11. STOP Using the above methodology, if # clusters are 3,4,5, maximum 33% , 25%, 20% of the total features can be selected respectively and at-least one cluster can give the best performance than conventional methods. So , searching time for the best features will be reduced. Below Table1 has the list of features with its rank and SU value. Rank
SU Value
Table 1. SU value of each feature and Rank of SONAR dataset. Feature Rank SU Feature Rank SU ID Value ID Value
1
0.2037
11
8
0.1094
45
15
0.0814
Feature ID 36
Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai PrasadSai Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
905
2
0.1794
12
9
0.1064
44
16
0.0806
46
3
0.1565
9
10
0.1044
47
17
0.0802
21
4
0.144
10
11
0.0964
51
18
0.0793
4
5
0.1347
13
12
0.0957
54
19
0.0792
5
6
0.1145
48
13
0.089
28
20
0.0651
35
0.064
20
7 0.1117 49 14 0.0862 52 21 Total # features in initial dataset: 60 Total #features whose SU > 0 (TF): 21 Assume # clusters to be formed (S) is 4, then each cluster has 21/4 =5 features in it.
It can be noted that all these feature selection methods have given different ranks to few features[21]. According to the proposed methodology features in each cluster will be formed as per below Table2. Table 2. Formation of features in each cluster by proposed framework if # clusters are 4 1st 2nd 3rd 4th Direction Level Level Level Level Features Features Features Features
Cluster ID
11
12
9
10
Left to Right
45
49
48
13
Right to Left
44
47
51
54
Left to Right
46
36
52
28
Right to Left
21
4
5
35
Left to Right
20
Right to Left
IS41
IS42
IS43
IS44
From the above Table 2, IS44 cluster has an additional attribute i.e feature id 20, which has to be discarded from that cluster. After this process, store all first order features in cluster IS41, second order features in cluster IS42, third order features in cluster IS43, fourth order features in cluster IS44 except feature 20, because it has an extra feature than all other clusters. 4. Experiment For testing and analyzing the capability of proposed method, we considered the number of clusters (S)=3,4,5. For candidate subset of features in each cluster refer below Table3. Table 3. Candidate Subsets of features, Where # clusters are 3,4, and 5 Features with 3 clusters Features with 4 clusters Features with 5 clusters Cluster ID Features in it
Cluster ID Features in it
Cluster ID Features in it
IS31
11, 48, 49, 54, 28, 4, 5
IS41
11, 45, 44, 46, 21
IS51
11, 47, 51, 35
IS32
12, 13, 45, 51, 52, 21, 35 IS42
12, 49, 47, 36, 4
IS52
12, 44, 54, 5
IS33
9, 10, 44, 47, 36, 46, 20
9, 48, 51, 52, 5
IS53
9, 45, 28, 4,
IS43
Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai PrasadSai Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
906
IG@
11, 12, 9, 10, 13, 48, 49
IS44
10, 13, 54, 28, 35
IS54
10, 49, 52, 21
GR@
11, 12, 9, 44, 13, 54, 10
IG*
11, 12, 9, 10, 13
IS55
13, 48, 36, 46
Chi@
11, 12, 9, 10, 13, 48, 49
GR*
11, 12, 9, 44, 13
IG^
11, 12, 9, 10
Rel@
21, 11, 10, 36, 9, 45, 48
Chi*
11, 12, 9, 10, 13
GR^
11, 12, 9, 44
Rel*
21, 11, 10, 36, 9
Chi^
11, 12, 9, 10
Rel^ 21, 11, 10, 36 @ Top 7 features extracted by existing methods, * Top 5 features extracted by existing methods, ^ Top 4 features extracted by existing methods[21]. For testing the capability of each cluster of attributes, an equal number of top features extracted by the existing techniques (IG, Chi, Rel, GR) are taken into consideration. For example, IS31, IS32, IS33 clusters have 7 features in it. So, top 7 features extracted by the existing techniques are chosen to record the accuracy of those cluster of features. In a similar way, remaining subsets are measured by analyzing with Jrip, OneR, Ridor, J48, SimpleCart, Naive Bayes, KNN classifiers. This experiment is carried out using popular machine learning Tool WEKA. 5 Results and Discussion The accuracy of each classification algorithm on each subset of features and their corresponding ranks are given in this section with the possible discussion. Rank of each subset by the selected classifier is indicated with / (slash). Note: The existing method performance is given in Bottom 4 rows of every table ( Table 4 to Table 6). Table 4 . Performance analysis with 3 clusters. ID
Jrip
OneR
RidoR
J48
SC
NB
KNN
IS31
70.67/6
61.53/3
67.3/6
70.19/6
71.63/5
70.19/1
69.23/5
IS32
73.55/2
61.53/3
68.75/4
73.07/5
72.59/3
67.78/3
75.96/4
IS33
73.07/3
61.53/3
74.03/2
76.92/1
80.28/1
70.19/1
79.32/3
CHI*
71.15/5
59.13/4
71.63/3
74.51/3
72.59/3
63.46/5
80.76/1
GRAE*
72.59/4
62.98/2
65.38/7
73.55/4
72.11/4
66.82/4
75.96/4
IG*
71.15/5
69.23/1
74.51/1
76.92/2
74.03/2
69.23/2
80.28/2
REL*
74.51/1
61.53/3
67.3/5
70.19/6
71.63/5
70.19/1
69.23/5
From the Table 4, ReleiefF achieved better performance with Jrip, but IS32 cluster of features performed better than other three existing methods with Jrip. Information Gain performed better with OneR, but all the clusters of features formed by proposed performed equally with ReliefF and better than Chi squared attribute evaluator. Information gain has recorded better accuracy with Ridor, but IS33 cluster of features displayed better than remaining existing methods. IS33 cluster of features boosted the accuracy of J48, SC, NB classifiers than the existing methods. From the Table 5, all existing methods performed well with Jrip except Chi squared Attribute Evaluator, but IS41 cluster of features recorded better than Gain Ratio Attribute Evaluator. IS42 displayed the highest accuracy than all existing methods with OneR and Naive Bayse. IS43 cluster of features boosted the classification accuracy of RidoR and J48 than existing methods. SimpleCart has given better performance with the IS41 cluster of features than existing methods.
Sai Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai Prasad Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
907
Table 5 . Performance analysis with 4 clusters. ID
Jrip
OneR
RidoR
J48
SC
NB
KNN
IS41
73.07/2
62.98/2
69.23/3
71.15/4
74.51/1
62.01/6
72.11/3
IS42
72.11/3
66.82/1
69.71/2
71.63/3
69.71/5
71.63/1
69.23/5
IS43
70.67/4
59.61/4
72.59/1
73.55/1
71.63/3
62.98/5
64.9/7
IS44
69.71/5
62.98/2
68.75/4
69.23/6
69.71/5
71.15/2
74.03/2
CHI*
74.03/1
61.53/3
69.71/2
70.19/5
72.59/2
70.19/3
67.3/6
GRAE*
72.11/3
61.53/3
68.75/4
72.11/2
72.59/2
68.36/4
70.19/4
IG*
74.03/1
61.53/3
69.71/2
70.19/5
72.59/2
70.19/3
67.3/6
REL*
74.03/1
61.53/3
68.26/5
70.19/5
71.15/4
71.15/2
76.44/1
Table 6 . Performance analysis with 5 cluster ID
Jrip
OneR
RidoR
J48
SC
NB
KNN
IS51
69.71/5
62.98/4
67.3/5
69.23/4
73.07/1
73.55/2
62.5/8
IS52
69.71/5
68.26/1
67.78/4
67.3/6
71.15/4
68.75/5
62.98/7
IS53
66.82/7
63.46/3
73.55/1
71.15/3
68.26/6
65.86/7
69.71/3
IS54
68.26/6
61.05/6
68.26/3
68.26/5
66.34/8
68.26/6
68.26/5
IS55
75.96/1
61.05/6
70.19/2
65.86/7
67.3/7
69.23/4
73.55/1
CHI*
70.19/4
60.53/7
68.26/3
71.15/3
71.63/3
68.75/5
66.34/6
GRAE*
70.67/3
65.13/2
67.78/4
72.11/1
72.11/2
70.67/3
69.23/4
IG*
70.19/4
60.53/7
68.26/3
71.15/3
71.63/3
68.75/5
66.34/6
REL*
74.51/2
62.5/5
70.19/2
71.63/2
70.19/5
74.03/1
73.07/2
* Existing methods. From the Table 6, IS55 cluster of features formed by proposed method registered the highest accuracy than all existing methods with Jrip and KNN. OneR performed well with IS52, RidoR performed well with IS53, Simple cart recorded highest accuracy with IS51 than all existing methods. Gain Ratio Attribute Evaluator displayed better
Sai Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai Prasad Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
908
results than all with J48, but IS53 competing with Chi sqared Attribute Evaluator and Information Gain. IS31 also given better accuracy except ReliefF with Naive Bayse. If all the 60 features of data set is trained using the all classifiers, few classifiers have displayed bit better results than existed and proposed clusters of features, but training time to build the model is more than the all cluster of features. Few classifiers are recorded less accuracy than the cluster of features formed by proposed and existing methods. Classification accuracy with all 60 features by various classifiers is given in Table 7. Table 7 . Performance analysis with total features Jrip OneR Ridor J48 SC NB KNN 76.92
62.5
73.55
71.15
71.15
67.78
86.53
To strengthen the capability of proposed framework, it is also tested with few more real time benchmark datasets belongs to various fields[21]. 6. Conclusion In this research, a novel M-clusters of feature selection and ranking framework is introduced. The proposed framework is evaluated using real time SONAR data set. With this framework ‘S’ number of candidate subset of attributes are formed, each candidate subset has the least number of features without any redundancy. All the subsets of features are analyzed with Tree, Rule, Lazy, and Bayse learners and respective performance results are compared with the existing filter-based feature selection techniques. Then, ranking for each candidate subset is assigned as per the performance. Recorded results show that one of the candidate subsets, in some cases more than one subset registered boosted accuracy than existing methods. With this, we conclude that instead of choosing the features extracted by already existing methods, depending on the need, the proposed framework can be used to form the subset of features for better prediction accuracy. In this study, we considered 3, 4, 5 clusters of features, with this maximum 33%, 25%, 20% of features can be selected respectively. Thereby training time and memory consumption for model generation can be decreased. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Larose T., Discovering Knowledge in Data: An Introduction to Data Mining, Wiley & Sons, 2005. Rogalewicz, M. and Sika, R., 2016. Methodologies of knowledge discovery from data and data mining methods in mechanical engineering. Management and Production Engineering Review, 7(4), pp.97-108. Silwattananusarn, T. and Tuamsuk, K., 2012. Data mining and its applications for knowledge management: a literature review from 2007 to 2012. arXiv preprint arXiv:1210.2872. Ngai E., Hu Y., Wong Y., Chen Y., Sun X., The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature, Dec. Supp. Sys., 50, 559–569, 2011. Guyon, I., Cawley, G., Dror, G. and Lemaire, V., 2011, April. Results of the active learning challenge. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 (pp. 19-45). 03.11.2016, www.statsoft.pl/czytelnia/. Liao S., Chu P., Hsiao P., Data mining techniques and application – A decade review from 2000 to 2011, Exp. Sys. with App., 39, 11303– 11311, 2012. Azevedo A., Santos M., KDD, SEMMA and CRISPDM: a parallel overview, IADIS European Conf. Data Mining, pp. 182–185, 2008. Tang, J., Alelyani, S. and Liu, H., 2014. Feature selection for classification: A review. Data Classification: Algorithms and Applications, p.37. Sarkar, C., Cooley, S. and Srivastava, J., 2014. Robust feature selection technique using rank aggregation.Applied Artificial Intelligence, 28(3), pp.243-257. Chandrashekar, G., Sahin, F., 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 16–28. doi:10.1016/j.compeleceng.2013.11.024 Kotaria, V. and Changa, K.C., 2011, May. Fusion and Gaussian mixture based classifiers for SONAR data. In SPIE Defense, Security, and Sensing(pp. 80500U-80500U). International Society for Optics and Photonics. Chin-Hsing, C., Jiann-Der, L. and Ming-Chi, L., 1998. Classification of underwater signals using wavelet transforms and neural networks.Mathematical and computer modelling, 27(2), pp.47-60. Chen, C.H., Lee, J.D. and Lin, M.C., 2000. Classification of underwater signals using neural networks,Tamkang Journal of Science and Engineering,3(1), pp.31-48. Chen, R., Li, H. and Tang, S., 2001. Association rules enhanced classification of underwater acoustic signal. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on (pp. 582-583). IEEE. Hashem, H.F., 2004, September. Automatic classification of underwater sonar signals. In Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004 7th Seminar on (pp. 121-125). IEEE.
Sai Prasad Potharaju et al. / Procedia Computer Science 125 (2018) 902–909 Sai Prasad Potharaju, M. Sreedevi/ Procedia Computer Science 00 (2018) 000–000
909
16. Oliveira, R.L., de Lima, B.S. and Ebecken, N.F., 2014. Multiway analysis in data SONAR classification. Mechanical Systems and Signal Processing,45(2), pp.531-541. 17. Erkmen, B. and Yıldırım, T., 2008. Improving classification performance of sonar targets by applying general regression neural network with PCA.Expert Systems with Applications, 35(1), pp.472-475. 18. Jegadeeshwaran R., Sugumaran V., Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA) – a statistical learning approach, Eng. Sc. and Tech., 18, 1, 14–23, 2015. 19. Lu Z.J., Xiang Q., Wu Y. et al., Application of Support Vector Machine and Genetic Algorithm Optimization for Quality Prediction within Complex Industrial Process, Proc. of IEEE 13th Int. Conf. On Ind. Inf., pp. 98–103, 2015. 20. PasPashazadeh, H., Gheisari, Y. and Hamedi, M., 2016. Statistical modeling and optimization of resistance spot welding process parameters using neural networks and multi-objective genetic algorithm. Journal of Intelligent Manufacturing, 27(3), pp.549-559. 21. https://saiprasadcomp.files.wordpress.com/2017/10/wordpress-reference.pdf