A Comparative Study to Evaluate Filtering Methods for ... - Science Direct

25 downloads 0 Views 548KB Size Report
classify the crimes into three different categories; low, medium and high. The experiment is .... per 100K population (violent crimes per pop). If the value is less ...
ScienceDirect ProcediaScienceDirect Computer Science 00 (2017) 000–000

Available online at www.sciencedirect.com

Available online at www.sciencedirect.com Procedia Computer Science 00 (2017) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 116 (2017) 113–120

2nd International Conference on Computer Science and Computational Intelligence 2017, ICCSCI 2017, 13-14 October 2017, Bali, Indonesia 2nd International Conference on Computer Science and Computational Intelligence 2017, ICCSCI 2017, 13-14 October 2017, Bali, Indonesia

A Comparative Study to Evaluate Filtering Methods for Crime Data Feature Selection A Comparative Study to Evaluate Filtering Methods for Crime Data Feature Selection Masita @ Masila Abdul Jalil, Fatihah Mohd*, Noor Maizura Mohamad Noor Masita @ Masila Abdul Jalil, Fatihah Mohd*, Noor Maizura Mohamad Noor

School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia. School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia.

Abstract Abstract In this study, we present a comparative study on correlation and information gain algorithms to evaluate and produce the subset of crime features. The main objective of the study is to find a subset of attributes from a dataset described by a feature set and to In this study, we present a comparative study on correlation and information gain algorithmsistocarried evaluate the subsetand of classify the crimes into three different categories; low, medium and high. The experiment outand on produce the communities crime dataset features.using The WEKA, main objective the study to findsoftware. a subsetBased of attributes from achosen datasetbydescribed by aselection feature set and to an openofsource data is mining on attributes five features methods, classify the crimes into three classification different categories; low,were medium and high. The experiment is carried outexperiment on the communities and the accuracy rates of several algorithms obtained for analysis. The results from the demonstrated crime dataset using method WEKA,out an performed open source data mining software. Based on attributes chosen by five that, the correlation information gain and human expert with a mean accuracy of features 96.94% selection for entire methods, classifier the of several classification obtained for analysis. The results from the experiment demonstrated and accuracy FSs with rates 13 optimal features selection.algorithms This subsetwere feature is important information for classification and can be effectively that, the to correlation method performed information and human expert with asupport mean accuracy 96.94%infor entireprevention classifier applied crime dataset to out predict crime category forgain different state and directly decisionofmaking crime and FSs with 13 optimal features selection. This subset feature is important information for classification and can be effectively system. applied to crime dataset to predict crime category for different state and directly support decision making in crime prevention system. © 2017 The Authors. Published by Elsevier B.V. © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and © 2017 The Authors. Published by Elsevier B.V. Computational Intelligence 2017. Computational Intelligence 2017. Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and Computational Intelligence 2017. Keywords: Correlation; crime prevention; feature selection; filter method; Keywords: Correlation; crime prevention; feature selection; filter method;

* Corresponding author. Tel.: +609-668-3274; fax: +609-669-3326 E-mail address: [email protected] * Corresponding author. Tel.: +609-668-3274; fax: +609-669-3326 E-mail address: [email protected] 1877-0509 © 2017 The Authors. Published by Elsevier B.V.

Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and Computational Intelligence 2017. 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and Computational Intelligence 2017. 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 2nd International Conference on Computer Science and Computational Intelligence 2017. 10.1016/j.procs.2017.10.018

114 2

Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

1. Introduction Crime prevention refers to the a series of programs that are involved with individuals, communities, businesses, non-government organizations and all levels of government in addressing the various social and environmental factors that contribute to the risk of community’s crime, disorder and victimization 1 2 3. There are many different approaches to crime prevention that focus in the intervention, the types of activities that are organized, and the mechanisms that are applied cover the environmental, social and economic, and criminal justice system features. All of the approaches aim to reduce the opportunities for crime to occur through community environments 4. These features may seek to involve multiple types of race and ethnic categories (white Americans, black African and Asian.), different incomes class (low, middle and high income), various types of age categories, structure of family (single, married partners, unwed partners, parents with kids), education level (primary, secondary, university), population of town or locality in people live (housing price, types of house, home size), number of civil law enforcement assigned to a town, number of people working and the unemployment rate and others 5. Because of the huge number of features included in the communities and crime data, many factors may affect the outcome of the crime prediction system. Thus, selecting the most relevant features and information is critical to improving the accuracy of prediction systems. Feature selection (FS) is a method of discovering the relevant features and removing the irrelevant features, often motivates to the performance of the learning algorithm. FS is also able to gain information about the process, reduce the data, storage and cost. There are two main models of feature selection: filter methods and wrapper methods. While filter models rely on the general characteristics of the training data to select features with independence of any predictor, wrapper models involve optimizing a predictor as part of the selection process 6. In this paper, a number of filter methods are used over crime datasets with different number of relevant features. The results obtained for the filters studied; correlation attributes evaluator, correlation-based features subset evaluator and information gain are compared and discussed. This paper is organized as follows. Some related works are discussed in Section 2. Section 3 discusses the materials and methods used, containing communities and crime dataset, crime dataset preprocessing, and feature selection used in this study. Section 3, the experiments and results produced by features selections are presented and discussed. Finally, the conclusion of the study is concluded in Section 4. 2. Related Works For many years, various studies have been done to communities and crime data. Buczak and Gifford 7 discovered a relationship between various crimes attributes by applying fuzzy association rule mining in crime pattern application. Halawa 8 explored multilayer perceptron into communities and crime dataset attribute for predicting the number of crimes (per capita violent crime). Iqbal et al. 5 also applied crime dataset from UC Irvine (UCI) machine learning repository for crime prediction. They used the manual method for attribute selection based on human expert. From 128 attributes, only twelve (12) attributes are chosen, namely country state, median family income, median household income, per capita income, number of people under the federal poverty level, percentage of people 25 and over with less than a 9th grade education, percentage of people 25 and over that are not high school graduates, percentage of people 25 and over with a bachelor’s degree or higher education, percentage of people 16 and over in the labor force and unemployed, percentage of people 16 and over who are employed, population of community, total number of violent crimes per 100K population. In experimental works, they found that decision tree performed well than the naïve Bayesian for the crime dataset with twelve (12) selected features. Anuar et al. 9 applied a particle swarm optimization (PSO) as a FS methods to communities and crime dataset. They proposed a hybrid crime classification model for crime prediction by combining artificial neural network (ANN), PSO and grey relation analysis (GRA). The study aimed to identify the significant features of the specific crimes and to classify the crimes into three (3) different categories. Another study developed crime location forecast method by means of calculating probability on socio economic and frequent closed item set lattice (FCIL) algorithm to locate the crime locations using the UCI data 10. The FS methods are also explored as a hybrid algorithm in order to gain the optimum selected features such as combined information gain and sequential backward floating 11, hybrid generalized F-score (GF) with sequential forward search (SFS), sequential forward floating search (SFFS) and sequential backward floating search (SBFS) 12. To further improve the study in crime dataset, we explore the crime features by filtering



Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

115 3

method. The correlation has been selected as filter features selection method based on the encouraging experiment result by Mohd et al. 13. The detail of this study will be discuss on the next section. 3. Materials and methods This section first describes the crime dataset, preprocessing and then elaborates the features selection methods used in this study. Figure 1 illustrates the overall processes involved in the implementation of the proposed model. Detailed explanation on each process will be described in the next subsection.

Crime Dataset

Features Filtering Selecting the subset features

Set of all features

Data Pre-processing Machine Learning Algorithm CA1

CA2

CA3

CA4

CA5

CA6

CA7

CA8

Evaluate Performance The Best Subset Features Fig. 1. Implementation of filter methods for subset features.

3.1. Crime dataset collection The crime dataset used in the study is acquired from UCI machine learning repository 14 15. The title of the dataset is the communities and crime. This dataset contains a total number of 128 attributes and 1994 instances. All data provided in this dataset are numeric and normalized. The variables included in the dataset involve the community, such as percent of the population considered urban, and the median household income. It has involved law enforcement, such as the number of police officers per 100K population, and the number of policemen allocated to special crime units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the every state. 3.2. Crime dataset pre-processing The dataset consists of 128 numbers of attributes and 1994 numbers of instances with missing values. Data preparation is important to improve data quality in classification. Thus, the following data preparation process was carried out to discover the final set of attribute in a suitable form for the next analysis and processing. The data preparation is based on the study done by Iqbal et al. 5.  All the attributes with large number of missing values were removed  The newly added nominal attribute named crime categories is created based on total number of violent crimes per 100K population (violent crimes per pop). If the value is less than 0.25 than the crime categories is low. If the value is equal to or greater than 0.25 than the crime categories is medium. If the value is equal to or greater

Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

116 4

 

than 0.40 than the crime categories is high. The final count of crime categories are low is 1315, medium is 386 and high is 293. The final number of attributes after data preparation is 104. This number of attributes will execute to feature selection process. All the data will be normalized into (0, 1) using the min-max method.

3.3. Feature selection methods used Feature selection is useful in reducing dimensionality, removing irrelevant data and noise to improve results. It could directly reduce and remove an irrelevant number of the original features by selecting a subset that contributes to the optimum information for classification 16. The FS algorithms are divided into two (2) categories: the filter and wrapper method. The filter method contributes a high computational efficiency compared to the wrapper method 17. Therefore, in this study the filter methods were chosen for attribute selection including correlation features evaluator (CorrelationAttributeEval), correlation-based feature subset evaluator (CfsSubsetEval) and information gain (InfoGainAttributeEval). The correlation method evaluates the worth of features by measuring the correlation among the features and the class. Nominal features are measured on a value by value basis by considering each value as an indicator. In general correlation for the nominal features are attained by the weighted average. Somehow, correlation-based feature subset evaluates the worth of a subset of features after view of the capability of prediction of each feature together with the degree of repetition between them. Subsets of features with a high value of class correlation and low inter correlation are chosen. Information gain evaluates the worth of an attribute by measuring the information gain with respect to the class. All the attributes were searched using these algorithms; best first forward or sequential forward FS (SFFS) greedy stepwise and ranker. All the experimental works in this study, are conducted in Waikato Environment for Knowledge Analysis (WEKA) software, one of the data mining tools with 10 fold cross validation. 4. Experimental results and discussion This section presented and described the finding of experiments in this study. 4.1. Feature selection results Table 1 presents the result of feature selection for each method. There are four (4) features selection methods proposed in the experimental works represented as FS1 to FS4. Both FS1 and FS2 are correlation-based feature subset evaluator (CfsSubsetEval) with best first forward and greedy step wise as searching algorithm. FS3 is a hybrid FS method that combines information gain with correlation-based feature subset evaluator. Whereas, FS4 is a hybrid FS method that combines correlation attribute evaluator with correlation-based feature subset evaluator. Finally, a method from previous study 5 that based on the human expert with twelve (12) features is also selected in experimental work namely FS5. The result shows that FS1 selects fifteen (15) features while FS2 and FS3 have similar number of significant features of crime data. FS4 selects lowest features number with thirteen (13) attributes of the experiment works. Table 1: Selected feature number with FS methods. FS

Feature Selection Method

Feature Number

FS1

CfsSubSetEval with Best First Forward

15

FS2

CfsSubSetEval with Greedy Step Wise

16

FS3

Information Gain Attribute Evaluator with Ranker, Combines with CfsSubSetEval with Linear Forward Selection (LFS)

16

FS4

Correlation Attribute Evaluator with Ranker, Combines with CfsSubSetEval with Linear Forward Selection (LFS)

13

FS5

Human expert

12

5



Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

117 5

In order to evaluate the performance of an optimal feature subset, this study applied eight (8) classification algorithm (CA): naïve Bayes, multilayer perceptron, super vector machine, logistic, multi class classifier, random forest, decision stump, and random tree using a data mining tool, WEKA. The result of the algorithms is used to evaluate the accuracy performance measurements which are the percentages of instances with classified correctly by the classifiers. Finally, the best result gives the best subset of features for that particular type of dataset. Table 2 shows the overall performance of features subset methods and their respective classification algorithm used in crime dataset. The results for all FS methods (F1 to F5) with eight (8) classifiers show that the features selected contributed are gaining more than 90% correctly classified almost the entire classification algorithm used for the crime dataset exclude decision stump (CA7) with only 85.03%. Table 2: Performance measurement for FS based on classification algorithm. Classification Algorithm (CA)

FS1 (15)

FS2 (16)

FS3 (16)

FS4 (13)

FS5 (12)

Naïve Bayes (CA1)

91.1234

90.8225

90.8225

90.9729

90.9729

Multilayer Perceptron (CA2)

98.9468

98.5958

90.5958

98.6961

98.5958

Super Vector Machine (CA3)

95.6369

95.5366

95.5366

95.7372

96.0883

Logistic (CA4)

99.3480

99.348

99.348

99.4985

99.5486

MultiClassClassifier (CA5)

99.3480

99.3982

99.3982

99.6991

99.3480

RandomForest (CA6)

100.00

100.00

100.00

100.00

99.8997

DecisionStump (CA7)

85.3059

85.3059

85.3059

85.3059

85.3059

RandomTree (CA8)

96.3892

95.9880

95.9880

97.6429

95.0853

Average Correctly Classified

95.7623

95.6244

94.6244

95.9441

95.6056

The empirical comparison between fives (5) FS methods for the entire classifier algorithms is shown in Table 2 is also presented as a graph comparison (see Figure 2). It illustrates a mean accuracy performance of all classifiers for FS1 to FS5. FS3 shows the lowest accuracy with a mean of 94.62% than the entire methods. Fortunately, the other proposed filter FS method; FS1, FS2, and FS4 produced higher classification accuracy than human expert, FS5 (96.60%).

Fig. 2. Comparison between different measures of the feature selection methods

Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

118 6

Generally, the proposed method, FS4 gives the highest accuracy with a mean of 96.94% for entire classifier and FSs with thirteen (13) optimal features selection. The chosen features are namely US state, percentage of population that is Caucasian, per capita income for people with other heritage, percentage of people 25 and over with a bachelor’s degree or higher education, percentage of kids in family housing with two parents r capita income, percentage of moms of kids under 18 in labour force, percentage of kids born to never married, percentage of immigrants who immigrated within last 8 years, percent of family households that are large (6 or more), median gross rent as a percentage of household income, number of homeless people counted in the street, population density in persons per square mile, and total number of violent crimes per 100K population (see Table 3). Table 3: Crime dataset features No.

Features

Data Type

Description

1.

State

Numeric

state

2.

racePctWhite

Numeric - decimal

percentage of population that is Caucasian

3.

OtherPerCap

Numeric - decimal

per capita income for people with other heritage

4.

PctBSorMore

Numeric - decimal

percentage of people 25 and over with a bachelor’s degree or higher education

5.

PctKids2Par

Numeric - decimal

percentage of kids in family housing with two parents per capita income

6.

PctWorkMom

Numeric - decimal

percentage of moms of kids under 18 in labor force

7.

PctIlleg

Numeric - decimal

percentage of kids born to never married

8.

PctImmigRec8

Numeric - decimal

percentage of immigrants who immigrated within last 8 years

9.

PctLargHouseFam

Numeric - decimal

percent of family households that are large (6 or more)

10.

MedRentPctHousInc

Numeric - decimal

median gross rent as a percentage of household income

11.

NumStreet

Numeric - decimal

number of homeless people counted in the street

12.

PopDens

Numeric - decimal

population density in persons per square mile

13.

ViolentCrimesPerPop

Numeric - decimal

Total number of violent crimes per 100K population

14.

Crime Category

Nominal

Crime categorization in to three categories, namely low, medium, and high (Goal attribute to be predicted)

In the experimental work, the results also show the highest scoring features selected by FS1 to FS4 when using the correlation-based feature subset evaluator (CfsSubsetEval) is eight (8) features. The chosen features are:  state  percentage of the population that is Caucasian  percent of family households that are large (6 or more)  median gross rent as a percentage of household income  percentage of people 16 and over who are employed in professional services  percentage of kids in family housing with two parents per capita income  number of homeless people counted on the street  total number of violent crimes per 100k population. Moreover, this study found that the most significant feature in crime category classification for FS1 to FS5 is a number of homeless people counted on the street (NumStreet). This is differently subset produced by FS5 which is based on human understanding and intellect 5. As a result, the proposed subset features presented from this study are concluded as the most significant features that successfully added value to improve the accuracy of crime category classification. The result from the accuracy performance (see Figure 2) also shows that most of the feature selection proposed in this study performed better than FS5 in predicting all the classes. We also compared our results with several available feature selection techniques in other studies. Iqbal et al. 5 applied the human expert to select the significant features and they produced twelve (12) selected features. With these features, they achieved 84% correctly classified with decision tree and 70.8% with Bayesian. Anuar et al. 18 shows their study outperform other human expert method with particle swarm optimization that produced also twelve (12) significant features. They achieved 86.48% accuracy applying artificial neural network (ANN) and artificial bee colony (ABC) algorithm (codename ANN-ABC). In this study, the human expert selection method 5 is also used for



Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

119 7

feature selection. The result of the experimental works shows that most of the classification algorithm has improved the accuracy with a mean of 95.61 % compared with Iqbal et al. 5. Moreover, with a proposed FS method that combines correlation and CfsSubSetEval with linear forward selection, the finding shows that a mean accuracy of the tested classification algorithm have correctly classified 96.94% of the dataset with thirteen (13) selected features. This finding presents that this study has successfully improved in term of accuracy compared with others. 5. Conclusion This paper presents a comparative study in crime classification modelling. The correlation and information gain as a feature selection method are used to identify and obtain the significant features for different type of crimes category; low, medium and high. The linear forward selection, greedy stepwise and ranker are applied to rank the significant features of each type of crimes. The machine learning algorithm are used to classify each crime into three (3) different categories which are low, medium and high. The experimental results indicate that the proposed method, hybrid FS that combined correlation attribute evaluator with ranker and correlation subset evaluator with linear forward selection (FS4) is an acceptable model to analyse crime data with thirteen (13) significant features. This subset feature is important information for classification and can be effectively applied for crime dataset to predict crime category for different state. This would help the police to prioritize the important information for the specific type of crimes and directly supporting for decision making in crime prevention system. The future direction includes the consideration of using proposed feature subset to further apply other classification algorithm on the crime dataset. Acknowledgements This study has been supported in part of the Fundamental Research Grant Scheme (FRGS) with vot number 59394, under the Malaysia Ministry of Higher Education (MOHE) and Universiti Malaysia Terengganu (UMT). The authors would like to acknowledge all contributors who have provided their assistance in the completion of the study and anonymous reviewers of this paper. Their useful comments have played a significant role in improving the quality of this work. The authors would also like to thank for continuous supports given by Director of Criminal Investigation Department, Royal Police Malaysia (RPM). References 1. Cameron M, MacDougall CJ. Crime prevention through sport and physical activity. Canberra: Australian Institute of Criminology; 2000. 2. Hughes G, Edwards A. Crime Control and Community. London: Taylor & Francis Ltd; 2013. 3. Homel PJ. The whole of government approach to crime prevention. Canberra: Australian Institute of Criminology.; 2004. 4. Clarke RV. Situational crime prevention: Theory and practice. The British Journal of Criminology. 1980; 20(2): p. 136-47. 5. Iqbal R, Azmi Murad MA, Mustapha A, Payam Hassany SP, Khanahmadliravi N. An experimental study of classification algorithms for crime prediction. Indian Journal of Science and Technology. 2013; 6(3): p. 4219-422. 6. Sanchez-Marono N, Alonso-Betanzos , Tombilla-Sanroman. Filter methods for feature selection - a comparative study. In Yin H, Tino PCE, Byrne W, Yao XY, editors. Intelligent Data Engineering and Automated Learning - IDEAL 2007: 8th International Conference, Birmingham, UK, December 16-19, 2007. Proceedings. Berlin Heidelberg: Springer Berlin Heidelberg; 2007. 7. Buczak AL, Gifford CM. Fuzzy association rule mining for community crime pattern discovery. In ACM SIGKDD Workshop on Intelligence and Security Informatics.; 2010; Washington, D.C.: ACM. p. 1-10. 8. Halawa K. A method to improve the performance of multilayer perceptron by utilizing various activation functions in the last hidden layer and the least squares method. Neural Processing Letters. 2011; 34(3): p. 293-303. 9. Anuar S, Selamat A, Sallehuddin R. Hybrid particle swarm optimization feature selection for crime classification. In Barbucha D, Nguyen NT, Batubara J, editors. New Trends in Intelligent Information and DataNew Trends in Intelligent Information and Database Systems. Switzerland: Springer International Publishing; 2015. 10. Radhakrishnan S, Devarasan E. Computing the probability on socio economic factors to predict the crime locations by means of joint probability based AMABC-FCIL. International Journal of Intelligent Engineering and System. 2016; 9(3): p. 80-89.

120 8

Masita @ Masila Abdul Jalil et al. / Procedia Computer Science 116 (2017) 113–120 Author name / Procedia Computer Science 00 (2017) 000–000

11. Sundaram A, LV N, SP R. A hybrid feature selection method based on IGSBFS and naïve bayes for the diagnosis of erythemato-squamous diseases. International Journal of Computer Applications. 2012; 41(7): p. 13-18. 12. Xie J, Lei J, Xie W, Gao X, Liu X, Shi Y. Novel hybrid feature selection algorithms for diagnosing erythemato-squamous diseases. In The International Conference on Health Information Science; 2012; Beijing. p. 173-185. 13. Mohd F, Abu Bakar Z, Mohamad Noor NM, Ahmad Rajion Z, Saddki N. A hybrid selection method based on HCELFS and SVM for the diagnosis of oral cancer staging. Advanced Computer and Communication Engineering Technology. 2015 January; 10(2). 14. UCI Machine Learning Repository. [Online]. https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime.

[cited

2017

April.

Available

from:

15. Redmond M, Baveja A. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research. 2002; 141(3): p. 660-678. 16. Chu C, Hsu AL, Chou KH, Bandettini P, Lin CP, ADNI. Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. Neuroimage. 2012; 60(1): p. 59-70. 17. ElAlami ME. A filter model for feature subset selection based on genetic algorithm. Knowledge-Based Systems. 2009; 22(5): p. 356-362. 18. Anuar S, Selamat A, Sallehudddin R. Hybrid artificial neural network with artificial bee colony algorithm for crime classification. In S. PA, T.W. A, editors. Computational Intelligence in Information Systems: Proceedings of the Fourth INNS Symposia Series on Computational Intelligence in Information Systems (INNS-CIIS 2014). Switzerland: Springer; 2015.