A Hybrid Approach based on Particle Swarm ...

3 downloads 34031 Views 1MB Size Report
Uses a number of agents (particles) that constitute a swarm moving around in the search space looking for the best ... Contact Details. First Author. Dr. Hossam Faris. Associate Professor, Department of Business Information Technology.
A Hybrid Approach based on Particle Swarm Optimization and Random Forests for E-mail Spam Filtering Dr. Bashar Al-Shboul Assistant Professor, Dept. of BIT Web Intelligence Research Group The University of Jordan [email protected]

Outline…  Related Work  Particle Swarm Optimization  Proposed Method  Contribution  Experiment Setup  Results

Introduction  Don’t we all receive spam e-mails ?  spam filtering problem is considered as a text categorization problem  the high-dimensionality of the problem is a main challenge when sophisticated learning algorithms are applied in text categorization  Vector Space Model

Related Works  Artificial Neural Networks (ANN)  Silva et al., 2012  Faris et al., 2015  Deshpande et al., 2007

 Naïve Bayes (NB)  Sakkis et al., 2003

 K-Nearest Neighbor (kNN)  Drucker et al., 1999

 Support Vector Machine (SVM)  Blanco et al., 2007

 Ensemble Methods (RF, Boosting Trees, Combined SVMs, Voting, among others)  Delany et al., 2006  Fernandez-Delgado et al., 2014  DeBarr & Wechsler, 2009  Rios & Zha, 2004

Particle Swarm Optimization  Craig Reynolds, 1986  avoid crowding local flockmates  move towards the average heading of flockmates  move toward the average position of flockmates

 The Algorithm:  Uses a number of agents (particles) that constitute a swarm moving around in the search space looking for the best solution  Each particle in search space adjusts its “flying” according to its own flying experience as well as the flying experience of other particles

Particle Swarm Optimization  Collection of flying particles (swarm) - Changing solutions  Search area - Possible solutions  Movement towards a promising area to get the global optimum  Each particle keeps track:  its best solution, personal best, pbest  the best value of any particle, global best, gbest

 Each particle adjusts its travelling speed dynamically corresponding to the flying experiences of itself and its colleagues 

its current position & velocity



the distance between its current position and pbest & gbest

Particle Swarm Optimization  Updating positions take the following forms:  Xi(t + 1) = Xi(t) + Vi(t + 1)  Xi(t): Particle i position at iteration t

 Vi(t + 1) = W · Vi(t) + r1 · c1 · [pBesti − Xi(t)] + r2 · c2 · [gBesti − Xi(t)]  Vi(t) Velocity of particle i at iteration t  W is interia weight  r1 & r2 are random numbers between 0 & 1  c1 & c2 are constants  pBesti: Local Best position of particle i  gBesti : Global Best position of particle i

Personal Influence

Social Influence

Geometric Particle Swarm Optimization  The only difference from regular PSO is that there is no clear definition of what Velocity is, thus the process of updating particle positions is not quit possible as in canonical PSO.  Therefore, updating position is based on a three mask-based geometric crossovers and a mutation.  Inertia, Personal Influence, and Social Influence are represented as a stream of bits, crossed over, then mutated.

Proposed Method

Contribution General Goal: An enhanced spam e-mail classifier

Specific Contribution: Utilizing GPSO to providing a better feature set to RF spam e-mail classifier to enhance classification quality

Experiment Setup  Dataset:  Source: SpamAssassin  Size: 9346 e-mails (6951 Non-spam)  Size: 86 features  Imbalanced

 Tool: Weka Data Mining Tool  Cost Functions / Evaluation Measures:

 Settings:  GPSO: 20 Individuals / 20 Generations per Run / 1% Mutation Probability / other weights split equally  RF: 100 Trees

 Other Settings:  Decision Trees: J48

 Accuracy,

 SVM with RBF Kernel / gamma & cost tuned with 5-fold cross validation

 F-Measure (F1),

 kNN: k = 1

 Area under Receiver Operating Characteristics (ROC),  Root Mean Squared Error (RMSE)

Results

Results

Contact Details  First Author  Dr. Hossam Faris  Associate Professor, Department of Business Information Technology  The University of Jordan  [email protected]

 Second Author  Dr. Ibrahim Aljarah  Assistant Professor, Department of Business Information Technology  The University of Jordan  [email protected]

Thank You !

Suggest Documents