Automated Prevention of Phishing Attacks by Machine ...

46 downloads 0 Views 2MB Size Report
Why is Phishing Attack detection important? ○ Signature based defense. ○ Proposed method. ○ Machine Learning for Security. ○ Modeling Methodology.
Automated Prevention of Phishing Attacks by Machine Learning Web Application Firewall and GPOs

Konstantinos Demertzis & Lazaros Iliadis

No system is safe!!!

INTRODUCTION

[~] $Whoami… -

Dr. Konstantinos Demertzis (2LT)

Part-time Lecturer, Computer and Informatics Engineering Department, Eastern Macedonia & Thrace Institute of Technology. -

Dr. Lazaros Iliadis

Professor of Applied Informatics, Department of Civil Engineering, Democritus University of Thrace Greece.

Outline ● What is Phishing Attack? ● Process, Objective & Types used for Phishing Attacks

● Why is Phishing Attack detection important? ● Signature based defense ● Proposed method ● Machine Learning for Security ● Modeling Methodology ● DGA identification ● Phishing Discovery ● Spiking Neural Network ● Results

● Future Directions

What is Phishing Attack? ● Phishing is typically carried out by email spoofing or instant

messaging and it often directs users to enter personal information at a fake website, the look and feel of which are almost identical to the legitimate one. ● Communications purporting to be from social web sites, auction

sites, banks, online payment processors or IT administrators are often used to lure victims. ● Phishing emails may contain links to websites that are infected

with malware.

Phishing Attack Example

Objectives ● Main objectives of phishing attack are:

- Trick people into providing sensitive personal information such as account credentials or credit card numbers. - Gain further knowledge of internal assets. - Expand access into other systems. The information is then used to access important accounts and can result in identity theft and financial loss.

Techniques ● Some of the techniques used for phishing attacks include:

- Spear phishing - Clone phishing - Whaling

- Link manipulation - Filter evasion - Website forgery

- Covert redirect - Social engineering - Zero-day

Why is phishing attack detection important? ● Rapid detection of phishing attack can reduce, contain and prevent

further impact of a breach ● Detection of phishing attack enables SecOpS and InfoSec teams to act

in a more efficient manner

Why is phishing attack detection important?

93% of phishing emails are now ransomware ● The modification of the registry keys (Most associated with persistence.

I.E execute after reboot). ● Renames and encrypts file extensions of files (Targets User ’s docs. I.E

.doc, xls, ppt, mp3, wallet). ● Modifies Master Boot Record to prevent rebooting, usually encrypting

it relocating it and placing a replacement. ● Removal of Volume Snapshot Service files (VSS) or volume shadow files,

use for system restoration and backup ● Polymorphic/metamorphic behavior

93% of phishing emails are now ransomware

Signature based defense ● The rise of ransomware exemplifies how malicious actors always adapt

and create new methods of attacks to bypass system protections. ● Particularly with ransomware, specific verticals have been targeted due

to their high dependence on information availability in order to operate. ● Current defense technologies such as antivirus and firewalls are purely

based on static signatures. ● This signature based approach means malicious actors can and will

modify their code in order to bypass these signature-based defenses. ● This approach is limited and passive, forcing defenders to constantly

develop and update signatures in order to detect and prevent malicious code attacks.

Proposed method ● A new approach using machine learning techniques and leveraging the

processing power of big data technologies may provide a different and more comprehensive approach, which does not depend on static based signatures. ● This paper proposes the development of the Machine Learning Web

Application Firewall (MLWAF) which is innovative, ultra-fast and has low requirements. ● It is about an automate smart tool which builds Group Policy Objects

(GPO) and push into Windows Domain for automated prevention of phishing attacks. ● It runs under the Windows Server operating system and its reasoning is

based on advanced computational intelligence approaches.

Use the right tools for the job

MACHINE LEARNING FOR SECURITY

Big Data, ML & Cyber Security ● Big Data: Synthesis of technology providing visibility into the analysis of

large data sets and the ability to discover patterns, trends, and associations, especially relating to human behavior and interactions. ● Machine Learning: Subfield of computer science/statistics. Explores

and study construction of algorithms that can learn from and make predictions on Data. ● ML allows us to go beyond of static signature based technologies but

can be challenging to deal with for enterprise volumes of user data. ● Combining Traditional Security Tools + Data science creates a scenario

where detection of threats based on dynamic and multi contextual indicators is possible.

Machine Learning

Advantages of using ML ● Using ML allows us to put together very large and distinct sources of

data into a platform for analysis, interpretation and prediction. ● ML allows us to go beyond of static signature based technologies.

● ML creates an scenario where detection of threats based on dynamic

and multi contextual indicators is possible. ● A ML system randomly initialized and trained on some datasets will

eventually learn good feature representations for a given task (Feature Learning). ● ML mostly employs a gradient based method of optimizing a large array

of parameters. It is not feasible for a human to find such an optimal setting for large number of parameters by hand, thus large scale ML algorithms such as Stochastic Gradient Descent are used to find an optimal setting (Parameter Optimization).

“But all too often we forget the first rule of battle - the battlefield – the attacker can escape everything it cannot escape the terrain – choose the terrain, use the terrain – we win” Sun Tzu

SECURITY ANALYTICS FOR DEFENSE

Modeling Methodology ● Step 1: DGA Identification

● Step 2: Phishing Discovery ● Step 3: Active Defense ● Step 4: Data

● Step 5: ML Algorithm

Garbage in, Garbage out

STEP1: DGA IDENTIFICATION

DGA Identification ● Domain Generation Algorithm (DGA) ● Bot agents create a dynamic list of multiple FQDN’s that can be used

as rendezvous points with their C&C servers. ● The large number of potential rendezvous points makes it difficult

for law enforcement to effectively shut down botnets since infected computers will attempt to contact some of these domain names every day to receive updates or commands. ● By using public-key cryptography, it is unfeasible for law

enforcement and other actors to mimic commands from the malware controllers as some worms will automatically reject any updates not signed by the malware controllers.

DGA Identification ● For example, an infected computer could create thousands of

domain names such as: www.gi9bfb4er2ig4fws8h.ir and would attempt to contact a portion of these with the purpose of receiving an update or commands. ● Embedding the DGA instead of a list of previously-generated (by the

C&C servers) domains in the unobfuscated binary of the malware protects against a strings dump that could be fed into a network blacklisting appliance preemptively to attempt to restrict outbound communication from infected hosts within an enterprise.

DGA Identification

DGA Identification

Catching Fish

STEP 2: PHISHING DISCOVERY

Detecting phishing attacks

How Modern Web Phishing Works ● In most cases, phishing lures are just a very simple copy of a login

page for Facebook, Google, banks, insurance companies, etc. ● The attackers include locally-stored images, CSS, and JavaScript to

produce almost identical copies of the original login page. ● The important difference is the malicious PHP scripts which are

sending your username and password directly to the attacker. ● The stolen credentials and personal information are used to perform

identity theft and fraudulent activities. ● It’s that simple.

Web phishing example

Technical Details of Advanced Phishing Attacks

index.php

modules.php

Technical Details of Advanced Phishing Attacks

part of chmod.php

Technical Details of Advanced Phishing Attacks

visitor_log.php and its logging code

"The best defense is a good offense"

STEP 3: ACTIVE DEFENSE

Protection for Windows Servers ● Malware and Ransomware targets primarily Microsoft Windows

operating systems. ● Microsoft Windows, is the most used operating system in most

enterprises and by users at homes as well. ● In the case of Ransomware and due to the constant evolving nature of

malicious code, it is very likely that despite protections and new detection technologies users will still get infected. ● One of the most common drivers of users getting infected despite

technology protections is the use of phishing attack (by social engineering). ● In many cases users get messages or websites that present misleading

messages and drive them to allow execution of malicious payload.

Good protection doesn't need to be offensive ● Some of the roadmap items for active defense includes Group Policy

Object (GPO) scripting and push into Active Directory (AD) once attack has been detected, creation of Access Control Lists (ACLs) to isolate infected host and eventually provide an open format of input that can retro feed those signature based defense technologies. ● Also, active defense measures may consists of operationalized action

items performed in an automated fashion such as service shutdowns, application disabling or computer isolation, which may be combined the aforementioned items. ● Goal is to use machine learning to discover some common asset

classes (ML term sometimes is class labels).

MLWAF: how it works? ● MLWAF can extract the name of the payload as it is being served

providing information that can then be fed into a Power Shell script that creates a GPO to be distributed across systems in an Active Directory environment that disallows execution of the found malware. ● Even if the names are randomized the MLWAF will find current name

and produce output. ● The script will extract name of malicious executable then connect to

the Domain Controller using a service account and pushing a GPO that prevents executable from running.

MLWAF: how it works? ● The MLWAF tool and script can be run from the popular security

distribution “Security Onion” https://securityonion.net/ ● You will need to create a GPO prior to executing the script and

reference it in the name (I.E ' –AntiMalGPO ') ● The proof of concept script requires python Paramiko to run the script

and it also requires the SSH setup at Windows Server (FreeSSHD) and appropriate permissions for the SSH service account linked to AD to execute powershell script. ● Once the GPO is pushed it can be refreshed via schedule tasks in

Windows Server operating system.

Example of powershell GPO script/cmdlet Set-GPRegistryValue -Name AntiMalGPO -Key 'HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer\Disallow Run' -ValueName '1' -Type String -Value 'WanaCrypt0r.exe'

“It's not failure, it's data…”

STEP 4: DATA

DGA Dataset ● Domain Generation Algorithms dataset (DGA_dataset).

● 5 features + class (legit or malicious), containing 131,374 patterns. ● 100,000 URLs they were chosen randomly from the database with the

1 million most popular domain names of Alexa and 16,374 malicious URLs from the updated list of the Black Hole DNS database and 15,000 malicious URLs they were created based on the timestamp DGA algorithm.

Phishing Dataset ● To implement and test our approach, we have used a dataset with 4000

emails (973 phishing and 3027 legitimate emails). ● In our approach, we make use of sixteen relevant features: HTML-

format, P-based URL, Age of Domain Name, Number of Domains, Number of Sub-domains, Presence of JavaScript, Presence of Form Tag, Number of Links, URL Based Image Source, Matching Domains, 6 groups of Keywords.

“Maybe the only significant difference between a really smart machine and a human being was the noise they made when you punched them…”

STEP 5: MACHINE LEARNING ALGORITHM

Spiking Neural Network ● A typical spiking neuron model consists of dendrites, which simulate ● ● ●



the input level of the network, which collects signals from other neurons and transmits them to the next level, which is called soma. The soma is the process level at which when the input signal passes a specific threshold, an output signal is generated. The output signal is taken from the output level called the axon, which delivers the signal (short electrical pulses called action potentials or spike train) to be transferred to other neurons. Α spike train is a sequence of stereo-typed events generated at regular or irregular intervals. Typically, the spikes have an amplitude of about 100 mV and a duration of 1-2 ms. Although the same elements exist in a linear perceptron, the main difference between a linear perceptron and a spiking model is the action potential generated during the stimulation time.

Spiking Neural Network ● Furthermore, the activation function used in spiking models is a

differential equation that tries to model the dynamic properties of a biological neuron in terms of spikes. ● The form of the spike does not carry any information, and what is

important is the number and the timing of spikes. ● The shortest distance between two spikes defines the absolute

refractory period of the neuron that is followed by a phase of relative refractoriness where it is difficult to generate a spike. ● Several spiking models have been proposed in the last years

aiming to model different neurodynamic properties of neurons.

Izhikevich spiking neuron model ● One of the simplest and versatile models is the one proposed by

Izhikevich. ● This model has only nine dimensionless parameters, and it is

described by the following equations:

“If you know the enemy and know yourself you need not fear the results of a hundred battles…”

RESULTS

Classification Accuracy & RMSE Classifier

DGA Dataset ACC

Phishing Dataset

RMSE F-ScoreROC Area ACC

Izhikevich SNM 98,2% 0.3284 0,982

RMSE F-Score ROC Area

0,990

99,6% 0.2951 0,996

0,995

RBF ANN

89,8% 0.5766 0,900

0,980

91,3% 0.5514 0,910

0,985

GMDH

94,4% 0.5017 0,945

0,955

97,8% 0.3983 0,978

0,980

PANN

90,9% 0.5633 0,910

0,950

96,6% 0.4512 0,965

0,975

FNN-GA

96,7% 0.4972 0,967

0,970

99,1% 0.3048 0,990

0,990

FNN-PSO

96,2% 0.4911

0,962

0,975

99,2% 0.3009 0,992

0,990

FNN-ACO

89,4% 0.5791

0,895

0,900

92,7% 0.5336 0,927

0,950

FNN-ES

90,1% 0.5716

0,901

0,901

93,5% 0.5125 0,936

0,945

“At this point all the hard work is done”…

FUTURE DIRECTION

MLWAF

References [1] Demertzis K., Iliadis L. (2015, April). Evolving Smart URL Filter in a Zone-based Policy Firewall for Detecting Algorithmically Generated Malicious Domains. Proceedings SLDS (Statistical Learning and Data Sciences) Conference LNAI (Lecture Notes in Artificial Intelligence) 9047 Springer, Royal Holloway University London, UK, 223-233. doi: 10.1007/978-3-319-17091-6_17. [2] Virvilis N., Gritzalis D., Apostolopoulos T., (2013), Trusted Computing vs. Advanced Persistent Threats: Can a defender win this game?, in Proc. of 10th IEEE International Conference on Autonomic and Trusted Computing (ATC-2013), pp. 396-403, IEEE Press, Italy, December 2013. [3] Holz T., C. Gorecki, K. Rieck, and F. Freiling, Measuring and detecting fast-flux service networks, in NDSS ’08: Proceedings of the Network & Distributed System Security Symposium, 2008. [4] Vazquez R., (2010), Izhikevich Neuron Model and its Application in Pattern Recognition, Intelligent Information Processing Systems, Vol 11, No 1, Neurodynamics. [5] Zadeh J., Soto R., (2016), Aktaion, A signature-less open source machine-learning tool for ransomware detection, http://www.github.com/jzadeh/Atkaion

Q&A

?

Thanks [email protected] http://utopia.duth.gr/~kdemertz/

Suggest Documents