Fuzzy System-based Suspicious Pattern Detection in ...

9 downloads 0 Views 912KB Size Report
uncertainty → aforementioned methods are rendered inefficient. • Criminal modus operandi: data obfuscation/alteration (SMS texts in “code”, fake contacts, etc.) ...
Fuzzy System-based Suspicious Pattern Detection in Mobile Forensic Evidence Konstantia Barmpatsalou, Tiago Cruz, Edmundo Monteiro and Paulo Simoes

{konstantia, tjcruz, edmundo, psimoes}@dei.uc.pt

Con tents • Introduction

• Related Work • Methodology • Results and Evaluation

• Discussion 2

Problem Statement • Machine-related actions (malware activity) are easy to predict using classic Hard Computing methods • Human behaviour is unpredictable, with a higher level of uncertainty  aforementioned methods are rendered inefficient

• Criminal modus operandi: data obfuscation/alteration (SMS texts in “code”, fake contacts, etc.)  we need to dive into a deeper level

Research Questions • Can forensic evidence in the form of metadata lead to rule inference? Why is it important? • Can human behaviour be modelled and approximated by Soft Computing methods, such as Fuzzy Systems, Neural Networks, etc.? 3

Related Work • Stoffel et. Al. (2010): Inferred expert-system-like rules from a forensic database of actual criminal activities.

• Arun Raj Kumar and Selvakumar (2013): Detected Denial of Service (DoS) attacks with the use of Fuzzy Systems and Neural Networks

• Shalaginov and Franke (2013): Automatic rule definition by a NeuroFuzzy system for Android malware detection

4

Methodology • Use case definition and expert knowledge

• Rule inference • Dataset selection and ground truth generation • Fuzzy System Configuration

5

Use Case Definition and Expert Knowledge • Public order demonstration or riot

• High probability of occurrence of unfortunate events involving mobile devices belonging to PPDR officers

• Case under examination: PPDR officers infiltrating the rioting forces

• Expert knowledge about modus operandi: SALUS FP7 Project deliverables and Greek Police Escort Teams Department on-field practices 6

Rule inference • If officers are infiltrators, they will use their devices to communicate with their accomplices only in cases of extreme necessity  the rate with which a sent message will appear is going to be very low. • Recipients with local numbers are considered more suspicious. • Messages exchanged right before or during rioting are very short in length IF (AppearanceFreq==Very Low)&&(Length==Low)&&(Country==Local) THEN (Suspiciousness==Very High) IF (AppearanceFreq==High)&&(Length==Medium)&&(Country==Foreign) THEN (Suspiciousness==Low) 7

Datasets • Lack / non-availability of datasets belonging to actual criminal cases • Device Analyzer Dataset" : collection of real-time usage data from Android devices • Lists of attributes such as call logs, SMS texts, network usage statistics, location data, etc., retrieved during a considerable period of time • Pre-processing is essential (separation-cleansingmodification) SMS(Appearance Frequency; Length; Country Code Source)

8

Fuzzy Semantic Cohesion • Fuzzy sets and the value range of each variable have specific meanings • Each fuzzy variable should not exceed the 7±2 range fields (threshold for human perception capabilities) • There is no point within the system’s universe of discourse that does not belong to at least one fuzzy set • A fuzzy set should be normal; in a fuzzy system F, there should always exist at least one χ, the membership degree (height) of which should be equal to 1 • All fuzzy sets should overlap in a certain degree

9

Variables and Fuzzy Approximation Variable

Type

Fuzzy Approximation

Numerical Range

Length

Input

VERY SHORT, SHORT, MEDIUM, LONG, VERY LONG

1-600 characters

Appearance Frequency

Input

VERY LOW, LOW, MEDIUM, HIGH, VERY HIGH

1-1100 appearances

Country Code Source

Input

FOREIGN, UNDEFINED, LOCAL

0, 1 and 2

Suspiciousness

Output

VERY LOW, LOW, MEDIUM, HIGH, VERY HIGH

0.15,0.25,0.50,0.75,1

10

Evaluation • Five instances of Mamdani Fuzzy Systems with Triangular, Trapezoidal, Bell, Gauss and Gauss2 membership functions as inputs and outputs • The fuzzy system outputs are also continuous variables in the range (01)

• Multivariate classification and confusion matrices • Area Under Curve (AUC), Accuracy, Precision, Recall and False-Positive Rate are calculated

• Nearest Neighbour, SVM, Naive Bayes, AdaBoost and Random Forest classification techniques are used in order to compare the ground truth and fuzzy output values 11

Evaluation Metrics

12

MF – Classification Association

13

ROC Curves

14

But… • Increase in the number of inputs provokes an explosion in the number of rules and the overall system complexity • Manual predefinition is obligatory and can cause scalability issues (manual rule inference, membership function configuration

• Can we create a deviation / automatize this?

Yes, we can --> 15

Sneak-peak to work in progress Application of the same methodology to: • Back propagation neural networks

• Pattern recognition neural networks • ANFIS

Broader and more complex scenarios, with higher number of inputs

16

Credits & References • This work was partially funded by the ATENA H2020 EU Project (H2020-DS-2015-1 Project 700581). We also thank the team of FP7 Project SALUS (Security and interoperability in next generation PPDR communication infrastructures) and the GEPTD officer Nikolaos Bouzis for the fruitful discussions, feedback and insights on infield investigation practices

• Stoffel, K., Cotofrei, P., Han, D: Fuzzy methods for forensic data analysis. In: 2010 International Conference of Soft Computing and Pattern Recognition, pp. 23-28 (2010) • Arun Raj Kumar, P., Selvakumar, S.: Detection of distributed denial of service attacks using an ensemble of adaptive and hybrid neuro-fuzzy systems. Comput. Commun. 36, 3, 303-319 (2013) • Shalaginov, A., Franke, K.: Automatic rule-mining for malware detection employing neuro-fuzzy approach. In: Norsk informasjonssikkerhetskonferanse (NISK) 2013, (2013)

17

Thank you : ) Questions???

18

Suggest Documents