Malware Detection System Based on API Log Data ... - Semantic Scholar

Malware Detection System Based on API Log Data Mining Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Development of Computer Science and Engineering National Sun Yat-sen University Kaohsiung, Taiwan The 5th IEEE International Workshop on Network Technologies for Security, Administration and Protection (NETSAP 2015)

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

1/19

Outline

Introduction Malware Observation Related Works Proposed Method Experimental Results Conclusion


2/19

Introduction

Network attack has become more serious recently and malware is the primary tool in attacks. Characteristics of new malware: New exploits Very large amount of malware Malware generators

Anti-reverse techniques Packing, Code encryption

Stealth techniques Code injection, Rootkits


3/19

Stealth Technique


4/19

Related Works C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, Malware detection based on suspicious behavior identification, in Education Technology and Computer Science, 2009. R. Islam, R. Tian, L. Batten, and S. Versteeg, Classification of malware based on string and function feature selection, in Cybercrime and Trustworthy Computing Workshop (CTC), 2010 R. Tian, R. Islam, L. Batten, and S. Versteeg, Differentiating malware from cleanware using behavioural analysis, in 5th International Conference on Malicious and Unwanted Software (MALWARE), 2010. Z. Salehi, M. Ghiasi, and A. Sami, A miner for malware detection based on api function calls and their arguments, in 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), 2012.


5/19

System Architecture


6/19

Behavior Monitoring We monitor the samples in three aspects: Sample processes Derivative processes Any new processes that the samples created

Injected processes Any processes that the samples inject codes into. We develop a tool ”TraceHook” to trace injection.


7/19

TraceHook TraceHook traces the code injection by intercepting the CreateRemoteThread function.


8/19

Feature Extraction Extract the API call frequency as features.


9/19

Feature Classification

Three algorithms for training: Naive Bayesian J48 (Decision Tree) Support Vector Machine


10/19

Experiment Environment Sample sets Experiment 1: 251 benign programs and 263 malwares Experiment 2: 251 benign programs and 773 malwares

Environment Windows XP SP2 in VirtualBox

Classification tool WEKA3


11/19

Evaluation We utilize 10-fold Cross Validation to validate results. The following measurements are used to evaluate our results: True Positive rate (Recall): False Positive rate:

TP TP + FN

FP FP + TN

TP TP + FP P recision · Recall F-measure: 2 ∗ P recision + Recall TP + TN Accuracy: TP + TN + FP + FN Precision:

Positive (Actual) Negative (Actual)

Positive (Predicted) True Positive False Positive

Negative (Predicted) False Negative True Negative


12/19

Experiment 1 Result

Algorithm

Type

TPrate

FPrate

Precision

Recall

F-Measure

J48

Malicious Benign

0.962 0.962

0.055 0.055

0.948 0.948

0.962 0.962

0.955 0.955

Average

0.954

0.047

0.954

0.954

0.954

NaiveBayes

Malicious Benign

0.962 0.957

0.043 0.038

0.958 0.961

0.962 0.957

0.96 0.959

Average

0.959

0.041

0.959

0.959

0.959

SVM

Malicious Benign

0.76 0.875

0.125 0.24

0.862 0.78

0.76 0.875

0.808 0.824

Average

0.817

0.182

0.822

0.817

0.816

Accuracy


0.954

0.959

0.816

13/19

Experiment 2 Result

Algorithm

Type

TPrate

FPrate

Precision

Recall

F-Measure

J48

Malicious Benign

0.975 0.91

0.09 0.025

0.97 0.924

0.975 0.91

0.973 0.917

Average

0.959

0.074

0.959

0.959

0.959

NaiveBayes

Malicious Benign

0.937 0.957

0.043 0.063

0.985 0.833

0.937 0.957

0.96 0.891

Average

0.942

0.048

0.947

0.942

0.943

SVM

Malicious Benign

0.99 0.604

0.396 0.01

0.883 0.951

0.99 0.604

0.933 0.739

Average

0.894

0.3

0.9

0.894

0.885

Accuracy


0.959

0.941

0.893

14/19

Attribute Reduction for Experiment 1

We choose J48 for further experiment. We select attributes based on their InformationGain values Attributes

TP rate

FP rate

Precision

Recall

F-Measure

Accuracy

1769 800 250 120 80 40 20

0.954 0.954 0.952 0.932 0.938 0.907 0.89

0.047 0.047 0.048 0.068 0.062 0.091 0.109

0.954 0.954 0.952 0.933 0.938 0.91 0.892

0.954 0.954 0.952 0.932 0.938 0.907 0.89

0.954 0.954 0.952 0.932 0.938 0.907 0.89

0.954 0.954 0.952 0.932 0.938 0.907 0.889


15/19

Attribute Reduction for Experiment 2

Attributes

TP rate

FP rate

Precision

Recall

F-Measure

Accuracy

1889 800 250 120 80 40 20

0.959 0.958 0.952 0.95 0.953 0.923 0.912

0.074 0.072 0.048 0.085 0.073 0.018 0.231

0.959 0.958 0.952 0.95 0.954 0.922 0.914

0.959 0.958 0.952 0.95 0.953 0.923 0.912

0.959 0.958 0.952 0.95 0.953 0.921 0.908

0.959 0.958 0.952 0.95 0.953 0.923 0.912


16/19

Comparison

Experiment Test Data

Analysis

Attributes

Accuracy

Wang et al .

461(B)/451(M)

Static

not provided

93.9%

Islam et al .

3396(M)

Static

not provided

83.3%

Tian et al .

454(B)/1369(M)

Dynamic

not provided

97.3%

Salehi et al .

385(B)/826(M)

Dynamic

410/166

98.1%/93%

Our method

251(B)/263(M)

Dynamic

80

93.8%

251(B)/773(M)

Dynamic

80

95.3%

B: Benign Sample M: Malicious Sample


17/19

Conclusion

We have proposed an integral method for monitoring and identifying malware. Experimental results have demonstrated that our method can achieve high detection rate with low complexity. Future works: Compatibility of our tracing tool and monitoring tool Kernel Space hooking may be able to make tracing more efficient and stable Our method can be applied to host-based detection


18/19

QA Thank you for your attention!


19/19

Malware Detection System Based on API Log Data ... - Semantic Scholar

Malware Detection System Based on API Log Data ... - Semantic Scholar

Suggest Documents

Malware Detection System Based on API Log Data ... - Semantic Scholar

Permission-Based Android Malware Detection - Semantic Scholar

Malware Detection Based on Source Data using Data ...

Malware Detection Based on Source Data using Data ...

AStatic MALWARE DETECTION SYSTEM USING DATA MINING ...

Static Analysis Based Behavioral API for Malware Detection using ...

A Mobile Log Data Analysis System Based on Multidimensional Data ...

Android Malware Detection using Deep Learning on API Method

Static Analysis Based Behavioral API for Malware

Idea: Opcode-sequence-based Malware Detection - Semantic Scholar

Signature Based Malware Detection is Dead - Semantic Scholar

Permission-based Malware Detection Mechanisms on Android ...

Semantic Malware Detection - tratt.net

A PRoactive Malware Identification System based ... - Semantic Scholar

Permission-based Malware Detection Mechanisms

A Survey on Malware and Malware Detection

Simple and Fast Face Detection System Based on ... - Semantic Scholar

3D Lane Detection System Based on Stereovision - Semantic Scholar

Fall Detection System based on Kinect Sensor ... - Semantic Scholar

Face Detection System Based on MLP Neural ... - Semantic Scholar

Lung Cancer Detection System Based On Artificial ... - Semantic Scholar

Bridge Damage Detection Based on Vibration Data - Semantic Scholar

Crowdroid: Behavior-Based Malware Detection System for Android

pBMDS: A Behavior-based Malware Detection System for Cellphone ...