Malware Detection System Based on API Log Data ... - Semantic Scholar

12 downloads 0 Views 3MB Size Report
Data Mining. Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng. Development of Computer Science and Engineering. National Sun Yat-sen University.
Malware Detection System Based on API Log Data Mining Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Development of Computer Science and Engineering National Sun Yat-sen University Kaohsiung, Taiwan The 5th IEEE International Workshop on Network Technologies for Security, Administration and Protection (NETSAP 2015)

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

1/19

Outline

Introduction Malware Observation Related Works Proposed Method Experimental Results Conclusion

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

2/19

Introduction

Network attack has become more serious recently and malware is the primary tool in attacks. Characteristics of new malware: New exploits Very large amount of malware Malware generators

Anti-reverse techniques Packing, Code encryption

Stealth techniques Code injection, Rootkits

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

3/19

Stealth Technique

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

4/19

Related Works C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, Malware detection based on suspicious behavior identification, in Education Technology and Computer Science, 2009. R. Islam, R. Tian, L. Batten, and S. Versteeg, Classification of malware based on string and function feature selection, in Cybercrime and Trustworthy Computing Workshop (CTC), 2010 R. Tian, R. Islam, L. Batten, and S. Versteeg, Differentiating malware from cleanware using behavioural analysis, in 5th International Conference on Malicious and Unwanted Software (MALWARE), 2010. Z. Salehi, M. Ghiasi, and A. Sami, A miner for malware detection based on api function calls and their arguments, in 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), 2012.

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

5/19

System Architecture

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

6/19

Behavior Monitoring We monitor the samples in three aspects: Sample processes Derivative processes Any new processes that the samples created

Injected processes Any processes that the samples inject codes into. We develop a tool ”TraceHook” to trace injection.

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

7/19

TraceHook TraceHook traces the code injection by intercepting the CreateRemoteThread function.

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

8/19

Feature Extraction Extract the API call frequency as features.

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

9/19

Feature Classification

Three algorithms for training: Naive Bayesian J48 (Decision Tree) Support Vector Machine

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

10/19

Experiment Environment Sample sets Experiment 1: 251 benign programs and 263 malwares Experiment 2: 251 benign programs and 773 malwares

Environment Windows XP SP2 in VirtualBox

Classification tool WEKA3

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

11/19

Evaluation We utilize 10-fold Cross Validation to validate results. The following measurements are used to evaluate our results: True Positive rate (Recall): False Positive rate:

TP TP + FN

FP FP + TN

TP TP + FP P recision · Recall F-measure: 2 ∗ P recision + Recall TP + TN Accuracy: TP + TN + FP + FN Precision:

Positive (Actual) Negative (Actual)

Positive (Predicted) True Positive False Positive

Negative (Predicted) False Negative True Negative

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

12/19

Experiment 1 Result

Algorithm

Type

TPrate

FPrate

Precision

Recall

F-Measure

J48

Malicious Benign

0.962 0.962

0.055 0.055

0.948 0.948

0.962 0.962

0.955 0.955

Average

0.954

0.047

0.954

0.954

0.954

NaiveBayes

Malicious Benign

0.962 0.957

0.043 0.038

0.958 0.961

0.962 0.957

0.96 0.959

Average

0.959

0.041

0.959

0.959

0.959

SVM

Malicious Benign

0.76 0.875

0.125 0.24

0.862 0.78

0.76 0.875

0.808 0.824

Average

0.817

0.182

0.822

0.817

0.816

Accuracy

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

0.954

0.959

0.816

13/19

Experiment 2 Result

Algorithm

Type

TPrate

FPrate

Precision

Recall

F-Measure

J48

Malicious Benign

0.975 0.91

0.09 0.025

0.97 0.924

0.975 0.91

0.973 0.917

Average

0.959

0.074

0.959

0.959

0.959

NaiveBayes

Malicious Benign

0.937 0.957

0.043 0.063

0.985 0.833

0.937 0.957

0.96 0.891

Average

0.942

0.048

0.947

0.942

0.943

SVM

Malicious Benign

0.99 0.604

0.396 0.01

0.883 0.951

0.99 0.604

0.933 0.739

Average

0.894

0.3

0.9

0.894

0.885

Accuracy

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

0.959

0.941

0.893

14/19

Attribute Reduction for Experiment 1

We choose J48 for further experiment. We select attributes based on their InformationGain values Attributes

TP rate

FP rate

Precision

Recall

F-Measure

Accuracy

1769 800 250 120 80 40 20

0.954 0.954 0.952 0.932 0.938 0.907 0.89

0.047 0.047 0.048 0.068 0.062 0.091 0.109

0.954 0.954 0.952 0.933 0.938 0.91 0.892

0.954 0.954 0.952 0.932 0.938 0.907 0.89

0.954 0.954 0.952 0.932 0.938 0.907 0.89

0.954 0.954 0.952 0.932 0.938 0.907 0.889

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

15/19

Attribute Reduction for Experiment 2

Attributes

TP rate

FP rate

Precision

Recall

F-Measure

Accuracy

1889 800 250 120 80 40 20

0.959 0.958 0.952 0.95 0.953 0.923 0.912

0.074 0.072 0.048 0.085 0.073 0.018 0.231

0.959 0.958 0.952 0.95 0.954 0.922 0.914

0.959 0.958 0.952 0.95 0.953 0.923 0.912

0.959 0.958 0.952 0.95 0.953 0.921 0.908

0.959 0.958 0.952 0.95 0.953 0.923 0.912

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

16/19

Comparison

Experiment Test Data

Analysis

Attributes

Accuracy

Wang et al .

461(B)/451(M)

Static

not provided

93.9%

Islam et al .

3396(M)

Static

not provided

83.3%

Tian et al .

454(B)/1369(M)

Dynamic

not provided

97.3%

Salehi et al .

385(B)/826(M)

Dynamic

410/166

98.1%/93%

Our method

251(B)/263(M)

Dynamic

80

93.8%

251(B)/773(M)

Dynamic

80

95.3%

B: Benign Sample M: Malicious Sample

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

17/19

Conclusion

We have proposed an integral method for monitoring and identifying malware. Experimental results have demonstrated that our method can achieve high detection rate with low complexity. Future works: Compatibility of our tracing tool and monitoring tool Kernel Space hooking may be able to make tracing more efficient and stable Our method can be applied to host-based detection

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

18/19

QA Thank you for your attention!

Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log

19/19