Malware Detection System Based on API Log. Data Mining. Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng. Development of Computer Science ...
Malware Detection System Based on API Log Data Mining Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Development of Computer Science and Engineering National Sun Yat-sen University Kaohsiung, Taiwan The 5th IEEE International Workshop on Network Technologies for Security, Administration and Protection (NETSAP 2015)
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
1/19
Outline
Introduction Malware Observation Related Works Proposed Method Experimental Results Conclusion
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
2/19
Introduction
Network attack has become more serious recently and malware is the primary tool in attacks. Characteristics of new malware: New exploits Very large amount of malware Malware generators
Anti-reverse techniques Packing, Code encryption
Stealth techniques Code injection, Rootkits
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
3/19
Stealth Technique
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
4/19
Related Works C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, Malware detection based on suspicious behavior identification, in Education Technology and Computer Science, 2009. R. Islam, R. Tian, L. Batten, and S. Versteeg, Classification of malware based on string and function feature selection, in Cybercrime and Trustworthy Computing Workshop (CTC), 2010 R. Tian, R. Islam, L. Batten, and S. Versteeg, Differentiating malware from cleanware using behavioural analysis, in 5th International Conference on Malicious and Unwanted Software (MALWARE), 2010. Z. Salehi, M. Ghiasi, and A. Sami, A miner for malware detection based on api function calls and their arguments, in 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), 2012.
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
5/19
System Architecture
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
6/19
Behavior Monitoring We monitor the samples in three aspects: Sample processes Derivative processes Any new processes that the samples created
Injected processes Any processes that the samples inject codes into. We develop a tool ”TraceHook” to trace injection.
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
7/19
TraceHook TraceHook traces the code injection by intercepting the CreateRemoteThread function.
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
8/19
Feature Extraction Extract the API call frequency as features.
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
9/19
Feature Classification
Three algorithms for training: Naive Bayesian J48 (Decision Tree) Support Vector Machine
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
10/19
Experiment Environment Sample sets Experiment 1: 251 benign programs and 263 malwares Experiment 2: 251 benign programs and 773 malwares
Environment Windows XP SP2 in VirtualBox
Classification tool WEKA3
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
11/19
Evaluation We utilize 10-fold Cross Validation to validate results. The following measurements are used to evaluate our results: True Positive rate (Recall): False Positive rate:
TP TP + FN
FP FP + TN
TP TP + FP P recision · Recall F-measure: 2 ∗ P recision + Recall TP + TN Accuracy: TP + TN + FP + FN Precision:
Positive (Actual) Negative (Actual)
Positive (Predicted) True Positive False Positive
Negative (Predicted) False Negative True Negative
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
12/19
Experiment 1 Result
Algorithm
Type
TPrate
FPrate
Precision
Recall
F-Measure
J48
Malicious Benign
0.962 0.962
0.055 0.055
0.948 0.948
0.962 0.962
0.955 0.955
Average
0.954
0.047
0.954
0.954
0.954
NaiveBayes
Malicious Benign
0.962 0.957
0.043 0.038
0.958 0.961
0.962 0.957
0.96 0.959
Average
0.959
0.041
0.959
0.959
0.959
SVM
Malicious Benign
0.76 0.875
0.125 0.24
0.862 0.78
0.76 0.875
0.808 0.824
Average
0.817
0.182
0.822
0.817
0.816
Accuracy
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
0.954
0.959
0.816
13/19
Experiment 2 Result
Algorithm
Type
TPrate
FPrate
Precision
Recall
F-Measure
J48
Malicious Benign
0.975 0.91
0.09 0.025
0.97 0.924
0.975 0.91
0.973 0.917
Average
0.959
0.074
0.959
0.959
0.959
NaiveBayes
Malicious Benign
0.937 0.957
0.043 0.063
0.985 0.833
0.937 0.957
0.96 0.891
Average
0.942
0.048
0.947
0.942
0.943
SVM
Malicious Benign
0.99 0.604
0.396 0.01
0.883 0.951
0.99 0.604
0.933 0.739
Average
0.894
0.3
0.9
0.894
0.885
Accuracy
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
0.959
0.941
0.893
14/19
Attribute Reduction for Experiment 1
We choose J48 for further experiment. We select attributes based on their InformationGain values Attributes
TP rate
FP rate
Precision
Recall
F-Measure
Accuracy
1769 800 250 120 80 40 20
0.954 0.954 0.952 0.932 0.938 0.907 0.89
0.047 0.047 0.048 0.068 0.062 0.091 0.109
0.954 0.954 0.952 0.933 0.938 0.91 0.892
0.954 0.954 0.952 0.932 0.938 0.907 0.89
0.954 0.954 0.952 0.932 0.938 0.907 0.89
0.954 0.954 0.952 0.932 0.938 0.907 0.889
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
15/19
Attribute Reduction for Experiment 2
Attributes
TP rate
FP rate
Precision
Recall
F-Measure
Accuracy
1889 800 250 120 80 40 20
0.959 0.958 0.952 0.95 0.953 0.923 0.912
0.074 0.072 0.048 0.085 0.073 0.018 0.231
0.959 0.958 0.952 0.95 0.954 0.922 0.914
0.959 0.958 0.952 0.95 0.953 0.923 0.912
0.959 0.958 0.952 0.95 0.953 0.921 0.908
0.959 0.958 0.952 0.95 0.953 0.923 0.912
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
16/19
Comparison
Experiment Test Data
Analysis
Attributes
Accuracy
Wang et al .
461(B)/451(M)
Static
not provided
93.9%
Islam et al .
3396(M)
Static
not provided
83.3%
Tian et al .
454(B)/1369(M)
Dynamic
not provided
97.3%
Salehi et al .
385(B)/826(M)
Dynamic
410/166
98.1%/93%
Our method
251(B)/263(M)
Dynamic
80
93.8%
251(B)/773(M)
Dynamic
80
95.3%
B: Benign Sample M: Malicious Sample
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
17/19
Conclusion
We have proposed an integral method for monitoring and identifying malware. Experimental results have demonstrated that our method can achieve high detection rate with low complexity. Future works: Compatibility of our tracing tool and monitoring tool Kernel Space hooking may be able to make tracing more efficient and stable Our method can be applied to host-based detection
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
18/19
QA Thank you for your attention!
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng Malware Detection by Mining API Log
19/19