May 12, 2017 - Index TermsâDeep Learning, Android Malware Detection, ..... Fig. 9. The color images example of benign Android apps. Fig. 10. The color ...
R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections † Hsien-De
Huang, ∗ Chia-Mu Yu, and ∗† Hung-Yu Kao Research Lab., Cheetah Mobile Inc. ∗ Department of Computer Science and Engineering, National Chung Hsing University, Taiwan ∗† Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan † Security
Abstract—Machine Learning (ML) has found it particularly useful in malware detection. However, as the malware evolves very fast, the stability of the feature extracted from malware serves as a critical issue in malware detection. The recent success of deep learning in image recognition, natural language processing, and machine translation indicates a potential solution for stabilizing the malware detection effectiveness. In this research, we haven’t extract selected any features (e.g., the control-flow of op-code, classes, methods of functions and the timing they are invoked etc.) from Android apps. We develop our own method for translating Android apps into rgb color code and transform them to a fixed-sized encoded image. After that, the encoded image is fed to convolutional neural network (CNN) for automatic feature extraction and learning, reducing the expert’s intervention. Deep learning usually involves a large number of parameters that cannot be learned from only a small dataset. In this way, we currently have collected 1500k Android apps samples, have run our system over these 800k malware samples (benign and malicious samples are roughly equal-sized), and also through our back-end (60 million monthly active users and 10k new malware samples per day), we can effectively detect the malware. We believe that our methodology and the corresponding use of deep learning malware classification can overcome the weakness, and computational cost of the common static/dynamic analysis process or machine learning-based of Android malware detection approach. Index Terms—Deep Learning, Android Malware Detection, Convolutional Neural Network
ing percetange, 99.87% on the number of malware on mobile platform [2]. Fig. 2 shows the statistic collected from our backend system during January 2017., where for countries such as US, UK, France, etc., more than 50000 users are infected everyday. Moreover, from our experience, the number of Android malware is sharply increased from 1000000 in 2012 to 2.8 million in 2014. Due to the serious security problem caused by Android malware, we propose, color-inspired convolutional neural networks (CNN)-based Android malware detection, R2D2, to detect Android malware. Different from the existing solution, our propose R2-D2 detection is featured by its endto-end learning process. More specifically, in contrast to the prior solutions that require manual process of feature selection and parameter configuration, our proposed R2-D2 detection takes as input the training samples and outputs the malware detection model without the human interrelation. 100.00%
90.00%
84.30%
86.80%
11.70%
12.50%
0.70%
0.70%
2016Q2
2016Q3
70.00%
60.00%
50.00%
40.00%
30.00%
18.60%
20.00%
15.40%
13.40%
I. I NTRODUCTION
87.60% 83.40% 79.60%
80.00%
Market Share
arXiv:1705.04448v1 [cs.CR] 12 May 2017
†
10.00%
2.30%
1.70%
2015Q3
2015Q4
1.20%
0.00%
Smartphones have gained widespread popularing worldwide, among them. Android is the most commonly used operating system (OS), and is still expanding as ferritory. International Data Corporation (IDC) 2016 Q3 smartphones OS marketshare reports 86.8% of smartphones are Android phones, indicating a steady growth market share, compared to 84.3% of marketshare in 2015 Q2 (see Fig. 1) [1]. Android is featured by its openness; users can choose to downloads apps from Google Play and third-party marketplace. However due to the popularing and openness, Android has attracted the attacker’s attention. In purticalar malicious software (malware) can easily spread and infect benign devises. Security Report 201516 of AV-TEST Intitute reports that while the number of malware is increased from 17 millions in 2005 to over 600 millions in 2016, the percentage of Android malware has a significant increase from 3.19% in 2015 to 7.48% in 2016Q2. Among them, Trojans targeting at stealing user data occupy 97.49%. We can also find that Android malware has dominat-
2016Q1 Android
iOS
Mobile OS Others
Fig. 1. World Smartphones OS Market Share.
Fig. 2. The statistic collected from our back-end system during January 2017., where for countries such as US, UK, France.
II. R ELATED W ORK A. Android Malware Analysis Background Android Package (APK) is the paella format used by Android OS for distribution and installation of mobile apps. Essentially, APK fields are a type of zip-formatted archive file. The structure of APK can be dissected as follows: • META-INIF/: contains the information description from Java jar file. • res/: contains the resource document. • libs/: contains the so library from Android Native Development (NDK) • AndroidManifest.xml: contains the configuration file about the authorization and service. • classes.dex: contains the dalvik byte-code, i.e., Android execution file. • resource.asc: contains the binary resource file after compilation. Fig. 3 shows the compilation process of Android APK. First we compile all of the Java source code, generating the .class byte-code. After that, all of the .class files are transformed into .dex format executable format. Dalvik VM is mainly developed by Google, with the minimal requirement on the resource. In other words, Dalvik VM can be run on the mobile devices with limited computing and memory resource but with acceptably performance. Static analysis of signature-based method is one of the most common approaches for Android malware detection. The tools include Apktool, baksmalu, dex2jar, and JD-GUI, which are reverse-engineering tools for deriving Java or smali source code from classes.dex. Furthermore, AXMLPrinter2 can be used to derive the Android permission from AndroidManifest.xml. Another category of approaches for Android malware detection is dynamic analysis or behavior-based method. The idea behind this category is to keep track of the system communications and network connections by emulating Android apps. through techniques such as sandbox and virtual machine. The domain exports then perform the inspection and analysis on the records to see wether the malicious behaviours occur. If so, virus pattern and black/white-list are generated for pattern matching and for capturing the malware. However antianalysis techniques such as obfuscation, encryption, and antidebugging are also proposed by hacker to hide the malicious behaviours, and escaping the detection [3]. In fact, the above approaches can be useless when the malware programmers make modification on the initial components in malware, which can evade the detection. Fig. 4 shows the popular class naming in benign Android apps (e.g. intellij.annotations and features2d). Fig. 5 shows the common way for declaring the data variables (e.g. paramBitmap and paramBitmap). However, to evade the detection, certain random naming (e.g. dsrhki.yjgfqjejkjh and nlrskgblc) will be used by " opfake " family (show in Fig. 6). Fig. 7 shows that " fakeinst " family until encrypt the critical variables (e.g. jj = 84f113ee155ba4f1e280b54401fffab and jj1 = 84f113ee155ba43f1e280b54401fffab).
R.java Java source code
Java Compiler
.class files
Java interface
Android PacKage (.apk)
apkbuilder
.dex file
dex
Fig. 3. The compilation process of Android APK.
Fig. 4. The popular class naming in benign Android apps.
Fig. 5. The common way for declaring the data variables.
Fig. 6. The certain random naming will be used by opfake family.
Fig. 7. The fakeinst family until encrypt the critical variables.
B. Machine Learning-based Malware Detection Machine learning is a techniques for enabling the machine to learn the pattern, build the model, and make the prediction by seeing only the raw data. Currently, the most widespread malware attacks are still exploit attack and privileged escalation, as a result, most of the machine learning-based malware detection still find the features such as the Android apps permissions, API invocation and control flow graphs (CFG) in order to distinguish between the benign and malicious apps. Usually, SVM and random forest are used to build the model to distinguish between benign and malicious apps. For example, DroidMiner [4] has a two-step flow-chart to represent the behaviours of Android apps and capture the execution logs behind the apps. After that rectors are clustered to identity the similar code snippet. Lei Chen et al. [5] extracted features from Android API invocation as a reverse engineering approach. They apply normalization procedure to the extracted featuring and perform logistic regression on the extracted feature to detect Android malware. Moreover, the dataset used by the current research-oriented machine learning-based Android malware detection are rather small. Most of existing solutions train their detection models sorely based on the small datasets. This cause the practicality problem because the realworld malware may have distinguishing distinctive behaviors and the existing detection models cannot successfully identify the malware by correlating the API invocation and Android permissions. C. Deep Learning The recent success in deep learning research and development attracts people’s attention [6]. In 2015, Google released Tensorflow [7], a framework of realizing deep learning algorithm. Deep learning is a specific type of machine learning, more specifically deep learning is an artificial neural network, in which multiple layers of neurons are interconnected with different weights and activation functions to learn the hidden relationship between input and output. Intuitively, input data are feeded to the first layer that generates different combinations of the input [8]. These combinations, after the activation function, are feeded to the second layer, and so on. Under the above procedures, different combinations of the outputs from the previously layer can be seen as different representation of features. The weights on links between layers are adjusted according forward and backward propagations, depending on the distance or less function between true output label and the label calculated by neural network. Note that deep learning can be seen as a neural network with a large number of layers. After the above learning process via multiple layers, we can derive a better understanding and representation of distinguishable features, enhancing the detection accuracy [9]. Also notice that the effectiveness of deep learning is increased with the network size. In addition to deep neural networks, the most well-known deep networks is convolutional neural networks (CNN). The representation of CNN include AlexNet, VGG, GoogleNet, and ResNet [10][11][12][13]. More specifically, CNN is composed of hidden layers, fully connected layers.
convolution layers, and pooling layers. The hidden layers are used to increase the complexity of the model. If the same number of neural are associated with the input image, the number of parameters can be significantly reduced, adapting to the function structure more properly. D. Deep Learning-based Malware Detection Deep learning once can be seem as the care for the above problem. However, a pre-processing step, such as feature engineering, is still needed before the model is learn. Furthermore, the dataset for training the model cannot reflect real-world malware accurately. For example [14], propose a malware detection, which takes windows API invocation as the inputs of stack of Auto-Encoders and fine-tunes the model parameters [15]. Performs the feature extracting, such as contextual byte features, PE import features, string 2d histogram features, and PE metadata features. Then, the extracted features are feeded to the DNN. With two hidden layers, [16] uses static analysis to extract features such as required permission, sensitive API, and uses dynamic analysis to extract features such as actiondexclass-load, action-recrent and action-service start. After that, these features are feeded to deep belief network (DBN). There is similarity between the execution logic of Android malicious apps and the order of functions being called. Thus, in addition to the aforementioned solutions that apply DNN to malware analysis based on exploit attack and privileged escalation, another category of malware detection relies on ngram analysis on byte-code and op-code. For example, [17] and [18] first calculate the n-grams on the binary byte-code and then perform the malware detection based on k-nearest neighbor. [19] proposes to perform analysis on op-code of the reverse-engineering. In addition, one more category of the malware detection relies on transforming malware into the images. For example, [20] proposes to first transform binary byte-code into grayscale image and the apply pattern recognition on the image. All of the above methods achieves a certain level of detection accuracy. However, as mentioned in Section II, the number of malware is increased dramatically. Even worse, more and more anti-debugging techniques are discovered. The size of dataset used for training the model also has significant impacts on the detection accuracy and the training efficiency. Here, we particularly note that despite the detection accuracy of the n-gram approach. N-gram approach consumes substantial computing resources and time for handling the dynamic growth of the model parameters required, implying the impracticality [21]. However, CNN is able to handle the explosive data growth because the increased number of parameters does not imply the growth of computing resource and time required. Recently, [22] also proposes a deep learning-based malware detection, where the op-code sequences are encoded as one-hot vectors for the input of CNN. However, this method needs to dissemble the Android apps via backsmali, and therefore cannot handle malware with encryption and obfuscation, as show in Fig. 3 and Fig. 4 [23], [24], [25].
III. OUR PROPOSED MECHANISM: R2-D2 A. The Advantage of Our Methodology As mentioned above, the majority of the malware detection still relies on the static analysis of source code and the dynamic analysis through monitoring the execution of malware. However, these approaches are known to have better detection accuracy for the same family of malware only. In reality, Android malware has dramatically growth in numbers and mutates with fast speed and with various anti-analysis techniques. All of these characteristics make the accurate detection extremely difficult. Thus, we attempt to find out the hidden relationship between the program execution logic and the order of function calls behind the malware by taking advantauge of the deep learning in order to accurately detect known and even unknown malware. Since the byte-code in classes.dex and rgb color code are both hexadecimal, we proposed a transformation algorithm that enables the byte-code as rgb color code. After that, we apply CNN to the images for the Android malware detection. Our R2-D2 method works by only decompressing Android apps and then translating the dalvik executable, classes.dex into color images. In fact, a huge amount of human labor will instead perform feature engineering and model before the detection model built. To ease the model training, we adopt deep learning to construct an end-to-end learning-based Android malware detection, which is termed as R2-D2 (R2D2: ColoR-inspired Convolutional NeuRal Network (CNN)based AndroiD Malware Detections). Our proposed R2-D2 possesses the following advantages: • R2-D2 translates classes.dex, the core of the execution logic of Android apps, into RGB color images, without modifying the original Android apps and without extracting features from apps. In our experiments, only 0.4 seconds suffice to translate an apps into a color image, such translation is also featured by the fact that more information in apps can be preserved in the color image compared to the grayscale image. • Though the fully connected layer of DNN can be used to handle the fast mutation, the CNN in R2-D2 actually is more suitable for capturing the malware, because of its features such as local receptive fields and shared weights that can not only significantly reduce the number of model parameters nut also represent the complex structure of Android malware. • The filter, pooling, and non-linear activation functions in CNN do not extract features from image for pattern recognition. Instead, the raw pixels are represented by multi-dimensional matrices.
Dropper, FakeAV, Monitor, SMS, Spy, Ransom, Exploit, and BackDoor. Then, we decompress the apps to retrieve the classes.dex and present as byte-code. We then map the hexadecimal from byte-code to rgb color code through the rule show in Fig. 8. Finally, we will reach a color image (shown in Fig. 9 and 10), and the images are feed to CNN and training a model to detection Android malware. Fig. 11 is our system architecture.
Fig. 8. The result of present byte-code as rgb color code.
Fig. 9. The color images example of benign Android apps.
B. The Architecture of Our Methodology In the following, we explain the algorithm procedures of R2-D2 in more details. First of all, we collected the Android apps which from our original back-end classification system, the Android apps will be classified as benign and Trojan, RiskTool, HackTool, AdWare, Banker, Clicker, Downloader,
Fig. 10. The color images example of malware Android apps.
700000
636730 589800
600000 500000 388884
391751
Amount
400000
292872 321047
328143
282787
308283
300000
238897
200000 100000 0
Malware Family
Fig. 11. The architecture of R2-D2.
Fig. 13. The trend of different mlaware families particularly all of the world.
IV. EXPERIMENT RESULT A. Experiment Environment and Datasets
2000000
Fig. 12 shows the hardware setting and software library used in our experiment. We collect the data from Cheetah Mobile Security Research Lab. In our back-end system, on average we collect 4000 benign and 6000 malicious samples everyday. The data collection duration is from October 2016 to February 2017, among which we have collection approximately 1.5 million of Android apps for our experiments. We particularly note that the malware may have different variants and mutations, depending on the factors such as the cell-pone model, Android version, and the orographic regions. Thus Fig. 2 shows the basis for our classification depending on malware family and malware behavior. Fig. 13 shows the trend of different mlaware families particularly all of the world, and Fig. 14 shows the trend of different malware families particularly in China and India. Fig. 15 shows the market shares for different cell-phone models. Based on the above statistic, we can confirm that even the same malware family will exhibit different behaviours in different geographic regions. Our dataset can handle this problem.
1800000
India
Indonesia
China
1600000
Amount
1400000 1200000 1000000 800000 600000 400000 200000 0
Malware Family
Fig. 14. The trend of different malware families particularly in China and India.
100% 90% 80%
Percentage
70% 60% 50% 40% 30% 20% 10% 0%
US BR DE TR MX FR IT TH KR IN GB TW PH ES ID RU JP MY SG AU CN Country Samsung
Fig. 12. The hardware setting and software library used in our experiment.
Lenovo+Motorola
Huawei
LGE
Sony
TCL
ZTE
htc
ASUS
Fig. 15. The market shares for different cell-phone models.
acer
B. Evaluations of different deep neural networks optimization
D. Real Case study We further compare our detection results to the results reported by VirusTotal vendors. More specifically, we collect 87 Minecraft apps that are verified as malware from Google Play. These apps are active on Google Play from January 2017, and the numbers of installations is approximately 1 million. The most prominent feature of these apps are the extra module for downloading, and the request for Device Administrator permission. These apps then promote deceptive advertisements. These apps have been removed by Google Play,. However, if they can be detected in the first place
NAG
SGD‐Loss
NAG‐Loss
0.16
97
Loss
Accuracy (%)
96
95
0.08
94
93
92
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Epoch
Fig. 16. The results of Inception-v3 with the SGD and NAG optimization models.
SGD
NAG
SGD-Loss
NAG-Loss
0.2
97
0.18
0.16 96 0.14 Loss
0.12
95 0.1
0.08 94 0.06
0.04
93
0.02
92
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Epoch
Fig. 17. The results of AlexNet with the SGD and NAG optimization models.
95.1
0.27
AdaGrad
AdaDelta
AdaGrad‐Loss
AdaDelta‐Loss 0.25
0.23
94.1 0.21 Loss
To infer the capability of our R2-D2 system in detection the unknown malware, we also collected Android apps on Google Play from February 2017. More specifically, we collected Android apps in different categories (e.g., weather, business, finance, travel, etc.) from the different countries (e.g., US, UK, France, Germany, etc. ). In addition, we also choose benign samples that none of VirusTotal (https://www.virustotal.com) vendor report malicious. We also selected malicious samples, the criteria for selecting malicious samples is that more than 30 VirusTotal vendors report malicious. We download some malicious samples from contagio (http://contagiominidump.blogspot.tw). As a summary, the above four sources of benign/malicious samples are used for verifying the detection capability of our R2D2 system. The evaluation metrics in our experiment include True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN), Accucarcy (Acc), Precision (Prec), Recall (Detection Rate, DR), False Positive Rate (FPR) and F1-score (F-measure). The evaluation results are shown in Fig. 20. We can see that with the rapid generation and mutation of malware, even the detection model is trained based on our dataset with 1.5 million samples, the precision is highler than accuracy after the threshold 0.6. Moreover, when the threshold is over 0.7, recall and F1 -score start to drop, meaning that the detection accuracy is not as good as before, this phenomenon can confirm that thought R2-D2 is able to reduce the human labor and resource consumption, long-term sample collection and model updating are still necessary.
SGD 98
Accuracy (%)
C. Validation on Real Environment
99
Accuracy (%)
Based on our collected data, we evaluate the detection accuracy and performance with different optimization model techniques. Note that the learning rate is fixed to be 0.01, The model optimization techniques are stochastic gradient descent (SGD), Nesterov Accelerated Gradient (NAG), AdaDelta and AdaGrad. From our experiment, we find that Inception-v3 (show in Fig. 16) is almost always better than AlexNet (show in Fig. 17), with such as observation, we further compare Inception-v3 with AdaDelta and Inception-v3 with AdaGrad (see Fig. 18), and find that SGD is best suitable for our use. In particular, it results in the sharpest increase in accuracy and sharpest decrease in loss. At the end, we reach 98.4225% and 97.7081% accuracy (see Fig. 19).
0.19
0.17 93.1
0.15
0.13
92.1
0.11 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Epoch
Fig. 18. The results of Inception-v3 with the AdaDelta and AdaGrad optimization models.
0.16 98.8
SGD
NAG
SGD-Loss
NAG-Loss
0.14 97.8
Accuracy (%)
0.1
95.8
94.8
0.08
93.8
0.06
92.8
Loss
0.12
96.8
0.04 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Epoch
Fig. 19. The results of Inception-v3 with the SGD and NAG optimization models. Fig. 21. The evaluation results. 100
100
100
99.9
99.9
99.9
99.5
100
95.03
94.29
92.76
98.9
98.4
97.5
97.3
95.65
96.19
96.97
97.25
96.6
97.13
95.9
96.87
95.5
94.4
96.91
96.42
90.45
96.05
95.39
93.5
92
87.87
Percentage(%)
80
83.33
94.93
90.8
85.25
93.06
87.2
60
91.56 40
80
82.7
95.16
93.76
95.84
96.4
96.88
96.95
97.25
97.15
96.9
96.95
96.5
96.15
95.55
95.05
93.5
86.2
20
0
71.43 0.05
74.29 0.1
78.37 0.15
82.63 0.2
86.57 0.25
89.28 0.3
90.95 0.35
92.60
94.07
0.4
0.45
96.44 0.5
97.20 0.55
97.67 0.6
97.86 0.65
98.35 0.7
98.54 0.75
98.73 0.8
99.03 0.85
99.45 0.9
99.77 0.95
Threshold Accuracy
Precision
Recall
F1-Score
Fig. 20. The evaluation results.
without feature extraction, they can fluctuation of Android malware can be mitigated. From Fig. 21, we can find that most of the above Minecraft malware can be detected by approximately 16% vendors (10/60) on 2017/03/24. Until 2017/03/30, approximately 32% vendors (20/60) can detect these malware. For example, Fig. 22 shows the detection results of "hash". However, R2-D2 can detect more than 75% of these apps. This manual feature engineering, but even also is able to beter detect unknown malware before the state-ofthe-art malware detection engines.
Fig. 22. The detection results of hash.
E. Comparison with Exited Methodology Finally, we compare our R2-D2 to DroidSieve. The evaluation metres include the sample size, DR/FPR, acc, and detection time. From Fig. 23, we find that though R2-D2 is slightly weaker than some others in DRFPR and acc, our R2D2 is trained based on a significantly larger dataset and is more adaptive to the real-world adoption, because the model trained by smaller dataset cannot reflect the reality. Moreover, R2-D2 has distinguishing advantage that it has fast detection time, 0.5 seconds for each coming samples, and require manual feature engineering.
Fig. 23. The result of compare R2-D2 and existed research.
V. CONCLUSION This research adopt deep learning to construct an end-toend learning-based Android malware detection and proposed color-inspired convolutional neural network (CNN)-based Android Malware detection, which is termed as R2-D2. The proposed proof-of-concept system has tested in our internal environment. The results show that our detection system works very well to detect known Android malware and even unknown Android malwarte. Also, we have publish to our core product to provide convenient usage scenarios for endusers or enterprise. The future work is reduce the complex task and train a high performance to faced the Android malware from a huge amount of computation burden. Furthermore, we will keep our research results on http://R2D2.TWMAN.ORG if there any update. ACKNOWLEDGEMENT This work would not have been possible without the valuable dataset offered by Cheetah Mobile, Leopard Mobile and Android apps CM Security Master (CM Security). R EFERENCES [1] International Data Corporation (IDC), (2016), Smartphone OS Market Share 2016 Q3 [Online]. Available: https://www.idc.com/prodserv/smartphone-os-market-share.jsp. [2] The AV-TEST Institute, (2016) Security Report 2015/16 [Online].Available : https : //www.av − test.org/f ileadmin/pdf /security_report/AV − T EST _Security_Report_2015 − 2016.pdf. [3] T. Vidas and N. Christin, "Evading Android Runtime Analysis via Sandbox Detection," in Proceedings of the 9th ACM symposium on Information, computer and communications security (ASIA CCS ’14), Kyoto, Japan, June, 2014. [4] C. Yang, Z. Xu, G. Gu, V. Yegneswaran, and P. Porras, "DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications.," in Proceedings of the 19th European Symposium on Research in Computer Security (ESORICS’14), Wroclaw, Poland, September, 2014. [5] C. Lei, C. S. Gates, S. Luo, and L. Ninghui, "A Probabilistic Discriminative Model for Android Malware Detection with Decompiled Source Code," IEEE Transactions on Dependable and Secure Computing, vol. 12, pp. 400-412, 2015. [6] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, pp. 484-489, 2016. [7] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, et al., "TensorFlow: a system for large-scale machine learning," in Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, Savannah, GA, USA, 2016. [8] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp. 436-444, 2015. [9] Ian Goodfellow, Y. Bengio and Aaron Courville, "Deep learning" An MIT Press book (2016). [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems 25 (NIPS 2012), Harrahs and Harveys, Lake Tahoe, 2012, pp. 1097-1105. [11] A. Z. K. Simonyan, "Very Deep Convolutional Networks for LargeScale Image Recognition," in International Conference on Learning Representations 2015 (ICLR2015), San Diego, CA, 2015. [12] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.,”Rethinking the Inception Architecture for Computer Vision.”in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), WA, USA, June 2016. [13] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, 2016.
[14] William Hardy, Lingwei Chen, Shifu Hou, Yanfang Ye, Xin Li. "DL4MD: A Deep Learning Framework for Intelligent Malware Detection", International Conference on Data Mining (DMIN) , 2016. [15] J. Saxe and K. Berlin, "Deep neural network based malware detection using two dimensional binary program features," 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, 2015. [16] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, "Droid-Sec: deep learning in android malware detection," in Proceedings of the 2014 ACM conference on SIGCOMM, Chicago, Illinois, USA, 2014. [17] T. Abou-Assaleh, N. Cercone, V. Keselj, and R. Sweidan, "N-GramBased Detection of New Malicious Code," in Proceedings of the 28th Annual International Computer Software and Applications ConferenceWorkshops and Fast Abstracts-Volume 02, 2004. [18] D. K. S. Reddy and A. K. Pujari, "N-gram analysis for computer virus detection," Journal in Computer Virology, vol. 2, pp. 231-239, 2006. [19] R. Moskovitch, C. Feher, N. Tzachar, E. Berger, M. Gitelman, S. Dolev, et al., "Unknown Malcode Detection Using OPCODE Representation," in Intelligence and Security Informatics: First European Conference, EuroISI 2008, Esbjerg, Denmark, December 3-5, 2008. Proceedings, D. Ortiz-Arroyo, H. L. Larsen, D. D. Zeng, D. Hicks, and G. Wagner, Eds., ed Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 204-215, 2008. [20] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," in Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, Pennsylvania, USA, 2011. [21] X. Zhang, J. Zhao, and Y. LeCun, "Character-level convolutional networks for text classification," in Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015. [22] N. McLaughlin, J. M. d. Rincon, B. Kang, S. Yerima, P. Miller, S. Sezer, et al., "Deep Android Malware Detection," in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, Scottsdale, Arizona, USA, 2017. [23] Bob Pan, "Inside of APK Protectors," RSA Conference 2015, [Online].Available : https : //www.rsaconf erence.com/writable/presentations/f ileu pload/spo− r09i nsideo fa pkp rotectors.pdf. [24] Caleb Fenton and Tim Strazzere, "Dex Education: Practicing Safe Dex," Black Hat USA 2012, [Online].Available : https : //www.blackhat.com/html/bh − us − 12/bh − us − 12 − brief ings.html#Strazzere. [25] Tim Strazzere and Jon Sawyer, "Android Hacker Protection Level 0," Defcon 22, [Online].Available : https : //www.def con.org/images/def con − 22/dc − 22 − presentations/Strazzere − Sawyer/DEF CON − 22 − Strazzere − and − Sawyer − Android − Hacker − P rotection − Level − U P DAT ED.pdf.