WK,QWHUQDWLRQDO6\PSRVLXPRQ'LJLWDO)RUHQVLFVDQG6HFXULW\,6')6 $SULO/LWWOH5RFN$5
A Review on Mobile Threats and Machine Learning Based Detection Approaches Bilgehan Arslan
Sedef Gunduz
Seref Sagiroglu
Department of Computer Engineering Gazi University Ankara, Turkey Email:
[email protected]
Department of Computer Engineering Gazi University Ankara, Turkey Email:
[email protected]
Department of Computer Engineering Gazi University Ankara, Turkey Email:
[email protected]
Abstract—The research of mobile threats detection using machine learning algorithms have got much attention in recent years due to increase of attacks. In this paper, mobile vulnerabilities were examined based on attack types. In order to prevent or detect these attacks machine learning methods used were analyzed and papers published in between 2009 and 2014 have been evaluated. Most important mobile vulnerabilities implementation format for these threats, detection methods and prevention approaches with the help of machine learning algorithms are presented. The obtained results are compared from their achievements were summarized. The results have shown that selecting and using datasets play an important role on the success of the system. Additionally, supervised learning techniques produce better results while compared with unsupervised ones in intrusion detection.
I.
Also; the evaluation about mobile threat detection steps with machine learning, principles of operation of this method, data types used in the detection time and the accuracy rate was presented. Rest of the paper is organized as follows: in Section II, threats which seen on mobile devices in recent years were discussed. In Section III, we focused steps of machine learning techniques to use malware detection and focused on methods developed for the detection of malicious software. Finally, the research results are discussed in Section IV. II.
Mobile threats are becoming a concept which heard more commonly day by day. In this context, according to many technology firms and the results of statistical reports, user and manufacturer methods have entered into the development process based protection [6]. Mobile device usages are more vulnerable than computers usages. Today, it is possible to make all kinds of transactions, mobile banking by mail, e-commerce etc. with using mobile devices and the data values which captured from a user’s mobile device is as same important as the value of the data which captured from the computer [4]. However; most users protect computers with antivirus application or etc., but they do not show the same sensitivity to mobile devices. For this reason; mobile platforms which have a more favorable offensive environment is exposed to attacks.
INTRODUCTION
Nowadays, applications on mobile devices offer the services designed for corporate use such as border control, forensics, criminal detection, e-passports, and the services designed for personal use such as e-commerce, e-banking, remote access control, e-wallet [1], [2]. However, mobile devices, as well as those supremacy are under attack and might be vulnerable and causes security threats, so mobile platforms/devices are recent targets [1]–[3]. Using mobile devices, we can access the person’s identity information, health information, credit card numbers and passwords, address, telephone number, personal photos and videos, a user name and password information which is used to sign with social networks such as Facebook, Twitter, LinkedIn [4]. Attacks on mobile devices aim to have access to this data. This kind of personal user data might be used many illegal fields such as creating fake identity [5].
In this part of study; we focused on the most common attacks in mobile phones and evaluated the aim of the attacks, implementation format, captured data type of the attack methods and results are summarized in Table 1. These attacks were examined in detail in the following section.
Because of this security problem on mobile platforms, studies are carried out on current mobile threats, examination of existing threats, analysis the deficiencies that exist, identifying these attack before they happen. Applications that use to attacks are changing day by day. However, in the content of the application changes are minor. For this reason; by analyzing the attacks that have already been given, the identification of future attacks can be achieved.
A. Malicious Applications (Riskware, Adware, Malware) Developed mobile software follows three different ways to use the application process [11]. These are freemium applications, premium applications and applications which are free but get money from advertisement. Some software is considered free software on the user side. But in this developers get place advertisement in software hiddenly. This allows developers to make money from advertisement [11].
In this study; existing studies and security reports have been examined in the literature, most common mobile threats were investigated. Statistics were compared, mobile attack types, methods and development were evaluated. Finally; in order to prevent or detect these attacks, machine learning based methods which use for mobile threat detection were analyzed. ,6%1978-1-4673-9865-7/16/$31.00 ©2016 IEEE
MOBILE THREATS
Applications developed for mobile phones create different effects on the device. If the user chooses to use an application, user should question itself whether the application is 7
$5HYLHZRQ0RELOH7KUHDWVDQG0DFKLQH/HDUQLQJ%DVHG'HWHFWLRQ$SSURDFKHV
how much reliable or not. Specific criteria are available for the application about how much reliable [7]. Malicious apps may obtain different results with different attack methods on mobile devicesandtheycanprovideaccesstodifferentareas.Malicious applicationscandelete,copy,changeandsendthepersonaldatato a remote center when the user uploads this type of application willingly [8]. Some malicious application types are listed below [9]: •
Remote management utilities
•
IRC clients
•
Dialing programs
•
File download programs
•
Software to monitor the computer activity
•
Password management utilities
•
Internet server services (FTP, web, proxy, telnet, etc.
Malicious application can lead to different conditions while using on the mobile device. The cases are summarized in the following scenario: •
An application selected by user may not have a harmful content but, not working as stable, it creates security weakness. The application can be dangerous because of located erroneous code snippets and this weakness can be used against the user.
•
Attackers can establish to control over the user with malicious code that inserted into application intentionally. Such users who do not realize applications that contain malicious software code remains at risk when using these applications.
•
Established applications can run other existing applications on mobile devices. Furthermore, this process can be carried out without approval from the user. Unauthorized using can be prevented by the encryption process or using biometric characteristic specifications to prevent malicious applications from being used by other applications [10].
•
Malicious applications used on mobile devices, can access the browser or application download service. Without the permission of the user, some current applications can download other different malicious application on mobile devices. Briefly, these applications may have the capacity to manage the device.
•
Software to be used on mobile devices may request a right of access to outside areas of its jurisdiction. If an application want irrelevant external authorization, it gives the impression that the application is malware.
be transmitted to mobile devices by the download sites, email, file transfer sites etc. Trojans use users to perform the spreading process because they have no ability to self-replicate like viruses and worms [14]. So, trojan is produced fast and simple compare to other malicious application. Trojans or Trojan horses, which can be hidden in useful application or can show itself different function application, run on operating systems [16]. If the users may not make a careful examination in mobile devices, they cant detect the trojan [17]. Trojan software, these applications can be considered popular by the user, is placed in the system file. Trojans covers very few places in capacity. Therefore, they do not occupy much space in the system files. Trojan horses does not have the processing power of their own like other types of malware, so they require user action [13]. Trojan software increases internet traffic so much on the user’s mobile device and this process leads to resource restrictions [18]. Trojans can be used in three different ways with mobile devices. These methods are [15]–[19]: •
Trojan can record every movement clicked or touched by the user and they can work with no permission. In this way, sensitive information such as passwords can be obtained from whole data.
•
Trojans can capture mobile users session that started earlier with capture session technique and it can also provide the bypassing of authentication of the e-mail system.
•
Specific information can be accessed with trojan by listening environment using a microphone or monitoring personal life using a camera. For example; telephone banking transactions can be performed on mobile devices. While user talks to customer service representative on the mobile phone, card number, confidential information such as passwords etc. can be captured by attackers using mobile microphone.
Trojan software can be converted to different formats according to their usage areas. Specialized Trojan derivatives have different task. These are [12]: •
Backdoor applications allow user to take control of infected computers. An attacker can do each process on the infected computer such as sending files, file import, file deletion, restart computer, data monitoring etc. Backdoor usually brings a group of victim computers together and use this computer in order to commit a crime by establishing botnet or zombie network.
•
Rootkit applications are designed to hide certain objects or activities in system. Their main aim is usually to avoid detection of malware infected the program and to extend the time to work on a computer.
•
Trojan-mail finder applications can collect email addresses from computers.
•
Trojan-SMS applications can send text messages to premium-rate phone numbers from mobile phones and it may cause users to spend money.
B. Trojan, derivatives and working structure They are the most preferred applications by attackers since they can be developed and used very simple compared to other mobile malwares. Trojan has wide application area, because from old technology phone which used calls and messages to smart phones, each mobile device can perform the function of sending and receiving messages [12]. Trojan software can 8
%LOJHKDQ$UVODQ6HGHI*QG]DQG6HUHI6DJLURJOX
•
Trojan-Banker programs are designed to steal account information such as online banking systems, epayment systems etc.
•
Trojan-DDOS programs can organize Denial of Service attack to targeted web address. They send multiple requests from personal computers or from many other infected computers.
•
Trojan-Downloader download and install new versions of malicious software such as Trojans and adware on computers.
•
Trojan-Dropper programs install viruses or trojans by hackers or they are used to prevent the detection of malicious programs. All antivirus programs do not scan all the components contained in this type of Trojans.
•
Trojan-FakeAV programs imitate the activity of antivirus software.
•
Trojan-GameThief programs can steal user account information of online players.
•
Trojan-IM programs can steal login information and passwords used in instant message programs such as ICQ, MSN Messenger, AOL Instant Messenger, Yahoo Pager, Skype etc.
•
are interesting and intriguing names, therefore they can create the effect looking at the content for user side [21]. They can occupy internet network redundantly, keep unnecessary space in the data storage mechanism, cause a loss of performance in devices such as computer, tablet, phone, etc. III.
MALWARE DETECTION USING MACHINE LEARNING TECHNIQUES
Although existing anti-virus systems improved for mobile devices are effective against predetermined malware, they can not show the same performance against new malware which has unknown signature [28]. Existing attacks must be analyzed with learning based methods and systems should be developed to predict occurring in the future attacks [23]. For this reason; machine learning based systems are used to detect malicious software. Considering the studies conducted on this topic; malware detection with machine learning methods is carried out in three phases: extracting type of features by using file representation, feature selection and classification [24]. In this section we explained these phases and studies that are carried out in this area. A. Phase I: Feature Extractions The analysis performed on executable files allows us to have an idea of the applications if they are harmful or harmless. In this context, the first analysis made by machine learning techniques have been carried out on executable files [25], [40], [48], [50]. Studies show that this phase uses DLL files, fuction call, number of function calls. The values extracted from PE files using string software and n-gram approaches are used to detect malicious software [26], [27]. Malicious application are analyzed by using the aforementioned techniques. Obtained results have been used with machine learning techniques to determine the situation of an attack. The studies using machine learning to analyze PEs are shown in Table 2.
Trojan-Spy programs can collect information on your computer. They can follow the data entered using the keyboard, get a list of running applications or take screenshots on monitor.
C. Viruses Viruses can pull down the structure which enters inside, make it unavailable. They can make multiple and personal attacks according to the production purposes [20]. Viruses do not use network mechanisms during their replication process. Instead of this, they use data transfer paths (downloaded files, FaceBook MSN account of the downloaded files etc.) or data storage media (CD, USB) [20]. Being in a state of interaction with computer is not necessary for virus structure. They want to be seen as the application file, they are produced in such as exe, bat formats [20]. Viruses can interfere with the operating system, cause information loss and damage. If written by a professional, it is possible to take action under a large network. Viruses can be classified such as e-mail viruses, observer virus, copycat viruses, macro-based viruses, boot viruses, file viruses, scripts [16].
In a study by Santos et. al [28]; they proposed a new method for detecting unknown malware types such as Trojan horses, viruses and worms. They collected 13,189 executable malware and 13,000 executables benign software and tested their methods using their dataset. Their model was based on representation opcodes and they used Decision Trees, KNN, Bayesian Networks, SVM, data-mining based approach with cross validation. Authors assessed their approach to the different value of disassembling time, representation time, number of features, training and testing time. In their study, they achieved. 90% average accuracy rate and showed that with ROC and the other tables. They followed three contributions. These were; how to use an opcode-sequence-frequency representation of executables, to provide empirical validation of the method with data-mining models, to propose previously unseen and to have successful rate methods.
D. Worms Worms copy themselves from one computer to another like viruses, but performs a self-copying process automatically [21]. First, the virus invades the transport channel or information files on computer. Once the worm enters in the system, it can progress on its own. The biggest danger is its ability to replicate itself. A worm may send copies of itself to all the contacts in mail account of computer users [16]. In this way, the entire network could be infected. Compared with viruses, they do not cause damage and loss of data. Worms detect the weakness of the system they want to act on and after that they find themselves a tunnel [16]. There
Nissim et. al [29] presented an active learning framework. They created a dataset of 7688 malicious such as viruses, worms, trojans and benign executable files from the VX Heaven for Windows OS. They conducted an experiments over a ten days and in order to evaluate their framework, they compared four selective sampling methods: an existing AL method, SVM-Margin, exploitation method which they developed, a combination of the two previous methods and randomselection. On the 10th day, using Exploitation methods, 88.5% 9
$5HYLHZRQ0RELOH7KUHDWVDQG0DFKLQH/HDUQLQJ%DVHG'HWHFWLRQ$SSURDFKHV TABLE I. Ref.
COMPARISON FOR MACHINE LEARNING CLASSIFICATION TECHNIQUES, USED PROPERTIES, DATA SETS, SUCCESS CRITERIA
Used Features and Data
Classification Methods
Dataset
Portable Executables
Support Vector Machine Wrapper/Maximum Relevance Minimum Redundancy Filter
51.223 dataset
PE32 file format (Windows 32)
13.189 executable malware/13.000 executable benign software 7688 malicious and benign executable files from VX Heaven 13.189 executable malware/13.000 executable benign software
88.5% average accuracy rate
Malware and benign programs dataset
General
[22]
Success Criteria malware/15.480
benign
96.84% average accuracy for Support Vector Machine/96.5% and 96.8% accuracy for Hybrid approach 90% average accuracy rate
Mobile Treaths General
[30]
-Executable files/opcode/byte sequences -n-gram model
Decision Tree/Naive Bayes/KNearest Neighbor/Support Vector Machine/Data Mining Active Learning/Support Vector Machine-Margin Support Vector Machine/Genetic Algorithm
[32]
-7253 static feautures -Executable files
Support Vector chine/Random Forest
Random features
Artificial Neural Network
1.843.359 malware/817.485 benign application
String information
K-Nearest Neighbor/Statistical Algorithm/AdaBost/Tree-based Algorithm/IB1 Decision Tree /Logistic Regression/Naive Bayes K-Nearest Neighbor/Naive Bayes /Support Vector Machine/Decision Tree
1367 sample string information
95.8% static analysis/97.1% dynamic analysis/98.7% combinated analysis 0.49% error rate for single Neural Network/0.42% error rate for ensembled Neural Network 97% accuracy rate
122,799 malicious files / windows files for benign files 115,157 Zeus malware dataset sample
Decision Trees achieved a 0.97 F-score Precision 99.5% and recall 99.6%
General
27475 malware (backdoor, hacktool, Trojan, worm) in Virus Heaven collection/273133 clean file in Windows OS Created own dataset 1000 benign and 2000 trojan application
80% and higher TP rate
Trojan
90% and higher precision rate
Trojan
Support Vector Machine
MIT reality mining data set
-
Trojan
K-Nearest Neighbor /Naive Bayes/Decision Tree/Support Vector Machine/ J48/Multilayer Perceptron Neural Network Decision Tree/Nave Bayes/ Bayesian Networks/PART/Boosted Bayesian Networks/Boosted Decision Tree/ Random Forest/Voting Feature Intervals One-Class Support Vector Machine
A total of 220 unique malware/250 unique benign software samples
recall of 95.9% /false positive rate of 2.4%/precision of 97.3%/accuracy of 96.8%
General
Android market apps files 407 games and 1878 tools.
Information Gain yield an accuracy level of 0.918 with a 0.172 FPR
General
2081 benign and 91 malicious Android applications.
0.09 precision in 2100 sample
General
20,000 raw malware (local memory attacks, remote code execution attacks, web application exploits, and denial of service attacks) 1330 malicious and 408 benign applications
80% and higher accuracy rate
Trojan
RF 94.53% /NB 79.79% /MP 93.91% /BN 86.23%/LR 89.52% /J48 93.43% Correctly Classified Rate Average precision 0.71/recall 0,71/F- measure 0.71/false positive 0.2897/false negative 0.2897 Detect rate %74-83 Average Detection rate %75
General
Precision 0.996/Recall 0.990/F score 0.993/ROC area 0.998
General
[28] Windows executable files [29]
Ma-
[34]
[35] PE32 file format (Windows 32) [36] [28]
-Behavior-based malware analysis -OS file system, network activity -logging, and registry 308 binary features in binary file/crossvalidation tests/F-scores statistical measures
Perceptron algorithm
[37]
Whole Windows Management Instrumentation files (Script, Event log, Win32 files etc.) Histogram features such as duration calls, sending sms, sending activities Windows XP Portable Executable file
K-Nearest Neighbor/ Bayes/Decision Tree
[38]
[39] [40]
[41]
-Apk, jar, dex, XML files/byte-codes -More than 22,000 features -ChiSquare, FisherScore, InfoGain
Naive
-
Trojan Virus Worms Trojan Virus Worms General
General
Trojan Virus
Trojan
[42]
-Apk and jar files - K fold cross validation -Wrapper model -3,699 Perl; 2,484 Python; 5,408 Ruby features
Genetic Algorithm
[43]
6,832 collected feature vectors
Random Forest / Naive Bayes / multilayer perceptron/Logistic Regression/J48
-XML, java and Android Manifest files/PE -String features/byte sequence features
K-means algorithm
[45]
18,174 Android applications on internet/188,389 Android applications from Android Market
-TF/IDF -bigrams
Linear Support Vector Machine
[46]
[47]
-Network, SMS, CPU, Power inf. file -32 resource features -10 fold Cross-Validation
Naive Bayes/ Random Forest/Logistic Regression/Support Vector Machine
271094 malicious executables by VX Heaven website/created 52803 malicious files/51243 umpacked malicious file/1560 benign files 30 normal apps and five malicious apps
-Levenshtein distance -Signature-based approach -Portable Executables (PE)
string-kernel based Support Vector Machine/K-Nearest Neighbor
3228 binary programs
KNN 91.42% accuracy / SVM 93.93% accuracy
General
[48]
AndroidMainfest.xml
Support Vector Machine/J48 Bagging Naive Bayes/J48/AdaboostMI/IBK5/Random Forest
1200 malware and 1200 benign samples 226 Linux Malware Samples/442 Linux Benign Samples from VXHeavens
accuracy 90%, precision 89%, Recall 86% Accuracy of 97%
General
[44]
[49] Executable Linkable Files/K feature [50]
10
General
General
Backdoor Floodor Virus Worm Rootkit Exploit Trojan
%LOJHKDQ$UVODQ6HGHI*QG]DQG6HUHI6DJLURJOX
of the acquired files were malicious. Their methods acquired about 2.6 times more malwares than the existing AL methods on the 9th day of the experiment.
for 95.8% in static analysis, 97.1% in dynamic analysis and 98.7% in combinated analysis.
Zolotukhin and Hamalainen [30] proposed a data-miningbased approach and they used executable files to identify malware using n-gram model. They classified opcode and byte sequences extracted from real executable files with using support vector classifiers integrated with a genetic algorithm. Their ZSGSVM (a zero-sum game support vector machines) algorithm tried many byte and opcode and it seperated benign software and malware. According to their study; ZSGSVM showed good results but feature selection with genetic algorithm took long time.
C. Phase III: Classification In the classification phase, files derived from malware are converted to vectors. These vectors are called representative and it is used in machine learning algorithm as a training set [23]. After that this data analyzes with appropriate method such as Decision Trees, Artificial Neural Networks, Naive Bayes, and Support Vector Machine etc. [28], [40], [41] and [44]. The data sets are then inserted into the classification process with these methods. These methods analyze training datasets and find malicious and benign files. Studies using classification-based approach with machine learning are shown in Table 2. Burguera et al [33] proposed a new framework to obtain and analyze smartphone application activity and developed a mobile platforms framework called Crowdroid. They used Virustotal Malware Intelligence Service to obtain the infected applications. They observed Linux kernel files for making malware classifications. They used k-Means algorithm for clustering the applications into two groups a benign and malware applications. They obtained a 100% detection accuracy for PJApps, and 85% for HongToutou Trojan. They concluded that open, read, access, chmod and chown were the most used system calls by malware.
B. Phase II: Feature Selection In this phase, the features finalized as harmful are seek whether they exist in examined new software or not. In this context, correct and sufficient feature detection can improve classifier methods sensibility and efficiency. There are two techniques about feature selection [24]. The first one is wrapper methods and in wrapper method, features are selected using the classifier [24]. The other one is filter methods. In filter methods, features are selected to the detriment of classifier. Filter approaches use API statistics as the training data. For implementing feature based methods, there are lots of techniques such as Document Frequency, Fisher Score, Gain Ratio. Feature Based Methods develop machine learning algorithms efficiency, help to understand and visualize data [37], [41], [43] and [50]. Studies using feature-based approach with machine learning are shown in Table 2.
Dahl and Yu [34] proposed malware detection approach using random projections and classification. They stated that feature selection methods may reduce the number of feature. So they choose features randomly. They used 1.843.359 malware and 817.485 benign applications. They used logistic regression and neural networks for classification. Their classification results were 0.49% error rate for a single neural network and 0.42% error rate for an ensembled neural networks.
In a study by Huda et. al [22]; they proposed hybrid framework for malware detection using machine learning techniques. In their study, they used a combination of two different hybrid approach. First one was SVM Wrapper, MaxRelevanceMin-Redundancy Filter combination and the second one was wrapper and filters combination based approaches. They used python language to extract the API call lists from portable executables. They tested their approach with 51.223 malware datasets and 15480 benign datasets. In their study; the wrapper filter using SVM can achieve 96.84% accuracy. The hybrid approach can achieve 96.5% and 96.8% for accuracy rate respectively. Their major contribution of literature was fully-automated signature-free method and for improved methods, approaches were considered as single and double and put the best result that the processing.
Tian et. al [35] proposed malware classification technique based on string information with using machine learning classification such as treebased algorithms, a nearest neighbour algorithm, statistical algorithms and AdaBoost. They used 1367 samples string information including Trojan and viruses for input data. They achieved 97% accuracy rate in classification. Their experiments showed that IB1 and Random Forest classification methods were the most effective for string information classification. IV.
Feizollah et. al [31] reviewed almost 100 papers published among the years 2010 and 2014.The study was about feature selection in mobile malware detection systems. They examined mobile malware detection system with 4 categories; static features, dynamic features, hybrid features and applications metadata. In addition, they discussed datasets used in the recent research studies.
CONCLUSIONS AND RECOMMENDATIONS
In this study; threats due to malicious applications, trojan, viruses and worms in mobile devices were examined in detail, recent mobile threat reports and the studies based on machine learning applied to these security threats were reviewed. Regarding the results obtained in the reviewed studies: •
Shijo and Salim [32] proposed methods having static and dynamic analysis and unknown executable file classification. They used application signature as unique identification feature for classification. They found 7253 static features extracted in their analysis and classified them with using SVM and random forest. They used known malware and benign programs as dataset. Their experimental results showed that the accuracies 11
Malware detection with machine learning techniques is performed three phases. First of all; the applications files (DLL, PE, byte code, apk file, jar file etc.) are determined to examine and extract the features. Then features are selected using such as Document Frequency, Fisher Score, Gain Ratio feature based methods.Finally collected features are used for training the system with machine learning algorithms i.e.
$5HYLHZRQ0RELOH7KUHDWVDQG0DFKLQH/HDUQLQJ%DVHG'HWHFWLRQ$SSURDFKHV
threats and the countermeasures against them by utilizing our recommendations.
SVM, KNN, NB etc. These stages are fixed. Only the methods are changing for performing each step. •
•
•
Examined studies have shown that; anti-virus applications use signature and anomaly-based approaches detect malicious application.It is observed that signature based techniques are less accurate incomparison with anomaly based techniques for detecting malware which are not in the database.
R EFERENCES
For feature selection, filter based models are more feasible. Because filter based approaches have low computational process due to assessment criteria and have weak prediction accuracy. However while comparing with filter based models, wrapper models are more complex and require more processing power. There have been available a few studies interested in detecting Trojan horses. Its detection mechanism is really one of the most difficult processes to understand.
Other contributions of this article are: •
Supervised techniques are more preferred than unsupervised techniques in intrusion detection systems.
[1]
M. Tanviruzzaman, S. Ahamed, C. Hasan, C. Obrien, ”ePet: when cellular phone learns to recognize its owner, Computer and Communications Security, New York, USA, pp. 13-18, 2009.
[2]
D. Gragnaniello, C. Sansone, L. Verdoliva, ”Iris Liveness Detection for Mobile Devices Based on Local Descriptors”, Pattern Recognition Letters, Naples, Italy, vol. 57, pp. 1-7, 2014.
[3]
H. Kim, M. Choi, ”Linux Kernel-based Feature Selection for Android Malware Detection”. Network Operations and Management Symposium, pp. 1-4, 2014.
[4]
Q. Su, J. Tian, X. Chen, X. Yang, ”A Fingerprint Authentication System Based on Mobile Phone”, 5th International Conference, AVBPA 2005, New York, USA, vol. 3546, pp. 151-159, 2005.
[5]
R. Shimonski, J. Zenir, ”Mobile Phone Tracking”, Cyber Reconnaissance, Surveillance and Defense, pp. 113-143, 2015.
[6]
D. Emm, M. Garnaeva, V. Chebyshev, R. Unuchek, D. Makrushin, A. Ivanov. (2014, November 18). IT threat evolution Q3 [Online]. Available: https://securelist.com/analysis/quar terly-malwarereports/67637/it-threat-evolution-q3-2014/
[7]
A. Armando, A. Merlo, L. Verderame, ”Security considerations related to the use of mobile devices in the operation of critical infrastructures”, International Journal of Critical Infrastructure Protection, vol. 7, pp. 247-256, 2014.
•
Success rate of the studies conducted with supervised techniques are in between 80-99.6%.
•
The studies using executable file achieves greater success rate than other studies that used different features and data types.
[8]
S. Seoa, A. Guptaa, A. Sallama, E. Bertinoa, K. Yimb, ”Detecting mobile malware threats to homeland security through static analysis”, Journal of Network and Computer Applications, vol. 38, pp. 43-53, 2014.
•
The studies can be divided into two general groups. Several approaches investigated applications power usages and unnecessary consumption. Other approaches investigated system calls.
[9]
G. Stanescu, ” Risk Assessment Model for Mobile Malware”, Journal of Mobile, Embedded and Distributed Systems, vol. 7, pp. 1-10 ,2015.
[10]
S. Dyea, K Scarfone, ”A standard for developing secure mobile applications”, Computer Standards & Interfaces, vol. 33, pp. 524-530, 2014.
[11]
I. Ideses, A. Neuberger, ”Adware Detection and Privacy Control in Mobile Devices”, 28-th Convention of Electrical and Electronics Engineers, Israel, pp. 1-5, 2014.
[12]
V. Chebyshev, R. Unuchek.(2014). Mobile Malware Evolution: Methods and techniques [Online]. Available: https://www.securelist.com/en/analysis/2047923 26/Mobile-MalwareEvolution-2013.
[13]
K. Dunham, ”Timeline of Mobile Malware, Hoaxes, and Threats”, Mobile Malware Attacks and Defense, pp. 35-70, 2009.
•
According to authors, to select feautures by manuel effects success rate directly.
•
SVM and Genetic Algorithms techniques are the most common and powerful machine learning algorithms compared with other techniques for mobile applications.
•
Better accuracy rates are achieved with more comprehensive datasets.
[14]
”What is Trojan” [Online]. Available: http://www.kaspersky.com/internet-security-center/ threats/trojans.
•
Feature selection process is the most difficult and effective phase for malware detection. Also selecting features manually affects the success rate directly.
[15]
A. Ekim, ”Digital Forensic and Malware Analysis in Mobile Devices”, 1st International Symposium on Digital Forensics and Security, Elazig, Turkey, pp. 1-6, 2013.
[16]
•
Malware detection in android OS are similar to other OS. In general static and dynamic analysis are used one by one.
An Introduction to Malware [Online]. Available: https://www.cert.gov.uk/wp-content /uploads/ 2014/08/An-introductionto-malware.pdf.
[17]
B. Uscilowski (2013, October). Mobile Adware and Malware Analysis [Online]. Available: http://www.symantec.com/content/en/us/enterprise/media/securityresponse/whitepapers/madware-and-malware-analysis.pdf
[18]
L. Rondeau (2014) Mobile Device Vulnerabilities & Securities [Online]. Available: http:// commons.emich.edu/cgi/viewcontent.cgi?article=1379&context=honors
[19]
E. Tatli. (2014, April 10). Attack Trees [Online]. Available: https://www.researchgate.net/publication/261510789-Saldiri-AgaclariAttack-Trees
[20]
John G. Iannarelli, M. OShaughnessy, ”The Threats of Today and Tomorrow”, Information Governance and Security, pp. 13-27, 2015.
[21]
S. Qing, W. Wen, ”A survey and trends on Internet worms”,Computers and Security, vol. 24, pp. 334-346, 2005.
•
Using hybrid techniques in solutions are more effective in comparison with single technique usages.
The contribution of this paper is to review the most recent mobile threats and machine learning solutions according to the literature. This study also contributes the authors who want to study malware detection with machine learning by providing them a comprehensive analysis of used methods. System and application developers can also benefit from our conclusions while developing new software. Finally, at least but not last, end users should be aware of the recent mobile 12
%LOJHKDQ$UVODQ6HGHI*QG]DQG6HUHI6DJLURJOX
[22]
[23]
S. Huda, J. Abawajy, M. Alazab, M. Abdollalihian, R. Islam, J. Yearwood, ”Hybrids of support vector machine wrapper and filter based framework for malware detection”, Future Generation Computer Systems, pp. 1-15, 2014.
[45]
P. Wang, Y. Wang, ”Malware behavioural detection and vaccine development byusing a support vector model classifier”, Journal of Computer and System Sciences, vol. 81, pp. 1012-1026, 2015.
[46]
[24]
A. Shabtai, R. Moskovitch, Y. Elovici, C. Glezer, ”Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey”, Information Security Technical Report, vol. 14, pp. 16-29, 2009.
[25]
M. Schultz, E. Eskin, E. Zadok, S. Stolfo, ”Data mining methods for detection of new malicious executables”, Security and Privacy, pp. 3849, 2001.
[49]
[26]
R. Moskovitch, D. Stopel, C. , N. Nissim, Y. Elovici, ”Unknown malcode detection via text categorization and the imbalance problem”, Intelligence and Security Informatics, pp. 156-161, 2008.
[50]
[27]
E. Menahem, A. Shabtai, L. Rokach, Y. Elovici, ”Improving malware detection by applying multi-inducer ensemble”, Computational Statistics & Data Analysis, vol. 53, pp. 1483-1494, 2009.
[47]
[48]
[28] I. Santos, F. Brezo, X. Pedrero, P. Bringas, ”Opcode sequences as representation of executables for data-mining-based unknown malware detection”, Information Sciences, pp. 64-82, 2013. [29]
N. Nissim, R. Moskovitch, L. Rokach, Y. Elovici, ”Novel active learning methods for enhanced PC malware detection in windows OS”, Expert Systems with Applications, vol. 41, pp. 5843-5857, 2014.
[30]
M. Zolotukhin, T. Hamalainen, ”Support Vector Machine Integrated with Game-Theoretic Approach and Genetic Algorithm for the Detection and Classification of Malware”, Globecom Workshops, pp. 211216, 2013.
[31]
A. Feizollah, N. Anuar, R. Salleh, A. Wahab, ”A review on feature selection in mobile malware detection”, Digital Investigation, vol. 13, pp. 22-37, 2015.
[32]
P. Shijo, A. Salim, ”Integrated static and dynamic analysis for malware detection”, Procedia Computer Science, vol. 46, pp. 804-811, 2015.
[33]
I. Burguera, U. Zurutuza, S. Tehrani, ”Crowdroid: Behavior-Based Malware Detection System for Android”, http://dl.acm.org/citation.cfm?id=2046619, 27.08.2015.
[34]
G. Dahl, J. Stokes, L. Deng, D. Yu, ”Large-Scale Malware Classification Using Random Projections And Neural Networks”, Acoustics, Speech and Signal Processing, pp. 3422-3426.
[35]
R. Tian, L. Batten, M. Islam, S. Versteeg, ”An Automated Classification System Based on the Strings of Trojan and Virus Families”, Malicious and Unwanted Software, pp. 23-30, 2009.
[36]
Z. Markel, M. Bilzor, ”Building a Machine Learning Classifier for Machine Detection”, Anti-malware Testing Research, pp. 1-4, 2014.
[37]
D. Gavrilut, M. Cimpoesu, D. Anton, L. Ciortuz, ”Malware Detection Using Machine Learning”, Computer Science and Information Technology, pp. 735-741, 2009.
[38]
Y. Liu, L. Zhang, J. Liang, S. Qu, ”Detecting Trojan Horses Based On System Behavior Using Machine Learning Method”, Machine Learning and Cybernetics, vol. 2, pp. 855-860, 2010.
[39]
A. Shamili, C.Bauckhage, T.Alpcan, (2010). Malware Detection on Mobile Devices using Distributed Machine Learning. Pattern Recognition, 4348-4351.
[40]
I. Firdausi, C. Lim, A. Erwin, A. Nugroho, ”Analysis Of Machine Learning Techniques Used In Behavior-Based Malware Detection”, Advances in Computing, Control and Telecommunication Technologies, pp. 201-203, 2010.
[41]
A. Shabtai, Y. Fledel, Y. Elovici, ”Automated Static Code Analysis for Classifying Android Applications Using Machine Learning”, Computational Intelligence and Security, pp. 329-333, 2010.
[42]
J. Sahs, L. Khan, ”A Machine Learning Approach to Android Malware Detection”, Intelligence and Security Informatics, pp. 141-147, 2012.
[43]
V. Benjamin, H. Chen, ”Machine Learning for Attack Vector Identification in Malicious Source Code”, Intelligence and Security Informatics, pp. 21-23, 2013.
[44]
B. Amos, H. Turner, J. White, ”Applying machine learning classifiers
13
to dynamic Android malware detection at scale”, Wireless Communications and Mobile Computing Conference, pp. 1666-1671, 2013. A. Samra, Y. Kangbin, O. Ghanem, ”Analysis of Clustering Technique in Android Malware Detection”, Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 729-733, 2013. B. TugsSanjaa, E. Chuluun, ”Malware Detection Using Linear SVM”, Strategic Technology, pp. 136-138, 2013. H. Ham, M. Choi, ”Analysis of Android Malware Detection Performance using Machine Learning Classifiers”, ICT Convergence, pp. 490495, 2013. T. Ban, R. Isawa, S. Guo, D. Inoue, ”Application of String Kernel based Support Vector Machine for Malware Packer Identification”, Neural Networks, pp. 1-8, 2013. N. Peiravian, X. Zhu, ”Machine Learning for Android Malware Detection Using Permission and API Calls”, Tools with Artificial Intelligence, pp. 300-305, 2013. K. Asmitha, P. Vinod, ”A Machine Learning Approach for Linux Malware Detection”, Issues and Challenges in Intelligent Computing Techniques, pp. 825-830, 2014.