Attack Tree Based Android Malware Detection with Hybrid Analysis. Shuai Zhao1,2, Xiaohong Li1,3, Guangquan Xu1,3*, Lei Zhang1,3 and Zhiyong Feng1,3.
2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications
Attack Tree Based Android Malware Detection with Hybrid Analysis Shuai Zhao1,2, Xiaohong Li1,3, Guangquan Xu1,3*, Lei Zhang1,3 and Zhiyong Feng1,3 1
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University 2 School of Computer Software, Tianjin University 3 School of Computer Science and Technology, Tianjin University {zshuai, xiaohongli, losin, lzhang, zyfeng}@tju.edu.cn platforms have been developed, such as Kirin [8], RiskRanker [12], SmartDroid [13] and DroidRanger [14].
Abstract—This paper proposes an Android malware detection approach based on attack tree. Attack tree model is extended to provide a novel way to organize and exploit behavior rules. Connections between attack goals and application capability are represented by an attack tree structure and behavior rules are assigned to every attack path in the attack tree. In this way, finegrained and comprehensive static capability estimation and dynamic behavior detection can be achieved. This approach employs a hybrid static-dynamic analysis method. Static analysis tags attack tree nodes based on application capability. It filters the obviously benign applications and highlights the potential attacks in suspicious ones. Dynamic analysis selects rules corresponding to the capability and conducts detection according to runtime behaviors. In dynamic analysis, events are simulated to trigger behaviors based on application components, and hence it achieves high code coverage. Finally, in this way, we implement an automatic malware detection prototype system called AMDetector. The experiment result shows that the true positive rate is 88.14% and the false positive rate is as low as 1.80%.
However, both static and dynamic analysis have some problems. Static analysis analyzes the compiled “.APK” file through reverse engineering. It is efficient because the malware is not executed. But it cannot deal with malwares employing anti-reverse engineering technologies such as code obfuscation. And static features cannot cover the complete runtime status. Therefore, the false negative rate and the false positive rate would be high. Dynamic analysis analyzes the runtime features, such as battery consumption and function calls. Dynamic analysis is relatively accurate. But there will be false negative if the malicious code is not executed. Furthermore, the large amount of runtime information results in low performance. In this paper, a practical scheme to mitigate the problems above is proposed. Static analysis is efficient, but the result is not reliable. Dynamic analysis is accurate, but it has a large performance overhead. Then we use static analysis to make a coarse filtering by analyzing permissions and function calls. After that only the potential attacks in suspicious applications need to be examined in dynamic analysis and a small subset of rules corresponding to the application capability are used in the detection, greatly reducing the detection load. Meanwhile, the component information extracted in static analysis is used to trigger behaviors. Generally, static analysis determines the capability of an application and dynamic analysis inspects how the application uses its capability. In this process, attack tree [15] is used to support fine-grained and configurable behavior rules.
Keywords—Android; malware; attack tree; detection; hybrid analysis
I.
INTRODUCTION
With the development of mobile Internet, smartphones are becoming increasingly ubiquitous. Google’s Android has obtained remarkably rapid growth. According to a recent study [1], Android has 81.0% market share in the mobile-phone market in the third quarter of 2013, with a 51.3% growth rate over the past year. An increasing number of people use smartphones to handle personal, financial and business data, arrange and organize their work and private life.
In summary, this paper makes the following contributions: x We extend the basic attack tree model to be fit for Android malware detection. Attack tree establishes connections between attacks and application capability. it provides a novel way to organize and exploit rules, which is fine-grained and runtime configurable.
The popularity of Android makes it the primary target of mobile malware authors. 2013 has been called “the year of mobile malware” [2]. Mobile malware presents a trend of explosion. McAfee’s report [3] shows there are 14000 new threats emerging in the first quarter of 2013. Malwares cause privacy leakage, financial loss and other issues.
x We combine static analysis and dynamic analysis. Static analysis detects possible attacks and classifies applications to benign and suspicious, then it records application components. Dynamic analysis examines runtime behaviors with components based behavior triggering and attack paths based rule selection. This achieves high accuracy as well as high performance.
Mobile security draws widespread attention. To mitigate the rampant malware problem, researchers have proposed some malware detection methods. These methods can be categorized to static analysis and dynamic analysis in general [4] with the employment of formal verification [5, 6], feature matching [7, 8] and machine learning [9, 10, 11]. Multiple detection tools and This work has partially been sponsored by the National Science Foundation of China (No. 91118003, 61272106, 61340039), 985 funds of Tianjin University and Tianjin Key Laboratory of Cognitive Computing and Application.
978-1-4799-6513-7/14 $31.00 © 2014 IEEE DOI 10.1109/TrustCom.2014.49
380
x Leaf nodes contain capability calculation rules for basic behavior capability detection.
x We implement a malware detection prototype system called AMDetector. Experiment result shows that the true positive rate is 88.14% and the false positive rate is 1.80%.
x Attack paths are numbered and attached with several behavior rules for runtime selection.
The rest of this paper is organized as follows: Section II describes the overall malware detection procedure and specifies the important steps. After that, Section III presents the prototype experiment and result analysis. Finally, we discuss the related work in Section IV and conclude our work in Section V. II.
Arguably, attack tree provides a novel approach to organize and exploit rules. Rules are formulated for every attack path in the attack tree. Compared to the traditional rules, this approach has the following two benefits. First, rules are fine-grained. Attack tree promotes the rule makers to consider the difference between benign and malicious applications on every sensitive behavior under a particular attack scenario. Second, rules support runtime selection. Dynamic analysis only checks the rules attached to attack paths that the suspicious application is capable to conduct. Therefore, different applications apply different rules corresponding to their capability.
DESIGN
The methodology of malware detection will be demonstrated in this section. To be more specific, the overall architecture design will be provided first and then the crucial procedures will be discussed in detail. Fig. 1 shows an overview of this approach. It consists of three main modules, including attack tree, static analysis and dynamic analysis. Attack tree is built before the detection process. It is a model to analyze application capability and hold behavior rules. Static analysis is not aiming at discovering malwares, but filtering the benign applications and identifying the possible attacks of an application. Then dynamic analysis prepares rules for an application according to its possible attacks. For each rule, an event is simulated and then the triggered behaviors are checked to detect malwares.
1) Android Malware Attack Tree Definition T =(VˈE) is an Android malware attack tree with one or more AND-OR nodes, Then: V(T) is a non-empty finite set of AND-OR nodes. Root node is the ultimate attack goal. Child nodes are possible attack steps to realize the father node goal. Leaf nodes are defined as a structure called effective capability. Nodes marked “P” means the node goal is possible to realize while “I” means the node goal is impossible to realize.
A. Android Malware Attack Tree Attack trees [15] provide a formal, methodical way of describing the security of systems. Attacks against a system are represented in a tree structure, with the attack goal as the root node and different ways of achieving that goal as leaf nodes. It is suitable for modeling attacks and threats. The attack tree model is intuitive, reusable and extensible.
E(T) is a subset of VhV. If there is a directed edge between , then v1 is called predecessor sub-goal of v2 while v2 is the successor sub-goal of v1. Attack path is a tree branch from the root node to a leaf node. It represents an attack behavior under a particular scenario with the leaf node capability. Every attack path in the attack forest has a unique identifier called attack path number.
This paper extends the attack tree to be fit for Android malware detection. Compared to the original attack tree and other attack tree approaches [16], this Android malware attack tree has richer semantics. It is not only used for static analysis, but also assigned behavior rules for further dynamic analysis. Tree structure, node relationships and paths are fully utilized.
There are AND nodes and OR nodes in the attack tree. AND nodes represent different steps toward achieving the same goal. That is, the father node goal is possible to be realized if and only if all the child node goals can be realized. OR nodes are alternatives, representing different ways to achieve the goal. That is, if one child node goal is possible to be realized, the father node goal is possible to be realized.
x Tree structure represents the relations between attacks and basic behavior capabilities.
AND node and OR node are represented in Fig. 2.
$WWDFN7UHH %XLOGLQJ $WWDFN7UHH
6WDWLF$QDO\VLV
5HYHUVH (QJLQHHULQJ
'\QDPLF$QDO\VLV
3HUPLVVLRQV $3,V
'HWHFWLQJ $WWDFN&DSDELOLW\
&RPSRQHQWV
/RDGLQJ%HKDYLRU5XOHV %HKDYLRU $QDO\VLV 7ULJJHULQJ%HKDYLRUV
Fig. 1. The overall architecture
381
* *
messages, accessing the Internet and writing SD card. The leaf nodes contains the rule to determine whether an application has the basic capability. This will be discussed in Section b). In the attack tree, every path is attached with a series of behavior rules used in dynamic analysis. This will be discussed in Section c).
* *
*
*
There are various categories of attacks in Android system such as privacy stealing and financial charges. For each attack category, we analyze malicious samples, abstract an attack mode and build an attack tree. All attack trees constitute attack forest.
Fig. 2. Representation of AND node and OR node
2) Building Android Malware Attack Tree Building Android malware attack tree is a top-down analysis process. For each attack category, we aggregate and summarize the malicious behaviors as an attack mode, then represent it with an attack tree. To build an Android attack tree, three steps are needed: the first builds the tree structure to describe how the ultimate attack goal is realized from capability combination, while the second defines leaf node rules to calculate basic capability from permissions and function calls, and the third formulates a series of behavior rules for each attack path.
b) Designing leaf nodes and capability rules Leaf node represents basic behavior capability. It has three key attributes: Perm, API and Tag. Perm records the permission required to obtain the capability. API records the function call to realize the capability. Tag is marked “P” or “I” to record whether an application has this capability. Network locating leaf node is defined as follows:
a) Building the attack tree structure Attack tree structure is an attack goal breakdown structure. First we identify Android malware categories. Each category forms a separate tree and it will be the root node. Then, think of all attacks against the goal and their steps. Add them to the tree. And mark the nodes as AND node or OR node according to the nodes relationship. Repeat this process down the tree until the node is a basic behavior capability of an application. Then define it as a leaf node.
Name: E_NLoc Perm: android.permission.ACCESS_COARSE_LOCATION API: LocationManager. requestLocationUpdates() Tag: I
Application capability is not only determined by permissions applied in manifest file. For example, if an application has permission of sending messages but doesn’t have any related function calls. Then it doesn’t have the capability of sending messages. But if an application exploits dynamic loading and privilege escalation, its capability is expanded. Therefore, we employ different capability calculation methods for different situations as Algorithm 1.
Fig. 3 shows the attack tree structure of privacy stealing malware category. The root node “privacy stealing” is the ultimate attack goal. To achieve this goal, both obtaining privacy and getting leakage method are needed. So the relationship between the two nodes is AND. Continually break down the “obtain privacy” sub-goal, it is divided into environmental privacy and stored privacy. The former includes obtaining user’s location and taking photos. The latter includes contacts, call logs and stored files. If the application gets one of these, it obtains privacy information. So these nodes are OR nodes. Leakage method is achieved by one or more capabilities such as sending
c) Formulating behavior rules attached to attack path Android applications are event driven. Applications execute particular behaviors in response to given events. Then behavior rules are formulated based on this system feature and defined in the following form: ⇒ 3ULYDF\6WHDOLQJ
2EWDLQ 3ULYDF\ (QYLURQPHQWDO 3ULYDF\ 7DNH 3KRWRV
5HFRUGHU
(B5HF
(B&DP
2EWDLQ /RFDWLRQ
/HDNDJH 0HWKRG
6WRUHG 3URYLF\
'HYLFH,QIR
6HQG606
8VHU,QIR
(B66PV
'LDO
,QWHUQHW
(B'LDO
:ULWH 6'FDUG
(B,QWH
(B:6'&
*36
1HWZRUN
'HYLFH ,'
'HYLFH 0RGHO
6\VWHP 9HUVLRQ
&RQWDFWV
5HDG606
&DOO/RJ
$FFRXQWV
5HDG 6'FDUG
(B*/RF
(B1/RF
(B'LG
(B'PRG
(B69HU
(B5&RQW
(B56PV
(B5'LDO
(B$FFR
(B56'&
Fig. 3. Attack tree of privacy stealing
382
Algorithm 1 Leaf Node Tag Calculation Input: 1. DYNAMIC
Test Item Definition ∷=
True if existing dynamic loading, False otherwise
2. ESCALATION
call(func)
True if existing privilege escalation, False otherwise
3. Permissions
set of the test application permissions
4. APIs
set of the test application function calls
func function call detected
before(func1, func2)
func2 detected before func1
after(func1, func2)
func2 detected after func1
func.param[i] ∈ reg
the ith parameter of func meets regular
func.retval ∈ reg
the return value of func meets regular
expression reg
Target: Calculating Tag value of leaf node: Node Process: markLeafTag(Node)
expression reg
IF ¬ ∧ ¬
. ∈ ∧
For illustration, we present the behavior rules for SMS charging malwares. Component set extracted in static analysis is represented with uppercase letters such as ACTIVITY. Sensitive phone number set is represented with SNO.
. ∈ ⇒ . =
. ∉ ∨
. ∉ ⇒ . =
∃ : #$ | ∈ % • #$() ⇒ #&&( . )
ELSE IF ∧ ¬
. ∈ ⇒ . =
∃ : # | ∈ '% • #() ⇒ #&&( . )
. ∉ ⇒ . = ELSE IF
∃ *: *# | * ∈ +' • *#(*) ⇒ #&&( . )
. =
#,#ℎ ⇒ #&& ( . ) ∧ ¬ */( . , 1, . &/ 1,)
Event is sent to Android system to trigger behaviors. Hence applications execute under control so that the occasions when sensitive functions called are known. This is an important feature to distinguish benign and malicious applications. There are three main types of events. The first is starting up components. We can enforce all activity and service components executed and send broadcast to stimulate the broadcast receivers. The second is environmental events such as location changed and receiving an incoming call. The third is simulating user interaction. For every activity, event flows such as screen touch and key press are sent to the system.
∃ : ,* | ∈ • #(, ".∗ ") ⇒ #&&(4. *+#)
The behavior rules above describe the malware features in SMS charging. Compared to the normal use of function sending messages, malwares call this function immediately when an activity and service start up or a broadcast is received. Or even though the function call is triggered by screen touch, but input method service is not called before sending messages. Besides, messages from carriers or banks are intercepted. This example shows that the rules are formulated by carefully considering the difference between normal applications and malwares in function call occasion, parameters and return value.
Test is what to examine after simulating an event. Function calls including calling occasion, parameter passed and the value returned are features for detection. Both benign apps and malicious apps use the same APIs. But there are subtle differences among calling occasion, parameters and return value. With the purpose of distinguishing between benign and malicious apps, rules are made by referring to these differences.
B. Static Analysis Static analysis is aiming at detecting possible attacks of an application. It first extracts static features of an application, then detects the application’s potential attacks with attack tree. 1) Feature extraction We adopt reconfigured Androguard [26] in feature extraction. It is convenient for automatic processing. Security related permissions and function calls are extracted to determine application capability. Components are extracted to trigger behaviors in dynamic analysis. For activity and service the component name attribute is extracted while for broadcast the action attribute of intent filter is extracted.
Event Definition ∷= activity(component)
start up an activity
service(component)
start up a service
broadcast(action)
send a broadcast
sendsms(number, text)
send a sms from number
geo(location)
send a new loaction
gsmcall(number)
dial from number
gsmaccept
accept the incoming call
gsmcancel
hang up the call
screentouch
send a screen event
keypress(keycode)
press the key of given keycode
2) Potential attacks detection a) Tag the Leaf Nodes. Leaf nodes represent the basic behavior capabilities. Tag attribute is marked as “P” or “I” according to Algorithm 1. b) Tag All the Tree Nodes. Marking all tree nodes in a bottom-up approach till the root node is marked. If child nodes are all marked, then the father node can get its tag value by the node type and child tag values.
383
c) Record Components and Attack Paths. If there exists an attack tree root node marked “P” then the application is classified as suspicious while if all attack tree root nodes are marked “I”, the application is benign. For suspicious apps, record the components and attack paths.
C. Dynamic Analysis Static analysis obtains the potential attacks of an application. Then the goal of dynamic analysis is to examine whether these potential attacks will be utilized. We focus on two main challenges of dynamic analysis, low code coverage and low detection efficiency. Component based behavior triggering is proposed to improve code coverage. While attack tree based behavior rules approach can avoid unnecessary detection and achieve high efficiency.
Algorithm 2 Potential Attacks Detection Input:
1. Permissions
Set of permissions
2. APIs
Set of function calls
1) Component based behavior triggering. Dynamic analysis is effective only if applications execute completely to expose malicious behaviors. Actually, even manual detection cannot guarantee the completeness. Some trigger conditions are not satisfied by user operation. Android Monkey tool is widely used in automatic analysis. This tool sends random event flow to application. The effect is not satisfactory either.
Target: Determine attack capability and record attack paths Process: FOR EACH attackTree IN attackForest 1. /* Calculate Tag of leaf nodes based on Permissions and APIs */ FOR EACH leafNode IN attackTree markLeafTag(leafNode) 2. /* Calculate Tag value of all nodes */ FOR EACH node IN attackTree
Android application consists of four kinds of components. Enforcing all components go through their life cycle achieves high code coverage. Then component based behavior triggering approach is proposed. Events are sent to the system according to application components. “am” command is used to start up activity, service and send broadcast. We telnet to the Android device to simulate events such as location changing and receiving an incoming call to trigger broadcast receivers. For every activity, screen operations and key press are simulated. In this approach, events are sent to the system based on the result of static analysis to avoid unnecessary simulations.
IF node.Type = AND . = ∧ {. ℎ& } ELSE IF node.Type = OR . = ∨ {. ℎ& } 3. /* Record attack path if there exist root node marked P */ IF root.Tag = P FOR EACH path IN attackTree IF ∀ : | ∈ 1ℎ • . = record(path)
2) Behavior rules detection. We exploit TaintDroid [18] to obtain runtime function calls, parameters and return values. These features are extracted from logcat when applications execute in a TaintDroid based device.
In static analysis, only reversible applications that have no capability to realize any attack goals and do not exploit dynamic loading and privilege escalation are classified to benign. Due to the limitation of their behavior capability, these applications are impossible to execute malicious behaviors. Therefore, there is no false negative in this procedure. The suspicious applications contain real malwares and some benign applications. Dynamic analysis will distinguish them.
Algorithm 3 Dynamic Detection Input:
Target:
Json [17] format is used to record suspicious applications as the following. Json is a key-value based serialized string with wide library support. For each suspicious application, we record the name, components and attack path numbers. Events are simulated based on the component type and class name. Behavior rules are selected based on the attack path numbers.
1. AttackPaths
set of test application attack paths
2. Components
set of test application components
Detecting runtime behaviors.
Process: BehaviorDetection() 1. /* Loading the coarse set of attack path rules */ FOR EACH path IN AttackPaths rules = rules ∪ loadRules(path) 2. /* Rule cluster optimized */
{
cluster(rules) "applicationNmae": "Application.apk", "components": [ { "packageName": "com.android.pkname", "launcherActivity": "com.android.pkname.MainActivity", "activities": [ "com.android.pkname.Act1", "com.android.pkname2.Act2" ], "services": [ "com.android.service.BgService" ], "broadcastreceiver": [] }], "attackPathNo": [ "3", "11", "14", "37" ]
3. /* Rule based behavior detection */ FOR EACH rule IN rules simulate(rule.event, Components) result = check(rule.test) IF result == TRUE RETURN malware RETURN benign
a) Assembling Behavior Rules Each application applies different behavior rules depending on its attack capability. The set of behavior rules specific to an
}
384
application is well assembled before detection. Static analysis records the attack path numbers and every attack path number has several rules. The first step is to put behavior rules corresponding to the recorded numbers together. Rules are defined as event ⇒ test . For each rule detection, event simulation takes great time overhead. Then the second step is a cluster operation for optimization. If more than one rules employ the same event, cluster them so that the event simulation is executed only once. The test parts are aggregated by logic OR. b) Behavior detection Applying assembled behavior rules one by one. We send the simulated event to system and afterwards examine the function calls triggered against all the clustered test part of the rule. If all rules are mismatched, the application is benign, otherwise it is malicious. III.
Fig. 4. Result on Different Number of Attack Paths and Rules
experiment with different number of attack paths and rules. The result is shown in Table I.
EVALUATION
We have developed an automatic malware detection prototype system called AMDetector based on the approach we proposed in this paper. Detailed detection data is recorded in every stage to evaluate the effectiveness. The experiment is conducted on a Dell PC with Intel dual core 2.7GHz CPU, 4GB memory, running Ubuntu 12.04.
Fig. 4 shows the malware detection result on different number of attack paths and rules. The TPR increases with the increase of attack path and rule number. When attack tree is sparse, with a small number of attack paths and rules, there will be no FP. But the TPR is very low. The TPR increases significantly by adding attack paths and rules. When there are more than 200 rules attached to nearly 50 attack paths, the TPR reaches 88.14% with little further increase. However, the fast increase of FPR results in the decrease of Accuracy because benign samples occupies the majority of the data set. In this experiment, the highest Accuracy is 96.57% with a TPR of 88.14% and FPR of 1.80%.
The data set used for detection is collected from Google Play, Android Malware Genome Project [19], VirusTotal Malware Intelligence Service [20] and our own developed applications. It consists of 728 applications which include 610 benign and 118 malicious ones. Malware proportion is 16.2%. The following standard is employed to evaluate the result. True Positive Rate (TPR) is the proportion of positive instances classified correctly. False Positive Rate (FPR) is the proportion of negative instances misclassified. And Accuracy measures the proportion of absolutely correctly classified instances, either positive or negative. Where TP is the number of positive instances classified correctly; FP is the number of negative misclassified; FN is the number of positive instances misclassified and TN is the number of negative instances classified correctly. Then:
We analyze the reason for false negatives in privacy stealing, financial charging and remote control. One reason is the runtime environment on which malicious samples depend is no longer satisfied. Such as the remote server is shutdown. Then the samples are not active when detecting. And we sacrifice some true positive rate to achieve a lower false positive rate, then our rules cannot detect some malwares highly disguised. While false positive appears when a benign application shows nonstandard behaviors. For example, a news client automatically downloads and installs update without informing the user when it starts. This behavior results in a misclassification.
TPR = / ( + 9 ) FPR = 9 / (9 + ) Accuracy = ( + ) / ( + 9 + 9 + )
B. Comparisions To prove the effectiveness and find the drawbacks of this approach, we compare the detection result with the previous approaches and analyze advantages and drawbacks of each one. Table II shows the comparison result.
A. Experiment Result In this approach, the detection result is directly influenced by the number of attack paths and rules. The attack tree model is intuitive, reusable and extensible. Then we conduct the TABLE I.
RESULT ON DIFFERENT NUMBER OF ATTACK PATHS AND RULES
Num of Attack Paths
Num of Rules
TP
24
106
87
0
610
31
32
132
96
5
605
22
38
165
101
8
602
17
44
198
104
11
599
49
216
105
16
594
55
239
105
25
585
FP
TN
385
FN
TPR(%)
FPR(%)
Accuracy(%)
73.73
0
95.74
81.36
0.82
96.29
85.59
1.31
96.57
14
88.14
1.80
96.57
13
88.98
2.62
96.02
13
88.98
4.10
94.78
TABLE II. Features Method used
Kirin [8]
COMPARISON OF EXISTING MALWARE DETECTION APPROACHES STREAM [11]
SmartDroid [13]
BNF notation rules Static permission labels and action strings match
Machine Learning Input emulation with Monkey
UI-based trigger conditions Function call graph and Activity call graph
Attack tree Hybrid analysis.
Simple to implement with good performance
Suitable for large scale sutdies. Distributed experimentation plastform.
Static analysis gets activity switch relationships and dynamic analysis checks sensitive behaviors. The detection has a high code coverage.
Rules are organized with attack tree to achieve fine-grained and configurable. Static analysis points out potental attacks and dynamic analysis checks the reduced rule set. Components based triggering.
Nine rules is not sufficient Static permission features cannot represent the actual behaviors of an application
Monkey tool does not simulate user input well. The classifiers have poor false positive rates.
No trigger for components other than activity, such as service and broadcast.
Rules are formulated manually Detailed dynamic analysis occupies long time.
10 of 311 apps didn’t pass the rules. Among them 5 are dangous and the other 5 are considered reasonable.
Bayes net Logistic TPR: 81.25% 68.75% FPR: 31.03% 15.86%
SmartDroid can reveal the UIbased trigger condition lead to a behavior. But it cannot reveal indirect conditions and logic-based trigger conditions.
TPR: 88.14% FPR: 1.80% Accuracy: 96.57%
Advantages
Drawbacks
Detection result
AMDetector
engineering. Extracting static features, analyzing capability and recording information are completed very fast. Static analysis time depends on the code size. Dynamic analysis overhead counts the installation and execution time. The overhead varies from applications. Complex applications with a large number of components and high capability take a long detection time. For the total performance overhead, since 35.37% benign applications are filtered in static analysis, the overall average detection time is significantly reduced.
Kirin [8] is a rule based static unsafe permission combination match system. For their dataset, half of the applications that don’t pass the rules assert a dangerous configuration of permissions, but are used within reasonable functional needs. This proves the limitation of static analysis approaches. It tends to consider an application with large amount of permissions and complex function calls to be a malware. But actually, a benign application such as a system assistant declares many permissions while malwares may attack with only one or two permissions. STREAM [11] is a machine learning system designed for large scale detections. The best TPR is 81.25% and the lowest FPR is 15.86%. Because the most frequently exhibited behaviors are usually benign ones while the malicious behaviors are hard to expose. It is difficult for classifiers to learn what is the real malicious behaviors. SmartDroid [13] proposes a method to trigger the sensitive behaviors. It uses static analysis to extract activity switch paths and then uses dynamic analysis to traverse each UI elements. This approach is effective, but it does not clearly define what are sensitive behaviors and it cannot trigger Android components such as service and broadcast.
TABLE III. Process
PERFORMANCE OVERHEAD
Shortest(s)
Longest(s)
Average(s)
Static Analysis
1.7
5.6
2.9
Dynamic Analysis
18.2
165.4
64.7
Total
2.9
167.8
41.2
IV.
RELATED WORK
Mobile security has become an active area of research. Malware detection methods employing static analysis and dynamic analysis are closely related, as well as behavior triggering method to collect dynamic features. The following provides an overview of current research in this area.
Compared to these approaches, our approach reaches a high TPR and a very low FPR. This benefits from the adoption of attack tree. It promotes attack scenario specific behavior analysis. The behavior rules are formulated fine-grained to distinguish the sensitive function calls maliciously employed. Then benign applications with high capability won’t be misclassified. Component based event simulation guarantees high code coverage.
Static analysis extracts features such as permissions, function calls and file structure from the compiled .APK file. ScanDroid [21] is a program analysis tool for automated security certification. It extracts security specifications from manifests and checks whether data flows are consistent with those specifications. Kirin [8] uses security rules to block the installation of potential unsafe apps if they exhibit certain dangerous permission combination. Suleiman Y. Yerima et al. [22] extracts properties including API calls, Linux system commands and permission to map into feature vectors and detects malwares using Bayesian classification. RiskRanker
C. Performance Overhead We analyze the detection performance in our experiment environment and present the result in Table III. The performance overhead result shows that the primary overhead occurs in the dynamic analysis phase. Static analysis takes a relatively short time which is mainly used in reverse
386
[12] detects sensitive instructions from Dalvik bytecode and especially focus on encryption and dynamic loading code.
[4] [5]
Dynamic analysis studies features such as function calls and environment data at runtime. Shabtai et al. [9] proposes a feature extraction and analysis framework called Andromaly. They make a comparison between variety of filtering and classification algorithms in malware detection based on CPU load, memory usage, power consumption and other features. Bose et al. [23] presents a method to detect malware by observing the logical order of an app’s behaviors and matching with “spatial-temporal” representation of known malware behaviors. CrowDroid [24] collects system call log file from Android community and applies clustering algorithms to analyze and detect malware in remote server. TaintDroid [18] monitors sensitive data flow at runtime and detects the potential privacy leakage.
[6]
[7]
[8]
[9]
[10]
Behavior triggering is a challenge to dynamic analysis. Crowdroid collects behavioral data directly from users via crowdsourcing. STREAM [11] implements the random fuzz testing provided by Android’s Monkey [25]. SmartDroid [13] combines the static analysis and dynamic analysis to reveal UI-based trigger conditions and has the capability to simulate UI interactions. V.
[11]
[12]
CONCLUSIONS [13]
In this paper, we propose a novel Android malware detection approach based on attack tree. Attack tree promotes a top-down analysis, starting from the high level attack category and breaking down to the basic application capability. Attack tree introduces a new method to organize and exploit rules, fined-grained and configurable. Hybrid analysis combines static analysis and dynamic analysis to give full play to their advantages and compensate for their deficiencies. Static analysis records the attack capability and components of suspicious apps. Dynamic analysis sends events to trigger behaviors based on application components and examines runtime behavior against attack capability. We have developed a prototype system AMDetector to realize automatic malware detection. The experiment result is 88.14% true positive rate and 1.80% false positive rate, which demonstrates the effectiveness of this approach.
[14]
[15] [16] [17] [18]
[19]
From the experiment we learn that there still exist blind spots in the behavior rules to distinguish benign and malicious applications. The features we choose are function call occasion, parameters and return value while detailed data connections among function calls are not effectively employed. Then extending the attack tree and refining the behavior rules is our future work. And we will collect more applications and run a larger benchmark.
[20] [21]
[22]
[23]
REFERENCES [24] [1]
[2] [3]
International Data Corporation. Worldwide quarterly mobile phone tra cker 3q13. November 2013. http://www.idc.com/getdoc.jsp?container Id=prUS24442013. Blue Coat. Blue Coat Systems 2013 mobile malware report. 2013. McAfee Labs. McAfee threats report: first quarter 2013.
[25] [26]
387
N. Idika, A. P. Mathur. A survey of malware detection techniques. Purdue University. 2007. Dibyajyoti Ghosh, Anupam Joshi, Tim Finin, Pramod Jagtap. Privacy control in smart phones using semantically rich reasoning and context modeling. IEEE Symposium on Security and Privacy Workshops. 2012. Howard Barringer, David Rydeheard, Klaus Havelund. Rule systems for run-time monitoring: from eagle to RuleR. Runtime Verification Lecture Notes in Computer Science Volume 4839, pp. 111-125. 2007. B. Dixon, Y. Jiang, A. Jaiantilal, S. Mishra. Location based power analysis to detect malicious code in smartphones. Workshop on Security and Privacy in Smartphones and Mobile Devices. 2011. W. Enck, M. Ongtang, and P. McDaniel. On lightweight mobile phone application certification. ACM Conference on Computer and Communications Security, 2009. Asaf Shabtai, Uri Kanonov, Yuval Elovici, Chanan Glezer, Yael Weiss. Andromaly: a behavioral malware detection framework for android devices. Journal of Intelligent Information Systems, Volume 38, pp. 161-190. 2012. Justin Sahs, Latifur Khan. A machine learning spproach to Android malware detection. European Intelligence and Security Informatics Conference. 2012. Brandon Amos, Hamilton Turner, Jules White. Applying machine learning classifiers to dynamic Android malware detection at scale. Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1666 – 1671. 2013. Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou Xuxian Jiang. RiskRanker: scalable and accurate zero-day Android malware detection. International Conference on Mobile systems, applications, and services. pp. 281-294. 2012. C. Zheng, S. Zhu, S. Dai, G. Gu, X. Gong, X. Han, W. Zou, Smartdroid: an automatic system for revealing ui-based trigger conditions in android applications. ACM workshop on Security and privacy in smartphones and mobile devices. pp. 93–104. 2012. Yajin Zhou, Zhi Wang, Wu Zhou, Xuxian Jiang. Hey, you, get off of my market: detecting malicious apps in official and alternative Android markets. Network & Distributed System Security Symposium, 2012. Bruce Schneier. Attack trees. Dr Dobb’s Journal of Software Tools, 12(24), pp. 21-29. 1999. Xia Jiang, Weiwei Qi. An improved attack tree algorithm based on Android. Journal of Software Engineering, Volume 8, pp. 50-57. 2014. Json, http://www.json.org/. William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, et al. TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. USENIX Symposium on Operating Systems Design and Implementation, 2012. Yajin Zhou, Xuxian Jiang. Dissecting Android malware: characterization and evolution. 2012 IEEE Symposium on Security and Privacy (SP). 2012. VirusTotal Malware Intelligence Services. https://secure.vtmis.com/vtmis/. A. Fuchs, A. Chaudhuri, and J. Foster. SCanDroid: automated security certification of Android applications. IEEE symposium of security and privacy. 2009. Suleiman Y. Yerima, Sakir Sezer, Gavin McWilliams, Igor Muttik. A new Android malware detection approach using bayesian classification. International Conference on Advanced Information Networking and Applications. 2013. A. Bose, X. Hu, K. G. Shin, T. Park. Behavioral detection of malware on mobile handsets. International Conference on Mobile systems, applications, and services, pp. 235-238. 2008. I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: behaviorbased malware detection system for Android. Workshop on Security and Privacy in Smartphones and Mobile Devices, 2011. UI/Application Exerciser Monkey. http://developer.android.com/tools /help/monkey.html. Androguard. http://code.google.com/p/androguard/.