Using Dynamic Taint Approach for of Malware Threat ...

Using Dynamic Taint Approach for of Malware Threat Ping Wang, Wen-Hui Lin , Wun Jie, Chao Department of Information Management Kun Shan University Tainan, Taiwan

Kuo-Ming Chao

Chi-Chun Lo

Professor of Engineering and Computing, School of MIS, Coventry University, UK [email protected]

Institute of Information Management National Chiao Tung University Hsinchu, Taiwan [email protected]

[email protected] [email protected] [email protected]

security analysis of the information exchange for protecting sensitive information in cloud computing to prevent such a hack is required. To guarantee the mobile security of cloud computing services, cloud service providers (CSPs) use Taint Checking (TC) approaches to examine how a developing cloud app uses sensitive information and check for suspicious behaviours of users before deploying an app. TC is a blacklisting approach as it asserts that certain values are dangerous for information flow within intersuspicious modules (intramodule) in the host and program of an external network (interapp). In other words, TC technique emphasises examining the scope of taint propagation within intramodule or interapp. Unlike the traditional threat analysis technique, in TC, the focus is more on detecting exploits and data flow analysis is used to determine if the value is derived from a user input. In taint analysis, modules connected to data originating from untrusted network channels are marked as contaminated, and a series of arithmetic and logic operations to track data with taint marking. In performing malware threat analysis against unspecified malware attacks, CSPs can use a TC approach for tracking information flows between attack sources (malware) and detect vulnerabilities of targeted network applications. Many TC approaches have been developed to facilitate the security by preventing malicious users from executing commands on a host computer [1-3], which require computing emulations with a set of distinct mission hosts in a virtual security experimental environment. As described before, most existing TC schemes focus on mainly examining the security hole for mobile devices applications, but do not fully answer the severe questions regarding 1) the risk posed by suspicious applications were connected to security holes in the cloud computing server, and 2) the attack paths selected for introducing safeguards. To address this limitation, the present study proposes a taint propagation analysis model that incorporates a weighted spanning tree analysis scheme to track data with taint marking and identify the spread of intents by considering the state transitions of a program in an information system. Both dynamic taint analysis (DTA) and risk analysis are incorporated in the scheme and used for possible system exploits to solve the dynamic taint propagation analysis (DTPA) problem. The proposed model examined the

Abstract—Most existing approaches focus on examining the values are dangerous for information flow within intersuspicious modules of cloud applications (apps) in a host by using malware threat analysis, rather than the risk posed by suspicious apps were connected to the cloud computing server. Accordingly, this paper proposes a taint propagation analysis model incorporating a weighted spanning tree analysis scheme to track data with taint marking using several taint checking tools. In the proposed model, Android programs perform dynamic taint propagation to analyse the spread of and risks posed by suspicious apps were connected to the cloud computing server. In determining the risk of taint propagation, risk and defence capability are used for each taint path for assisting a defender in recognising the attack results against network threats caused by malware infection and estimate the losses of associated taint sources. Finally, a case of threat analysis of a typical cyber security attack is presented to demonstrate the proposed approach. Our approach verified the details of an attack sequence for malware infection by incorporating a finite state machine (FSM) to appropriately reflect the real situations at various configuration settings and safeguard deployment. The experimental results proved that the threat analysis model allows a defender to convert the spread of taint propagation to loss and practically estimate the risk of a specific threat by using behavioural analysis with real malware infection. Keywords- Threat analysis, Malware behavioural analysis, Dynamic taint propagation, Finite state machine

I. INTRODUCTION Cloud computing offers inspiring services by using the Internet to deliver information services to open networks and involves the deployment of large-scale platforms through the use of applications for mobile devices; therefore, commercial data on the clouds might become targets of network attacks. For example, on August 31, 2014, the hacker gained access to the private photos of the celebrities had hacked into iCloud accounts of the victims. As a result, nearly two hundred celebrities’ private photographs stored in the iCloud were disclosed on different image sharing sites by the anonymous hacker. In practice, new malware attacks can break into firewall system by using Hypertext Transfer Protocol logging, kernel hacks, and bypassing stack protection techniques, and to the cloud applications. Thus, a

1

information flows associated with suspicious taint sources, to use a taint graph to track suspicious connection with the taint source, and to identify the spread of tainted data. Specifically, a taint analysis toolset is used for analysis of the exact signature of exploits (SOEs) and propagation rules (PRs) for a taint source by tracking data with symbolic taint marking, accounting for the behavioural profiles, and assessing the risk of a taint source to the commercial data in cloud computing server. The remainder of this paper is organised as follows. Section 2 reviews previous studies in this field. Section 3 introduces the proposed taint propagation analysis model based on the behavioural profile of suspicious applications. An analysis of the results is presented in Section 4. The PR and taint risk in the DTPA process are discussed in Section 5. Finally, Section 6 provides the conclusions of this study.

traversal path of taint propagation was constructed using the DFS spanning tree algorithm.

II. BASIC CONCEPT OF TAINT CHECKING APPROACH FOR ASSESSING MALWARE THREATS

FIGURE. 2. USING TAINT ANALYSIS TO TRACE WHETHER THE MODULE IS CONTAMINATED VIA INTENTS

This section describes a revised scheme for TC approaches by using the weighted spanning tree algorithm (WSTA) and dynamic taint propagation approach, in solving the DTPA problem based on taint marking in open networks.

As shown in Fig. 2, existing TC approaches do not consider the state transition of an app module in a system, including preconditions and postconditions. In other words, although the receiver module accepted intents, defenders used security protection, such as bug fixes in an operating system and firewall in the network system, to repel the taint propagation results. However, if defenders do not patch the security vulnerabilities, tainted data would spread. Thus, the present study incorporated a finite state machine (FSM) to appropriately represent three different states of a module in taint propagation analysis, as shown in Fig. 3. In addition, using FSM to specify DTPA based on a network flow analysis technique, as shown in Fig. 4. Fig. 4 shows that the FSM may generate uncontaminated states for a module in a taint graph in defence situations, and a contaminated state for the same module shown in Fig. 3.

Suppose there are taint sources and a taint sink of a targeted network in which hackers maliciously attempt to compromise network security. Initially, the information type is assigned to tainted data. A taint graph was constructed by starting from a taint source to any connected module through intent transmission. As shown in Fig.1, the taint graph represents how tainted data or messages propagates during program execution and is depicted as a directed graph G = (V, E), wherein a set of vertices V = {u, v1, v2,...,vm} u represents a taint source, v1, v2,...,vs represents a set of modules in an app or a cloud service, and E is a set of edges E = {e1, e2,...,en} between two adjacent modules of G.

FIGURE. 1. USING A DIRECTED GRAPH TO REPRESENT THE PATHS OF INTENT TRANSMISSION FIGURE. 3. USING FSM TO SPECIFY THREE STATES OF A MODULE IN TAINT

A depth-first search (DFS) was then performed on a taint graph in accordance with taint marking to gradually connect the edge between the two adjacent modules, which pass tainted data (i.e., intents) through the network. In such a situation, the module is marked as contaminated and the module is marked as ‘clean’ if no tainted data transmits; that is, disconnects the edge (dotted lines in Fig. 2) between the two adjacent modules. After visiting all modules, the

PROPAGATION ANALYSIS

In the actual situations where various configuration settings, security fix and safeguard deployment were set up in the DTPA processes, an FSM-based taint propagation of information analysis produced various results, as shown in Fig.5. From Figs. 4 and 5, we recognised that considering

2

the state transition of a module for taint analysis is more suitable than those in existing TC approaches.

FIGURE. 6. A THREE-PHASE PROCESS OF TAINT PROPAGATION ANALYSIS

In further, the proposed model composed of two basic approaches for TC: 1) Static taint analysis (STA): Examine the program text and analyse overmultiple control paths of a program. 2) DTA: Investigate the instructions executed when the program runs, as shown in Fig. 7.

FIGURE. 4. JOINING A FINITE STATE MACHINE INTO TAINT ANALYSIS

A detailed algorithm for identifying the spread of taint propagation on the information flow between mobile devices and cloud computing servers by using WSTA is described by PDL, as shown in Fig. 8.

FIGURE 5. AN EXAMPLE OF TAINT ANALYSIS BY CONSIDERING THE STATE TRANSITION OF A MODULE

As shown in Fig. 5, if tainted data is received from an untrusted source for a module, such as a Web in blacklist, that accepts tainted data and output intents to the connected modules, it would be marked as a ‘contaminated’ state. The module connected the edges between these modules. By contrast, a module was marked as uncontaminated if it disconnected the edge between two modules. III.

A DYNAMIC TAINT PROPAGATION MODEL FOR MOBILE MALWARE THREAT

FIGURE. 7. EXAMINATION PROCESS OF TAINT PROPAGATION ANALYSIS IN DETAIL

Fig. 6 illustrates a three-phase included in the TPA process: SOEs extracted from static taint analysis, dynamic taint propagation analysis with malware samples, and risk threat analysis for taint propagation. As shown in Fig. 6, the malware behavioural analysis is suitable for taint source detection and propagation analysis by using API hooking and taint marking to identify the taint paths of propagation spread.

Input: a test app or a cloud service and malware samples Output: taint paths, PR and COEs for taint sources Algorithm DTPA: Dynamic taint propagation analysis 1: Begin; 2: Input a directed graph G = (V, E), V = {u, v1, v2,...,vm}, E = {e1, e2,...,en}; 3: m = |V|; n = |E|; 3

4: If (m > 1), then (taint source = TRUE); 5: Initialise an m × n matrix with all ‘0’ elements; []ij = 1 (boolean value indicating that the taint source passes the intent between module i and module j); 6: Discover a set of taint graph G; 7: Loop { 8: If (taint source), then { 9: –performs a DFS by starting with a taint source u; 10: Taint_graph (v) 11: { 12: For (each unvisited neighbour u of v) { 13: If two neighbouring modules have passed tainted data, then contaminated = TRUE; 14: If (contaminated) then 15: Mark the edge from u to v; 16: Assign a weighted value to the edge by using (3) and consider that the path of tainted data causes the losses of information leaks; 17: Add the weights for each taint path for spanning subtrees; 18: List the taint paths for spanning subtrees; 19: Assign a defence investment value to the taint path; 20: Use the MST to determine the priority of guarding the taint path; 21: Write a list into the PR [u,vj=1,…,n,]; 22: Write a list into the PR SOE [u, PRj=1,…,n]; 23: else 24: Traverse another taint source u to v in G; 25: Call Taint_graph (u); 26: }--endif; 27: }--end for; 28: }--endif; 29: }--end loop; 30: Output the MST and the taint paths of propagation spread in G; 31: Return (u, PR, SOE); 32: End.

stopping at a control point to prevent potentially dangerous programs from running, as well as monitoring and recording activity in a closed environment when the malware is running. Malware signatures were derived by combining both the security flaws of static code analysis and dynamic behavioural patterns of behavioural analysis with the support of a virtual machine emulator to facilitate the detection of unidentified mobile malware and variants. Compared with malicious and normal behavioural profiles, defenders often used dynamic behavioural analysis to obtain the major runtime behaviour regarding access to the network, file, registry, and disk of each application. Static analysis was conducted to examine the specific source codes and binary strings of malware, and comparison results were recorded in a log file to discriminate the differences in the signatures of malware variants and new viruses. In the malware behavioural analyses, two free shareware products, Androguard[4] and DroidBox[5], were used to generate malware signatures as the source dataset; these signatures were used to taint analyses later. In identifying the attack profiles of MISs, the intent-based analyses are adopted to inspect intents transmission from taint sources to the receiving module. Intents can be sent between three of the four components: Activities, Services, Receivers and Broadcast. Intents can be used to start Activities; start, stop, and bind Services; and broadcast information to Broadcast Receivers [7]. Generally, The operations regarding intent operations include three types of instruction sets: 1) source: MISs for taint source analysis exchange parameters between the taint sources and sinks, such as startActivity, startService, stopService bindService, read, gets, etc; 2) sink: the instruction sets for a taint sink include receive, onReceive, and write, etc; and 3) propagation: commands for taint propagation comprise sendBroadcast, and sendOrderedBroadcast, etc. Once the MISs were located, intent-based analysis can be used to construct the malicious behavioural profiles for taint source. Step 2: Examining the security hole for cloud application Available TC schemes [6-7] for examining the security holes with analysis mechanisms are capable of checking the security hole for app. Valgrind, developed by Seward [6], was proposed for automatic multimemory debugging and memory leak detection in detail. Valgrind was originally designed to be a free memory debugging tool for a Linux operating environment, and it has evolved to become a generic framework for creating dynamic analysis tools such as checkers and profilers for inspecting programs being developed. In particular, Valgrind can recompile a binary code to run on the host and target of the same architecture through a simulated CPU technique. Therefore, it is suitable for supporting the inspection of data at possible exploits through the examination of program control transfers. Valgrind contains multiple tools. The most used tool is Memcheck. Valgrind is in essence a virtual machine using just-in-time compilation techniques, including dynamic

FIGURE. 8. ALGORITHM DTPA

Suppose that partial system vulnerabilities of a cloud service are known and a set of behavioural profiles associated with each malware has been identified. The following DTPA algorithm includes a new resolution process of taint propagation to track the taint paths of propagation based on the behavioural profiles of a specific threat, for estimating the SOEs of a successful attack for malware threats. Step 1: Identifying Taint sources. Initially, a common approach used virtual machine analysis techniques to perform behavioural analysis of malware. Generally, a sandbox provides capabilities such as 4

recompilation. In STA, six types of test case are examined using Memcheck: 1) invalid write, 2) conditioned jump on an uninitialised value, 3) memory boundary check for invalid read and write, 4) source and destination overlap, 5) inconsistencies between dynamic memory allocation and release, and 6) memory leak. In the following, the evaluation of risk in accordance with the security holes responding to specific threat is described as follows:

Given an event sequence s = (s; Ts; Te) and a window width win, let the time window of an episode be given by w = (w; Ts; Te). The support degree of an episode (i.e., MISs) is defined as the fraction of windows where the episode occurs, as shown in Fig.9.

Step 3: Performing Dynamic taint propagation analysis. Compared with normal and malicious behavioural profiles, defenders can check whether a testing app is vulnerable to potential cyber-attacks or not between legitimate application and malware website. In DTPA, PRs and SOEs for taint sources are characterised using three subprocesses: 1) correlation analysis for MISs, 2) taint propagation analysis by using the WSTA, and 3) the determination of SOEs.

Figure. 9. Finding frequent episodes from a given sequence of security events

Theoretically, given s and win, the support degree of an episode (α) in s is .

Step 3.1 Correlation analysis for malicious instruction sets

(2)

Once sup(ɑ) is obtained, it can be used to predict the possibility of attack occurrence from a taint source. In other words, sup(ɑ) reveals the connections between attack events with MISs in the given security event sequence. Eventually, the normal behavioural profiles for apps are compared with the calling sequence of MISs for taint source to determine an app is vulnerable to a specific taint source or not.

To identify the spread of taint propagation, PRs must be defined by tracking the information flow. Theoretically, PRs can be derived by using correlation analysis (CA) for application components and relevant tainted data via the execution of malicious instruction sets to analyse the stain propagation behaviour. Furthermore, many CA approaches incorporate association rules, frequent episode rules, and data mining approaches to facilitate precise prediction for generating PRs. PRs for tainted data can be analysed by comparing the distinct calling sequences of the malicious API calls in which the calling sequences are built by a pair of [Sourcej, Sinkk Sinkk+1,…] between the malware web and the proper app. In practice, network analysis tools, such as Netflow and Wireshark, can be used to assist a defender in collecting large amounts of logs to identify the specific call sequences of MISs. Finally, the relevant calling sequences of MISs based on program execution for each taint source j are summed and to generate the PR, which has the form as the following,

Step 3.2 Taint propagation analysis by using the weighted spanning tree algorithm Suppose a spanning tree T models the network flow for DTPA by using graph theory. Here, T is an directed graph G used to construct a tree, including all vertices and edges without any loop. The process for dynamic taint tracking by using the WSTA is depicted as follows: Initially, the WSTA ensures the connection between any two nodes of a network in which there is only a single path without a loop. The modelling of taint propagation based on the connectivity of the information flow is then used to construct a spanning tree algorithm, as shown in Table 1 and Fig. 10. TABLE 1. EXAMPLES OF TAINT PROPAGATION RULE

Source node 5 3 3 8 3 1 8 9 4 4 2 2 2

PRi = (Apk, Taint source, Module, Calling sequence of MIS). (1)

As shown in Table 1, the calling sequence of MISs represents the propagation chain of a taint source i (i = 1,...,m) and comprises a set of methods used in evaluating the attack signature for each module. The module name associated with a specific MIS is measured using a set of call sequences, where a longer length of a calling sequence might cause a higher risk and severe loss. When deciding the calling sequence of MISs, determining the exact pattern of the taint propagation actions is difficult. Inspired by the concept of an intrusion detection system, the occurrence of taint propagation is solved using frequent episode rules [8] by accumulating and associating the security logs as follows.

5

Sink node 3 1 8 9 9 4 5 2 10 2 6 7 12

Edge weighting 2.5 2.0 3.0 2.5 6.5 3.5 5.8 6.5 4.5 7.5 4.0 4.5 8.5

State transition YES No YES No YES YES No YES YES YES YES YES YES

In Fig. 10, it shows that the present model not only incorporates FSM into appropriately represented tainted data states associated with the information flow but also concisely describes the progression of an attack profile with various taint paths in consideration with the state transition of an app module. Notably, a defender assigns a higher weight than that of mobile devices to the connection of a cloud server.

By contrast, the defence investment resources on each path against tainted data pass through that helps a defender to identify the total cost to secure the possible taint paths. If the project time or defence resource is limited, the minimum spanning tree (MST) can be applied to determine the priority of guarding the taint path, which prevents maximum leaks to users by using minimum resources. After defining the PRs for a taint source, the WSTA is used to identify the spread of taint propagation associated with the development of the TC tool. In WSTA analysis, risk assessment is performed by considering both the number of taint paths and the spread depth of taint propagation. Step 4. Determining the risks of SOEs The recognised security vulnerabilities of network applications have been investigated, examined, and reported. For example, Mitre Corporation maintains a list of disclosed vulnerabilities in a system called common vulnerabilities and exposures (CVEs), for which vulnerabilities are scored using a common vulnerability scoring system. Obviously, the loss of data leaks may correlate closely to the spread of taint propagation. The SOEs indicate a set of possible attack profiles [18] caused by security holes for taint source. In the present study, the form for SOEs is defined as SOE = (IDj, CVE, taint source url, affected module name, and PRs), where PRs was generated by collecting a set of PR, PRk, PRk+1 PRk+2,..for each taint source in accordance with the behaviour analysis of malware. The proposed model was designed to describe the SOE and estimate the risk of each taint path for selecting appropriate safeguards to defend against cyber attacks, as shown in Fig.11. In DTPA, the taint risk (ri) is characterised using two quantities: (a) the weights of taint path ( w j ), and

Figure. 10. Taint paths represented by an edge-weighted graph by considering FSM

In practice, a defender discovered these taint propagation paths in order to repair information security vulnerabilities between module 3 to module 9 for firework by updating security policy of an organization. As a result, it succeeded in stopping the spread of the stain propagation, the revised taint path in Fig. 8 shows that the taint graph associated with the FSM can analyse the results regarding tainted data propagations in defence situations, i.e., safeguards generate some uncontaminated state for transition from module 3 to module 9 by considering preconditions and postconditions of module 9, where PRs have the form: If ([precondition] and [postcondition]), then PR1 = [MIS1->…->MISn]. In particular, a taint graph is constructed by aggregating spanning subtrees, which involve edges with weights. Theoretically, the graph must assign a greater weight to an edge when a high degree of the spread of taint propagation causes a series loss of data leaks. In practice, the loss of data leaks includes the monetary loss and loss on intangible asset such as the reputation of an enterprise. A defender assigns a weighted value to each edge at time t by considering the ratio of the loss of data leaks, ll (t ) for a single taint path j (j=1,…,n) to the maximum loss for all taint paths in a taint graph G as

w j (t ) =

l j (t )

.

(b) the loss of taint propagation ( sij ) caused by a set of security holes for a taint source i, (i = 1,…,m) which generate the possible taint paths j (j = 1,…,n). The taint risk is obtained using n

ri (t) = ∑wj (t) o sij (t) ,

(4)

j=1

To evaluate the taint risk of DTPA, the weights of taint path w j (0-1) is assessed using Eq.(4). The defender needs to determine the loss of taint propagation sij (0-10), and defence capability d j (0-10) of a taint path by considering the attack-defence scenarios of networks. In addition, for different number of PRs, losses of tainted data propagation rise with an increasing number of PR, but decrease with the defence capability. Note that the high loss originated from the unsettled vulnerabilities that caused the spread of taint propagation, which is associated with low defence capability against taint propagation:

(3)

Max l j (t ) j

6

sij (t ) = ∀ j

PRij , dj

(5)

Figure. 12. Experiment environment of taint analyses

Step 1.1: Malware behavioural analyses A total of 63 malware families that were identified on mobile devices by using active infections were obtained from blacklists published on the Dr.Web and Contagio Blogger web sites for an experiment conducted from February 2013 to February 2014. Initially, download the suspected app (.apk) from a smart device to both the Androguard [4] and DroidBox [5] the analysis platform. The experimental results of the behavioural analysis and code analysis of the mobile viruses are shown in Table 3. Defenders can verify the details of an attack sequence to obtain the possible attack profile.

Figure. 11. Risk represented by loss of data leak and weight of taint path

IV. CYBER SECURITY APPLICATION FOR TAINT ANALYSIS In this section, the applicability of the proposed taintchecking analysis model is demonstrated by considering an example of cloud security. The DTPA is constructed using the following four-step procedure involving the taint source analysis, malware behavioural analysis and dynamic taint propagation analysis on android platform.

Table 3. The relationships between viruses and behaviours

Step 1: Taint source analyses. Initially, use tool Dedexer disassemble DEX files, the Valgrind tool [9] for static taint analysis was then employed to record potential vulnerabilities for modules and intents, and then use tool Androguard and Comdroid to confirm the results of Valgrind and in further track information flows between the attack sources (i.e., malware) and the possible tainted data of targeted network applications in order to discover the possible taint source, as illustrated in Fig. 12 and Table 2. TABLE 2. EXPERIMENT ENVIRONMENT OF TAINT ANALYSES

Tool Operation system Static analysis

Specification Ubuntu v12.04 Dedexer, Valgrind, Androguard,Comdroid-1.04 Dynamic analysis Droidbox, APIMonitor-beta, Comdroid-1.04 In dynamic taint analysis, we use tool APIMonitor and Comdroid for DTRA to analyse the intent-based signature for taint source attacks and examine an app whether it is vulnerable to special cyber-attacks.

Step 1.2: Malicious activity analyses for taint source. Initially, install APP package file APK into testing folder, then execute TC tools, ComDroid and API_Monitor [10], examine illegal memory access, collect message passing to trace the taint source, check API call outputs, investigate taint paths for service, activity, and receiver and broadcast communication among modules, and analyze binary interacts with the environment. Using Comdroid to examine the risks of information exchange between the 7

mobile devices and the network hosts for cloud services, as shown in Figs 13 and 14.

In practice, the N-gram modelling scheme is used to obtain the most appeared API calls for MISs. We randomly examined 30 suspicious apps to check the capability of vulnerability detection for ComDroid's and API_Monitor to report app's common weaknesses. Consequently, ComDroid detected a total of 372 warnings for exposed components across 30 suspicious applications. Summing up, the most appeared API calls associated with the relevant MISs for 30 suspicious apps are obtained by using analyse intent source, sink and broadcast operations for I/O, service, activity, and communication, as shown in Table 5. TABLE 5. API CALLS FOR MISS WITH 30 TAINT SOURCES

Figure 13. Using API_Monitor to edit the monitoring targets

After defenders identifying the calling API for MISs for taint source, these attack profiles can be accumulated to perform taint checking analysis with the security holes. Step 2: Examining the security hole for app In this step, two analysis tools developed by Google Code project, API_monitor and Droidbox for Android (4.0), were used to perform taint source analyses. Typically, two auxiliary tools in Androguard, dex2jar-0.0.9.15 and jd-gui, are used to disassemble the binary files and form the source code in which a folder contains classes.dex, .APK file, package, and AndroidManifest.xml. The execution of tool./dex2jar.sh generates the jar file for classes.dex. In the disassemble process, the tool jd-gui can unzip the jar file and obtain the source code. Essentially, a set of test case for monitoring APIs of module were examined using API_monitor; these test cases were examined by considering the taint policies of an organisation. After analysed the source code, the behaviour was observed to have illogical connections networks to a taint source (Fig. 15).

Figure 14. Using ComDroid to inspect the exposures between malware and an app

An example for malicious activity analyses of a taint source is shown in Table 4. TABLE 4. M ALICIOUS ACTIVITIES FOR A TAINT SOURCE

8

TABLE 7. EXAMPLES OF THE TAINT PROPAGATION RULE

Figure. 15. Taint check of source code examined by DroidBox

Step 3: Dynamic taint propagation analysis. In DTPA, the SOE of a taint source is characterised using three subprocesses: 1) correlation analysis for MISs, 2) taint propagation analysis by using an STA, and 3) the determination of SOEs. After the formulation of PRs, DTPA tools can automatically correlate these PRs to perform taint propagation analysis along with the possible taint paths. For example, for the APK:Geinimi-Banking Trojan invasion of cloud services (information available at www.ipay.com.cn), we downloaded samples from http://contagiominidump and blogspot.tw/2011/10/geinimi-banking-trojanwwwipaycomcn.html to analyse the communication between Android applications and cloud services, and then constructed a taint graph of information flow by using the detail.log files derived from ComDroid associated with the WSTA (Fig. 16).

Step 3.1 Correlation analysis for malicious instruction sets To identify the malicious instruction sets of taint source, the TC analysis tool ComDroid [7] was employed to analyse the statistical information of disassembled output from Dedexer, logs potential components and Intent vulnerabilities. By using (3), the MISs that appeared most for malicious behaviours were extracted from the collected details.log regarding the intent creation and transmission operations on component modules (Table 6). TABLE 6. M ALICIOUS INSTRUCTIONS SETS OF TAINT SOURCES

In setting the weights to an edge in a taint graph, a defender must consider the relative severity of data leaks. For example, a defender assigns a higher weight than that of mobile devices to the connection of a cloud server. The Kruskal MST algorithm was used to remove the taint-free source of information flow in communication and generate a new taint graph (Fig. 17) to decide the priority of securing the taint paths,.

Step 3.2 Taint propagation analysis by using the weighted spanning tree algorithm In this step, ComDroid was used with the WSTA to monitor tainted data for the following purposes: 1) inspect intents transmission from taint sources to the receiving module, 2) examine the intents of intermodule transmission, and 3) determine whether the connected app and database are contaminated. In practice, ComDroid can track the intents of communications among apps by analysing the file details.log and listing information on activity, service, and broadcast sink. Fig. 13 illustrates the analysis of suspicious links between malware and an app after the module has received intents. A defender can click on ‘allActions’ to examine the activity in detail.

Figure. 16. Spanning tree approach for analysing the taint path

A defender can apply the analysis results of ComDroid to conclude the rule of taint propagation based on the frequent episode patterns of security event logs by using (1) for a specific taint source. The PRs derived from the experiment are shown in Table 7.

9

improved DTPA scheme can convert the spread of possible taint paths to loss and examine the risk of a specific threat in an objective way. Finally, the proposed method improves the precision of the threat analysis, risk perception and facilitate defenders to making an appropriate decision on defence strategies to reduce the risks of privacy leaks while dealing with possible attacks by taint sources. ACKNOWLEDGES This work was supported partly by National Science Council under the Grants No: MOST 103-2627-E-168001040 and MOST 103-2632-E-168-001.

Figure. 17. Minimum spanning tree approach for analysing the taint path

4. Determination of SOEs REFERENCES

Vulnerabilities issued by US-CERT for an Android platform are listed in Table 8.

[1] G. Portokalidis, A. Slowinska, and H. Bos. Argos: an emulator for fingerprinting zero-day attacks for advertised honeypots with automatic signature generation. SIGOPS Oper. Syst. Rev., 40(4), 2006. [2] H.Yin, D.Song, M. Egele, C. Kruegel, and E.Kirda, Panorama: capturing system wide information flow for malware detection and analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 116-127, 2007. [3] J. Newsome and D. Song, Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Carnegie Mellon University, Research Showcase @CMU, 2005. [4] A. Desnos, Androguard, avaliable at http://code.google.com/ p/androguard/wiki/Usage. [5] Honeynet project, DroidBox . available at , http://www.honeynet. org/gsoc/ slot11. [6] J. McClurg, J. Friedman, and W. Ng, Android Privacy Leak Detection via Dynamic Taint Analysis, EECS 450 (2013. [7] H.C. ,Kim, A.D. Keromytis, M. Covington, R.Sahita, Capturing Information Flow with Concatenated Dynamic Taint Analysis, iN Proceedings: International Conference on Availability, Reliability and Security, pp.355-362, 2009. [8] H. Mannila, H. Toivonen, and I.A. Verkamo Discovery of frequent episodes in event sequences, Data Mining and Knowledge Discovery 1(3), pp. 259-289, 1997. [9] Valgrind, available at http://valgrind.org/docs/manual/dist.news.html [10] API_Monitor, available at https://code.google.com/p/droidbox/ wiki/APIMonitor

TABLE 8. VULNERABILITIES FOR ANDROID PLATFORM

Once the system vulnerabilities, taint paths and PRs are identified, defender can form the SOEs for tracking taint paths between malicious website and detected vulnerabilities of targeted network applications. The form for SOEs is defined as (IDj, TS, PRk, PRk+1 PRk+2,...) generated by collecting a set of PRs in accordance with the taint sources and the relevant packages. Examples of the SOEs are shown in Table 9. In further, the zombie infections for a taint source are located at Honeynet Map for managerial purposes. TABLE 9. EXAMPLES FOR THE SIGNATURE OF EXPLOIT

IV.

CONCLUSIONS

In the proposed model, a taint analysis toolset, , Droidbox[5], ComDroid[7] and API_Monitor[10] was used to indicate the attack profile for malware infection, specify the possible targeted information flows between the malware and the security holes of network applications by performing in-depth behavioural analysis. Furthermore, an 10