DroidMLN: A Markov Logic Network Approach to Detect Android ...

1 downloads 149327 Views 220KB Size Report
Keywords-Markov Logic Network, Android, Malware, API. I. INTRODUCTION ... presence of a call in an application can be assigned a binary value of 1, absence ...
DroidMLN: A Markov Logic Network Approach to Detect Android Malware Mahmuda Rahman Dept. of Electrical Engineering and Computer Science, Syracuse University Syracuse, NY 13210. Email: [email protected]

Abstract—Traditional data mining mechanisms with their robustly defined classification techniques have certain limitations to express to what extent the class labels of the test data hold. This problem leads to the fact that a false positive or false negative data point has no quantitative value to express to what degree it is false/true. This situation becomes much severe when it comes to the problem of Malware detection for a growing business market like Android applications. To address the need for a more fine grained model to measure the fitness of the classification we used Markov Logic Network for the first time to detect Android Malwares. Keywords-Markov Logic Network, Android, Malware, API

I. I NTRODUCTION Recently the popularity of Android applications have reached a phenomenal level. That also opens new threats to the Android operating system as it gets more difficult to distinguish a malware from a benign application and users are often tricked as there is no specific measurement metric to resolve this issue. In this work we tried to devise a mechanism to withstand such problem. We show the result of several Markov Logic Network (MLN) approaches that we employed to solve the problem of distinguishing malware from benign applications in Android and found that the prior domain knowledge about the occurrences of the feature gathered from their work contributes considerable improvement in terms of accuracy compared to using structure and weight learning in MLN. II. BACKGROUND It has been observed that the best possible result for detecting malware in Android depends heavily on the historical data [1], we have also taken into account the fact that features usually used by malware can be used by benign applications as well and vise versa – leaving more room for confusion in the automation of the detection system for malware. To incorporate this anomaly, we adopted the concept from Markov Logic Network (MLN), and thus complexity between the relationship among the features has been mitigated by logical AI of MLN and uncertainty of the class label has been handled by its statistical AI [2]. MLN applies the ideas of a Markov network to first-order logic, enabling uncertain inference and was first introduced by Richardson and Domingos in 2006 [3]. It is a probabilistic graphical model for statistical relational learning and inferencing where each node represents a predicate or a unit

Figure 1.

Alchemy System

clause and each edge denotes relations between the nodes. Hence a Markov Network is a model for Joint Distribution of set of variables. Graphically, it is an undirected graph G which has a node for each variable (having binary value) and a real valued function for each clique in the graph (expressed in terms of weights of corresponding edges). This has been discussed more in section 3 and 4 before we show our experiment results in section 5. We simulated all our test runs using Alchemy [http : //alchemy.cs.washington.edu/] An overview of the work flow of the System is shown in figure 1. It has two basic building blocks, learner module and inferencing module. The former one is supplied with base rules along with the training data as inputs, the later one is used to derive the result of the query for the test data given the learned Knowledge Base (KB). III. M OTIVATION

AND

C ONTRIBUTION

There are several reasons behind choosing MLN framework for our purpose. And hence we elaborate here how we avail it towards our aim to detect android malwares. • MLN contains one binary node for each possible grounding of each features. This attribute of MLN perfectly goes with our feature set of API calls where presence of a call in an application can be assigned a binary value of 1, absence of it can be denoted by 0. • MLN contains one binary value for each possible grounding of each formula where a formula is constructed by using the features and/or their negations connected by logical connectors in a Conjunctive Normal Form (CNF). There is an edge between 2 nodes iff the corresponding ground predicates appear in at least one formula [3]. Thus each grounded formula Fi can





be represented by a clique with weight wi associated with it. For example, our formula !f eaturex (y) → classZ and with an associated weight simply expresses the ‘strength’ of the incident that ‘data point y has does not have the feature x and hence it belongs to class Z’. Accordingly we convert each data instance to this logical format [i.e. for an API call Service init f (.) not found in an observation t, we represent that API as a predicate for that instance as !Service init f (t) and if t detected to be a malware we express it as malware(t) and this feature instance is encoded as !Service init f (t) → malware(t) in our training MLN file] Inference in MLN depends on the weight, and the query result for a test data instance is produced in terms of marginal probability indicating how much it belongs to a certain class label. However, in our case this probability expresses the chance of being a malware considering the context reflected by the weight of the corresponding formula. MLN allows contradiction between the formulas, which it resolves simply by weighting the evidence data in both sides [4] MLN has significant difference from First Order Logic. In FOL, Knowledge Base has a set of “hard constraints” on the set of possible world (i.e. if world violates even one formula, it has zero probability). In Markov Logic the constraints has been “softened” so that violation of a formula makes it less probable but not impossible[3]. Weight of each formula denotes how strong that constraint is, higher weight means greater difference in probability between a world that satisfies the formula and a world that does not (provided that other things remain equal). So to fulfil our objective, we put a much higher weight to the formula which interprets “if any data is malware it must not be benign” [i.e. malware(x) →!benign(x) and vice versa] Apart from learning though the learner module, weight for each formula can be assigned manually if we know the domain knowledge – this approach gives the best result in our simulation which we will discuss in section 4.3 IV. MLN A PPROACHES F OR L EARNING

MLN can be viewed as “template” for constructing Markov Network. When given different set of constants, it will produce different network, but there will be certain regularities in structures and parameters as all grounding of the same formula F will have the same weight. Mathematically, probability distribution over possible worlds x specified by the ground Markov Network ML,C where L is the Markov Network and C is the constant data points is given by: P (X = x) = 1/Z ∗ exp[Σwi ni (x)] where 1 ≤ i ≤ F , F = Number of formulas in MLN and ni (x) = number of true grounding of ith formula Fi in x

Figure 3.

MLN knowledge base generated by bMLN

In this section we will discuss 3 different MLN approaches we adopted to run our simulations. A. Learning the Structure from Bare bone MLN: bMLN For the first approach - we term as bMLN, we decide to provide only the feature list and a simple one way implication rule malware(x) →!benign(x) as the only formula. To attain higher accuracy and coverage MLN exploits weighted pseudu-log likelihood (WPLL) rather than typical inductive logic programming (ILP) [5]. WPLL requires learning weight of each candidate structure which needs counting ground atom. This expensive task has been reduced by sampling in Alchemy system. The flow diagram in figure 2 captures this process where finding the best clauses(features) to construct a formula is done by beam search. Figure 3 exhibits the output knowledge base KB (where each generated formula is augmented with computed weight) learned from the training database (DB) using the leaner module of bMLN with all available predicate declaration (API calls) and only one formula: malware(x) →!benign(x) as basis. B. Generative Weight Learning with Belief Propagation: GL-BP Apart from calculating the weights for a formula while learning the structure, MLN weights can also be learned generatively using maximum likelihood of a relational database [6]. According to this approach, if x is a possible world (relational DB) and xl is the lth ground atom’s truth value, the pseudo-log likelihood (WPLL) of x given weights w is: logPw∗ (X = x) = Σnl=1 logPw (Xl = xl |M Bx (Xl )) where M Bx (Xl ) is the state of Xl ’s Markov Blanket in the Data (i.e. the truth values of the ground atoms it appears in some ground formula with) This computation does not require inference and so it is fast. Similar to knowledge base generated by structure learning, the output of this learning procedure can be fed to inferencing module as well. In this case, as input, we provided the learner with a set two way implication rules. So with 156 available features (156 APIs used to distinguish a malware from benign app), we developed a 156 X 2 = 312 rules in total and the weight corresponding to each of them has been learned from the training instances. A single weighted 2-way implication rule (generated two 1-way rule) from the output file has been shown in figure 4. In this case, to learn the weight of such a formula, we preferred to use belief propagation and coined this approach to be GL-BP.

Figure 2.

Figure 4.

Structure Learning

Output KB entry for a formula by GL-BP

Figure 6.

Performance Comparison in terms of Accuracy %

Figure 5. Augmentation of Prior Domain Knowledge in KB for DroidMLN

C. DroidMLN: Bias Weight from Domain knowledge Our work utilized the dataset from the work of [1]. By preprocessing, the feature selection phase already filtered out those APIs whose percentile difference of usage in malware and benign apps are negligible. In DroidMLN approach, we assigned weights for each feature proportional to the percentile difference that has been found during feature selection process. This let us bypass the learner, and the input KB to the inference module has been synthesized with this prior knowledge reflected the associated weight assigned to each formula. We call this approach DroidMLN. A snapshot of the KB is depicted in figure 5. V. R ESULTS OF I NFERENCING Inference in MLN is #P-complete [D.Roth 1996]. For inferencing, mostly MCMC is used. In particular it uses “Gibbs Sampling” which proceeds by sampling each variable in turn given its ‘Markov Blanket’ (MB) (MB of a node is the minimal set of nodes that renders it independent of remaining network).As we described in section 4.2, in Markov Network this is simply a node’s neighbours. First two approaches discusses in the previous section have the following data sets for training and testing: from 15000 observations, each having 156 features, 2/3rd was set for training, remaining 1/3rd kept for test/evaluation. Both

Figure 7.

Accuracy of DroidMLN in Decimal %

the methods has been compared in terms of accuracy in the figure 6 represented by the 1st and 2nd bar of the chart respectively. But the maximum accuracy has been found from the DroidMLN approach where we bypass the training of the data with synthesized assignment of weight based on our prior knowledge of difference in usage of features. And all the rules are set to be 2 way implications as with our second approach described in section 4.2. So we decided to do an exploratory analysis of DroidMLN. As DroidMLN needs no training overhead and depends only on prior knowledge regarding the weight-biases, all 15000 data points available can be used for testing purpose

which ensured more coverage with less time. For our analysis, in each test iteration we added 3000 new data, over a total of 5 run covering all available data points to test cumulatively. To compare with ground truth, we considered a probability score ≥ 0.5 for a malware to be detected as a true malware, benign otherwise. The result is plotted in figure 7 which shows that even without any specific pre ordering and with monotonously increasing data size, the accuracy of DroidMLN fluctuates roughly between 83-84%. We computed the the marginal probability scores of being a malware calculated from DroidMLN and to investigate it further, we plotted the deviation of these score from the true binary class label in terms of difference of score and plotted it in in figure 8. In this diagram, the data points residing on X-axis denotes the true positive and true negative ones, data points with normalized -ve and +ve distance conveys the false negative and false positives respectively whereas their distance from the X axis denotes the degree of in-correctness. As most of the data points fall near the X-axis, it projects the fitness of DroidMLN empirically. This experiment was intended to show that for majority of the cases, for a malicious or benign API call, DroidMLN approach could successfully emphasize deviation of a false class label from the true one. VI. R ELATED W ORK Previously MLN has been successfully used for Social Network Analysis [7] and Text Categorization for Information Retrieval [8], But to the best of our knowledge this work is the first of the kind where MLN framework has been deployed for malware detection for Android apps. Our main contribution is that rather than developing a robust classifier, our focus was to measure to what extent a pattern of features likely be results in being a malware. VII. C ONCLUSIONS With the help of prior knowledge about the pattern of features, in this work we explored the mechanism of quantifying the classifier for malware of Android apps rather than finding a discrete class label for the data. This work has possible extension in future to fine tune the access control mechanism to fight against malwares. It can help users make more educated judgement to differentiate what apps should be handled with suspicions. More restricted policy and much constrained capabilities can be imposed where there are more possibilities of damages. To mitigate risk we have to understand and quantify how threatening a pattern of features would be. This study shows that with the deployment of MLN, that metric can be quite accurately set. VIII. ACKNOWLEDGEMENT We gratefully thank Dr. Wenliang Du for his helpful feedback and support. This research was partially funded by his NSF SaTC grant No. 1318814.

Figure 8.

Fitness Measure

R EFERENCES [1] H. Y. Yousra Aafer, Wenliang Du, “Droidapiminer: Mining api-level features for robust malware detection in android,” in In Proceedings of the 9th International Conference on Security and Privacy in Communication Networks (SecureComm). September 25-27, 2013 Sydney, Australia, 2013. [2] P. Domingos, S. Kok, H. Poon, M. Richardson, and P. Singla, “Unifying logical and statistical ai,” in Proceedings of the Twenty-First National Conference on Artificial Intelligence. AAAI Press, 2006, pp. 2–7. [3] M. Richardson and P. Domingos, “Markov logic networks,” Machine Learning, vol. 62, no. 1-2, pp. 107–136, Feb. 2006. [Online]. Available: http://dx.doi.org/10.1007/s10994006-5833-1 [4] S. Kok and P. Domingos, “Learning the structure of markov logic networks,” in In Proceedings of the 22nd International Conference on Machine Learning. ACM Press, 2005, pp. 441–448. [5] F. eleznı and N. Lavraˇc, Inductive Logic Programming: 18th International Conference, ILP 2008 Prague, Czech Republic, September 10-12, 2008, Proceedings, ser. Lecture Notes in Artificial Intelligence. Springer, 2008. [Online]. Available: http://books.google.com/books?id=CGdYK7 Q9HEC [6] D. Lowd and P. Domingos, “Efficient weight learning for markov logic networks,” in In Proceedings of the Eleventh European Conference on Principles and Practice of Knowledge Discovery in Databases. Springer, 2007, pp. 200–211. [7] P. Domingos, “Mining Social Networks for Viral Marketing,” IEEE Intelligent Systems, vol. 20, no. 1, pp. 80–82, 2005. [Online]. Available: http://www.cs.washington.edu/homes/pedrod/papers/iis04.pdf [8] H. Poon and P. Domingos, “Unsupervised ontological induction from text,” in In Proc. of ACL, 2010.