Malware Detection in Android-Based Mobile ... - IEEE Xplore

4 downloads 188764 Views 640KB Size Report
social networking and home-banking, just to name a few. How- ever, we are often ... rule-based classifier for the detection of Android malwares. However, such ...
2015 IEEE 14th International Conference on Machine Learning and Applications

Malware Detection in Android-based Mobile Environments using Optimum-Path Forest Kelton A. P. da Costa, Luis A. da Silva, Guilherme B. Martins, Gustavo H. Rosa, Clayton R. Pereira, Jo˜ao P. Papa Department of Computing S˜ao Paulo State University Bauru, SP - Brazil 17033-360 Email: {kelton,gustavo.rosa,clayton,papa}@fc.unesp.br [email protected], [email protected] the security policy. The experiments indicate the real-time execution is very effective and can stop the discharge process of mobile malwares right before the user receives the entire modified file. Later on, the work described by Arora et al. [11] analyzed the network traffic features in order to build a rule-based classifier for the detection of Android malwares. However, such approach is used only to those set of malwares that connect to some remote server in the background, thus generating some network traffic. The experimental results suggest the proposed approach is remarkably accurate, detecting more than 90% of the traffic samples.

Abstract—Nowadays, people use smartphones and tablets with the very same purposes as desktop computers: web browsing, social networking and home-banking, just to name a few. However, we are often facing the problem of keeping our information protected and trustworthy. As a result of their popularity and functionality, mobile devices are a growing target for malicious activities. In such context, mobile malwares have gained significant ground since the emergence and growth of smartphones and handheld devices, becoming a real threat. In this paper, we introduced a recently developed pattern recognition technique called Optimum-Path Forest in the context of malware detection, as well we present “DroidWare”, a new public dataset to foster the research on mobile malware detection. In addition, we also proposed to use Restricted Boltzmann Machines for unsupervised feature learning in the context of malware identification.

Kruczkowski and Szynkiewicz [12] employed the wellknown Support Vector Machines (SVMs) to detect malwares, and Gavrilut et al. [13] presented a two-step approach for the same purpose, but using SVMs together with Perceptrons. In a recent work, Huda et al. [14] employed SVMs and a Maximum-Relevance-Minimum-Redundancy Filter to detect such threats using features extracted from Application Program Interface (API) calls. Shabtai et al. [15] presented an approach to automatic identify malwares based on network traffic patterns only. Roughly speaking, the idea was to use semi-supervised learning algorithms to recognize anomalous patterns that might correspond to the very same expected behaviour of malwares in the network.

Keywords—Optimum-Path Forest, Restricted Boltzmann Machines, Malware Detection

I.

I NTRODUCTION

Over the past few years, people started using smartphones and tablets with the same purpose as desktop computers: web browsing, social networking, banking and others [1], [2], [3]. As a result of their popularity and functionality, smartphones are a growing target for malicious activities. In this context, mobile malware has gained significant ground since the emergent and growth of smartphones and handheld devices, thus becoming a real threat [4]. As such, many researchers have studied the safety of mobile devices using statistical methods, data mining and intelligent techniques that can somehow decrease the amount of malware attacks on mobile devices [5], [6], [7], [8].

Some years ago, Papa et al. [16], [17] introduced a new pattern recognition technique called Optimum-Path Forest (OPF), which models the problem of pattern classification as a graph partition task, in which each dataset sample is encoded as a graph node and connected to others through an adjacency relation. The main idea is to rule a competition process among some key samples (prototypes) that try to conquer the remaining nodes in order to partition the graph into optimum-path trees, each one rooted at one prototype. OPF has obtained promising results in different research areas, being quite similar to Support Vector Machines (SVMs) with respect to recognition rates, but faster for training, since OPF is parameterless and it has a quadratic complexity.

Penning et al. [9], for instance, proposed a cloud-based framework for mobile malware detection that requires the collaboration among mobile subscribers, app stores and IT security professionals. The article describes a step-based framework to download applications, where each downloaded software is analyzed through a malware database scanning (such system is called “Malware Detector”). Another interesting work was proposed by Guo and Sui [10], which concerns with the analysis of the network behaviour in order to foster defence systems for mobile malwares. The system considers the URL features, traffic statistics and files to detect malicious attacks in mobile malwares. The system can track the user identity information, evaluate the mobile malware threat, as well as alert and prevent about mobile malware events according to 978-1-5090-0287-0/15 $31.00 © 2015 IEEE DOI 10.1109/ICMLA.2015.72

Therefore, the contributions of this paper are threefold: (i) to introduce OPF for malware detection in mobile environments, (ii) to make available a new dataset for Android-based malware identification systems called “DroidWare”, and (iii) we also introduced Restricted Boltzmann Machines (RBMs) for unsupervised feature learning in the context of malware 754

detection.

0.7 1.3

The experimental section comprises the comparison of OPF against with SVM with Radial Basis Function (RBF) kernel in two distinct situations: with the proposed set of features extracted from DroidWare, and with a set of features learned by RBMs. The remainder of this paper is organized as follows. Sections II, and III present a theoretical background regarding OPF and the methodology and experiments, respectively. Finally, Section IV states conclusions and future works.

2.7

0.6

0.6 0.4

1.7

1.9 0.4

0.7

0.4

2.1 2.2 0.7

0.7

0.3

0.0

0.3

(a)

0.0

(b)

(c)

0.7

II.

0.7

O PTIMUM -PATH F OREST C LASSIFICATION

The OPF framework is a recent highlight to the development of pattern recognition techniques based on graph partitions. The nodes are the data samples, which are represented by their corresponding feature vectors, and are connected according to some predefined adjacency relation. Given some key nodes (prototypes), they will compete among themselves aiming to conquer the remaining nodes. Thus, the algorithm outputs an optimum path forest, which is a collection of optimum-path trees (OPTs) rooted at each prototype. This work employs the OPF classifier proposed by Papa et al. [18], [19], which uses a complete graph and a path-cost function that computes the maximum arc-weight along a path. Additionally, the key nodes are the nearest samples from different classes, which are obtained by means of a minimum spanning tree (MST) computation over the training set. Follow, below, a more detailed explanation about the OPF mechanism.

0.8

0.4

0.4

0.5 0.3

0.0

0.8

0.7

0.0

(d)

0.7

0.5

0.7

0.0

0.0

(e)

Fig. 1. (a) Training set modeled as a complete graph, (b) a minimum spanning tree computation over the training set (prototypes are highlighted), (c) optimum-path forest over the training set, (d) classification process of a “green” sample, and (e) test sample is finally classified.

by the distances d between adjacent samples. The spanning tree is optimum in the sense that the sum of its arc weights is minimum as compared to any other spanning tree in the complete graph. In the MST, every pair of samples is connected by a single path, which is optimum according to fmax . Hence, the minimum-spanning tree contains one optimum-path tree for any selected root node.

Let D = D1 ∪D2 be a labeled dataset, such that D1 and D2 stands for the training and test sets, respectively. Let S ⊂ D1 be a set of prototypes of all classes (i.e., key samples that best represent the classes). Let (D1 , A) be a complete graph whose nodes are the samples in D1 and any pair of samples defines an arc in A = D1 × D1 , as displayed in Figure 1a. Additionally, let πs be a path in (D1 , A) with terminus at sample s ∈ D1 .

The optimum prototypes are the closest elements of the MST with different labels in D1 (i.e., elements that fall in the frontier of the classes, as displayed in Figure 1c). By removing the arcs between different classes, their adjacent samples become prototypes in S ∗ , and Algorithm 1 can define an optimum-path forest with minimum classification errors in D1 (Figure 1d). Algorithm 1 implements the above procedure for OPF training phase.

The OPF algorithm proposed by Papa et al. [18], [19] employs the path-cost function fmax due to its theoretical properties for estimating prototypes (Section II-A gives further details about this procedure):  0 if s ∈ S fmax (s) = +∞ otherwise, fmax (πs · s, t) = max{fmax (πs ), d(s, t)}, (1)

Lines 1 − 4 initialize maps and insert prototypes in Q (the function λ(·) in Line 4 assigns the true label to each training sample).1 The main loop computes an optimum path from S to every sample s in a non-decreasing order of cost (Lines 5 − 13). At each iteration, a path of minimum cost C(s) is obtained in P when we remove its last node s from Q (Line 6). Ties are broken in Q using first-in-first-out policy. That is, when two optimum paths reach an ambiguous sample s with the same minimum cost, s is assigned to the first path that reached it. Note that C(t) > C(s) in Line 8 is false when t has been removed from Q and, therefore, C(t) = +∞ in Line 11 is true only when t ∈ Q. Lines 10 − 13 evaluate if the path that reaches an adjacent node t through s is cheaper than the current path with terminus t, and update the position of t in Q, C(t), L(t) and P (t) accordingly. The complexity 2 for training OPF is done by θ(|D1 | ), due to the main (Lines

where d(s, t) stands for a distance among nodes s and t, such that s, t ∈ D1 . Therefore, fmax (πs ) computes the maximum distance between adjacent samples in πs , when πs is not a trivial path. In short, the OPF algorithm tries to minimize fmax (πt ), ∀t ∈ D1 . A. Training We say that S ∗ is an optimum set of prototypes when Algorithm 1 minimizes the classification errors for every s ∈ D1 . We have that S ∗ can be found by exploiting the theoretical relation between the minimum-spanning tree and the optimum-path tree for fmax [20]. The training essentially consists of finding S ∗ and an OPF classifier rooted at S ∗ . By computing an MST in the complete graph (D1 , A) (Figure 1b), we obtain a connected acyclic graph whose nodes are all samples of D1 and the arcs are undirected and weighted

1 The

755

cost map C stores the optimum-cost of each training sample.

of t whenever πki+1 · ki+1 , t is better than the current path πt (Line 6). Although the reader can note the complexity of the OPF classification phase is given by θ(|D1 | |D2 |) (for each classification node we have to visit all training samples), Papa et al. [19] showed that, in practice, the complexity is given by O(p |D2 |), in which p ∈ O(|D1 |).

5-13) and inner loops (Lines 8-13) in Algorithm 1, which run θ(|D1 |) times each. Algorithm 1: – OPF T RAINING A LGORITHM I NPUT: O UTPUT: AUXILIARY: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

A labeled training set D1 . Optimum-path forest P , cost map C, label map L, and ordered set D1 . Priority queue Q, set S of prototypes, and cost variable cst.

III.

In this section, we described the methodology applied to assess the robustness of the OPF classifier in order to detect mobile malwares, as well as the dataset employed to this task and how to learn features using RBMs.

Set D1 ← ∅ and compute the prototype set by MST S ⊂ D1 . For each s ∈ D1 \S, set C(s) ← +∞. For each s ∈ S, do C(s) ← 0, P (s) ← nil, L(s) ← λ(s), and insert s in Q. While Q is not empty, do Remove from Q a sample s such that C(s) is minimum. Insert s in D1 . For each t ∈ D1 such that C(t) > C(s), do Compute cst ← max{C(s), d(s, t)}. If cst < C(t), then If C(t) = +∞, then remove t from Q. P (t) ← s, L(t) ← L(s), C(t) ← cst. Insert t in Q. Return a classifier [P, C, L, D1 ].

A. Dataset In this work, we have designed the DroidWare2 dataset. We manually collected 278 benign application samples and 121 malware samples from android OS. Malware samples were obtained and analyzed by VirusTotal3 web site, which is an on-line scanning tool to detect malwares. Further, the non-malware data were collected from a manifest application by Google Play4 . The dataset features are composed of 152 permissions from the android platform, and are responsible to indicate which resources and device actions can be accessed by the applications. A more detailed explanation about the dataset can be found in its home-page.

B. Classification For any sample t ∈ D2 , we consider all arcs connecting t with samples s ∈ D1 , as though t were part of the training graph (Figure 1d). Considering all possible paths from S ∗ to t, we find the optimum path P ∗ (t) from S ∗ and label t with the class λ(R(t)) of its most strongly connected prototype R(t) ∈ S ∗ . This path can be identified incrementally, by evaluating the optimum cost C(t) as: C(t) = min{max{C(s), d(s, t)}}, ∀s ∈ D1 .

B. SVMs Parameter Setting-up In regard to SVM implementation, we employed the LibSVM [21] with a radial basis kernel with parameters optimized through a 5-fold cross validation, and with C ∈ {2−5 , 2−3 . . . , 213 , 215 }, and γ ∈ {23 , 21 , . . . , 2−13 , 2−15 }. With respect to OPF, we used LibOPF library [22].

(2)

Let the node s∗ ∈ D1 be the one that satisfies Equation 2 (i.e., the predecessor P (t) in the optimum path P ∗ (t)). Given that L(s∗ ) = λ(R(t)), the classification simply assigns L(s∗ ) as the class of t (Figure 1e). An error occurs when L(s∗ ) = λ(t). Algorithm 2 implements this idea. Algorithm 2: – OPF I NPUT: O UTPUT: AUXILIARY:

E XPERIMENTAL E VALUATION

C. RBMs Parameter Setting-up Restricted Boltzmann Machines have been widely used for unsupervised feature learning in the last years [23]. Roughly speaking, RBMs are stochastic neural networks that aim at predicting some latent variables, also known as hidden neurons. Therefore, given some labeled data, the idea is to reconstruct them based on predictions over the hidden layer. The reconstruction process is usually carried out by several samplings in a Markov Chain, being the last sampling the reconstructed data. This process also ends up modifying the connexion weights between input and hidden neurons, and after the learning process, we can thus use a test set to predict the latent variables, which are used as features for that test set.

CLASSIFICATION ALGORITHM

Classifier [P, C, L, D1 ] and test set D2 . Label Lo and predecessor Po maps defined for D2 . Cost variables tmp and mincost.

1. For each t ∈ D2 , do i ← 1, mincost ← max{C(ki ), d(ki , t)}. 2. Lo (t) ← L(ki ) and Po (t) ← ki . 3. While i < |D1 | and mincost > C(ki+1 ), do 4. Compute tmp ← max{C(ki+1 ), d(ki+1 , t)}. 5. If tmp < mincost, then 6. mincost ← tmp. 7. Lo (t) ← L(ki+1 ) and Po (t) ← ki+1 . 8. i ← i + 1. 9. 10. Return [Lo , Po ].

However, one of the main problems related to RBMs concerns with their fine-tuning, since such techniques are very sensitive to their parameter selection. Papa et al. [24], [25], [26] were one of the first to employ meta-heuristic techniques for such purpose, thus obtained promising results. Basically, the idea is to model the problem of fine-tuning parameters as an optimization task, in which the Minimum Square Error of the reconstruction step over the training data is then used as the fitness function.

In Algorithm 2, the main loop (Lines 1 − 9) performs the classification of all nodes in D2 . The inner loop (Lines 4 − 9) visits each node ki+1 ∈ D1 , i = 1, 2, . . . , |Z1 | − 1 until an optimum path πki+1 · ki+1 , t is found. In the worst case, the algorithm visits all nodes in D1 . Line 5 evaluates f2 (πki+1 · ki+1 , t) and Lines 7 − 8 updates cost, label and predecessor

2 https://github.com/RECOVI/DroidWare.git 3 http://www.virustotal.com/ 4 https://play.google.com/store/

756

D. Experiments

In this paper, we employed three approaches based on the well-known Harmony Search [27] technique to fine-tune RBMs: •

In this section, we describe the experiments conducted to assess the robustness of the proposed approach. As aforesaid, we first used all features for the DroidWare dataset, hereinafter called ORIGINAL dataset. After that, we conducted experiments in order to learn a more discriminative set of features by means of RBMs optimized with HS (RBM-HS), IHS (RBMIHS) and PSF-HS (RBM-PSF-HS). Based on ORIGINAL dataset, we then create 10 randomly generated training and testing sets, for the further application of the extracted/learned features by RBMs. Later on, the extracted/learned features are used to feed OPF and SVM classifiers, thus performing a traditional pattern recognition pipeline, i.e., training, testing and accuracy computation.

Na¨ıve Harmony Search (HS): it comprises the original version of the algorithm, that aims at modeling the problem of function minimization based on the way musicians create songs with optimum harmonies. Such technique employs two parameters to address this problem, being the Harmony Memory Considering Rate (HMCR) in charge of creating new solutions based on the previous experience of the music player, and the Pitch Adjusting Rate (PAR) is responsible for applying some small disturbance in the solution created with HMCR in order to avoid traps from local optima.



Improved Harmony Search (IHS) [28]: such approach uses dynamic values for both HMCR and PAR variables, which are updated at each iteration with new values that fall within the range [HMCRmin ,HMCRmax ] and [PARmin ,PARmax ], respectively. Since PAR is computed using a bandwidth variable , IHS also requires such variable be bounded within the range min , max .



Parameter Setting-Free Harmony Search (PSFHS) [29]: such approach was proposed in order to avoid the fine-tuning parameter step regarding HMCR, PAR and  variables. Roughly speaking, the idea is to obtain new HMCR and PAR values based on previous computations of such variables whenever a new solution is created using them. However, an initial estimative of such variables is required by the algorithm.

Table II shows the results regarding OPF classifier, being the accuracy measure the one proposed by Papa et al. [18], which considers unbalanced datasets. Clearly, better recognition accuracies were obtained by ORIGINAL dataset, i.e., using the features proposed when we designed the dataset. Although RBM did not achieve interesting results, we must highlight RBM-HS accuracy on round #3, which outperformed all techniques. That is an interesting indicator of the capabilities of such models, since we can perform a more detailed analysis of them in order to obtain better results. Nowadays, we are conducting experiments with Deep Belief Networks (DBNs) for the same purpose as in this paper. Such models can be seen as a sort of stacked RBMs often used to increase their power of data description.

An RBM has four main parameters to be fine-tuned: number of hidden neurons n, learning rate η, momentum α and weight decay λ [24], [25], [30], [26]. In this work, we bounded the search for these optimal parameters in the following ranges: n ∈ [5, 100], η ∈ [0.1, 0.9], α ∈ [0.1, 0.9] and λ ∈ [10−5 , 10−2 ]. Finally, we employed 10 agents over 50 iterations considering all aforementioned techniques. Table I presents the parameter configuration for each optimization technique5 . We also have employed T = 100 as the number of epochs for BRBM learning weights procedure. Considering PSF-HS, we start with HMCR=PAR= 0.7 and  = 0.1. Notice we used a hidden unit with 8 neurons, which means RBM learns 8 features for malware description. TABLE I. Technique HS IHS

Rounds

ORIGINAL

RBM-HS

RBM-IHS

RBM-PSF-HS

1 2 3 4 5 6 7 8 9 10

72.42 61.25 73.25 80.77 61.86 72.35 68.35 77.30 72.42 73.82

50.00 22.00 78.00 66.15 32.00 69.00 50.00 74.36 65.00 39.08

50.00 22.00 70.00 80.65 32.00 65.00 50.00 50.00 54.00 67.93

50.00 44.00 68.00 50.00 63.00 57.00 50.00 70.86 61.00 51.45

Avg.

72.42

57.50

52.00

54.22

TABLE II.

CLASSIFIER .

Table III presents the average value of each RBM parameter obtained through the meta-heuristic-based fine-tuning procedure. We can observe the number of hidden units n (number of final features) for all techniques are quite close to 9, meaning that RBM used 9 features to describe the dataset only, while ORIGINAL dataset was designed with 152 attributes.

PARAMETER CONFIGURATION . Parameters HM CR = 0.7, P AR = 0.7,  = 0.1 HM CR = 0.7, P ARM IN = 0.1 P ARM AX = 1.0, M IN = 0.1 M AX = 0.5

Technique

n

η

λ

α

RBM-HS RBM-IHS RBM-PSF-HS

9.26 9.17 8.80

0.42 0.64 0.54

0.84 0.85 0.78

0.007 0.005 0.005

TABLE III.

In order to provide a more robust analysis of the results, we conducted a cross-validation with 10 runnings for all datasets, being 75% of the dataset used for training and the remaining 25% employed for testing purposes. 5 Notice

R ESULTS CONSIDERING OPF

T IMES R ESULTS OPF C LASSIFIER .

Table IV presents the results considering SVM classifier. Once again, the recognition rates using ORIGINAL dataset outperformed the features learned through RBMs. Since SVMs employ a higher dimensional feature vector, it seems such number of features led to very good recognition rates. Notice

these values have been empirically setup.

757

systems are able to learn and detect new attacks, but not always with reasonable efficiency.

RBM-based features achieved the very same recognition rate regarding SVM in one iteration only (iteration #2). Rounds

ORIGINAL

RBM-HS

RBM-IHS

RBM-PSF-HS

1 2 3 4 5 6 7 8 9 10

79.00 78.00 94.00 92.00 77.00 77.00 89.00 90.00 79.00 83.00

65.00 78.00 78.00 65.00 68.00 69.00 69.00 68.00 65.00 69.00

65.00 78.00 78.00 65.00 68.00 69.00 69.00 68.00 65.00 69.00

65.00 78.00 78.00 65.00 68.00 69.00 69.00 68.00 65.00 69.00

Avg.

81.00

68.50

68.50

68.50

TABLE IV.

In this paper, we aimed at contributing with the related literature in three points: (i) first of all, we introduced the Optimum-Path Forest classifier in the context of malware detection in mobile environments, (ii) we made available a new public dataset oriented to Android systems to handle the problem of malware detection, and (iii) we evaluated the robustness of Restricted Boltzmann Machines in the task of unsupervised feature learning. In order to fine-tune RBMs, we employed three techniques based on the well-known Harmony Search. Last but not least, we evaluated two techniques for the malware detection task: OPF and the SVMs.

R ESULTS CONSIDERING SVM CLASSIFIER .

We have observed SVMs obtained very accurate recognition rates, but at the price of a much higher computational load for training patterns, which might be prohibitive for online learning systems. In addition, RBMs did not seem to be very effective in the task of unsupervised feature learning. We believe such behavior might be due to the number of features used in the final datasets, i.e. 9 in this work, since the ORIGINAL dataset is composed of 152 features6 . Nowadays, we are working on new experiments with a larger interval for the number of hidden units.

Tables V and VI present the mean computational load in seconds concerning the training and test steps for both classifiers. Although SVMs achieved better recognition rates, their computational cost (considering the parameter fine-tuning step) was about 2, 000 times higher than the OPF one for training purposes, and around 10 times higher for classification. Therefore, if one needs an online learning classifier, i.e., a technique that can learn new threats more quickly, we should choose OPF, while SVM can provide more accurate recognition rates at the price of a higher computational load. Rounds

Training

Testing

1 2 3 4 5 6 7 8 9 10

0.016306 0.017913 0.014736 0.014844 0.017134 0.017204 0.017018 0.022691 0.014774 0.041705

0.002889 0.003588 0.002446 0.002614 0.002812 0.002502 0.004162 0.003129 0.003282 0.004995

Avg.

0.019431

0.003241

TABLE V.

OPF

ACKNOWLEDGMENT The authors would like to thank Capes, FAPESP grants #2015/00801-9, #2014/09125-3 and #2014/16250-9, CAPES PROCAD #2966/2014, as well as CNPq grants #470571/20136 and #306166/2014-3. R EFERENCES [1]

[2]

COMPUTATIONAL LOAD [ S ].

[3] Rounds

Training

Testing

1 2 3 4 5 6 7 8 9 10

35.751026 39.29097 35.568604 39.976357 37.905128 37.285213 38.57896 39.604111 37.233833 38.970848

0.043289 0.038342 0.040312 0.044606 0.036398 0.035626 0.035832 0.037202 0.033729 0.036754

Avg.

38.01650

0.0381884

TABLE VI.

SVM

IV.

[4]

[5]

[6]

[7]

COMPUTATIONAL LOAD [ S ].

[8]

C ONCLUSIONS

Network attacks in mobile devices have been of great concern in the last years. Since the number of malwares oriented to mobile devices has increased considerably, a special attention has been devoted to machine learning-based techniques that can handle such threats in a more effective manner. Such

[9]

S. Liang and X. Du, “Permission-combination-based scheme for android mobile malware detection,” in IEEE International Conference on Communications, 2014, June 2014, pp. 2301–2306. J. Kwon, J. Jeong, J. Lee, and H. Lee, “Droidgraph: discovering android malware by analyzing semantic behavior,” in IEEE Conference on Communications and Network Security, 2014, 2014, pp. 498–499. S. Yerima, S. Sezer, and G. McWilliams, “Analysis of bayesian classification-based approaches for android malware detection,” Information Security, IET, vol. 8, no. 1, pp. 25–36, Jan 2014. H. Kang, J. W. Jang, A. Mohaisen, and H. K. Kim, “Detecting and Classifying Android Malware using Static Analysis along with Creator Information,” International Journal of Distributed Sensor Networks, 2015. J. Hamada, “New android threat gives phone a root canal,” 2011, available at http://www.symantec.com/connect/ blogs/new-android-threat-gives-phone-root-canal,. A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A survey of mobile malware in the wild,” in Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, ser. SPSM ’11. New York, NY, USA: ACM, 2011, pp. 3–14. A. Skovoroda and D. Gamayunov, “Review of the mobile malware detection approaches,” in Distributed and Network-Based Processing, 2015 23rd Euromicro International Conference on Parallel, 2015, pp. 600–603. J. Haifeng, C. Baojiang, and W. Jianxin, “Mining mobile internet packets for malware detection,” in Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 2014, pp. 481–486. N. Penning, M. Hoffman, J. Nikolai, and Y. Wang, “Mobile malware security challenges and cloud-based detection,” in International Conference on Collaboration Technologies and Systems, 2014, pp. 181–188.

6 Notice

758

the “optimal” number 9 was found using HS-based optimization.

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] [24]

[25]

[26]

[27]

[28]

[29]

[30]

International Conference on Applied Computing, 2015, (accepted for publication).

D. F. Guo, A. Sui, and T. Guo, “A behavior analysis based mobile malware defense system,” in International Conference on Signal Processing and Communication Systems, 2012, pp. 1–6. A. Arora, S. Garg, and S. Peddoju, “Malware detection using network traffic analysis in android based mobile devices,” in International Conference on Next Generation Mobile Apps, Services and Technologies (NGMAST), 2014 Eighth, Sept 2014, pp. 66–71. M. Kruczkowski and E. N. Szynkiewicz, “Support vector machine for malware analysis and classification,” in IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, vol. 2, 2014, pp. 415–420. G. D., C. M., A. D., and C. L., “Malware detection using perceptrons and support vector machines,” in Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, 2009, pp. 283–288. S. Huda, J. Abawajy, M. Alazab, M. Abdollalihian, R. Islam, and J. Yearwood, “Hybrids of support vector machine wrapper and filter based framework for malware detection,” Future Generation Computer Systems, pp. –, 2014. A. Shabtai, L. Tenenboim-Chekina, D. Mimran, L. Rokach, B. Shapira, and Y. Elovici, “Mobile malware detection through analysis of deviations in application network behavior,” Computers & Security, vol. 43, pp. 1—18, 2014. J. P. Papa, A. X. Falc˜ao, and C. T. N. Suzuki, “Supervised pattern classification based on optimum-path forest,” International Journal of Imaging Systems and Technology, vol. 19, pp. 120–131, 2009. J. P. Papa, A. X. Falc˜ao, V. H. C. Albuquerque, and J. M. R. S. Tavares, “Efficient supervised optimum-path forest classification for large datasets,” Pattern Recognition, vol. 45, no. 1, pp. 512–520, 2012. J. P. Papa, A. X. Falc˜ao, and C. T. N. Suzuki, “Supervised pattern classification based on optimum-path forest,” International Journal of Imaging Systems and Technology, vol. 19, no. 2, pp. 120–131, 2009. J. P. Papa, A. X. Falc˜ao, V. H. C. Albuquerque, and J. M. R. S. Tavares, “Efficient supervised optimum-path forest classification for large datasets,” Pattern Recognition, vol. 45, no. 1, pp. 512–520, 2012. C. All`ene, J.-Y. Audibert, M. Couprie, and R. Keriven, “Some links between extremum spanning forests, watersheds and min-cuts,” Image Vision Computing, vol. 28, no. 10, pp. 1460–1471, 2010. C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011, software available at http://www.csie.ntu. edu.tw/∼cjlin/libsvm. J. P. Papa, C. T. N. S., and A. X. Falc˜ao, LibOPF: A library for the design of optimum-path forest classifiers, 2014, software version 2.1 available at http://www.ic.unicamp.br/∼afalcao/LibOPF. G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002. J. P. Papa, G. H. Rosa, K. A. P. Costa, A. N. Marana, W. Scheirer, and D. D. Cox, “On the model selection of bernoulli restricted boltzmann machines through harmony search,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2015, pp. 1449–1450. J. P. Papa, G. H. Rosa, A. N. Marana, W. Scheirer, and D. D. Cox, “Model selection for discriminative restricted boltzmann machines through meta-heuristic techniques,” Journal of Computational Science, vol. 9, pp. 14–18, 2015, computational Science at the Gates of Nature. J. P. Papa, W. Scheirer, and D. D. Cox, “Fine-tuning deep belief networks using harmony search,” Applied Soft Computing, 2015, (accepted for publication). Z. W. Geem, Music-Inspired Harmony Search Algorithm: Theory and Applications, 1st ed. Springer Publishing Company, Incorporated, 2009. M. Mahdavi, M. Fesanghary, and E. Damangir, “An improved harmony search algorithm for solving optimization problems,” Applied Mathematics and Computation, vol. 188, no. 2, pp. 1567 – 1579, 2007. Z. W. Geem and K.-B. Sim, “Parameter-setting-free harmony search algorithm,” Applied Mathematics and Computation, vol. 217, no. 8, pp. 3881 – 3889, 2010. L. A. Silva, P. B. Ribeiro, G. H. Rosa, K. A. P. Costa, and J. P. Papa, “Parameter setting-free harmony search optimization of restricted boltzmann machines and its applications to spam detection,” in 12th

759