Malware Survival and Detection on the Android ...

Malware Survival and Detection on the Android Operating System Dustin Jones IASM8302 Fall 2015 Our Lady of the Lake University

1

Abstract Android is the most prevalent mobile operating system today. As such, it is also the most common platform for malware targeting mobile devices such as phones and tablets. Today’s mobile devices are subject to many of the same attacks as their desktop and server cousins, but do not possess as much processing power and memory to run complex malware detection algorithms on-device. Malware developers are increasingly producing more complex samples with “stealth” characteristics to ensure survival and propagation. In addition to the constantlyevolving threats, updates to the Android OS are simply not pushed out fast enough to patch known vulnerabilities. The Android Open Source Project maintains source code and releases patches. These are released to device manufacturers, then from the OEMs to the carriers, and finally pushed through over-the-air updates to the end user. This chain takes months and leaves many thousands of devices vulnerable to known vulnerabilities (Faruki et al, 2013). Novel approaches must be found to develop reliable solutions for detecting and classifying malware which targets the Android OS, while still keeping in mind the limited resources available to mobile devices. Research on this subject is ongoing with many contributors, and this paper will serve as a literature review of some of the published works of those researchers.

Keywords: Android, Security, Malware, Static/ Dynamic/ Hybrid Analysis, Machine Learning

2

Introduction Android has overtaken all other mobile operating systems and become the undisputed leader in terms of market share since its debut in 2008. In 2012, market share for Android was 66%, and by the close of 2013 it had reached 78% (Faruki et al 2015). By the end of the 2nd quarter, 2014, Android was holding about 85% (Lindorfer et al 2015). Because of this, it’s no wonder that 97% of malware families targeting a mobile platform in 2013 were developed specifically for Android (Lindorfer et al, 2014). While the actual number of malware samples targeting Android may vary depending on who’s reporting it and how many they actually catch, Sophos Security reported 650 thousand unique samples in 2014 (Lindorfer et al, 2014). According to Faruki et al, documented malware increased from three malware families containing 100 samples in 2010 to more than one hundred families containing somewhere between 120 thousand and 600 thousand samples in 2013 (Faruki et al, 2015). Due to the overwhelming number of malware appearing on the official Android market (also known as the Play store) Google deployed a dynamic analysis engine known as Bouncer to detect malware in apps uploaded to Play in 2012. Although it reportedly reduced malware uploaded to the Play store by about 40% (Lindorfer et al, 2014), some research has suggested or at least implied that apps downloaded from the Play store can now be considered relatively safe. Most researchers agree, however, that stealthy malware can still evade Bouncer and it is unwise to consider any app to be safe without additional analysis. Bouncer is, however, capable of remotely uninstalling malicious apps if they are discovered after being downloaded to a device. Additionally, Google bans developers from uploading to the official Play store if they are caught uploading a malicious app even once. Perhaps because of the countermeasures put in

3

place by Google, the majority of malicious apps now come from third party markets (Faruki et al, 2015). There are a number of different ways for malware to propagate among devices, but the most common way continues to be user installation (Wang et al, 2015), presumably from third party markets. One of the key reasons centers on the primary principle behind Android Security, the permissions system. Users tend to blindly grant permissions without knowing what these permissions can be used for (Wang et al, 2015). Compounding the issue is the fact that permissions in Android are coarse-grained and broadly-defined (Faruki et al, 2015). For example, a flashlight app may only need access to the LED light attached to the camera to perform its advertised function, but it will require full permission to camera functionality due to the broadly defined camera permission. This violates the security principle of least-privileged access, and enables many applications to do more than what they are advertised to do. For instance, with the example provided here, if the flashlight app also had internet permissions, perhaps because of an ad delivery platform, it could theoretically record video or take pictures and send them over the internet without the user’s knowledge. Indeed, if someone needed a flashlight app in a pinch, they would likely not even read the permissions before clicking “accept” and installing the app. Even when a device has an antivirus program installed on it, there is certainly no guarantee that it will prevent all malware. Due to the limited resources of mobile devices, most commercial antivirus for Android is based on a lightweight static method of matching syntactic signatures of apps to known malware samples in a database. It requires frequent updates and is completely ineffective at zero-day malware. It is also incapable of detecting all but the most rudimentary transformation efforts by malware developers to disguise their malicious

4

functionality (Rastogi et al 2013, Feng et al, 2014, and Wang et al 2015). It is because of these reasons that a more powerful, robust, lightweight, and easily-distributed model for malware detection is needed.

Android Kernel and Security Overview The Android Kernel is built with security in mind, but relies on users and developers to implement its security model correctly. It requires developers to use secure coding practices and users to read and understand the permissions framework. Unfortunately, as previously stated most users blindly assign permissions or are simply not well-informed on the functionality of permissions in Android. Relying solely on the Android permissions system to secure a device has been proven to be mostly unreliable at best and downright dangerous at worst. A core security concept for Android is inter-component communication, or ICC. In this model, middleware controls all interaction between components and apps using permissions. These components include Activity (user interface of an app), Service (background tasks without user interface), Broadcast Receiver (listens to system-generated events), and Content Provider (a data store resource such as a file). The permissions are located in the META-INF file of an app. (Faruki et al, 2015). A related feature of the Android framework is Discretionary Access Control, or DAC. In this system, all apps are assigned a user ID (UID) and run in a sandbox. All apps signed by the same developer’s PKI cert can communicate and share private files with each other using the ICC system. (Faruki et al, 2015). Ironically, some of the features designed to make Android secure appear to work to its detriment. For instance, downloaded apps are not granted root access. This means that

5

downloaded apps not signed by the same developer may not interfere with each other. Unfortunately, it also means that antivirus apps are incapable of quarantining or removing malicious applications. They also lack the permissions required to run deep scans on Android (Faruki et al, 2015). DAC and ICC can be exploited by malware if a developer’s private certificate is compromised because all apps with the same UID are assigned the same permissions. Several apps may appear benign on their own, but if they are installed together on the same device, one of them can be granted the ability to steal information, turn the device into a bot, or subscribe to premium-rate services by inheriting the permissions of other apps with the same UID. This is one of many ways that an app can achieve privilege escalation.

Common Malware Types and Attack Vectors Zhou et al presented a study at the 2013 IEEE Symposium on Security and Privacy, detailing the families and typical attack vectors of Android malware during its first few years on the market. It also gives a good overview of the nature of malicious payloads, activation mechanisms, and installation methods. Zhou et al found that of the 1,260 malware specimens they examined: 86% were repackaged versions of legitimate apps, 36.7% used root exploits, over 90% turned devices into bots, 45.3% sent SMS or made calls without user awareness, and 51.1% harvest sensitive information (Zhou et al, 2012). Techniques of malware installation generally include user installation, update attack, and drive-by-download. Update attacks usually work by downloading the malicious payload from a remote server during a later update instead of packaging it into the app (Zhou et al, 2012). Drive by downloads are the methods by which a user is persuaded to download malware by clicking URLs, in-app advertisements, and other similar techniques (Zhou et al, 2012).

6

Malware Detection Methods and Malware Survival Static analysis methods of malware detection include analyzing permissions, data-flow analysis, control-flow analysis, static taint analysis, and simple signature-based analysis (Lindorfer et al, 2014). Dynamic analysis generally means running an app in an emulated environment and checking for malicious behavior (Arp et al, 2014). Malicious applications can evade detection a number of ways. One of the most common methods is for a developer to repackage a legitimate app with a malicious payload and obfuscate the malicious logic so it isn’t recognizable by traditional signature-based detection methods. Zhou et al found that 86% of the malware samples they encountered were repackaged and propagated via the Play store and third party app stores (Zhou et al, 2012). Transformation attacks, as they are sometimes called, include the following categories according to (Rastogi et al, 2013): trivial techniques, attacks requiring code-level changes but still detectable by static analysis, and attacks not detectable by static analysis. Trivial techniques do not require code-level changes and rely on repacking files, call indirections, changing package names, and other changes requiring relatively little technical ability on the part of the malware developer. These techniques evade simple signature based detection such as API call-matching and string matching (Rastogi et al 2013). Attacks requiring code-level changes use any number of methods including identifier renaming, code reordering, junk code insertion, and encryption of the malicious Java payload. They evade signature based detection but can still be caught by data and control flow analyses (Rastogi et al 2013).

7

There are some transformation methods that can’t be caught by traditional static analysis at all and are only detectable by dynamic analysis. These transformations are technically sophisticated techniques which include reflection and encryption at the bytecode level instead of simply encrypting the malicious Java code (Rastogi et al, 2013). Additionally, zero-day malware is mostly undetectable by static analysis techniques, except very-sophisticated machine-learning trained classifiers. All commercially-available anti-malware products surveyed by Rastogi et al were found to be vulnerable to the simplest and most widely-available syntactic code transformation techniques. These included AVG, Symantec, Lookout, ESET, Dr. Web, Kaspersky, Trend Micro, ESTSoft, Zoner, and Webroot (Rastogi et al, 2013). The ability of dynamic analysis to catch sophisticated transformation attacks might lead one to believe that dynamic analysis is the most reliable detection method available. Unfortunately, this is usually not the case. Where static analysis fails, dynamic analysis prevails…but it has its own set of drawbacks and weaknesses. Firstly, dynamic analysis requires much more processing power and memory than static analysis methods (Arp et al, 2014). Also, some malware uses evasion techniques to detect when they are being run in an emulated environment (sandbox) and will avoid exhibiting malicious behavior in this case. As of 2014, there were only three publicly-available dynamic analysis systems that were resistant to emulation detection and evasion by malware (Lindorfer et al, 2014).

Hybrid Analysis, Machine Learning, and Other Novel Approaches In addition to traditional methods of dynamic and static analysis, there are a number of novel approaches emerging that hold promise and have already shown great success in detecting

8

malware. Hyunjae et al did a study on malware developer certificates and found that “4% of total certificates collected from malware signed as many as 70% of malware samples” (Hyunjae et al, 2015) In response, they developed a detection and classification system using static analysis methods on the source code combined with a black list of known malware developer certificates. They were able to achieve a 98% accuracy rate with their system. Although the certificate blacklist only marginally boosted accuracy over traditional static analysis of source code, they were able to boost the detection speed by 30.9% (Hyunjae et al, 2015). Aafer et al piloted a study and found that permission-based static analysis alone doesn’t work well because permissions are often requested by apps and not used, many apps are overprivileged, in-app advertisements use many of the permissions requested, and malware doesn’t always need permissions to carry out its intended functions (Aafer et al, 2013). In response, they developed their “robust and lightweight” machine-learning classifiers in DroidAPIMiner by statically analyzing and cataloging API-level malware behavior. After training DroidAPIMiner on a sample set of 20,000 apps including almost 4,000 known malware samples from the Android Malware Genome Project and McAfee, They were able to achieve 99% accuracy with a 2.2% false-positive rate in classifying “unknown apps as either benign or malware” (Aafer et al, 2013). (Aung et al, 2013) on the other hand, used permission-based static analysis and machinelearning classifiers to develop a permission-based system to detect and classify malware. Their study, although it takes the opposite route of DroidAPIMiner, reinforces Aafer et al’s research. Aung et al achieved a 12% false-positive rate. Their research still contributed significantly to the body of research by showing that permission-based analysis on its own is significantly less accurate than other methods.

9

DREBIN is a lightweight static analysis system that runs on-device. It is achieves 94% accuracy with only 1% false-positives. “It geometrically maps extracted features to a joint vector space where patterns and combinations of features are analyzed” (Arp et al, 2014). It is also trained on a large data-set of known malware to determine what types of features to look out for on an unknown app. It takes an average of 10 seconds to run on-device (Arp et al, 2014). (Wang et al, 2015) use a hybrid technique of static and dynamic analysis to achieve nearly 99% accuracy rate in classifying and detecting malware. By acknowledging the strengths and weaknesses of static and dynamic methods, and also by deploying the most resource-hungry methods in the cloud they are able to deploy a very successful detection system. They, like other researchers, also use machine learning to train their detection system. One important note they make while citing another reference, however, is that “learning-based detection is subject to poisoning attacks” (Zhang et al, 2014). They go on to say that “an adversary may deliberately poison a benign dataset through introducing clean apps with malicious features to confuse a training system.” (Wang et al, 2015) They also consider using “dynamic hooking” methods first postulated by (Hu W and Xiao Z, 2014) in future work to prevent emulator detection by malware. If dynamic hooking works, it could overcome one of the most cited shortfalls of dynamic analysis.

Conclusion and Recommendations The best methods to detect malware targeting the Android OS appear to be hybrid systems with some combination of the following: static analysis using machine learning methods trained on large datasets, dynamic analysis to fill the gaps where static analysis fails against advanced code-transformation techniques, and distributed-processing to divide the workload of

10

detection and classification. Anomaly-detection or dynamic analysis should be performed in an emulated environment on a machine with greater resources than what is offered by a mobile device. Likewise, deep static analysis and machine learning should occur on a server and then push the trained algorithm to the mobile device for deployment. Basic signature-based detection should be used when it is the only available option, as it has been shown repeatedly by the body of research to be unreliable. There are trade-offs in Android security, and the most-acceptable method when considering size, accuracy, processing power, and battery consumption appears to be a lightweight static solution on device (trained on the back end with machine learning) with a dynamic solution on the cloud to run checks before downloading a new app to the mobile device. In the absence of a cutting-edge detection system like the ones presented in this body of knowledge, one thing is certain. For the average user who is unable or unwilling to install an obscure research project or custom anti-malware solution on their phone, it is best to download apps only from the official Play store, read and understand permissions before installing, and find a publicly available anti-malware solution with good reviews to deploy on their device.

11

References

1. Faruki, Perez, Bharmal, Ammar, Laxmi, Vijay, Ganmoor, Vijay, Gaur, Manoj Singh, Conti, Mauro and Rajarajan, Muttukrishnan. (2015) Android Security: A Survey of Issues, Malware Penetration and Defenses. Communications Surveys & Tutorials, IEEE , vol.17, no.2, pp.9981022, Second quarter 2015. 2. Hyunjae Kang, Jae-wook Jang, Aziz Mohaisen, and Huy Kang Kim. (2015) Detecting and Classifying Android Malware Using Static Analysis along with Creator Information. International Journal of Distributed Sensor Networks, vol. 2015, Article ID 479174. 3. Zhou, Yajin and Jiang, Xuxian. (2012) “Dissecting Android Malware: Characterization and Evolution”, 2012 IEEE Symposium on Security and Privacy 4. Aafer, Yousra, Du, Wenliang, and Yin, Heng. (2013) DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android. SecureComm 2013 5.Aung, Zarni and Zaw, Win. (2013) Permission-Based Android Malware Detection. International Journal of Scientific & Technology Research Volume 2, Issue 3, March 2013 6. Arp, Daniel, Spreitzenbarth, Michael, Hubner, Malte, Gascon, Hugo, and Rieck, Konrad. (2014) DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. Network and Distributed System Security Symposium (NDSS) Conference Feb 2014 7. Wang, Xiaolei, Yang, Yuexiang and Zeng, Yingzhi. (2015) Accurate mobile malware detection and classification in the cloud. 8. Feng, Yu, Anand, Saswat, Dillig, Isil, and Aiken, Alex. (2014) Apposcopy: Semantics-Based Detection of Android Malware through Static Analysis. Association for Computing Machinery, 2014 9. Rastogi, Vaibhav, Chen, Yan, and Jiang, Xuxian (2013) Catch Me if You Can: Evaluating Android Anti-malware against Transformation Attacks. IEEE Transactions on Information Forensics and Security. 10. Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, van der Veen V, Platzer C (2014). Andrubis - 1,000,000 Apps Later: a view on current android malware behaviors. 11. Lindorfer M, Neugschwandtner M, and Platzer C (2015) MARVIN: efficient and comprehensive mobile app classification through static and dynamic analysis. 13. Hu W, Xiao Z (2014) Guess Where I am-Android: Detection and Prevention of Emulator Evading on Android. HitCon

12

14. Zhang M, Duan Y, Yin H, et al (2014) Semantics-aware android malware classification using weighted contextual API dependency graphs[C]. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp 1105–1116

13

Malware Survival and Detection on the Android ...

Malware Survival and Detection on the Android ...

Suggest Documents

Deep Android Malware Detection

Permission-based Malware Detection Mechanisms on Android ...

Android malware detection: state of the art

Research on Android Malware Detection and ... - Springer Link

Effective and Explainable Detection of Android Malware Based on

Research on Android Malware Detection and ... - Springer Link

A Survey on Malware and Malware Detection

Android Malware Detection & Protection: A Survey

Category Based Malware detection for Android

Android Malware Detection: an Eigenspace Analysis Approach

A Malware Detection System For Android

A Multifamily Android Malware Detection Using Deep

Intelligent Approach for Android Malware Detection

Apposcopy: Semantics-Based Detection of Android Malware

Significant Permission Identification for Android Malware Detection

Permission-Based Android Malware Detection - Semantic Scholar

High Accuracy Android Malware Detection Using ...

Android Malware Detection Using Kullback-Leibler ... - CiteSeerX

Explaining Black-box Android Malware Detection - arXiv

Android Malware Detection & Protection: A Survey

Android Malware Detection Using Kullback-Leibler ... - CiteSeerX

Android Malware Detection Using Backpropagation Neural Network

Power Consumption Based Android Malware Detection

Poster: Towards Sustainable Android Malware Detection