Securing Big Data Environments from Attacks - IEEE Xplore

20 downloads 127659 Views 215KB Size Report
Securing Big Data Environments from Attacks. Udaya Tupakula. Vijay Varadharajan. Advanced Cyber Security Research Centre. Faculty of Science and ...
2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, IEEE International Conference on Intelligent Data and Security

Securing Big Data Environments from Attacks Udaya Tupakula

Vijay Varadharajan

Advanced Cyber Security Research Centre Faculty of Science and Engineering Macquarie University, Sydney, Australia {udaya.tupakula; vijay.varadharajan}@mq.edu.au

data generated in such environments makes it extremely challenging to deal with the attacks in such environment. Hence there is need for techniques for securing such big data environments. In this paper we propose techniques for securing big data environments. The paper is organised as follows. Section II presents the attacker model and overview of the operation of our model. Section III presents detail discussion on the components of our model. Section IV present the implementation of our model and how it helps to deal with different attacks. Section V concludes.

Abstract—In this paper we propose techniques for securing big data environments such as public cloud with tenants using their virtual machines for different services such as utility and healthcare. Our model makes use of state based monitoring of the data sources for service specific detection of the attacks and offline traffic analysis of multiple data sources to detect attacks such as botnets. Keywords—Big Data Security, Security Attacks

I.

INTRODUCTION

Emerging technologies such as smart grids, Internet of Things (IoT) and clouds generate huge amount of data. Several business models have been developed and innovative applications have proposed for making use of this data for improving the quality of life and providing better services to the customers. For example, business models have been developed for capturing the location and behaviour of the users from their mobile devices and using this information for targeted advertisement and smart transportation. Utility providers are capturing power usage of the smart devices in real time to estimate the peak time demand for the generation of power and also offer variable pricing depending on the time of use. Although there are several advantages with such emerging technologies, there are significant challenges for securing such environments.

II.

Data Analysis Engine

A. Attacker Model Let us consider a generic big data scenario such as public cloud with different tenants (utility, healthcare, finance, governments) making use of IaaS public cloud for hosting their services. The tenants can be running different operating systems (such as Windows, Linux) and service specific applications in their virtual machines. Attacks in such environments can lead to catastrophic damages (blackouts in case of attacks on utility services) and in some cases loss of life (eg. doctors unable to access patient’s data). There are several challenges to deal with the attacks in such environments. On one hand there are attacks that target specific services (such as Stuxnet [3] for SCADA) of the tenants and on other hand there are some attacks such as botnet that are common for any of the tenants since their applications are running on popular OS such as Windows and Linux. If the attacker can exploit vulnerabilities in such OS then it can compromise different tenant services. Attacks such as botnet are practical in the current state of art. The botnet is a group of

Data Storage Controller

HDS 1

HDS 2

HDS X

Fig. 1. Big Data Scenario

As shown in Figure 1, a simple big data scenario [1] consists of capturing structured or unstructured data from several Heterogeneous Data Sources (HDS) such as tiny sensors, servers, laptops, desktops, virtual machines, and smart phones, storing of the data in easily accessible location (centralised or distributed) and analysis or further processing of data for different applications. However, since data is captured from untrusted devices, attackers or compromised devices can easily upload malicious data to the storage controller and the attacks can be spread to all other devices that access this malicious data. Also the volume, velocity and variety of the 978-1-5090-2403-2/16 $31.00 © 2016 IEEE DOI 10.1109/BigDataSecurity-HPSC-IDS.2016.74

OUR APPROACH

Our model makes use of Trusted Components (TC) for enforcing service specific security policies on the HDS and also for capturing the data required for security analysis. The TC are placed at different devices in secure locations and the policies enforced in the TC depends on the capabilities of the devices. For example, the TC can be placed in the gateways, access points, base stations and virtual machine monitors. Let us consider a simple cloud [2] scenario and discuss an attacker model and operation of our model. For example, in the case of cloud there can be several millions of devices (volume) that are uploading/downloading data in frequent intervals (velocity) and different types of data (variety) such as tenants using the cloud for different services such as critical infrastructure, health care and utility providers.

109

Dom 0 TC

T2 T2 VM21 VM22

XEN VMM

Router

Hardware Dom 0 TC

T1 T1 VM11 VM12

XEN VMM Hardware

disks and network. The report is updated whenever there is a variation in the processes running in the virtual machine. The ED parses the VM memory and captures VM runtime information such as processes that are running in the virtual machine, parent processes, path information related to the processes and open ports on the VM. For example, this process is similar to the report generated by the task manager in the Windows machine. The only difference is that the report is generated by the component residing in the VMM instead of the virtual machine. Hence the report is more trusted.

Attack Domain

Cloud Controller & Data Storage Controller

Tenant Customers (Users)

ADE

B. Store and Restore The SR component is used for capturing the VM specific knowledge of the OS and applications, enable auditing of the VM transactions and restoration of services in case of attacks on the VM. We assume base install of the VM with OS and applications is in clean state and use this information as reference for tracking changes in the VM. Also the clean state image is used for restoration of VM in case of attacks. Now let us consider how SR tracks the changes to the VM. The reports generated by the ED are compared with the information stored in the SR database. If there are any new processes detected in the report then a new entry is created in the database. All the new processes are tagged to be validated by the administrator and/or VAL component. Hence the SR captures VM specific information and tracks the changes to the OS and applications in the virtual machine. Also all the VM transactions are logged in the SR database. This enables auditing of the VM transactions. Now let us consider how SR enables fast restoration of services in case of attacks on virtual machines. In addition to clean state image, the SR takes a snapshots of the VM at regular intervals. If the attack occurred at time T, then the cloud administrator can query SR to identify a snapshot with timestamp closest but before the attack’s timestamp and restore the VM.

Fig. 2. Securing Big Data Environment

hijacked computers, which are employed under command and control mechanism administered by a botmaster. Botnets are a clever piece of software, which carry out sophisticated synchronized activities while being resilient. Botnet evolved from IRC based centralized botnet to employing common protocols such as HTTP with decentralized architectures and then peer-to-peer designs. As Botnets have become more sophisticated, the need for advanced techniques and research against botnets has grown. Hence there is need for techniques to detect service specific attacks for each tenant and common attacks such as botnets that target multiple tenants. B. Operation Overview We assume that that cloud service provider is offering security as a service to its tenants. Our model makes use of TC in each physical server (see Figure 2) for monitoring the usage of resources allocated to the tenant virtual machines. The TC is used for fine granular and real time monitoring of the tenant resources and deal with different attacks such as service specific attacks, rootkits and crash of the services. Entity Detection (ED), Store and Restore (SR), Validation (VAL) are the important TC sub components that are used for detecting service specific attacks on the tenant virtual machines. The ED is used for generating the VM state report and detecting the processes that are accessing the data from the disk and network. The SR component captures the VM specific knowledge such as the operating system, applications, updates and resources allocated to the virtual machine. SR enables auditing of the VM usage of resources and quick restoration of the virtual machine service in case of attacks. VAL is used to enforce VM specific security policies using signature and anomaly based detection. Our model also makes use of offline traffic analysis of multiple tenant virtual machines traffic to detect if any of the tenant virtual machines is infected with a range of bot families. III.

C. Validation The validation component uses different techniques to detect attacks on the virtual machines. For example, Stuxnet [3] extracts and decrypts two files from its resource section and writes them to disk as MrxNet.sys and MrxCls.sys; emerging attacks such as conficker disable the security tools running in the virtual machine. Hence VAL validates the ED report to ensure that important processes related to security tools are actually running in the VM and no hidden processes are running in the VM; it also validates the VM disk access and network traffic against known attack signatures and anomaly based detection policies. Note that the SR has the specific details (such as resources allocated to VM, OS and applications running in VM, and logs of disc/network access) of each virtual machine. The administrator makes use of this information to specify the security policies for each virtual machine. For example, signatures are selected depending on the OS and applications running in the virtual machine and threshold are placed for anomaly based detection. We use history based thresholds for total VM traffic and the application specific traffic to detect anomalies. The evaluation process of the VAL works as follows:

ARCHITECTURE COMPONENTS

In this Section we will describe the important components of our model. A. Entity Detection The ED generates state report of the tenant virtual machine and uses this information for fine granular detection of processes that are accessing (writing/reading) data from the

110

Let us first consider how VAL validates the ED reports. ED reports can be used to deal with different types of attacks including zero day attacks. For example, let us consider different activities that can be performed by an attacker by exploiting a zero day vulnerability in the virtual machine. The attacker can install malicious applications such as rootkits with hidden processes, disable the security tools installed in the infected machine and use the compromised machine for flooding other hosts with malicious traffic. For example, if zero day attack results in creation of new hidden process then the invoked hidden processes can be detected in the ED report. Although the attacker can alter the compromised hosts for not reporting the malicious processes that are installed by the attacker, these processes will be listed in the ED report. Since ED component is located in the VMM, the attacker does not have access to alter the report generated by the ED component. If the zero day attack disables security tools in the vulnerable machine then this can be detected while validating the process that are running in the infected machine. Hence if the processes related to the security tool are not found in the monitored host then this can be considered as a strong proof of compromise of the virtual machine. If a zero day attack exploits a running process and generates malicious traffic to spread the attacks or flood other hosts, they are detected by the signature based and anomaly based modules. Now let us consider how VAL validates the VM traffic. If any of the packet(s) from the virtual machines are matching with a known attack signature or found to be malicious by the anomaly detection module, then an alert is generated to the cloud administrator. The cloud administrator can query the SR to determine the entity that generated the malicious traffic and restore the services. For example, if the malicious entity is a new process then the recent snapshot image without the malicious entity is used for restoring the service of the virtual machine. Since attacks are possible within the threshold, such attacks are detected during offline traffic analysis which will be discussed in the following section.

For example, Zeus bots send updates at regular interval of 20 minutes; queries and updates in peer-to-peer bots such as Nugache and Weledac create many small uniformed packets compared to legitimate P2P communications. Also the initial exchanges of packets have a fixed format and can be easily differentiated from legitimate traffic. After analysing a range of bots, we formulated an attack template that combines the features of network flows for detecting different bot families. We made use of the IPFIX standard to define the attack template. Different vendors have different proprietary protocols such as Cisco-NetFlow, Juniperj-flow or cflowd, Huawei-Netstream for capturing the flow traffic. Hence by defining the uniform attack template using IPFIX, one can apply our template on devices belonging to different vendors including the virtual networking devices in the VMM. IV.

IMPLEMENTATION

Figure 2 shows the implementation of our model using eucalyptus cloud with Xen VMM [4]. Let us consider how our model helps to deal with different attack scenarios. A. Service Specific Attack Detection Let us consider an attack scenario on one of the tenant virtual machine and how the TC components detected the attack. In this case, the VM11 in Figure 2 was running an anti-virus security tool. We have infected this VM with a rootkit which disabled the anti-virus in the VM, runs a hidden process and generated traffic with spoofed source address. Although the rootkit was successful in altering the process list in the compromised virtual machine it was not able to alter the report generated by the ED. Hence the malicious hidden process with process id 3946 is detected by the ED. Also the VAL component could detect the compromise of the VM since it could not detect processes related to the anti-virus and traffic has spoofed address. In this case, the traffic generated by the compromised VM is dropped and an alert is generated to the administrator.

D. Offline Analysis As part of our work, we have carried out offline analysis of the traffic from multiple tenant virtual machines to detect attacks such as bots. The traffic analysis system consists of attack templates, flow collector, filtering and attack detection engines (ADE). Attack templates are used for capturing flow information using IPFIX. Flow collector is a logically centralised server that is used for storing and organising the data captured at different devices using the attack templates. Filtering is used to reduce the dataset, filtering out unwanted data that is not related to botnets. Finally, the attack detection engine correlates flow information using machine learning techniques to find the patterns and detect the bots. In our analysis, we have deployed naïve Bayesian, SVM, Neural Network and Decision Trees machine learning algorithms in our analysis. We infected some machines in our test bed environment with bot nets. Then we captured the traffic flows and identified specific features to detect the bots using machine learning algorithms. We repeated the process for different bot families.

B. Bot Detection As mentioned earlier, first we developed a template for capturing the traffic flows for bot detection. We analysed previously proposed [5-9] flow analysis techniques for detecting HTTP, IRC and P2P botnets. The related techniques make use of Netflow 5 template for capturing the flows and detecting the bots. Since NetFlow v5 comes with fixed dataset format, its record size has fixed size of 48 bytes (384 bits). However the related techniques do not make use of the attributes such as next hop, input, output, pad, tos, src_as, dst_as, src_mask, dst_mask and pad2 that were available in NetFlow5. So we developed an IPFIX template, making use of only the relevant attributes and removing the unused attributes, thereby significantly reducing the dataset size. Figure 3 shows the IPFIX template with the required attributes for detecting HTTP, IRC and P2P bot families. As record size of our template is 30 bytes (240bits), there has been a 37% reduction in dataset. The reduced dataset not only reduces the corresponding storage but also helps to reduce the

111

updates. This generated a pattern, which is used in the detection of bots. Even though botnets try to randomize these communications to evade detection, there are still some vertical and/or horizontal correlations in their communications. For example, we identified that the Zeus bots (v1.3) send updates at fixed interval of 20 minutes. A main objective of the botnets is to spread the network and recruit more bots into its botnet, which ultimately benefits to surge the strength of a botnet and the attack. Hence the bots scan other machines in its network for vulnerabilities. If a vulnerable machine is found, they will run exploits to compromise the machine. When scanning, bots generate bursts of small packets. So this activity resulted in sudden increase in the number of packets (without necessarily a major increase in the traffic volume), which our detection engine tries to identify. The attack detection engine also looked for DDoS activities such as, outbound TCP SYN packets floods and UDP floods. The reason for these large number of TCP SYN packets could mean that some of the internal hosts in the network is part of a botnet and are participating in a DDoS attack. From the above discussion, it is clear that our model can detect a range of botnet families using various activities throughout their lifecycle including C&C interactions, recruiting new bot members and synchronized attacks. Furthermore, the size of the flow record dataset is only 30 bytes in our model compared to 48 bytes in the related techniques [5-9].

Fig. 3. IPFix Template

computations required to filter the unnecessary attributes from the data, thereby increasing efficiency of both the collector and the exporter. Also, since the captured dataset includes all the attributes of related techniques, respective techniques are applied on the dataset for bot detection. We make use of multi stage filtering to eliminate traffic that is not related to the attack and relevant machine learning for the attack. For example, Bayesian algorithms are used for IRC and decision tree algorithms are used for peer to peer bot detection. Note one can customize the attributes in IPFIX and relate them to the payload to detect other attacks such as DNS flux and spam. The captured VM traffic flows are clustered to identify similarities in the flows. We have experimented with Bayesian, SVM, Neural Networks and Decision Trees based machine learning techniques to identify the specific features of the bots. There are three main behavioural patterns; bot behaviour, botnet behaviour and temporal behaviour. In the bot behaviour, we analysed flows generated from one bot or a single machine to identify its Command and Communications (C&C). In botnet behaviour, we analysed flows generated by group of bots or machines, in order to detect botnet activities. We horizontally analysed the flows generated in a network to find suspicious pattern related to botnet communications. Finally, in temporal or vertical method, we analysed flows generated by bots or botnets over a period of time to detect patterns. Our analysis mainly focused on detecting the bots during different phases in botnet life cycle; initial infection, secondary infection, connection, malicious command and control, and update and maintenance. In the initial infection phase, attacker exploits vulnerabilities on victims and gains basic control over victims. The secondary infection phase is used to further download and install malicious script and binaries to get full control of the victim. Once secondary infection is complete, bots make connection to its C&C server in order to become a member of botnet. Then the bot will receive command and control from C&C server to conduct malicious coordinated activities. Finally bots update its binaries to get more functionality or evade detection. In the case of direct C&C related bots such as IRC and HTTP, every bot needs to find its own C&C controller to be a member of centralized botnet. In the case of peer-to-peer and Hybrid bots such as Kademila, Chord and GameOver (Zeus v3) each bot needs to find its servant bot or proxy bot to get C&C instruction and become a member of P2P botnet. Our analysis confirmed that these botnets are using hard coded static IP lists or/and DNS service to locate its C&C server, from which it has to receive the control commands and

V.

CONCLUSION

In this paper we have proposed techniques for securing big data environments from different types of attacks. Our model makes use of real time monitoring to deal with the service specific attacks and offline traffic analysis to detect different types of bots such as IRC, HTTP and P2P. REFERENCES [1]

Sagiroglu, S., Sinanc, D. "Big data: A review", International Conference on Collaboration Technologies and Systems (CTS), CA, May 2013. [2] Peter Mell, Timothy Grance, "The NIST Definition of Cloud Computing", Special Publication 800-145, September 2011. [3] N. Falliere, L. O. Murchu, and E. Chien, “W32.Stuxnet Dossier,” White paper, Symantec Corp.,Security Response, 2011. [4] P. Barham et al., "Xen and the art of virtualization", Proc. of the 19th ACM symposium on Operating systems principles, USA, Oct. 2003. [5] D. Zhao et al., "Peer to Peer Botnet Detection Based on Flow Intervals", Proc. of the IFIP Information Security and Privacy Conference, Greece, Jun 2012. [6] F. Tegeler et al.,"BotFinder: finding bots in network traffic without deep packet inspection", ACM Nice, France, 2012. [7] L. Bilge et al., "DISCLOSURE: Detecting Botnet Command and Control Servers through Large-Scale NetFlow Analysis," Proc. of the Annual Computer Security Applications Conference, USA, Dec 2012. [8] J. Francois et al., "BotTrack: Tracking Botnets Using NetFlow and PageRank", Proc. of the 10th International IFIP TC 6 Networking Conference, LNCS Vol.6641, Spain, May 2011. [9] D. Zhao et al., "Peer to Peer Botnet Detection Based on Flow Intervals", Proc. of the IFIP Information Security and Privacy Conference, Greece, Jun 2012.

112