Security Event Correlation Approach for Cloud ...

2 downloads 0 Views 1018KB Size Report
(including Snort, FSecure Linux Security, and Samhain), which have to be installed from the Cloud provider on its physical machine, or security features offered ...
Int. J., Vol. x, No. x, 200x

1

Security Event Correlation Approach for Cloud Computing Massimo Ficco Dipartimento di Ingegneria Industriale e dell’Informazione Second University of Naples (SUN) Via Roma 29, 81031 Aversa, Italy Email: [email protected] Abstract: Cloud Computing is a new business model, which represents an opportunity for users, companies, and public organizations to reduce costs and increase efficiency, as well as an alternative way for providing services and resources. In this pay-by-use model, security plays a key role. Cyber attacks are a serious danger, which can compromise the quality of the service delivered to the customers, as well as the costs of the provided Cloud resources and services. In this paper, a hybrid and hierarchical event correlation approach for intrusion detection in Cloud Computing is presented. It consists of detecting intrusion symptoms by collecting diverse information at several Cloud architectural levels, using distributed security probes, as well as performing complex event analysis based on a Complex Event Processing Engine. The escalation process from intrusion symptoms to the identified cause and target of the intrusion is driven by a knowledge-base represented by an ontology. A prototype implementation of the proposed intrusion detection solution is also presented. Keywords: Cloud Computing, intrusion detection, security, correlation, complex event processing. Biographical notes: Massimo Ficco is Assistant Professor at the Department of ‘Ingegneria Industriale e dell’Informazione’ of the Second University of Naples (SUN). He received the degree in Informatics Engineering in 2000 from the University of Naples “Federico II”, and his PhD in “Information Engineering” from the University of “Parthenope”. From 2000 to 2010, he was senior researcher at the Italian University Consortium for Computer Science (CINI). Form 2004, he taught master courses in Software Engineering, Data Base and Software Programming. His current research interests include software engineering architecture, security aspects of critical infrastructure, and mobile computing. He has been involved in several international research projects. In particular, he is currently involved in an EU funded research project ‘Open-Source API and Platform for Multiple Cloud (mOSAIC FP7-ICT-256910)’ in the area of ‘Internet of Services, Software and Virtualisation’.

1 Introduction Cloud Computing is the emerging paradigm in distributed environments, and represents a new opportunity for both service providers and consumers. From Cloud Service Provider (CSP) point of view, it represents an alternative way for renting computing and storage infrastructure services, for providing remote platform customization for business processes, and for renting business applications as a whole. From the consumer point of view the self-service and on-demand access to a shared pool of configurable computing resources, allows consumers to create elastic environments that can be expanded based on the imposed workload and the required quality of service parameters. Moreover, the pay-by-use nature of Cloud Computing is a form of equipment leases that guarantees a minimum level of service from a Cloud provider. On the other hand, one of the major barrier for the fruitful adoption of Cloud Computing is a real and perceived lack of privacy and security. Cyber attacks have

Copyright © 2011 Inderscience Enterprises Ltd.

special effects on the costs and consumption, and not only on the performance perceived by consumers (1). For example, in order to face peak loads due to attacks that aim at reducing the service availability by exhausting the resources of the service’s host system, like memory, processing resources or network bandwidth, additional resources should be either already available, or dynamically acquired on demand. However, such resources are not free, and a cyber attack could make it economically prohibitive. Monitoring and understanding the state of the entire Cloud system can be particularly challenging for the CSP. It can be difficult for the security administrator to monitor the behavior of such a complex and distributed architecture. The Cloud architecture may contain hundreds or even thousands of heterogeneous components located at different architectural levels, consisting of a large number of software and hardware entities (2). Attacks against such a complex system manifest themselves in terms of symptoms, which can be of differing nature, at different Cloud architectural levels

2 (i.e., infrastructure, platform, and application), and in different virtual components. Therefore, security administrators have to use several security probes, such as Intrusion Detection Systems (IDS) and log analyzers, in order to monitor such symptoms. Such probes can produce a multitude of alarms most of which are false positives, or do not constitute actual intrusions. The effects of these intrusions can involve different components at different architectural levels, and they can also occur at different times. Moreover, in order to discover a system vulnerability, used to achieve a final malicious objective, sophisticated attacks can be performed. they can be complex scenarios, which consist of multiple stages performed by the same attacker in different times. This work presents a ‘hybrid’ and ‘hierarchical’ correlation approach for the detection of complex intrusion scenarios. The proposed approach provides means to group logically- and temporally-connected symptoms into attack scenarios. The correlation process is applied to streams of information collected at different Cloud architectural levels, using multiple security probes deployed as a distributed architecture in the Cloud. The ‘hybrid’ correlation is the first stage of the proposed approach. It consists in aggregating symptoms of the same attack, exploiting diversity both in information sources and methods used to detect malicious activities. Then, in order to recognize complex intrusion patterns and enrich the semantics of the diagnosis, a ‘hierarchical’ correlation approach based on a Complex Event Processing (CEP) is adopted. It captures the causal relationships between the alarms generated during the previous stage (which may represent intermediate attacks of a more complex attack scenario), by correlating them on the base of temporal and logical constraints. The correlation capability is driven by a knowledge-base represented by an ontology, which is used to capture the causal relationship between the detected intermediate attacks. A prototype implementation of the proposed solution is also presented. It consists of a collection of software components that transform raw attack symptoms into high level intrusion reports, which can be used to analyze the attack strategy and perform recovery actions. Each component focuses on different aspects of the overall correlation process. Finally, this work shows how the proposed solution can be used to detect complex attack scenarios, which consist of a specific sequence of malicious activities (namely ‘intermediate attacks’) performed by the attacker in order to discover a system vulnerability. The rest of the paper is organized as follows: The background and related work are presented in Section 2. Section 3 presents the adopted correlation process. The proposed knowledge-base is described in Section 4, whereas Section 5 shows a prototype of the proposed solution, which is entirely based on open source components. The system operation based on the CEP is presented in Section 6. Finally, conclusions and lessons learned are presented in Section 7.

2 Background and related work 2.1 Delivery model in Cloud Computing Cloud Computing paradigms offers three models for services delivering: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). • IaaS - At IaaS level, the Cloud providers offers resources, such as computational resource, storage, and network, on which the consumer can deploy and run arbitrary software. The consumer does not control the underlying Cloud infrastructure, but control virtual machines on which can manage operating system, storage, deployed application, and networking components (such as firewall and IDSs). Examples of IaaS providers are Amazon’s Elastic Compute EC2 and Simple Storage Service S3. • PaaS - At this level, Cloud providers offer development environments to consumers. An example is Google App Engine. The consumers can deploy onto the Cloud infrastructure, the applications they have created by using programming languages and tools supported by the CSP. These applications can run on the provider infrastructure and delivered to the consumer via the Internet from the provider servers. The consumer does not manage or control system infrastructure, including network, servers, operating systems, storage, or application capabilities. Moreover, he/she does not have control over the deployed applications and hosting environment configurations. • SaaS - At this level the applications offered by CPS are delivered to the consumers. Applications run on a Cloud infrastructure, and can be used through thin client interfaces, such as a Web browser. As with PaaS, the consumer does not control the underlying Cloud infrastructure, as well as he/she has not to face investment in servers or software licensing. An example is the Google Docs suite.

2.2 Related work Several research works face security problems involved with the usage of public and private Cloud infrastructures (3; 4; 5; 6). Cheng et al. (7) present a distributed IDS architecture for Cloud. They propose to separate the IDS for each user, from the actual component being monitored. In particular, each virtual component should be secured by a separated IDS probe, which is responsible for one virtual machine. The Cloud consumer is responsible of configuring and managing the IDS running on its virtual machine. Additionally, to correlate and analyze alerts generated by multiple distributed sensors deployed in the Cloud, a central management unit is provided. It can also be used by consumers to configure the IDSs. The CSP is responsible

3 of for enabling different IDSs and providing the possibility to connect them to virtual components. A similar solution has been proposed by ChiChun et al. (8). They propose a federation defense approach, in which the IDSs are deployed in each Cloud Computing region. IDSs cooperate among each other by exchanging alerts, and implementing a judgment criterion to evaluate the trustworthiness of these alerts. The IDS architecture is composed of several conceptual entities, which have been designed to perform the following activities: intrusion detection, alert clustering, threshold computation and comparison, and intrusion response and blocking. A cooperative component is used to exchange alert messages with other IDSs. The majority vote method is used for judging the trustworthiness of an alert. Park et al. (9) propose a multi-level IDS and log management method based on the consumer behavior for applying IDS effectively to Cloud Computing system. The multi-level IDS applies differentiated levels of security strength to consumers according to the degree of their trustworthiness. On the base of risk level policies, risk points are assigned to consumer in proportion to risk of anomaly behaviors. Users are judged by their previous personal usage history. The virtual component to assign to the consumer depends on the security level derived by the judgment. Moreover, the assigned virtual component may change according to anomaly behavior level of the consumer, and migration can occur. Finally, IDS sensor that has to be installed on the assigned consumer guest operating system image, is chosen on the base of the security level assigned to the user. In previous works (10; 11), we have proposed an extensible intrusion detection management framework, which can be offered to CSP in order to implement distributed IDSs for detecting of cyber attacks to their Cloud, as well as to customers in order to monitor their applications. The security framework offers an open-source API, a Software Platform, and several features to develop security probes, which can be configured and deployed by users on the different components of a Cloud federation. A specific interface has been designed to implement security engine components, which collects security information, inferred by the several security probes deployed on different Cloud architectural levels, which can be used to identify malicious activities. In order to improve the attack detection rate, enrich the semantics of alerts, and reduce the overall number of false alerts, several works propose explicit alarm correlation technique (12; 13; 14). In particular, Vigna et al. (15) proposes a correlation workflow intended to unify the different steps of the correlation process. They adopt an approach that combines events that represent the independent detection of the same attack occurrence, collected by different probes. Our work is strongly inspired by the such framework, but we have proposed and implemented a hybrid approach for Cloud, which combines different correlation techniques, and employs open source CEP technology in order to process events and discover attack patterns among multiple streams of event data.

Figure 1 Correlation process

However, even though all the previous works addressing distributed IDSs in the Clouds, to the best of our knowledge, no concrete solution seems to have been developed, which addresses the issues of gathering intermediate attacks from distributed Cloud components (in order to recognize complex attack scenario). Moreover, these works propose the correlation of alarms produced by network sensors only. This could bring IDSs to miss crucial details necessary for gaining the knowledge required to recognize relations among intermediate attacks. Our solution addresses precisely this issue through the collection of streams of information at several Cloud architectural levels (i.e., IaaS, PaaS, SaaS), and the use of a CEP that offers great flexibility to the management of a complex correlation logic.

3 The correlation process The adopted detection approach consists of a sequence of activities that are designed to identify correlation between raw attack symptoms, which are symptomatic of a complex intrusion. In particular, as Figure 1 shows, each activity focuses on different aspects of the overall correlation process: • Monitoring: Distributed probes are used to observe relevant symptoms at different Cloud architectural levels (IaaS, PaaS, and SaaS), by using both anomalyand misuse-based monitoring methods. The use of multiple heterogeneous probes potentially improves detection performance through the generation of different views of the same security incident. • Classification: During the classification process, the symptoms can be aggregated into categories. They are divided into abuses, misuses, and suspicious

4 acts. Abuses represent actions which change the state of a Cloud component. They can be further be divided into anomaly- and knowledge-based. The former represents anomaly behaviors (e.g., unusual application load, anomalous input requests); the latter is based on the recognition of signatures of previously known attacks (e.g., brute force attacks). Misuses represent out-of-policy behaviors, in which the state of the component is not affected (e.g., authentication failures, failed queries). Suspicious acts are not violations of policies, but however, they are events of interest to the detection process (e.g., commands which provide information about the CPU load of a virtual node), which are not signaled by explicit messages. • Normalization: Every monitored symptom is coded and normalized into a standardized format, as well as augmented with additional information, such as timestamps and source address of the attacker. • Fusion: Events generated by different probes and related with the same attack are merged into a single alarm. Such symptoms are aggregated by using clustering-based aggregation rules (16). They combine symptoms based on the ‘similarity’ among their attributes. In a previous work (17) has been presented an implementation of the considered aggregation approach based on the attack type, the target component, and the temporal proximity, i.e., it combines different symptoms of the same attack occurrences, which are directed to the same target. The temporal proximity is based on a time window that is specified as a parameter by the administrator. • Ranking & filtering: When the fusion succeeds, it is necessary to decide whether the current observations correspond to malicious activities. In particular, in order to reduce the number of false positives produced during the fusion step, it is adopted an approach based on the estimation of the likelihood that the symptoms aggregated during the Fusion, represent an underlining attack of such a type. If this probability does not exceed a threshold estimated during a training phase (specific of each attack), the symptoms are discarded (i.e., they are considered as false positives). • Diagnosis: For each attack reported by the ranking and filtering phase, the diagnosis determines whether the attack is either a successful attack or a non relevant attack, i.e., an attack that does not lead to intrusion. In the case of intrusion, it has to identify the intrusion effects, i.e., the extent of the damages and the faulty components. During the on-line diagnosis activities, different diagnosis mechanisms can be adopted; for example, thresholdbased over-time diagnostic mechanisms using either heuristic or probabilistic approaches (18; 19; 20).

• Correlation: The goal of correlation is to identify complex attack scenarios. In the intrusion detection literature, an attack scenario (or attack pattern) is a sequence of explicit attack steps, which are logically linked and lead to an objective. Correlation receives the detected intermediate attack alarms and tries to capture the causal relationships among them. In this work, an attack scenario is modeled by causalbased correlation rules (21). A rule consists of a set of prerequisites (pre-conditions) and consequences (post-conditions), which are logical conditions on the intermediate attacks. The pre-conditions are logical conditions that specify the requirements to be satisfied for the attack to succeed. The postconditions are logical conditions that specify the effects of the attack when this attack succeeds. The causal-based correlation approach correlates different attacks by matching consequences of an intermediate attack with the prerequisites of another intermediate attack. Post- and pre-conditions are linked together by temporal constraints. Each condition is described by predicates, which are conditions to be verified. The pre-conditions are verified by means of information inferred during the ranking phase, whereas the postconditions are verified by information reported by the diagnosis output. A deadline is fixed for the recognition of each scenario. • Recovery: If intrusion effects are identified, it is determined and performed the less costly and most effective remediation, based on the severity of the error affecting the system components (22; 23).

4 The knowledge-base schema The proposed approach is driven by a knowledge-base schema represented by an ontology (Fig. 2). It is used to aggregate symptoms produced by different probes with the final goal of identifying one or more intermediate attacks as likely causes of observed symptoms, as well as to recognize complex attack scenarios. Moreover, the knowledge-base is also used to automate the process of deriving queries for diagnostic analysis (i.e., to identify the cause and the effect of the discovered intrusion), and to support the identification of the less costly and most effective reaction. As Figure 2 shows, each kind of attack can be related to a set of potential symptoms. More specifically, an attack is described by using the AttackIndicator that has the following properties: (i) the hasTrustworthiness, which is defined by the concept of likelihood that the observed feature is a symptom of the considered attack, (ii) the isAssociatedTo, which is defined by the concept Probe, and (iii) indicates, which is defined by the concept Symptom. The Symptoms are classified into Abuse, Misuse and Suspicious, and characterized by the property hasIntensityScore, which reflects the probability of occurrence of the given symptom with regards to the

5

Figure 2

A simplified view of the proposed knowledge-base

specific monitoring method, and other proprieties, such as Timestamp, Origin, and Destination. The Probes are characterized by the MonitoringMethod property, which defines the method used to monitor the symptom. Moreover, they are also classified depending on the specific architectural level to which they belong, namely IaaS Level Probe, PaaS Level Probe, and SaaS Level Probe. A similar classification is adopted for the Target that refers to anything that should be protected. For example, the Application Level includes the Cloud Service, the Software Component, and the Application Data. Each Target component is characterized by several proprieties. For example, the Software Component is characterized by a Port Number, and a Process Identifier. The Attack Scenario identifies the correlation rule for the scenario recognition. It has proprieties hasDeadline, hasPrecondition and hasPostcondition. The former defines the time window by which monitoring the attack scenario. The latter defines the pre- and post-conditions, which describe the conditions required for including an intermediate attack in the considered scenario. Pre- and post-conditions have the following properties: Time, which is the temporal condition for detection of an intermediate

attack, and Predicate, which corresponds to a numerical or logical condition over the attack instance to be verified. The Attacks are also defined by the hasEffect propriety, which represents a possible effect on the state of the monitored target component, as well as the threat on a security objective (i.e., availability, integrity, confidentiality, authentication), if the attack is successful. Effects may be characterized by the Security Objective and the Effect Threshold, that identifies operation point beyond which is necessary to active the reaction. Moreover, the Effect has several subclasses, including Illegal Access, Data Destruction and Information Leakage. Finally, given the extensibility of the proposed ontology, new classes can be easily added. For example, as shown in Figure 2 (by the concepts depicted through dashed lines), we could inherit the similar structure as in the class Attack of the security ontology extracted from (24). Attack has the propriety isCausedBy, which is defined by the concept Treat Agent (i.e., the agent that exploits the vulnerability). Moreover, Threat Agent is connected to the concept of Vulnerability through the exploits propriety.

6

Figure 3

The Decision Engine architecture

5 The approach implementation In a previous work (10) a software framework for developing distributed IDSs in a Cloud has been presented. It allows to develop autonomous and proactive security agents able to migrate dynamically on the virtual nodes together with their code and the execution status. The agents implement context-aware software components, that when appropriately configured, can be enabled to infer security information at infrastructure level. Moreover, a software platform provides features to control and monitor software components at platform level. Finally, an API enables developers to implement Cloud applications that offer specific security features at application level. In the present work, the framework proposed in (10) represents the basis for designing and developing several probes, which can be deployed in strategic points in the Cloud system, and at different Cloud architectural levels. They can be used to infer attack symptoms from realtime traffic and specific logs. Each probe monitors specific features of the same Cloud component. For example, in order to discover if an intruder is attempting to cause a denial of service attack toward a virtual node, a network IDS at infrastructure level monitors anomalous packets rate; a host monitor measures CPU utilization, user logging, and disk activity on a specific virtual machine; a software monitor observes the unauthorized access to application data, and so on. Specifically, each probe can assess if an anomalous activity is underway on the base of its local view. They can adopt different detection methods (both anomaly- and misuse-based), which trigger different security message formats. The probes can be either IDSs

(including Snort, FSecure Linux Security, and Samhain), which have to be installed from the Cloud provider on its physical machine, or security features offered by the adopted framework. Moreover, by using specific API offered by the framework, security mechanisms (such as threshold-based filter, proxy components) can be developed to monitor software components. According to the proposed approach, a software Agent is related with each probe. The Agent converts the collected raw security data into attributes usable by a centralized Decision Engine (e.g., a timestamps, a probe identifier, the source and destination of the anomalous activity). To decouple Decision Engine from the specific format of the probe’s messages, the Agents perform the normalization, which consists in adopting an unique language to represent the alarms. In this work, a standard representation of symptoms based on the Intrusion Detection Message Exchange Format (IDMEF) data model is adopted (25). Then, the normalized messages are forwarded to the Decision Engine. The Decision Engine performs both the attack monitoring and intrusion diagnosis activities. It aims at (i) aggregating symptoms of monitored behaviors, (ii) recognizing what attack can be hypothesized to be the cause of the anomalous symptoms, (iii) diagnosing the intrusions, and (iv) detecting complex attack scenarios. As Figure 3 shows, the Decision Engine uses the designed knowledgebase in order to decide whether the observed behaviors represent potential attacks or not, as well as to perform the diagnosis of intrusions. The enabling components are the Stream Processor, the Filter, the Diagnoser, and the CEP. • The Stream Processor aggregates continuous events data streams in near real-time fashion. It uses a fixed aggregation schema defined by the knowledge-base to infer a set of queries to be performed on the incoming messages streams. For each monitored attack type, it extracts the corresponding monitored symptoms and forwards them to the Filter. • The Filter is a software component that performs the ranking and filtering process of the received events. It determines if the aggregated symptoms represent a real attack in progress in the system, as well as leads to identify what kinds of attacks can be hypothesized to be the cause of the identified symptoms. Finally, for each hypothesized attack, an alarm is forwarded to both the Diagnoser and the CEP. • The Diagnoser is a component that performs the diagnostic process. It determines whether the monitored attack is either an intrusion or an irrelevant attack, and what parts of the system are supposed to be the attack target. It is responsible to assess the intrusion effects on the target components for each received security alarm. Finally, it forwards the diagnostic results to the CEP. The Diagnoser is based on a Simple Event Correlator (SEC) (26). It is an open-source rule-based event correlation tool, created in an academic context. SEC can be seen as

7 a complex and context-aware filter that selects and correlates relevant information based on matching rules defined using regular expressions; complex matching patterns can be defined in a compact way that would otherwise be quite awkward to express. Rules are defined in specific configuration files (text format), which can be refreshed at run-time (27).

• The CEP is in charges of performing real-time queries on several data streams in order to detect the complex attack scenarios. The knowledge-base is used to automate the process of deriving roles for the attack scenario recognition. The CEP engine is based on Borealis (28), an open source framework for query processing of event streams. It performs SQL-style processing on the incoming events streams, without necessarily storing them. It uses specialized primitives and constructs (e.g., time-windows, logical conditions) to express stream-oriented processing logic. Queries consist in merging (via Join constructs) streams produced by the Filter and the Diagnoser. In particular, on the base of information described by the knowledge-base (i.e., the TimeCondition and Predicate proprieties), different temporal and logic conditions can be applied to every stream.

Finally, an open source framework, named Prelude (29) has been adopted, which acts as an event bus between the Agents and the Decision Engine. It consists of a collection of manager components hierarchically organized. Specifically, each Agent reports events in centralized fashion using a local connection to a prelude ‘local-manager’. The prelude local-manager processes the received events and delivers them to both a specified media (mysql database, postgresql database, XML file), and the higher level prelude manager, as represented in Figure 4. Prelude provides a library (called Libprelude), that guarantees secure connections (i.e., TLS connections) among the Agents, the local-managers, and the higher managers. Moreover, Libprelude provides the necessary functionality for emitting IDMEF events, and for automating re-transmission of data in times of temporary interruption of one of the system components. In this work, we assume to use a local-manager for each Cloud architectural level. Each local-manage forwards the collected data to a Decision Engine, which resides either in the Cloud, or in a separated node of the Cloud provider. Moreover, a ‘Prelude console’ has been extended to implement the Web Monitor used to view the detected attacks. The Monitor is the central analysis server, which can be used from the Cloud security administrator (i) to monitor the Cloud system status, (ii) to identify the best action to take in order to mitigate the intrusion effects on the target components, and (iii) to identify attack pattern strategies.

Figure 4 Prelude hierarchical architecture

6 The hybrid and hierarchical correlation approach Understanding the security status of a Cloud system needs combining observations performed across several entities distributed within the system. Specifically, in order to increase the detection coverage of intermediate attacks, as well as to reduce the time and the cost of managing a large number of false positives, a hybrid detection approach based on diversity both in information sources and methods used to detect malicious activities is adopted. Then, in order to recognize complex intrusion patterns and enrich the semantics of diagnosis, a CEP-based correlation approach is adopted, which captures the causal relationships among intermediate attacks, by correlating them on the base of temporal and logical constraints.

6.1 Intermediate attack recognition The framework proposed in (10) can be used to design and implement different probes for Cloud systems, which use either anomaly detection methods (AD) or misuse detection methods (MD). For each observed feature, the AD probe can perform in one of two phases: training and operation. In the training phase, data sets are used to parametrize the detection method (i.e., to determine the characteristics of ‘expected’ behavior). The suspicious requests are manually extracted in order to guarantee that the data sets are (as much as possible) attacks free. Moreover, the thresholds are identified and used to distinguish between regular and anomalous behaviors during the operation phase. Once the profiles and thresholds have been derived, the operation phase is performed. The choice of a proper threshold value

8

Figure 5

Different states of the target component.

is the main problem in this process, since there is a tradeoff between the number of false positives and the expected detection accuracy: a low threshold can result in many false positives, whereas a high threshold can result in many false negatives. In the operation phase, the AD method is used to monitor anomaly behaviors with respect to the desired profile computed during the training phase. A ‘score’ is assigned to the generated events, which reflects the intensity of the given anomaly with regard to an established profile. If the computed score exceeds the fixed threshold a message is triggered. As for MD methods, since they also produce false positives, the score of the generated messages is fixed during a training phase to a value from 0 to 1 (it represents the likelihood to monitor correctly an attack symptom). According to the process described in Section 3, the symptoms collected by the distributed probes are classified and normalized through the Agents. Then, they are aggregate by using the clustering-based aggregation approach. Specifically, the symptoms are correlated on the base of the attack type, the target component, and the temporal proximity. For each kind of target Cloud component to be monitored, the defined ontology identifies the symptoms to aggregate for each potential attack associated with that target. The temporal proximity correlation is based on a time window T c. However, this approach requires that all clocks at probes be synchronized, e.g., by using a total ordering based on timestamps, or a finer-grained mechanism. At the end of this phase, for each attack target a meta-event E(k) = eA1 (k), ..., eAm (k) is triggered. For each possible attack Ai , eAi (k) contains the symptoms correlated during the time window k. In order to reduce the number of false positives generated by probes, during the ranking and filtering phase, an approach based on the confidence (C) of the events is adopted. Assuming that eAi (k) = s1 , ..., sz is the set of correlated symptoms related with the attack of type Ai during the time window k, the confidence is the likelihood that the monitored symptoms represent an underlining attack of such a type. CAi (k) is the sum of several terms, one for each symptom. Each term is characterized by a weights wp and an Intensity Scores (IS). The IS is a probability value related to each monitored symptom. It is estimated through the method used by probe to monitor the symptom. It reflects the likelihood that the observed symptom represents a malicious behavior. Since, the observed features are not commensurable, the IS values has to be normalized to zero mean and unit variance. The

wp (s) is associated with trustworthiness of the probe p to monitor a symptom of the attack Ai. It is assigned on the base of a prior knowledge of the effectiveness of the monitoring method being used for the given attack type. Although simple in implementation, choosing proper weights is of critical importance to highlighting the proper features under various attacks. By using the computed confidence value, the generated events can be filtered. If the confidence does not exceed a threshold (specific of each attack, and estimated during the training phase), the event will be discarded (i.e., it is considered as a false positive). A concrete implementation example of such approach is presented in a previous work (17). When the Diagnoser receives an event from the Filter, it can process or ignore this event according to its confidence. During the diagnosis activity, the state of the target component can be: TRUSTED, TRUSTWORTHY, SUSPECTED, or CORRUPTED. Figure 5 describes the evolution of the state according to the received events. When an event eA1 is received, the target component state is sets to TRUSTWORTHY, and remains in this state until the confidence level exceeds the confidence threshold CTA1 fixed during a training phase, or a timer expires. If the timer expires, the security event is considered a false positive and the state returns to TRUSTED. If the confidence threshold CTA1 is exceeded, the SUSPECTED state is reached and an alarm is generated. In this state, a possible reaction can be performed based on the severity of the intrusion. Using the defined ontology schema, all the possible effects of detected attack Ai are verified. If a corrupted state of the component is found, the CORRUPTED state is reached, otherwise the state is set to TRUSTED. Finally, a compact diagnosis report is built and provided to the Cloud security administrator. For each produced meta-event, the report contains a summary of the information produced during the previous phases, including the likely cause of the detected anomalous behaviors (i.e., the attack Ai), and the target T i. Based on the results of this diagnosis, the security administrator can (i) determine whether the error processing mechanism is appropriate (e.g., because the observed attack is not successful); (ii) prepare an adequate remediation action (e.g., by a system reconfiguration, or adding new virtual nodes); (iii) improve the quality of the error detection (e.g., reconfiguring the IDS by tuning an anomaly detection threshold); and (iv) find out who/what performed the attack.

9

Figure 6

Example of attack scenarios

6.2 Attack scenario recognition During an attack scenario, it is more likely that the attacker first performs some knowledge gathering steps, which consist of a set of commands that enable him/her to gain knowledge about the target system, and then performs the intrusion. For example, a port scan can be adopted to trace the open ports on some virtual machine (VM) deployed in the Cloud. It can be only the first step of a more complex attack, which should point with an intrusion in a server data base deployed on the target VM. Figure 6 shows an example of an attack scenarios. It consists of a sequence of more common intermediate attacks followed by an attacker to discover and exploit a ‘SQL Injection’ vulnerability (17) of some Web application running on a VM. In particular, the considered intermediate attacks are the following ones: • A1 - Port scanning: A scanning procedure is adopted to trace the open ports on different Cloud nodes (e.g., distributed port scan in space and in time). • A2 - Telnet: A discovered port 80 (on a specific VM) is used to determine if a Web server is listening on that port (e.g., by a telnet request). • A3 - Directory traversal: By sending to the Web server a custom request, consisting of a long path name created artificially by using numerous dot-dotslash, the attacker gains a listing of the Web server directory contents. During this step, a hidden Web page (no public link to that page is published on the Web) is discovered. • A4 - Policy violation & buffer overflow: The attacker enters strings with variable length as GET parameters of the requests to the Web page discovered. With very long strings an error message is displayed within the Web page. The page contains the structure of a query performed by the application in the data base (e.g., login query). • A5 - SQL injection: Known the query structure (e.g., in terms of tables in which are located confidential information), the attacker performs a SQL Injection attack by HTTP POST requests.

Table 1 The post- and pre-conditions for Scenario 3 recognition Intermediate Attack

Pre-Conditions

Post-Conditions

A3

No considered

Keywords of directory are

Directory traversal

for Scenario3

contained in the provided Web page (e.g., root,/bin,/users)

A4

A hidden

Police violation

Policy violation

Web page is discovered

on a specific page

A5 Buffer

Violation of an access

Keywords of SQL syntax are contained in the provided

overflow

right

Web page (e.g., select,from)

A6

Query

Confidential information

SQL injection

structure is discovered

is discovered (e.g., encrypted password)

Figure 7 shows some of symptoms and effects for each intermediate attack. As described in Section 3, the preconditions specify the context to be satisfied for attack to succeed. They are verified by the outcomes of the monitoring phase (attack symptoms). The post-conditions specify the effects of an intrusion. They are verified by means of data inferred by the information reported by the diagnosis outcomes. Pre- and post-conditions are specified by logical and temporal conditions (predicates), which are combined by using logical connectives (conjunctions and negations) among them. Table 1 presents an example of the pre- and post-conditions that have to be verified in order to recognize the Scenario3 represented in Figure 6. In general, for each known scenario, it is possible to define a complex query that can be performed by the CEP to detect the specific scenario. In particular, the implemented CEP-based correlation engine receives two streams of events, the first one by the Filter and the second one by the Diagnoser. These streams represent the symptoms and the effects of detected intermediate attacks. Therefore, for each type of attack, a query is defined and performed by the CEP in order to correlate symptoms and effects of such an attack. The query operates on two input streams. ‘SYM’ is the input stream associated with the symptoms, whereas ‘EFF’ is the input stream associates with the effects. The query execution generates a new tuple that is added to a corresponding output stream. For example, the following query filters the pre- and postconditions of a generic attack ‘Ax’, within a temporal window.

10

Figure 7

The monitored symptoms and effects

__________________________________________________________ ATTACK PRE- AND POST-CONDITIONS MATCHING ---------------------------------------------------------Stream In SYM Stream In EFF Stream Out Ax SELECT COUNT(*) AS eventID // Assign event identifier SYM.IP_Source AS PRE.IP_Source, ... EFF.IP_Target AS POST.IP_Target, ... EFF.time AS POST.time, ... FROM SYM, EFF WHERE (SYM.attackType = EFF.attackType) and ABS(EFF.time - SYM.time) < T) __________________________________________________________

Using events received by output streams associate with the intermediate attacks, other queries are performed to recognize the attack scenarios. For example, the following symbolic query is used to recognize Scenario3. We supposed that the intermediate attack source is the same. __________________________________________________________ RECOGNITION OF KNOWN SCENARIOS Scenario3 detection ---------------------------------------------------------Stream In A3, A4, A5, A6 Stream Out SCENARIO3 INSERT INTO SCENARIO3 SELECT A6.POST.time, A6.POST.queryResult, ... FROM {{{ // Post-conditions of A3 and pre-conditions of A4 // have to be verified. Moreover, the source and // the target of both attacks have to be the same // (i.e., the same IP address and port) A3 JOIN A4 ON (A3.POST.keyword = true) and (A4.PRE.pageDiscovered = true) and (A3.POST.target.IP_addr = A4.POST.target.IP_addr) and (A3.POST.target.port = A4.POST.target.port) and (A3.PRE.source.IP_addr = A4.PRE.source.IP_addr)} // Post-conditions of A4 and pre-conditions of A5 // have to be verified. The source and the target of both // attacks have to be the same (i.e., the same web page)

JOIN A5 ON (A4.POST.violation = true) and (A5.PRE.violation = true) and (A4.POST.target.IP_addr = A5.POST.target.IP_addr) and (A4.PRE.source.IP_addr = A5.PRE.source.IP_addr)} // Some query keywords used in A6 is the same // discovered during A5 JOIN A6 ON (A5.POST.keywords = A6.PRE.keywords) and (A5.PRE.source.IP_addr = A6.PRE.source.IP_addr)} WHERE { // Temporal conditions (A3.PRE.time < A4.PRE.time) and (A4.PRE.time < A5.PRE.time) and (A5.PRE.time < A6.PRE.time) and ((A6.PRE.time - A3.PRE.time) < deadline) } // A new tuple is added to the output stream only if it // is not already included in SCENARIO1 or SSCENARIO2 EXCEPT (SCENARIO1 OR SCENARIO2) __________________________________________________________

In general, an attack scenario can be described by a state-transition-based language used to describe sequence of malicious actions performed by the attacker (30), and it can be learned from training data sets using data mining, and machine learning techniques (31). On the other hand, this approach is restricted to known scenarios, which have to be described by a human expert, or learned from training data sets. The technique using pre- and post-conditions (adopted in this work) has the potential of discovering unknown attacks patterns by matching the consequence of some previous alerts with the prerequisite of some later ones. In particular, in order to detect unknown attack patterns, a more complex query schema has to be defined and performed. For example, if we consider four types of attacks {A, B, C, D}, and assuming that (in order to simply the considered example):

11 • (a) (A, B) is the generic query used to verify the matching between the A post-conditions and the B pre-conditions; • (b) for each scenario, A can only precede the other attack types {B, C, D}, B can only precede the attack types {C, D}, and C can only precede the attack type D; the following decision tree has to be implemented by the CEP: _________________________________________________ \\ Recognition of unknown attack scenario if ((A,B) and ((B.time - A.time) < deadline)) if ((B,C) and ((C.time - A.time) < deadline)) if ((C,D) and ((D.time - A.time) < deadline)) Scenario(A,B,C,D) else Scenario(A,B,C) else Scenario(A,B) if ((A,C) and ((C.time - A.time) < deadline)) if ((C,D) and ((D.time - A.time) < deadline)) Scenario(A,C,D) else Scenario(A,C) if ((B,C) and ((C.time - B.time) < deadline)) if ((C,D) and ((D.time - B.time) < deadline)) Scenario(B,C,D) else Scenario(B,C) if ((A,D) and ((D.time - A.time) < deadline)) Scenario(A,D) if ((C,D) and ((D.time - C.time) < deadline)) Scenario(C,D) _________________________________________________

The previous algorithm can be implemented by performing three sequential query sets, in which an output stream of a query set is the input stream for the next query sets: 1. (A,B); (A,C); (A,D); (B,C); (B,D); (C,D); with the general query (X,Y) equal to: _____________________________________________ Stream In X,Y Stream Out XY SELECT X.PRE.time AS PRE.time, X.PRE..., Y.POST.time AS POST.time, Y.POST... FROM X,Y // Conditions to be verified WHERE (X.POST = Y.PRE) _____________________________________________

2. (A,B) Join (B,C); (A,B) Join (B,D); (B,C) Join (C,D); with the general query (X,Y) Join (Y,W) equal to: _____________________________________________ Stream In XY,YW Stream Out XYW SELECT XY.PRE.time AS PRE.time, XY.PRE..., YW.POST.time AS POST.time, YW.POST... FROM XY,YW WHERE (XY.POST.attackType = YW.PRE.attackType) and(XY.eventID = YW.eventID) _____________________________________________

3. (A,B) Join (B,C) Join (C,D), with :

_____________________________________________ Stream In XYW,WZ Stream Out XYWZ SELECT XYW.PRE.time AS PRE.time, XYW.PRE..., WZ.POST.time AS POST.time, WZ.POST... FROM XYW,WZ WHERE (XYW.POST.attackType= WZ.PRE.attackType) and(XYW.eventID = WZ.eventID) _____________________________________________

If the assumption (b) is not true, a more complex algorithm has to be defined and performed by the CEP.

7 Conclusion and lessons learned This work presents a solution to design distributed Intrusion Detection System, which can be used by service providers to monitor their Clouds. It proposes a hybrid and hierarchical correlation approach, which captures the causal relationships between the alarms that represent intermediate attacks of a more complex attack scenario, by correlating them on the base of temporal and logical constraints. The correlation capability is based on a CEP and driven by the knowledge-base, which is used to recognize attack scenarios. A common weakness of this correlation technique is that it requires specific knowledge about the attack strategies that a potential attacker could follow in order to discover a vulnerability of some Cloud component. Moreover, for each intermediate attack, it is difficult to define the prerequisites and the possible consequences. Thus, the major limitation of this technique is that it cannot correlate unknown attacks, since their prerequisites and consequences are not defined. Furthermore, for a multi-step attack, usually there is a time gap between executing two consecutive stages. Therefore, during a training phase should be necessary to extract time duration between steps of the attack scenarios, and use the results as a threshold in the framework to estimate the required time for observing the next stage of a specific attack scenario. On the other hand, it is hard to specify quantitative time constraints, because the time gaps between each stage may vary a lot, depending on how hurried the attacker is. Moreover, attack profiles to be used for training are hard to find. Therefore, in this work alarms that cannot be merged to any scenario within a deadline are provided to the administrator individually. In future work, we will aim at defining approaches to automatically estimate the more probable attack scenario in progress in the Cloud, in order to active a reaction before the attacker achieves its final objective.

12

References [1] S. Ricciardi, D. Careglio, G. Santos-Boada, J. SolePareta, U. Fiore, and F. Palmieri. Saving Energy in Data Center Infrastructures. In Proc. of the 1st International Conference on Data Compression, Communications and Processing, 2011. Pp. 265-270. [2] V. Holub, T. Parsons, J. Murphy, and P. O’Sullivan. Scalable Run-Time Correlation Engine for Monitoring in a Cloud Computing Environment. In Proc. of the 17th IEEE Int. Conf. on the Engineering of Computer Based Systems, 2010. Pp. 29-38. [3] S. Ramgovind, M. Eloff, and E. Smith. The Management of Security in Cloud Computing. In Proc. of the Int. Conf. on Information Security for South Africa, 2010. [4] K. Schulter. Intrusion Detection for Grid and Cloud Computing. In IEEE IT Professional Journal, July 2010. [5] R. Zhang, W. Xie, W. Qian, and A. Zhou. Security and Privacy in Cloud Computing: A Survey. In Proc. of the the 6th Int. Conf. on Semantics Knowledge and Grid, Nov. 2010. Pp. 105-112. [6] F. Palmieri, S. Ricciardi, and U. Fiore. Evaluating Network-Based DoS Attacks under the Energy Consumption Perspective: New Security Issues in the Coming Green ICT Area. In Proc. of the Int. Conf. on Broadband and Wireless Computing, Communication and Applications (BWCCA), Oct. 2011. Pp. 374-379. [7] F. Cheng and C. Meinel. Intrusion Detection in the Cloud. In Proc. of the IEEE Int. Conf. on Dependable, Autonomic and Secure Computing, Dec. 2009. Pp. 729734. [8] Chi-Chun Lo, Chun-Chieh Huang, and Joy Ku. A Cooperative Intrusion Detection System Framework for Cloud Computing Networks. In Proc. of the 39th Int. Conf. on Parallel Processing, Set. 2010. Pp. 280-284. IEEE CS Press. [9] Min-Woo Park and Jung-Ho Eom. Multi-level Intrusion Detection System and Log Management in Cloud Computin. In Proc. of the 13th Int. Conf. on Advanced Communication Technology, Feb. 2011. Pp. 552-555. IEEE CS Press.

Social Networks, Nov. 2012, pp. 244-249. IEEE CS Press. [12] D. Yu and D. Frincke. Alert Confidence Fusion in Intrusion Detection Systems with Extended DempsterShafer Theory. In Proc. of the 43rd ACM Southeast Regional Conference, vol. 2, May 2005. Pp. 142-147. [13] M. Haibin and G. Jian. Intrusion Alert Correlation based on D-S Evidence Theory. In Proc. of the 2th Inter. Conference on Communications and Networking, Aug. 2007. Pp. 377-381, IEEE CS Press. [14] B. Morin and H. Debar. Correlation of Intrusion Symptoms: an Application of Chronicles. In Proc. of the 6th Int. Conf. on Recent Advances in Intrusion Detection (RAID’03). [15] F. Valeur, G. Vigna, and A. Kemmerer. A Comprehensive Approach to Intrusion Detection Alert Correlation, In IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 3, July 2004. Pp. 146-169. [16] Jie Ma, Zhi-tang Li, and Wei-ming Li. RealTime Alert Stream Clustering and Correlation for Discovering Attack Strategies. In Proc. of the 5th ACM International Conference on Fuzzy Systems and Knowledge Discovery, Oct. 2008. ACM Press. [17] M. Ficco, L. Coppolino, and L. Romano. A weightbased symptom correlation approach to SQL injection attacks. In Proc. of the 4th Latin-American Symposium on Dependable Computing, Sept. 2009. Pp. 9-16. ACM Press. [18] M. Ficco and M. Rak. Intrusion tolerance of stealth dos attacks to web services. In Proc. of the Int. Conf. on Information Security and Privacy, Springer-Verlag LNCS, vol. 376, 2012, pp. 579-584. [19] A. Nickelsen, J. Gronbak, T. Renier, and H.P. Schwefel. Probabilistic Network Fault-Diagnosis Using Cross-Layer Observations. In Proc. of the International Conference on Advanced Information Networking and Applications, 2009. Pp. 225-232. IEEE CS Press. [20] M. Ficco and M. Rak. Intrusion tolerant approach for denial of service attacks to web services. In Proc. of the 1st Int. Conf. on Data Compression, Communication, and Processing, 2011. Pp 285-292. IEEE CS Press.

[10] M. Ficco, S. Venticinque, and B. Di Martino. Mosaic-based intrusion detection framework for cloud computing, Springer-Verlag LNCS, vol. 7566, 2012, pp. 628-644.

[21] S. Cheung, U. Lindqvist, and M. W. Fong. Modelling multistep cyber attacks for scenario recognition. In Proc. of DARPA Information Survivability Conference and Exposition, 2003. Pp. 284-292.

[11] M. Ficco, M. Rak, and B. Di Martino. An intrusion detection framework for supporting SLA assessment in Cloud Computing. In Proc. of the 4th International Conference on Computational Aspects of

[22] M. Ficco and M. Rak. Intrusion tolerance as a service: a SLA-based solution. In Proc. of the 2nd Int. Conf. on Cloud Computing and Services Science, Apr. 2012, pp. 375-384. IEEE CS Press.

13 [23] M. Ficco and M. Rak. Intrusion tolerance in cloud applications: the mosaic approach. In Proc. of the 6th Int. Conf. on Complex, Intelligent, and Software Intensive Systems, 2012. [24] V. Artem and H. Jun. Security attack ontology for web services. In Proc. of the 2th Int. Conf. on Semantics, Knowledge, and Grid, 2006. Pp. 42-49, IEEE CS Press. [25] D. Curry and H. Debar. Intrusion Detection Message Exchange Format: Extensible Markup Language (XML) Document Type Definition, draft-ietf-idwg-idmef-xml10.txt, Jan. 2003. [26] R. Varandi. SEC - a lightweight event correlation tool. In Proc. of the IEEE Workshop on IP operation and management, Dec. 2002. Pp. 111-115. IEEE CS Press. [27] M. Ficco, A. Daidone, L. Coppolino, L. Romano, and A. Bondavalli. An event correlation approach for fault diagnosis in scada infrastructures. In Proc. of the 13th

European Workshop on Dependable Computing, May 2011. Pp. 15-20. ACM CS Press. [28] Borealis: Distributed Stream Processing Engine, Brandeis University, Brown University, and the Massachusetts Institute of Technology. Available at: http://www.cs.brown.edu/ research/borealis/public/ [29] Prelude, an Hybrid Intrusion Detection System. Available at: http://www.prelude-ids.com [30] F. Cuppens and R. Ortalo. Lambda, A language to model a database for detection of attacks. In Proc. of the 3rd International Symposium Recent Advances in Intrusion Detection (RAID 2000), LNCS, SpringerVerlag Heidelberg, 2000. Pp. 197-216. [31] R. Vaarandi and K. Podins. Network IDS Alert Classication with Frequent Itemset Mining and Data Clustering. In Proc. of the International Conference on Network and Service Management, 2010. Pp. 451-456. IEEE CS Press.

Suggest Documents