An Approach to Model Network Exploitations Using Exploitation Graphs

SIMULATION http://sim.sagepub.com

An Approach to Model Network Exploitations Using Exploitation Graphs Wei Li, Rayford B. Vaughn and Yoginder S. Dandass SIMULATION 2006; 82; 523 DOI: 10.1177/0037549706072046 The online version of this article can be found at: http://sim.sagepub.com/cgi/content/abstract/82/8/523

Published by: http://www.sagepublications.com

On behalf of:

Society for Modeling and Simulation International (SCS)

Additional services and information for SIMULATION can be found at: Email Alerts: http://sim.sagepub.com/cgi/alerts Subscriptions: http://sim.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 16, 2008 © 2006 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

MODELING NETWORK EXPLOITATIONS USING EXPLOITATION GRAPHS

An Approach to Model Network Exploitations Using Exploitation Graphs Wei Li Graduate School of Computer and Information Sciences Nova Southeastern University 3301 College Avenue Fort Lauderdale, FL 33314 [email protected] Rayford B. Vaughn Yoginder S. Dandass Department of Computer Science and Engineering Mississippi State University Box 9637, Mississippi State, MS 39762 In this article, a modeling process is defined to address challenges in analyzing attack scenarios and mitigating vulnerabilities in networked environments. Known system vulnerability data, system configuration data, and vulnerability scanner results are considered to create exploitation graphs (egraphs) that are used to represent attack scenarios. Experiments carried out in a cluster computing environment showed the usefulness of proposed techniques in providing in-depth attack scenario analyses for security engineering. Critical vulnerabilities can be identified by employing graph algorithms. Several factors were used to measure the difficulty in executing an attack. A cost/benefit analysis was used for more accurate quantitative analysis of attack scenarios. The authors also show how the attack scenario analyses better help deployment of security products and design of network topologies. Keywords: Exploitation graph (e-graph), vulnerability graph, graph-based modeling, computer security

1. Introduction Security engineers face a challenging world today because of the increase in complexity of network-based intrusions/attacks and in the availability of tools used by attackers. Unfortunately, current vulnerability assessment (VA) activities do not fully address these challenges. For example, one common VA activity is to scan multiple hosts in a networked environment to provide a comprehensive list of vulnerabilities. Because of various limitations, it is usually infeasible to remove all these vulnerabilities—even for organizations with well-defined security policies and small-sized networks. In such cases, an understanding of the logical relationships between vulnerabilities is important to perform an in-depth VA analysis that will identify the most critical vulnerabilities to be eliminated. SIMULATION, Vol. 82, Issue 8, August 2006 523-541 © 2006 The Society for Modeling and Simulation International DOI: 10.1177/0037549706072046

Computer security researchers have been investigating techniques to study intrusive behaviors in computer systems and networks for several decades. Attack modeling is one such technique. It refers to methods used to formally model attack scenarios based on information gathered from different sources. These sources may include system configuration, sensor data, audit/log files, known attack scenarios, and other security-related information generally available to system administrators. Based on resulting models, attack scenarios can be simulated, recorded, studied, detected, and reacted to. Attack modeling has been used in areas such as vulnerability assessment [1], red team penetration testing [2], and intrusion alert correlation [3]. Graph-based approaches appear to be well suited for modeling attacks in networked environments. The fundamental idea behind graph-based approaches is that attacks can be viewed as chains of activities/events performed by attacker(s). These activities/events are then used as a basis for constructing directed graphs. Nodes within these graphs represent a combination of security-related system Volume 82, Number 8 SIMULATION 523


Li, Vaughn, and Dandass

states such as changes to system files, levels of user privileges, or trust relationships between different hosts. Edges connecting the nodes represent potential steps performed by the attackers, where the outcome of each step depends on the outcome of previous steps. Lengths of attack paths usually depend on the intention of attackers and the difficulty of performing specific attacks. Several graph-based attack modeling approaches have been proposed in recent years. For example, the attack tree approach uses tree structures to represent attacks [4]. The alert correlation approach uses alerts generated by multiple security sensors to build scenario graphs that represent intrusive activities [3, 5, 6]. Some graph-based approaches use model-checking techniques to build graphs and simulate attack scenarios [7-10]. Other approaches use preconditions and postconditions [11-13] or attack languages [6, 14, 15] to model attack behaviors. Although these approaches are useful in representing ongoing or hypothesized attack scenarios, they also introduce several implementation challenges. First, these approaches must address information at several levels of granularity in their models. This is because attackers not only take advantage of vulnerabilities to attack a system but also make use of available information at different granularities (e.g., network topology, trust relationships between hosts, detailed system configuration information, and software and hardware usage). Combining such heterogeneous information into a single graph-based model is, in itself, a challenging task. Second, an effective graph structure is needed to accommodate information related to attack scenarios. There are several options one can choose from when representing attack scenarios using graphs. For example, nodes inside graphs can represent vulnerabilities, alert information, changes of important system files, or user privileges. Edges can represent exploits used during different stages of an attack, intrusions that have triggered alerts, methods used to change system files, or operations used to elevate user privileges. The graph structure used in modeling greatly affects the graph-building process and the techniques used to analyze attack scenarios. Third, an effective graph abstraction technique is needed to simplify the resulting graphs. The graphs should provide useful information to system administrators in a succinct manner. Currently, no graph-based attack modeling method achieves all of these objectives. In this article, we describe an approach that addresses the first two objectives. Solutions to the third objective can be found in our previously published work [16]. A basic assumption of this research is that most system attacks exploit system vulnerabilities. In our approach, known system vulnerability data, system configuration data, and vulnerability scanner results are combined in a systematic manner to create an exploitation graph (e-graph) representing attack scenarios. This modeling process consists of two primary steps. The first step is to create a knowledge base of vulnerability graphs (v-graphs) from known system vulnerabilities. These vulnerabilities are represented using precon-

ditions and postconditions. To address granularity issues associated with vulnerability information, we categorize preconditions and postconditions into groups and encode vulnerabilities using a limited set of attributes. A sample of vulnerability data is extracted from the widely used Common Vulnerabilities and Exposures (CVE) [17] data entries. A set of corresponding v-graphs is then built for this data sample. These graphs are constructed to show preconditions for a single vulnerability as well as postconditions that occur after exploitation of a single vulnerability. The second step in our approach involves associating multiple v-graphs with system administrator data (that define the overall configuration of the system) and with vulnerability scanner data to create a system-specific e-graph. We describe system configurations using a limited set of attributes to demonstrate the technique. Experiments show that our approach is effective in discovering security vulnerabilities in high-performance cluster computing environments. Furthermore, it appears that e-graph models are useful in providing in-depth vulnerability assessments, in general. For example, analysis of an exploitation graph can help answer questions such as the following: given an initial system state and an attack goal, what is the minimum set of vulnerabilities that could be removed to prevent the attack? Answers to these questions are useful in designing defensive mechanisms and in performing a return-on-investment (ROI) analysis. The approach described in this article is a “static analysis” method in the sense that defenders model potential attack scenarios before the attacks actually happen. Modeling of complex attack scenarios is achievable because the modeling method proposed uses knowledge similar to that used by the attackers. The difference between a defender and an attacker is that the attacker only needs to find one feasible path to complete an attack, while the defender needs to discover all possible paths and take appropriate measures to block them. The e-graph is a tool that can facilitate this defensive process. The advantage of static modeling is that system administrators can take advantage of data provided by vulnerability scanners used on their systems and combine them with other known attributes of the network to provide a more complete view of their vulnerability status. 2. Related Work Over the past two decades, a variety of approaches for graph-based attack modeling techniques have been proposed. In the attack tree approach proposed by Schneier [4], AND-OR tree structures are used to model attacks. Inside an attack tree, nodes represent attack goals and subgoals. Logical structures, such as AND or OR nodes, are used to represent the relationship between low-level events. If all of the low-level events are needed for the top event to occur, these events are grouped under an AND node. If any of the low-level events can trigger the top event to occur, these events are grouped under an OR node. The attack

524 SIMULATION Volume 82, Number 8



tree approach has been shown to be successful in analyzing real-world attacks, such as an attack against Pretty Good Privacy (PGP). PGP is a set of programs used for secure communication based on the public/private key encryption scheme [4]. This approach has also been used in defense experiments such as complex penetration testing and organized red team work [2]. Another attack graph approach was proposed by Swiler et al. [11, 12]. Inside the attack graphs, nodes represent possible attack stages, and edges represent changes of state caused by hostile behaviors. Inputs for this approach include configuration files, attacker profiles, and attack templates. All nodes and edges are generated according to predefined templates, with only the templates applicable to the environment being instantiated [11, 12]. This approach is similar to ours because it also uses graph structures to model attacks. However, it does not directly handle vulnerability data (CVE data) and scanner data, as we do in our research. Due to the complexity inherent in the graph generation process, model checkers such as Symbolic Model Verifier (SMV) have been used for generating attack graphs [810]. In this approach, a network is modeled as a finite state machine (FSA) where state transitions represent atomic attacks. A security policy is encoded and checked against the FSA. If the security policy fails, an attack graph is generated to show all counterexamples (indications of possible attack paths) [8-10]. Using model checkers, the graph generation process can be automated. However, the generation process for attack graphs is still computationally expensive because of the state explosion problem.1 To reduce the complexity of attack graphs, Ammann et al. [7] proposed a solution based on the monotonicity property of exploitations. This property states that “the pre-condition of a given exploit is never invalidated by the successful application of another exploit” [7]. Based on this assumption, attack graphs can be generated efficiently using a goalbased search. These approaches differ from our approach in two important ways. One difference is that there is no vulnerability database constructed according to pre- and postconditions as in our research. Another difference is that this approach does not target the modeling of highperformance computing facilities as our attack modeling technique does. Specialized attack modeling languages have been used in attack modeling. Templeton and Levitt [6] proposed a “requires/provides” model and used an attack specification language, JIGSAW, to provide a set of tools and specifications to describe attacks. Cheung, Lindqvist, and Fong [14] proposed a language, the Correlated Attack Modeling Language (CAML), to model multistep attack scenarios. CAML uses a library of predicates that serve as vocabular1. In the area of formal verification, the state explosion problem means that the size of the search space grows exponentially as the size of the model to be checked grows. This problem leads to the exhaustion of storage available to verification tools [18].

ies to describe the properties of system states and events related to attacks [14]. Ramakrishnan and Sekar [15] proposed a Prolog-like language, similar to CAML, for representing abstract and accurate models of behaviors within an operating system (e.g., UNIX). While these approaches are semantically similar to the attributes used in our vulnerability database, this approach does not handle CVE data. Also, it does not correlate data with system-specific information to create attack graphs. Alert correlation techniques have also been used for attack scenario construction. These approaches work by constructing attack scenarios from alerts generated by multiple sensors to facilitate better intrusion analysis [3, 5]. It should be noted that alert correlation is a parallel research area to the graph-based modeling approaches. The difference is in the data used to model attacks and their subsequent interpretations. Alert correlation makes use of alert data and focuses on “what has happened” according to data reported by intrusion detection systems (IDSs). Graph-based attack modeling makes use of network configuration and vulnerability data, with the focus on “what may happen” inside networked environments. These two approaches can be used together in attack scenario analysis. One recent approach is the work by Noel et al. at George Mason University [13, 19, 20]. A topological vulnerability analysis (TVA) tool is used to perform impact analysis of various network configurations on overall network security. There are three major components in this architecture. The first part is a database composed of descriptions of vulnerabilities. The second part is a description of a network in which information is discovered using open-source tools. The third part is a specification of an attack scenario that includes information about initial conditions, attack target, and configuration changes. This tool can be used to generate exploit dependency graphs that represent all attack paths related to specific attack goals [13]. It should be noted that these graphs are generated based on the monotonicity assumption proposed by Ammann et al. [7]. This approach is further extended by an effort that enables system administrators to interactively reduce the complexity of the exploit dependency graphs. This complexity reduction technique is based on a set of predefined aggregation rules that correspond to different network elements at different levels of abstraction [19]. Our approach differs from the TVA approach as follows. First, we use a vulnerability template to facilitate the definition of single vulnerabilities. This template can be used to represent a large number of vulnerabilities according to predefined attributes and is stored in a database. When a new vulnerability is discovered, it can easily be added as a new entry in the vulnerability database. Second, our approach focuses on modeling vulnerabilities in the domain of high-performance computing (HPC) clusters to address their particular security requirements [16, 21, 22]. Experimental results show that in-depth VA analysis is useful in such computing environments. Third, we develop efficient Volume 82, Number 8 SIMULATION 525



Vulnerability Base

Graph-Generation Module

Vulnerability Scanner

System Information Gathering Tool

E-Graphs

Graph-Simplification Module

Simplified E-Graphs E-Graphs

Simplified E-Graphs Graph-Drawing Module

Figure 1. An overview of the e-graph approach

graph simplification techniques to achieve a simplified representation of attack scenarios [16]. In general, our approach is distinctive from previous approaches in that it is more specific to realistic vulnerability data and operational systems. Our approach also focuses more on improving vulnerability scanner capabilities and devising vulnerability mitigation strategies.

• Step 1. Create a knowledge base of vulnerability graphs (v-graphs) from known system vulnerabilities. These vulnerabilities are represented using preconditions and postconditions. V-graphs show preconditions for a single vulnerability as well as postconditions that occur after exploitation of a single vulnerability.

3. Overview

• Step 2. Instantiate v-graphs with specific exploitations. This step is necessary because a single vulnerability may lead to several exploitations.

Figure 1 provides an overview of the techniques used for creating exploitation graphs specific to a network system. Vulnerability data (e.g., CVE [17], Bugtraq [23], CERT Advisories [24], National Vulnerability Database [25], and SANS Top 20 [26]) are stored in a vulnerability knowledge database in which each entry is represented using preconditions and postconditions. To facilitate the definition of customized vulnerabilities, we use a template to categorize these conditions so that end users have the flexibility to add new rules whenever necessary [16, 27]. In our template, most of the preconditions and postconditions can be represented using discrete values (e.g., Boolean values or integers). Other conditions (e.g., application name and kernel version) are represented using strings. We generate e-graphs by associating these data with vulnerability data discovered using network scanners (we use the STAT scanner from Harris Corporation [1]), combined with systemspecific information (e.g., host connectivity, open ports, and security products). The data can also be simplified and compact graphs can be generated using our simplification module. Two-way conversion between simplified and original e-graphs can also be performed to facilitate useful security analyses. The graphs may be drawn using open-source software, such as Graphviz [28]. The graph generation process can be described as follows:

• Step 3. Instantiate exploitations with specific host information. This step is necessary because a single exploitation may be performed on multiple hosts. • Step 4. Associate multiple exploitations with system administrator data and with vulnerability scanner data to create a system-specific e-graph. Inside an e-graph, nodes represent instantiated exploitations and edges represent systems states that make exploitations effective. • Step 5. Apply graph algorithms to model potential defensive strategies. This step is necessary because each e-graph is specific to a problem domain (e.g., prioritizing the removal of vulnerabilities). Graph algorithms should be chosen according to this predefined domain. Trade-offs are usually needed in this step because of the potentially high computational cost of finding optimal solutions. • Step 6. Apply graph simplification techniques (optional). • Step 7. Present resulting e-graphs.




Remote Access

Version of OpenSSH is between 2.9.9 and 3.3

SSHd is running

CVE-2002-0639

Root Level Access Figure 2. A vulnerability example: CVE-2002-0639 [17]

4. Creation of the Vulnerability Database

4.1 Data Source

The first step in our work is the creation of a vulnerability database. This database serves as a searchable data source that facilitates attack scenario construction. Each vulnerability has a data entry in the vulnerability database, and its preconditions and postconditions are listed using a set of predefined attributes. For the construction of our vulnerability database, we chose to use CVE from the MITRE Corporation because of its widespread acceptance and popularity. However, CVE is not the only possible source of vulnerability information. Others include the SANS top 20 vulnerability list [26], Bugtraq entries [23], CERT advisories [24], and vulnerability information from vendors of software/hardware products. Figure 2 is an example of preconditions and postconditions developed for a CVE entry. This example shows CVE-2002-0639, an integer-overflow vulnerability in OPENSSH 2.9.9 through 3.3 that allows remote attackers to get root-level access on the target host [17]. A simplified representation of this vulnerability indicates that there are three preconditions necessary to make the vulnerability exploitable: the attacker must have remote access to the target host, the versions of OPENSSH must be between v2.9.9 and v3.3, and SSHD must be running. The single postcondition shows that the attacker gains root-level access on the target host. The process of finding and combining preconditions and postconditions for vulnerabilities to represent attack scenarios requires substantial effort and may be error prone. For example, additional preconditions could be added to the example in Figure 2, such as “Target host is running a Linux/UNIX operating system.” Preconditions can also be split into several preconditions. For example, the node “Version of OpenSSH is between 2.9.9 and 3.3” can be split into two nodes, “OpenSSH is installed on target host” and “Version of OpenSSH is between 2.9.9 and 3.3.” In the following section, we partially address these issues and show an effective categorization process for preconditions and postconditions that can be used to facilitate the definition of v-graphs.

The goals of the CVE project include the creation of standardized names for computer vulnerabilities and security exposures and enabling the sharing of data across separate vulnerability databases and security tools [17]. In an effort to organize CVE data entries, researchers at the National Institute of Standards and Technology (NIST) placed vulnerability data into a “metabase” called ICAT [29].2 The ICAT metabase is a “searchable index leading one to vulnerability resources and patch information” [29]. ICAT has been widely used by system administrators because it facilitates the searching, organizing, and studying of known vulnerabilities. In our work, ICAT plays an important role during the construction of v-graphs because it assists in the acquisition of detailed vulnerability information as well as in the definition of preconditions and postconditions of vulnerabilities. ICAT has now been updated into the National Vulnerability Database (NVD) [10]. 4.2 Categorization of Preconditions and Postconditions Table 1 shows the categories of preconditions and postconditions. The table contains five columns. The first column indicates whether the (sub)category is a precondition or a postcondition. The second column shows categories and subcategories. The third column shows the attribute name for the corresponding (sub)category. These attributes will be represented using v-graphs. Column 4 provides the data type for each attribute, and column 5 gives an example for each (sub)category. In Table 1, there are four categories of preconditions, four categories of postconditions, and several subcategories. Note that different values on most preconditions can be compared using string-matching algorithms. Values of two preconditions (i.e., version of the operating 2. ICAT was initially a name used by a project in NIST with the intention to create a database of vulnerabilities. As the project changed its focus into building a searchable index, the name no longer has specific definition and does not officially stand for anything [29].

Volume 82, Number 8 SIMULATION 527



Table 1. Categories of preconditions and postconditions Condition Preconditions

Category

Name

Type

Example

Name

OS_name

String

Version Architecture Kernel

OS_version OS_archi OS_kernel

String String String

OS_name = “RedHat Linux” OS_version = “7.1” OS_archi = “ix86” OS_kernel = “2.4.2” or “2.4.1”

Name

App_name

String

Version

App_version

String

Access

Range User level

Access_range Access_level

Boolean Integer

Access_range = 1 Access_level = 0

Additional

Open port(s) Running application(s) Other

Addition_port Addition_runapp

Integer String

Addition_other

String

Addition_port = 22 Addition_runapp = “sshd” Addition_other = “listen_server (rpc.mountd)”

Availability

Boolean

Availability = 1

Boolean

Confidentiality = 0

Integrity

Boolean

Integrity = 1

Super user access

SecPro_superuser

Boolean

User access Other access

SecPro_user SecPro_other

Boolean Boolean

SecPro_superuser =1 SecPro_user = 0 SecPro_other = 1

Operating system

Application

Postconditions

Availability Confidentiality

Confidentiality

Integrity Security protection

system and version of the application) can be compared using mathematical operators (e.g., , =, ≤, and ≥). These operators facilitate the definition of most vulnerabilities because vulnerabilities typically exist only on specific versions of operating system (or applications). For example, we define 2.8 < 2.9.9 in terms of versions of applications. For each precondition, several conditions can be combined using logical operators such as “AND” and “OR.” For example, the kernel version precondition can be defined as OS_kernel = “2.4.2” or “2.4.1”. These operations facilitate the definition of complex requirements of preconditions. The definition of postconditions is simpler. Note that the postcondition (sub)categories are not necessarily mutually exclusive because an attacker might gain several outcomes upon successful exploitation of a single vulnerability. To map postconditions to the preconditions between different vulnerabilities, a mapping process is needed; this mapping process is discussed in more detail in the next section. The classification scheme introduced here is concerned with the prevention of exploitations in operational systems. There are other classification schemes based on different, but related, concerns. One approach is based on classifying the origin of vulnerabilities, such as program errors

App_name = “openssh” App_version > “2.9.9” and App_version < “3.3”

(e.g., input validation error, access validation error, exceptional condition handling error, environmental error, configuration error, race condition, and design error), during the software design phase [29]. Another approach classifies the manifestation of vulnerabilities (e.g., attack signatures) [30] to facilitate misuse intrusion detection. In our approach, we classify the origin of vulnerabilities as well as the manifestation of vulnerabilities. These two aspects are similar to those defined by preconditions and postconditions. For this research, we need the granularity of information provided to be at a level that is useful to vulnerability scanners such that a system administrator can know the general vulnerability status of the system (but not necessarily to discover the cause of the vulnerability). As a by-product, we envision our approach will benefit the reasoning capability of vulnerability scanners and will enable a more in-depth analysis of attack scenarios. Based on the categorization scheme presented, we construct a vulnerability database using ICAT information associated with a set of Linux vulnerabilities found in our operational high-performance cluster system called Microcosm [21, 22]. Additional specifications of this cluster are provided later in this article. Currently, we have about 100 data entries coded in the vulnerability database. All these




vulnerabilities are Linux/Unix CVE-listed vulnerabilities. Currently, the vulnerability database we have described is constructed manually. Techniques for combining these vgraphs with system configuration data are introduced in the next section.

Attacker

Host1

Server Local Network

Figure 3. A sample network

5. Associating V-Graphs with System Configuration Information to Generate E-Graphs Once the vulnerability database is built, we associate system configuration information and scanner data with entries in the vulnerability database to create a model of possible attack scenarios. This section discusses the techniques that can be used, followed by the algorithms and their complexity analysis. 5.1 Introduction The vulnerability database previously discussed contains data entries of single vulnerabilities that can be exploited in an operational system. To understand how different vulnerabilities are exploited in specific systems, these vulnerabilities are better viewed when associated with additional information (e.g., system configuration, host connectivity, and open ports). We show an example of how this approach works and why it is useful through the example shown in Figure 3. Suppose a small network consists of two host machines, Host 1 and Server (as shown in Figure 3), and an attacker is connecting to the local network via Host 1 using the Internet (i.e., the attacker does not have direct access to Server). Table 2 lists the vulnerabilities discovered on Host 1 and Server using a vulnerability scanner. The column “Exploitation Outcome” in Table 2 is an indication of outcomes resulting from the vulnerabilities. This column also partially represents the postconditions of vulnerabilities. For this research, we define exploitations as instantiations of vulnerabilities with specific host information (e.g., host ID). Possible values in column 3 include remote-to-user, remote-to-root, and user-to-root. Remote-to-user exploitation grants a remote attacker user-level privilege on the victim host. Remote-to-root exploitation grants a remote attacker root-level privilege on the victim host. User-toroot exploitation grants a local user root-level privilege on the victim host [31]. These categories are useful to explicitly describe changes of user privilege during exploitations and are drawn from the 1999 DARPA intrusion detection evaluation [31]. Host 1 and Server each have three vulnerabilities: one remote-to-user, one remote-to-root, and one user-to-root vulnerability. Note that all vulnerabilities listed have outcomes related to user privilege escalation (or elevation). Our experience shows that these vulnerabilities have been exploited in many attack scenarios and, therefore, are an important subset of CVE entries.

Figure 4 shows how different vulnerabilities can be associated to form an e-graph. Each of the subfigures in Figure 4 depicts an attack scenario. For convenience, within each subfigure, a hypothesized initial state, s0 , and one of two final states, sf or sf , are used. Each of these states has the following meanings. • s0 : The attacker has full access only on the attacker’s host. • sf : The attacker has root-level access on Host 1 . • sf : The attacker has user-level access on Server. Figure 4a shows an attack scenario that consists of two exploitations. The attacker first exploits vulnerability CVE-2002-0836 on Host 1 to obtain user-level access to Host 1 . Next, the attacker exploits vulnerability CVE2002-0178 on Host 1 to obtain root-level access to Host 1 . Note that the second exploitation cannot be performed without the first step. This is because one of the preconditions of CVE-2002-0178 is that “attacker has user-level access on Host 1 ,” which can be formally represented as (Access_level=1) and (Access_range=1) using the template discussed in the previous section. Similarly, one of the postconditions of CVE-2002-0836 is that “attacker has user-level access on Host 1 ,” which can be formally represented as SecPro_user=1 using our template. In this case, we say that the postconditions of CVE-2002-0836 match the preconditions of CVE-2002-0178. We can also say that attribute SecPro_User (i.e., a postcondition) can be mapped to attribute Access_range (i.e., a precondition). In other words, for these two vulnerabilities to be chained together in a series of exploitations, one of the postconditions of CVE-2002-0836 must be the same as one of the preconditions of CVE-2002-0178. In this example, at the beginning of the attack process, only one precondition of exploitation represented by s2 is unsatisfied. Therefore, the exploitation represented by s1 satisfies all preconditions of s2 . Figure 4b shows a similar scenario in which the attacker exploits different vulnerabilities to acquire user-level access to the Server. Note that as in Figure 4a, the postcondition “attacker has user-level access on Host 1 ” of CVE2002-0836 is one of the preconditions of CVE-2002-1378. Figure 4c shows a scenario in which two different paths can be followed to achieve the same attack goal as in Figure 4b. One path, s0 −s1 −s2 −sf , is the same as that shown Volume 82, Number 8 SIMULATION 529



Table 2. Vulnerabilities for the sample network Host

Vulnerability

Exploitation Outcome

Host 1

CVE-2002-0836

Remote-to-user

CAN-2002-0013

Remote-to-root

CVE-2002-0178

User-to-root

CAN-2002-1378

Remote-to-user

CVE-2002-0391

Remote-to-root

CVE-2002-0638

User-to-root

Server

s0

Description A vulnerability in the tetex package allows remote attackers to execute arbitrary commands. Vulnerabilities in the SNMPv1 allow remote attackers to cause a denial of service or gain user-level privileges. The sharutils package before 4.2.1 vulnerability allows attackers to overwrite files or execute commands. Buffer overflows in OpenLDAP 2.2.0 and earlier allow remote attackers to execute arbitrary code. Integer overflow vulnerability in RPC servers allows remote attackers to execute arbitrary code. A vulnerability in the util-linux package may allow local users to gain privileges via complex race conditions.

s1: CVE-2002-0836 on Host1

s2: CVE-2002-0178 on Host1

sf

s2: CVE-2002-1378 on Server

sf’

s2: CVE-2002-1378 on Server

sf’

(a)

s0

s1: CVE-2002-0836 on Host1 (b)

s0

s1: CVE-2002-0836 on Host1

s3: CVE-2002-0013 on Host1 (c)

Figure 4. Simple exploitation graphs

in Figure 4b. The other path, s0 −s1 −s3 −s2 −sf , represents a different attack scenario. Although it seems unnecessary to perform exploitation related to state s3 in terms of the goal sf , this scenario could happen in a real-world attack because attackers often fully compromise the intermediate targets (denoted as getting super user privilege on victim hosts) before performing further attacks. An important implication of Figure 4 is that exploitation of a vulnerability does not require much previous exploitation. For example, the exploitation of s2 in Figure 4c requires only one previous exploitation (i.e., either the exploitation on s1 or the exploitation on s3 ). From this perspective, a relationship between different paths in an exploitation graph is considered to be an “OR” relationship. In other words, an “OR” relationship indicates the different options available to attackers by exploiting different vulnerabilities in achieving a single goal. By contrast, the “AND” relationship represents the required combination of exploitations needed to achieve a single attack goal. The examples in Figure 4 show how different exploitations can be modeled as chains of events in attack scenarios.

From these examples, we can see that given a set of initial states and final states, it is possible to construct an e-graph that includes all exploitable vulnerabilities within specific system configurations. 5.2 A Formal Definition of E-Graphs Suppose there is a set, E, of exploitations derived from the vulnerabilities discovered using vulnerability scanners. The number of exploitations is generally larger than the number of vulnerabilities because a single vulnerability may lead to multiple exploitations. Also, there is a set, C, of system configuration information, such as host connectivity, open ports, versions of operating systems, and other software. The term state (or attribute) is used to denote the set C because it describes the status of system configurations before and after exploitations occur. Each ei ∈ E corresponds to a certain exploitation of a specific vulnerability. It may not be the case that every exploitation makes use of vulnerabilities; for example, the attacker may use brute-force tools to crack user passwords. The current




model, however, only represents attacks that exploit system vulnerabilities. This is based on the assumption that most compromises conducted by remote attackers take advantage of system vulnerabilities. For each ei ∈ E, pre(ei ) and post (ei ) denote the set of preconditions and the set of postconditions of ei . It is clear that pre(ei ) ⊆ C and post (ei ) ⊆ C. For any c ∈ C, c is satisfied when the value of c evaluates to true. This could happen either before the attack (e.g., based on versions of OS’s, software or other configurations) or after the attack (e.g., when the attack changes the configuration of the target system). From this viewpoint, given a set of initial states, an attack consists of a series of exploitations that ends when some specific postconditions are satisfied. There are two special sets, INIT and GOAL. INIT is a set of initially satisfied system states, and GOAL is the set of satisfied system states after successful attacks. During the graph-building process, we use the monotonicity assumption proposed by Ammann et al [7]. This assumption essentially means that the preconditions of an exploitation are never invalidated by performing another exploitation. Compared to attack-modeling approaches using model checkers [8, 9], this assumption reduces the size of the state space from exponential to polynomial [7]. We use nodes to represent exploitations and edges to represent state changes related to exploitations. The result of this process is a layered structure of network exploitations, in which the execution of an exploitation in a layer depends on exploitations in the lower layers. This process starts with an empty set of exploitations at the lowest layer, which means no exploitation has been performed before an attack. This process proceeds by checking preconditions of nonexecuted exploitations against available system states. Newly available exploitations and newly satisfied system states are added into new layers. This incremental process ends when there are no newly available exploitations or the goal state is reached. One of the most important implications of this approach is that there is no “backtracking” of exploitations. Also, there is no edge from higher-layer exploitations to lower-layer exploitations. Due to the assumption of monotonicity, complexity problems incurred by state explosion and loops can be avoided [16]. This algorithm is illustrated as LabelExploitations(INIT, GOAL, E, C). The result of this algorithm is an implicit graph structure with all the dependencies of exploitations shown on the labels of exploitations. This structure can be easily rendered using graph-drawing tools in a trivial amount of time. The original approach by Ammann et al. [7] has a time complexity of O(|C|2 |E|), where |C| is the number of attributes, and |E| is the number of exploitations [7]. Our algorithm has a complexity of O(|C||E|2 ), derived as follows: first, the algorithm needs at most |E| steps. Second, within each step, the algorithm needs at most |C||E| steps. So the overall complexity is O(|C||E|2 ) [16]. While the complexities of these two approaches may seem similar, in real networks, the number of system attributes (i.e., the

value of |C|) is generally much greater than the number of available exploitations (i.e., the value of |E|). Therefore, our approach is more computationally economical. Our experiments also prove the validity of this conclusion. Furthermore, because the complexity of our algorithm dominates the entire graph generation process, our graph construction process also has a complexity of O(|C||E|2 ). Algorithm: LabelExploitations(INIT, GOAL, E, C) INPUT: 1. A set E of exploitations; 2. A set C of system states; 3. For each ei ∈ E, pre(ei ) ⊆ C and post (ei ) ⊆ C represent the set of preconditions and the set of postconditions of ei ; and 4. INIT ⊆ C and GOAL ⊆ C represent the set of initially satisfied system states and a goal state for an attack. OUTPUT: Layered structure of exploitations with forward labeling. 1. Suppose En represents the set of exploitations labeled at layer n, Cn represents the set of satisfied system states at layer n. 2. if ((INIT = φ) or (GOAL = φ) or (GOAL ⊆ INIT)) 3.

exit;

4. end if 5. E0 = φ; C0 = I N I T ; n = 1; 6. repeat 7. 8. 9.

for all ei ∈ En do if pre(ei ) ⊆ Cn−1 then for all ci ∈ post (ei )do

10.

Mark the ci label of ei with number n;

11.

En = En ∪ {ei };

12.

end for

13.

end if

14.

end for

15.

Suppose all system states used to label exploitations at layer n are represented as An .

16.

Cn = Cn−1 ∪ An

17.

n = n + 1;

18. until ((n = |E|) or (An = φ) or (GOAL ⊆ En )). Volume 82, Number 8 SIMULATION 531



6. Experimental Results 6.1 An Overview of the Cluster Environment Used Our testbed for the graph-based modeling approach is called Microcosm [16, 21, 22, 27], a cluster-computing environment in the Center for Computer Security Research (CCSR) at Mississippi State University dedicated to testing scientific computing applications. A cluster can be viewed as a low-cost solution for high-performance and high-availability requirements [16, 21, 22, 27]. The Microcosm cluster environment consists of one quad-processor server and eight dual-processor computing nodes. All nodes run Red Hat Linux 7.1 (kernel 2.4.2). The eight computing nodes are interconnected by a 100-Mbps Ethernet switch and a high-performance Giganet switch. We chose this experimental testbed because of the special security requirements of such systems (e.g., the use of COTS—or open-source—products, sensitivity of applications, and the large number of users and functions [22]). These security requirements motivate the need for performing an in-depth analysis beyond normal vulnerability assessments. Furthermore, properties of clusters, such as the homogeneity of computing nodes and their enclosed nature, also facilitate the in-depth analysis performed by our modeling approach. In the original configuration of the Microcosm cluster, the head node (i.e., the server) was connected directly to the Internet as a single point of entry. All internal computing nodes could only be accessed after a user logged into the head node [21]. To better simulate attack scenarios, a variant of Microcosm is used as shown in Figure 5. This change is reasonable in that in most corporate networks, outsiders typically do not have direct access to a server (especially a data server). The server runs all server applications, such as the database server (Oracle 9i), the FTP server, the RPC (Remote Procedure Call) server, and a parallel job scheduler. An attacker connects to the local network via the host 1 through host 8 over the Internet and does not have direct access to the server. The goal for the attacker is to compromise the server and to obtain user- or root-level privilege on the server. Currently, our modeling technique is limited by the availability of standard vulnerability data sets. However, this technique has the capability of incorporating more cluster-specific vulnerabilities once these kinds of data sets become available and the data can be represented using preconditions and postconditions. 6.2 Using E-Graphs for Attack Scenario Analysis To build e-graphs specific to the Microcosm cluster, we first performed vulnerability scanning using a STAT Scanner [1] for the network environment shown in Figure 5. Next, we selected a sample of the vulnerabilities discovered and used them to model exploitations related to the user-privilege elevation process for this cluster computing environment. There are three categories of vulnerabilities

for each host. These categories are classified into three groups: remote-to-user, remote-to-root, and user-to-root. 6.2.1 Identifying Critical Vulnerabilities To show that e-graphs can be used to identify the most critical vulnerabilities that need to be removed for cluster computing environments, we performed several experiments using a different number of intermediate hosts between the attacker and the server. More specifically, we performed experiments using two, four, six, and eight intermediate nodes between the assumed attacker and the server using network topologies similar to the one shown in Figure 5. In addition, each machine was seeded with specific vulnerabilities to perform controlled experiments. Figure 6 shows an e-graph with eight nodes being modeled. Figure 6a is the entire e-graph, and Figure 6b is an enlarged central part of the e-graph. Note that the egraph depictions have been simplified (i.e., preconditions and postconditions are not shown) in this article to reduce the complexity of the diagrams. Within this e-graph, state InitState denotes the initial state to initiate an attack, and GoalState denotes the final goal state of the attack. Each path shows a series of exploitations that can be performed by attackers. The graph was built using the algorithm discussed in the previous section. A number of graph algorithms can be applied to egraphs to help the security engineering process. For example, it is straightforward to apply a minimum cut algorithm [27] on the exploitation graph to find the minimum set of vulnerabilities. The minimum cut algorithm can be found in most textbooks on graph algorithms, such as the one by Chartrand and Lesniak [32]. We have implemented a variant of this algorithm in our experiments. The original version of the minimum cut algorithm can be used to find a minimum set of edges to separate any two vertices within a graph. To create a standard form of the minimum cut problem, first all edges in an e-graph are converted into vertices, and all vertices are converted into edges because we want to identify a set of vulnerabilities to be removed. These vulnerabilities correspond to a set of nodes instead of edges in e-graphs. Then a capacity value of 1 is assigned to each new edge inside the graphs. Graph algorithms are then executed to find a minimum set of vulnerabilities. In the following sections, this algorithm is referred to as minimal vertex cut algorithm because it may find several minimum sets of vulnerabilities instead of just one optimal solution. The running of a minimum cut algorithm results in two minimal sets of states, {es1 , es2 , es5 } and {es3 , es4 , es5 }. Intuitively, these sets represent the nodes whose removal will result in a disconnected graph. Vulnerabilities corresponding to any of the two sets can be removed to guarantee the safety of the network by making the attack process infeasible. Recall that the vulnerabilities corresponding to each state are identified by the graph labels. In this example, 3 out of 37 vulnerabilities have been identified as being




Host 1

Host 2 Server

Attacker …… Host 8

Local Network

Figure 5. A cluster computing environment: Microcosm

(a)

(b)

Figure 6. An e-graph for an eight-node cluster

critical. Although it is possible that this vulnerability set may be different from the critical set (in the security sense) defined/identified by security professionals, this information is useful to help focus their attention on a relatively small number of vulnerabilities given time or monetary constraints. In this modeling process, we assume that the target of the attacker is known to defenders so that complete egraphs can be built. In reality, it is sometimes difficult to decide the “ending points” of attacks because the behavior of attackers is often unpredictable. In such cases, however, the attackers’ activities can be viewed as subgraph (or subgraphs); therefore, our analysis techniques are applicable in these scenarios.

6.2.2 Attacker Work Factor Analysis Based on e-graphs, we can explore complex parameters to provide more accurate analyses of attack scenarios. It should be noted that some of these parameters are closely related to the research efforts by the Information Design Assurance Red Team (IDART) at Sandia National Laboratories (http://www.sandia.gov/idart/), the research red team at SRI International [33], and the National Center for Scientific Research, Laboratory of Analysis and Architecture of Systems (LAAS-CNRS) in France [34]. Intuitively, the number of branches (or “attacker work factor”—a common penetration team measure) in e-graphs is a measure of how many different paths an attacker can Volume 82, Number 8 SIMULATION 533



follow to achieve the goal of an attack. Using e-graphs, this parameter can be calculated by enumerating the total number of branches from the initial state to the goal state. The number of branches is a good estimate of the difficulty of the attack and of the difficulty in protecting a networked system. Figure 7 shows two e-graphs generated for a cluster that can be used by system administrators to view the security level of networks. The graph in Figure 7a has 62 distinct paths from state s0 (initial state) to sf (goal state), and Figure 7b has 60 distinct paths. This means that, in terms of the number of possible attacks, the network represented by Figure 7b is slightly more secure than the network represented by Figure 7a. Note that the enumeration of all attack paths can be performed using a breadth-first search on the directed e-graphs. To protect the networks being modeled, the goal of the system administrator is to reduce the number of paths as much as possible. Therefore, this fine-grained comparison is clearly beneficial in that it enables a quantitative comparison of security levels of corporate networks. As a natural extension of the branches of the e-graphs discussed above, another measure of the security level is the length of the attack paths. Three metrics—length of shortest path, length of longest path, and average length of all attack paths—can be defined in this category. The length of attack paths in e-graphs is a measure of the complexity of e-graphs and also is a measure of the difficulty of specific attacks. Using the two e-graphs shown in Figure 7, we enumerate all the attack paths within the graphs and compare the shortest, the longest, and the average lengths of all paths. The results are shown in Figure 8. In this context, it can be seen that the e-graph in Figure 7b represents a less secure network because the shortest path takes only two steps, while the shortest path in the e-graph in Figure 7a needs three steps. Attacks shown in the e-graph in Figure 7b also need fewer steps on average. Both e-graphs have the same number of steps in the longest path. Note that the above analysis is based on the assumption that all exploitations take an equal effort to implement. In real-world applications, we may assign weights to different exploitations to get more realistic estimates. 6.2.3 Cost/Benefit Analysis E-graphs can be used for cost/benefit analysis by assigning cost and benefit values to nodes or edges. These costs and benefits values can be derived from sources such as security professionals, business management team members, or the hacker community. Figure 9 is an e-graph for the cluster environment shown in Figure 5. In Figure 9, values for nodes indicate the cost of mitigating the corresponding exploitations. For example, node “s2 : 3” means that the expense of removing a vulnerability corresponding to node s2 is 3 (e.g., in terms of hundreds of dollars). Note that there are no costs associated with the abstract nodes s0 and sf . Furthermore, the values assigned to nodes in this example are illustra-

tive and do not necessarily reflect true operational values. However, we have attempted to assign values that were close to real-world values. For example, exploitations on a server (represented as nodes s7 , s8 , s9 , s10 , s11 , and s12 in the e-graph) generally cost more than exploitations on other hosts (represented as nodes s1 , s2 , s3 , s4 , s5 , and s6 in the e-graph). We can calculate the minimum, the maximum, and average cost for all paths in a manner similar to that used to compare the lengths of attack paths. The results are shown in Figure 10. In the e-graph shown in Figure 9, there are a total of 40 different attack paths (enumerated using a breadth-first search algorithm). The cost for each path is the sum of costs of all exploitations on that path. For example, we have the following equation for the attack path s0 − s1 − s5 − s7 − s11 − sf : cost (s0 , s1 , s5 , s7 , s11 , sf ) = cost (s1 ) + cost (s5 ) + cost (s7 ) + cost (s11 ) = 2 + 2 + 8 + 7 = 19. The cost values in Figure 10 can be used as one factor for comparing the security levels of different networks. It is apparent that e-graphs with costs can provide a more accurate quantitative analysis of attack scenarios. This information can be useful to perform an ROI analysis. 6.2.4 Deployment of Security Products Numerous network products (e.g., firewalls, routers, network switches) are commonly used to control access to hosts within networked environments. Although security products are effective in controlling network traffic to/from specific IP addresses and ports, deciding where to deploy them is sometimes difficult, especially in medium- to largescale networks. Because these products will influence the structure of e-graphs by disabling exploitations, an examination of how these products influence the topologies of e-graphs is a useful tool in determining efficient product deployment. Consider the network shown in Figure 11. An FTP server and a database server are installed on the server to facilitate data transfer between Host 1 and the Server. A system administrator may want to deploy some security products to protect this network. Note that this network is part of the cluster we modeled previously, with the addition of a firewall product that is to be deployed. The question we want to answer is the following: should the firewall be deployed at position (1) or position (2)? Suppose exploitation e_s1 and e_s2 represents exploitations related to the database server and the FTP server, respectively. Figure 12 shows the resulting e-graph. Two sets of nodes, {s4 } and {s5 }, denote the minimal set of exploitations that can be removed to stop the attack. These two states are denoted using the shaded nodes and also represent vulnerabilities that can be removed with minimum effort.




(a)

(b)

Length (number of steps)

Figure 7. E-graphs used for attacker work factor analysis

9 8 7 6 5 4 3 2 1 0

E-Graph (a) E-Graph (b)

Shortest Length

Longest Length

Average Length

Figure 8. Comparison of lengths of attack paths




Figure 9. An e-graph with costs of exploitations

30 25 25 19.4

20 15 10 10 5 0 Minimum cost

Maximum cost

Average cost

Figure 10. Cost analysis for the e-graph shown in Figure 9

We use this e-graph to analyze the firewall deployment problem. Because the system administrator will want to remove either set {s4 } or {s5 }, the firewall should be placed at position (2) so that access to either the database server or the FTP server can be blocked, effectively blocking the attack path. Deploying the firewall at position (1) will not eliminate the exploitations represented by {s4 } and {s5 }. This is because a firewall at position (1) cannot control the network traffic between Host 1 and Server.

In realistic networks, however, security solutions may not be as simple as the one discussed above. Specifically, because network users will want to use the database server and the FTP server, we cannot impose a security solution that simply blocks network traffic. Therefore, the best solution may not be the removal of set {s4 } or {s5 } from e-graphs but rather may be the disabling of some other nonminimal node sets (e.g., the exploitation set {s1 , s2 }). After carefully reviewing the trade-offs between usability




(1) Attacker

(2) Host1

Server

FTP DB

Local Network

Firewall Figure 11. A network with one host, one server, and a firewall to be deployed

6.2.5 Detection of Attacks

Figure 12. E-graph: host 1 has three vulnerabilities, and the server has two vulnerabilities

and security of this network, the firewall should probably be deployed at position (1). As can be seen from the above example, the e-graph approach is useful in providing fine-grained analysis (actually a modeling process) in mitigating network vulnerabilities. It can also be used in evaluating the inherent trade-offs of different security solutions.

As can be seen from the discussion in the previous section, e-graphs can help the deployment of security products. In this section, we are concerned with the detection of attacks using intrusion detection products, such as an IDS, Intrusion Prevention System (IPS), antivirus product, or antispam products. For example, for the network shown in Figure 13, a network IDS (NIDS) can be deployed either in position (1) or position (2) for monitoring live network traffic to detect intrusions. E-graphs can directly use the information provided by the NIDS. For example, Table 3 lists all exploitations modeled for the networked environment shown in Figure 13. This table also corresponds to the e-graph shown in Figure 14. The “detectable” column of this table shows whether exploitation can be detected by an operational NIDS. In the security research community, nondetectable intrusions are sometimes referred to as “stealthy” intrusions, such as the CAN-2003-0252 exploitation on the server in Table 1, or other newly discovered intrusions (e.g., the zero day exploits). The value of “NA” shows that an NIDS is not capable of finding exploitations, such as CVE-2002-0178 and CVE-2002-0638, which are all hostbased local buffer overflow exploitations (these exploitations might be detected by a host-based IDS or HIDS). The “state” column shows the corresponding node number in the e-graph in Figure 14. In this figure, the shaded nods represent detectable exploitations. Set {s1 , s2 , s3 , s4 } represents exploitations that can be detected if an NIDS is deployed at position (1). Set {s7 , s8 , s9 } represents exploitations that can be detected if an NIDS is deployed at position (2). E-graphs can not only be used to show detectable exploitations within e-graphs but can also be used to refine deployment of security products. For example, after reviewing the e-graph in Figure 14, we find deploying a single NIDS at position (2), as shown in Figure 13, may not be an optimal solution. This is because the NIDS cannot




(1) Attacker

(2) Host 1

Server Local Network NIDS

Figure 13. A network with one host, one server, and an NIDS to be deployed

Figure 14. E-graph for network shown in Figure 13

detect the exploitation denoted by node s10 , and an attack can be executed through a path that cannot be detected by the NIDS (e.g., using the path s0 − s2 − s6 − s10 − sf ). However, if the NIDS is deployed at position (1), then at least one exploitation on any attack path will be detected. This information is valuable to system administrators in determining the optimal deployment solution for security products. We have shown how the e-graph approach aids in performing an in-depth analysis of detectable attacks and how the approach helps to refine the deployment of an NIDS product. A similar e-graph approach can also be used when deploying other security products, such as antivirus products. 6.3 Time for the Graph Generation Process As part of our experiments, we measured the time needed for generating the e-graphs. These results are shown in

Figure 15. For these experiments, we used the cluster environment shown in Figure 5 and using different numbers of hosts and vulnerabilities. We used the following assumptions in these experiments: 1. all hosts have the same number of vulnerabilities, and 2. all hosts contain only remote-to-root vulnerabilities. In Figure 15, the horizontal axis shows the number of vulnerabilities found on each host (including the server). The vertical axis shows the time (in seconds) needed for generating the e-graphs. The number of hosts shows how many intermediate hosts exist between the attacker’s machine and the server, which is the final target for the attacker (represented using the GOAL state according to our definition). Intuitively, as the number of vulnerabilities and the number of hosts increase, the time needed for computation will also grow. Due to the assumptions imposed on



Graph-generation time (Seconds)


5.00E-01 4.50E-01 4.00E-01

1 host 2 hosts

3.50E-01

3 hosts 4 hosts

3.00E-01 2.50E-01

5 hosts 6 hosts 7 hosts

2.00E-01 1.50E-01 1.00E-01

8 hosts

5.00E-02 0.00E+00 1

2

3

4

5

6

7

8

9

Number of vulnerabilities on each host

Figure 15. Graph generation time for different number of hosts

Table 3. Exploitations and corresponding states in e-graph shown in Figure 14 Exploitation

Detectable?

State

CVE-2002-0836 on host 1 CAN-2002-1378 on host 1 CAN-2002-0013 on host 1 CVE-2002-0391 on host 1 CVE-2002-0178 on host 1 CVE-2002-0638 on host 1 CVE-2002-0836 on Server CAN-2002-1378 on Server CAN-2002-0013 on Server CAN-2003-0252 on Server CVE-2002-0178 on Server CVE-2002-0638 on Server

Yes Yes Yes Yes NA NA Yes Yes Yes NA No NA

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12

NA = not applicable.

the graph generation process, the time complexity is essentially polynomial with respect to the number of hosts and the number of vulnerabilities. As described previously, the complexity of the graph generation process is O(|C||E|2 ), where |C| is the number of system states, and |E| is the number of vulnerabilities. Note that the number of system attributes (|C|) is typically proportional to the number of hosts within a networked environment. The e-graph generation experiments were conducted on a workstation with a Pentium 4 3.2-GHz processor and 1-GB memory. One interesting problem we have encountered in this research is how to display the exploitation graphs to users. Using the examples shown in Figure 5, if nine vulnerabilities exist on each of the five hosts, the resulting exploitation graph contains 56 nodes and 459 edges. Finding ways to represent such a complex graph is one of our research

goals. We have proposed an effective way to simplify egraphs (based on domain knowledge specific to a system) for security analysis with reasonable computational costs without losing useful information [16]. In addition to the number of hosts, we have found that other factors, including the type of vulnerabilities, network topology being modeled, definition of initial states and goal states, and the deployment of security products (e.g., firewalls, IDS), may also affect the graph generation time. After testing experimenting with all of these issues, we have discovered that our graph generation process scales well, and more detailed experimental results can be found in Li [27]. We anticipate that our modeling technique could be extended to model vulnerabilities of clusters with hundreds of computing nodes without using graph simplifications. However, the simplification techniques will enable our modeling approach to be used in installations with thousands of compute nodes, depending on the characteristics of vulnerabilities and system configurations. Modeling vulnerability in such large-scale clusters is one of our future research goals. 7. Conclusions In this article, we have defined a process to address some of the challenges in analyzing attack scenarios and mitigating vulnerabilities in a cluster environment. Known system vulnerability data, system configuration data, and vulnerability scanner results are combined to create exploitation graphs (e-graphs) that are used to represent attack scenarios. The modeling process consists of two primary steps— the creation of a knowledge base of vulnerability graphs (v-graphs) from known system vulnerabilities and the association of multiple v-graphs to create an e-graph specific Volume 82, Number 8 SIMULATION 539



to a system being modeled. To handle the varying granularity of vulnerability information, we categorize preconditions and postconditions and encode vulnerabilities using a limited set of attributes. Critical vulnerabilities can be identified by employing graph algorithms on the e-graphs. Several factors were used to measure the difficulty in executing an attack. A cost/benefit analysis was used to enhance the accuracy of the quantitative analysis of attack scenarios. We have also shown how the attack scenario analyses help in improving the efficient deployment of security products and in the design of network topologies. Experiments carried out in a high-performance cluster computing environment show possible exploitations in a networked environment. These experiments showed the usefulness of the proposed approach to model attack scenarios and deduce attacks paths. Clearly, modeling with e-graphs can help provide solutions in mitigating network vulnerabilities and provides valuable insight to system administrators. Although the techniques in this article were presented in the context of a cluster computing environment, it is applicable to any networked environments. Our near-term future research includes the development of more efficient algorithms, the exploration of additional properties for exploitation graphs, and other uses of exploitation graphs. The long-term goal of this research is to devise techniques that are useful for developing vulnerability scanners and to help provide practical in-depth attack analysis information to system administrators. 8. Acknowledgments This work was partially supported by NSF Cyber Trust grant #SCI-0430534 and NSA grants #H98230-04-1-0205. 9. References [1] STAT scanner. 2004. Available from: http://www.stat.harris.com/ solutions/vuln_assess/scanner_index.asp [2] Kewley, D. L., and J. F. Bouchard. 2001. DARPA information assurance program dynamic defense experiment summary. IEEE Transactions on Systems, Man and Cybernetics—Part A: System and Humans 31 (4): 331-6. [3] Ning, P., Y. Cui, and D. S. Reeves. 2002. Constructing attack scenarios through correlation of intrusion alerts. In Proceedings of the 9th ACM Conference on Computer and Communications Security, 245-54. New York: ACM Press. [4] Schneier, B. 1999. Attack trees. Dr. Dobb’s Journal 24 (12): 21-9. [5] Cuppens, F., and A. Miege. 2002. Alert correlation in a cooperative intrusion detection framework. In Proceedings of the 2002 IEEE Symposium on Security and Privacy, 187-200. Los Alamitos, CA: IEEE Computer Society Press. [6] Templeton, S. J., and K. Levitt. 2000. A requires/provides model for computer attacks. In Proceedings of the 2000 New Security Paradigms Workshop, 31-8. New York: ACM Press. [7] Ammann, P., D. Wijesekera, and S. Kaushik. 2002. Scalable, graphbased network vulnerability analysis. In Proceedings of the 9th ACM Conference on Computer and Communications Security, edited by V. Alturi, 217-24. New York: ACM Press. [8] Jha, S., O. Sheyner, and J. M. Wing. 2002. Two formal analyses

of attack graphs. In Proceedings of the 15th IEEE Computer Security Foundations Workshop, 49-63. LosAlamitos, CA: IEEE Computer Society Press. [9] Ritchey, R. W., and P. Ammann. 2000. Using model checking to analyze network vulnerabilities. In Proceedings of the 2000 IEEE Computer Society Symposium on Security and Privacy, 156-65. Los Alamitos, CA: IEEE Computer Society Press. [10] Sheyner, O., J. Haines, S. Jha, R. Lippmann, and J. M. Wing. 2002. Automated generation and analysis of attack graphs. In Proceedings of the 2002 IEEE Symposium on Security and Privacy, 254-65. Los Alamitos, CA: IEEE Computer Society Press. [11] Swiler, L. P., C. Phillips, D. Ellis, and S. Chakerian. 2001. Computer-attack graph generation tool. In Proceedings of the DARPA Information Survivability Conference and Exposition, vol. 2, 1307-21. Los Alamitos, CA: IEEE Computer Society Press. [12] Swiler, L. P., C. Phillips, and T. Gaylor. 1998. A graphbased network-vulnerability analysis system. Report SAND973010/1, Sandia National Laboratories, Albuquerque, NM. [13] Jajodia, S., S. Noel, and B. O’Berry. 2003. Topological analysis of network attack vulnerability. In Managing cyber threats: Issues, approaches and challenges, edited by V. Kumar, J. Srivastava, and A. Lazarevic. Boston: Kluwer. [14] Cheung, S., U. Lindqvist, and M. W. Fong. 2003. Modeling multistep cyber attacks for scenario recognition. In Proceedings of the DARPA Information Survivability Conference and Exposition, 284-92. Los Alamitos, CA: IEEE Computer Society Press. [15] Ramakrishnan, C. R., and R. Sekar. 2002. Model-based analysis of configuration vulnerabilities. Journal of Computer Security 10 (1-2): 189-209. [16] Li, W., and R. B. Vaughn. 2005. Building simplified exploitation graphs for a cluster computing environment. In Proceedings of the 6th IEEE Information Assurance Workshop, 50-7. Los Alamitos, CA: IEEE Computer Society Press. [17] Common Vulnerability Exposures. 2004. Available from: http://cve.mitre.org/ [18] Bérard, B., M. Bidoit, A. Finkel, F. Laroussinie, A. Petit, L. Petrucci, and P. Schnoebelen. 2001. Systems and software verification: Model-checking techniques and tools. Berlin: SpringerVerlag. [19] Noel, S., and S. Jajodia. 2004. Managing attack graph complexity through visual hierarchical aggregation. In Proceedings of the ACM CCS Workshop on Visualization and Data Mining for Computer Security, 109-18. New York: ACM Press. [20] Noel, S., S. Jajodia, B. O’Berry, and M. Jacobs. 2003. Efficient minimum-cost network hardening via exploit dependency graphs. In Proceedings of the 19th Annual Computer Security Applications Conference. Silver Spring, MD: Applied Computer Security Associates. [21] Li, W. 2002. The integration of security sensors into the intelligent intrusion detection system (IIDS) in a cluster environment. Master’s project report, Department of Computer Science, Mississippi State University. [22] Li, W., and E. B. Allen. 2005. An access control model for secure cluster-computing environments. In Proceedings of the 38th Hawaii International Conference on System Sciences, 309. Los Alamitos, CA: IEEE Computer Society Press. Full paper available on proceedings CD. [23] BugtraqVulnerabilitiesArchive. 2004.Available from: http://www. securityfocus.com/bid [24] CERT® Advisories. 2004. Available from: http://www.cert.org/ advisories [25] National Vulnerability Database. 2005. Available from: http://nvd. nist.gov/ [26] SANS Top 20 Vulnerabilities. 2004. Available from: http://www. sans.org/top20/




[27] Li, W. 2005. An approach to graph-based modeling of network exploitations. PhD diss., Department of Computer Science, Mississippi State University. [28] Graphviz. 2004. Available from: http://www.research.att.com /sw/tools/graphviz/ [29] ICAT Metabase. 2004. Available from: http://icat.nist.gov/icat.cfm [30] Kumar, S. 1995. Classification and detection of computer intrusions. PhD diss., Department of Computer Science, Purdue University, West Lafayette, IN. [31] Das, K. J. 2000. Attack development for intrusion detection evaluation. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. [32] Chartrand, G., and L. Lesniak. 1986. Graphs & digraphs. 2nd ed. Belmont, CA: Wadsworth. [33] Schudel, G., and B. Wood. 2000. Adversary work factor as a metric for information assurance. In Proceedings of the 2000 New Security Paradigms Workshop, 23-30. New York: ACM Press. [34] Ortalo, R., Y. Deswarte, and M. Kaaniche. 1999. Experimenting

with quantitative evaluation tools for monitoring operational security. IEEE Transactions on Software Engineering 25 (5): 63365.

Wei Li is an assistant professor at the Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, Florida. Rayford B. Vaughn is the Billy J. Ball professor in the Department of Computer Science and Engineering at Mississippi State University. Yoginder S. Dandass is an assistant professor in the Department of Computer Science and Engineering at Mississippi State University.



An Approach to Model Network Exploitations Using Exploitation Graphs

An Approach to Model Network Exploitations Using Exploitation Graphs

Suggest Documents

Lagrangian Flow Network approach to an open flow model

An Effective Approach to Network Intrusion Detection System using ...

Understanding Exploitations of Familiar Conceptual Metaphors: An

An integrated approach to model an ungulate

Grouping Using Factor Graphs: an Approach for ... - Semantic Scholar

Using an Outcomes-Logic-Model Approach to Evaluate a Faculty ...

an efficient approach to develop software cost estimation model using ...

Using an Outcomes-Logic-Model Approach to Evaluate a Faculty ...

Neural network approach to classification using ...

An Exploratory Approach to Social Network ...

An interaction network approach to study the

From Contacts to Graphs: Pitfalls in Using Complex Network ...

An E cient Neural Network Approach to

An Active Network Approach to E cient Network Management

Kronecker Graphs: An Approach to Modeling Networks - Stanford ...

An Active Network Approach to E cient Network ... - CiteSeerX

Using Quotient Graphs to Model Neutrality in Evolutionary ... - CiteSeerX

An initial-algebra approach to directed acyclic graphs

an uniform approach to model reduction

ConcurTaskTrees: An Engineered Approach to Model ... - CiteSeerX

An Optimal Automata Approach to LTL Model

An Approach to Identifying Inconsistencies in Model

An Incremental Approach to Building a Fiber Optic Network An ...

An Approach to Decentralizing Search, Using ...