An attack-norm separation approach for detecting cyber attacks

7 downloads 9796 Views 672KB Size Report
and characteristics that exist in cyber attack and norm data. We also leverage well-established signal detection models in the physical space (e.g., radar signal ...
Inf Syst Front (2006) 8:163–177 DOI 10.1007/s10796-006-8731-y

An attack-norm separation approach for detecting cyber attacks Nong Ye · Toni Farley · Deepak Lakshminarasimhan

Received: 28 July 2004 / Revised: 22 February 2006 / Accepted: 5 April 2006 C Springer Science + Business Media, LLC 2006 

Abstract The two existing approaches to detecting cyber attacks on computers and networks, signature recognition and anomaly detection, have shortcomings related to the accuracy and efficiency of detection. This paper describes a new approach to cyber attack (intrusion) detection that aims to overcome these shortcomings through several innovations. We call our approach attack-norm separation. The attacknorm separation approach engages in the scientific discovery of data, features and characteristics for cyber signal (attack data) and noise (normal data). We use attack profiling and analytical discovery techniques to generalize the data, features and characteristics that exist in cyber attack and norm data. We also leverage well-established signal detection models in the physical space (e.g., radar signal detection), and verify them in the cyberspace. With this foundation of information, we build attack-norm separation models that incorporate both attack and norm characteristics. This enables us to take the least amount of relevant data necessary to achieve detection accuracy and efficiency. The attack-norm separation approach considers not only activity data, but also state and performance data along the cause-effect chains of cyber attacks on computers and networks. This enables us to achieve some detection adequacy lacking in existing intrusion detection systems.

N. Ye () . T. Farley . D. Lakshminarasimhan Arizona State University, Tempe, Arizona, USA e-mail: [email protected] T. Farley e-mail: [email protected] D. Lakshminarasimhan e-mail: [email protected]

Keywords Cyber attacks . Intrusion detection . Computer and network security . Signal processing . Signal detection

Introduction Cyber attacks on computers and networks have presented a considerable threat to our information infrastructure, as well as business transactions (e.g., e-commerce and online banking), mission-critical operations (e.g., supervisory control of power supply networks), and many other activities that rely on this infrastructure. An insider or outsider of a computer or network system may launch an attack. An insider has authorized access to the system but may abuse that access right to perform illegitimate activities. An outsider does not have direct access to the system. An attack may target a computer (host) based asset (e.g., a data file) or a network based asset (e.g., a port for network service), resulting in a host or network based attack. There are many attack methods (password guessing, bandwidth flooding, buffer overflow, etc.) used to accomplish different goals (denying service, gaining access, stealing, corrupting or deleting data, etc.). A sophisticated attack may go through several phases (Skoudis, 2002): reconnaissance, scanning, gaining access, maintaining access, further attacking, and covering tracks. Each phase involves different methods and different goals. Furthermore, not all attacks include every phase, and an attack may go through the phases in a different order from that listed above. In the reconnaissance phase, an attacker investigates a target system, usually through publicly available information sources, to obtain information that is useful in later phases of the attack. For example, a web site may reveal the name of a network domain that an attacker can use to obtain the IP address of the network from information available on a Domain Name Server. In the scanning phase, an attacker Springer

164

attempts to map the topology of a system, find potential entry points (e.g., email server), and discover vulnerabilities (e.g., buffer overflow vulnerability) that can be exploited in later intrusion phases using information (e.g., IP address) obtained during reconnaissance. Scanning usually requires direct interaction with a system. For example, an attacker can use the traceroute tool for network mapping to discover active host computers and the topology of a target network. A system prompt, in response to a request for a network service running on an active host, may reveal the type of the operating system running on the host. An attacker can then exploit a vulnerability of this particular operating system in later phases of an attack. In the gaining access phase, an attacker attempts to penetrate a system by exploiting a previously discovered vulnerability. Vulnerabilities render a system susceptible to attacks, such as password guessing, buffer overflow, web attacks, worms and viruses. In the maintaining access phase, an attacker attempts to establish easy, safe access to a target system in order to return later without going through the initial complicated, risky process of gaining access. Methods to maintain access include creating a new user account or backdoor running a network service. In the further attacking phase, an attacker may take subsequent attack actions to expand the impact and/or scope of an attack. An example of a further attack is using a victim host to propagate a virus or worm to other computers. In the covering tracks phase, an attacker may alter audit or log files to remove any activities that may be indicative of an attack, or otherwise implicate the attacker. There exists a variety of defense mechanisms to protect computers and networks from attacks. These mechanisms generally serve one of three purposes: prevention, detection, or reaction. Prevention mechanisms, such as firewalls, cryptography, and authorization, authentication and identification for access and flow control, usually control or limit access to a system. Prevention raises the difficulty level in launching attacks, but cannot completely block attacks from especially determined, organized, skilled attackers due to many unknown vulnerabilities on computers and networks. Detection mechanisms monitor activities on computers or networks to identify the intrusive activities of an ongoing attack. Reaction mechanisms control the further spreading of an attack and its impact, then trace and diagnose the attack to determine its path, cause, and consequences, and finally take actions to recover systems and correct problems along the cause-effect path. This paper focuses on detection mechanisms, and presents a new approach to attack or intrusion detection. This approach involves building separate attack and norm models to filter out normal noise from mixed data and then identify a cyber attack in the filtered data. We first describe two existing approaches to intrusion detection: signature recognition and anomaly detection. We then present a new approach, called attack-norm separation, Springer

Inf Syst Front (2006) 8:163–177

and compare it with the two existing approaches. We discuss the research work necessary to enable the attack-norm separation approach for intrusion detection, and provide some preliminary results to illustration the concept. Finally, we look at some related work and conclude the paper.

Signature recognition Most commercial intrusion detection systems, including antivirus software, employ signature recognition to detect cyber attacks. In this approach, signature patterns of attacks are either manually captured by expert analysts or automatically discovered through mining computer and network activity data collected under attack and normal operating conditions (Ye, 2003; Proctor, 2001). Attack signatures are stored and used in an intrusion detection system to check against activities and files on computers or networks for the presence of a signature. If present, the system detects an attack. For example, three consecutive login failures may be stored and used as the signature of a password guessing attack. Thus, a detection solution monitors the number of consecutive login failures, and compares it with the signature to detect this attack. Since the signature patterns of novel attacks are often unknown, signature recognition is not effective against them.

Anomaly detection Anomaly detection considers any large deviation from normal system behavior as an indication of a possible attack (Proctor, 2001; Ye et al., 2001). Thus, it requires an established model of normal system behavior (norm profile), to monitor activities on computers and networks and measure deviations from the norm. A large deviation indicates a possible attack. We can establish a norm profile according to the system norm by design, or by learning from data of system behavior collected under normal operating conditions. For example, we can construct a norm profile for a web server by considering sequences of user actions that represent expected user-web server interactions. A sequence that differs from this expected sequence indicates a possible attack. In another example, we use an Exponentially Weighted Moving Average (EWMA) control chart to learn the statistical distribution of event intensity as the norm profile (Ye and Chen, 2003). At any given time, event intensity exceeding a threshold, determined from the statistical distribution properties in the norm profile, indicates a possible attack. Various norm profile modeling techniques have been investigated, including strings representing sequences of system calls, Statistical Process Control (SPC) charts, Markov chain models, data clusters, association rules and artificial neural networks (Proctor, 2001; Ye et al., 2001).

Inf Syst Front (2006) 8:163–177

An anomaly detection technique can detect a novel attack if it shows a large deviation from its norm profile. However, a novel attack may not deviate largely from the norm profile, yielding a miss or detection failure. The modeling technique used in an anomaly detection solution may not be powerful enough to cover all kinds of normal system behavior, especially that which is normal, but irregular. When such behavior occurs, the solution erroneously indicates a possible attack, yielding a false alarm. Too many false alarms burden system administrators, who must investigate them, rendering the anomaly detection approach impractical to some extent. Hence, in spite of its advantage in possibly detecting novel attacks, anomaly detection has not become popular in commercial intrusion detection systems.

165

Attack-Norm Separation

Signature Recognition

Attack models (signature or attack profile)

Norm models (norm profile)

Deviation from norm profile

false alarms

Anomaly Detection

True normal space

Fig. 1 3 approaches to cyber attack detection: attack norm separation, signature recognition and anomaly detection

Shortcomings of existing intrusion detection systems Existing intrusion detection systems in the form of either commercial software or research prototypes mostly use network traffic data (data packets traveling on networks) to monitor activities on networks and audit/log data to monitor activities on computers. Tools such as Tcpdump and Windump are typically used to capture network traffic data. Various operating systems, such as Windows and Linux, usually come with their own auditing and logging facilities to capture system, user and application events on computers. Since attacks may occur in an intermittent manner, skipping any data packet on a network or any event on a computer presents the risk of missing a critical step of an attack. On the other hand, the continuous monitoring of all network data packets and computer events requires processing large amounts of data. Moreover, network traffic data and computer audit/log data contain a lot of irrelevant information to cyber attack detection. For example, a network data packet consists of a header and data portion. The header has numerous data fields, including the source IP address, destination IP address, the source port, the destination port, and so on. These data fields are originally designed for the purpose of controlling and coordinating data transmission on networks, rather than cyber attack detection. Handling large amounts of data with much irrelevant information presents a considerable challenge to the detection efficiency of all existing intrusion detection systems. This problem will only become worse with the increasing speed of computing on hosts and data transmission on networks. Hence, current solutions are inefficient in that they require large amounts of sometimes irrelevant activity data to be monitored. Not only does the anomaly detection approach suffer from false alarms and misses as discussed previously, the signature recognition approach faces a similar problem of detection accuracy. Human experts often extract attack signatures without the clear knowledge or explicit contract to normal behavior

patterns. For example, a security analyst may examine a virus code and extract a “signature” of the virus without an accurate model of normal program code that provides the definite knowledge of whether or not such a “signature” also appears in normal program code on a computer. Then there is a possibility that this signature will later produce false alarms as shown in Fig. 1. In Fig. 1, an attack signature identified without knowledge of the true normal space actually falls in this space, producing a false alarm. Hence, signature recognition can produce false alarms in addition to misses of novel attacks. When data mining techniques (e.g., artificial neural networks) are used to automatically extract attack signature patterns, both attack data and normal data are often required to learn attack signature patterns that identify attack data but not normal data. Although the contrast of attack signature patterns to normal behavior patterns is employed in those data mining techniques to identify attack signature patterns, only attack signature patterns are captured and later used for signature recognition. Normal data are used in data mining only for the purpose of identifying attack signature patterns that exist in attack data but not in normal data. If a normal model for an anomaly detection technique has only the power to express a regular norm profile, and fails to cover the irregular true normal space accurately, some deviation from the normal model actually falls in the true normal space, producing a false alarm as shown in Fig. 1. Essentially, both approaches employ data analysis combined with a model of system behavior to detect attacks. The two approaches differ in their underlying models. Signature recognition uses a model of “bad” system behavior under the attack condition, whereas anomaly detection uses a model of “good” system behavior under the normal operation condition. Attacks on computers and networks are detected when the observed behavior either correlates with known attack profiles or diverges from known normal profiles. Neither of the two approaches requires and enforces the use of both Springer

166

attack and normal behavior models in contrast to achieve detection accuracy. Without using both attack and norm profiles in a cyber attack detection model, both approaches are susceptible to the inaccuracy of using attack and norm models alone with mixed data, which includes both attack and normal activities. Note that when an attack occurs on computers and networks, there often are normal activities going on computers and networks at the same time. Hence, the data contains a mixture of both attack and normal activities. Normal data from normal activities may obscure attack signatures to recognize in the mixed data for signature recognition, resulting in the detection inaccuracy. Attack data from attack activities may obscure normal models from which large deviations need to be determined for anomaly detection, resulting in the detection inaccuracy. Hence, the two existing approaches of signature recognition and anomaly detection rely on inadequate modeling solutions to handling the mixed data of both attack and normal activities for attack detection, and thus are inaccurate with the potential consequence of many false alarms and/or misses. Furthermore, existing signature recognition and anomaly detection techniques are mostly developed empirically, using only test results from limited cases, rather than based on scientific knowledge of attack and normal data. We have little confidence in the accuracy of attack or normal models employed with those techniques, and thus little confidence in their detection performance, especially in a realistic environment. In fact, there exists little scientific knowledge of attack and normal data in the field of cyber attack detection, despite many existing intrusion detection techniques developed and tested empirically. Another shortcoming of current solutions to cyber attack detection is in their reliance on only activity data on computers and networks. Network traffic data and computer audit/log data capture only activities on computers and networks and not state and performance data. Computers and networks have a collection of resources (e.g., CPU as a hardware resource, database as a host-based software resource, and a web server as a network-based software resource) that provide services to processes representing users’ requests. Those resources have their own state of availability, integrity, and confidentiality that in turn affects the performance of processes in regards to timeliness, accuracy, and precision (Ye, 2002). The execution of a user’s process (an activity, which includes attackers’ activities) on a resource changes the state of that resource, which in turn impacts the performance of the process. This state and performance change may propagate to other resources and processes (e.g., processes sharing the same resources). Hence, an activity actually starts a causeeffect propagation chain or network. Monitoring only activity data, and not state and performance data, fails to cover the entire cause-effect chain of an attack, and gives up the benefit of correlating elements on the chain for more accurate Springer

Inf Syst Front (2006) 8:163–177

detection. For example, in a UDP storm attack, an attacker attacks two host computers, host A and host B, by sending out a spoofed packet to the echo port of host A that appears to come from the echo port of host B. When host A receives the packet, host A responds with an echo-reply to the echo port of host B. Host B perceives the packet from host A as an echo-request, and sends out an echo-reply to host A. Host A then replies to host B, and this cycle continues until one of the echo services is shut down. The detection of a single packet—the initial echo request packet to host A—is not sufficient to identify this attack. However, correlating this echo request packet with the later echo packets and the continuous decrease in network bandwidth (state data) and performance of other network processes (performance data) allows the accurate detection of this attack.

Attack-norm separation—a new approach to intrusion detection Consider attack data as a signal to detect and normal use data as noise mixed with the signal in cyberspace. Then, there is a mapping between cyber attack detection and signal detection in the physical space (e.g.) radar and sound signal detection). Unlike existing techniques for cyber attack detection that rely on the model of only one element (signal or noise) in the monitored data, existing techniques for signal detection in the physical space often employ models that incorporate characteristics of both signal and noise, that is, all elements that exist and are mixed together in the monitored data (Bailey et al., 1998; Box and Luceno, 1997; Atlas and Duhamel, 1999; Jain et al. 2000; Botella et al., 2003). For example, in the cuscore model for detecting a sine wave signal buried in random noise that fluctuates around the level of T, the following noise and signal models are considered (Box and Luceno, 1997): Noise model : yt = T + at0

(1)

Signal model : yt = T + δ sin xt + at

(2)

where at0 and at are Gaussian white noise. The cuscore is Box and Luceno (1997): Q=

 tat 0

=

 t

rt =

 t

(yt − T )

(yt − T )

(at0 − at ) δ

 δ sin xt = (yt − T ) sin xt . δ t

(3)

This cuscore model is sensitive to detecting a sine wave signal buried in random noise. Box and Luceno provide other cuscore models that are constructed to detect: a step change

Inf Syst Front (2006) 8:163–177

signal, a slope change and single spike signal buried in the random noise of Eq. (1), and parameter change signals with the noise of a first-order autoregressive time series model or the nonstationary disturbance noise of an Integrated Moving Average (IMA) time series model (Box and Luceno, 1997). Many signal detection techniques in the physical space, including low-pass and high-pass filters, use frequency bands to characterize and differentiate signal and noise to perform signal filtering or detection accordingly (Atlas and Duhamel, 1999). A signal detection model, incorporating characteristics of both signal and noise mixed together in monitored data, can more accurately detect a signal in noise than a model relying on only one element, and is more sensitive to low signal-tonoise ratios (where the signal is buried in a lot of noise) (Box and Luceno, 1997). A low signal-to-noise ratio is often the case in cyber attack detection since there are usually many more normal users than attackers on computer and network systems when an attack occurs. Hence, we propose a new approach called attack-norm separation to bring the accuracy of cyber attack detection to that of signal detection in the physical space. This approach allows us to leverage the extensive work of well-established theories and technologies for signal detection in many disciplines, such as electrical engineering, physics and geology, to build attack-norm separation models. Equations (1)–(3) provide an example of the cuscore signal detection model from the physical space. The attack-norm separation approach consists of the following three steps in order to detect an attack: (1) Define the model of cyber attack and the norm model (2) Filter out normal noise from mixed data using the norm model (3) Identify the cyber attack signal in the remaining data using the attack model. For example, in the cuscore model Eqs. (1) and (2) carry out Step 1 of the attack-norm separation approach by defining the attack and norm models. The signal model indicates that the sine signal is added to the noise. Hence, it is an additive signal model. Note that not all signals are additive. Some signals may distort the noise in other ways than simply adding a signal to the noise. Steps 2 and 3 of the attack-norm separation are embedded in Eq. (3) with (yt − T ) filtering out the noise of T from the mixed data of yt through subtraction and the multiplication of the remaining data, (yt − T ), to the signal pattern of sin xt . The multiplication has a resonant effect to produce a large positive value if the sine signal pattern of sin xt is present in the remaining data of (yt − T ). Hence, the cuscore model uses the resonance method through multiplication to identify the signal in the data after filtering out the noise. Note that different methods of noise filtering and different methods of signal identification can also be used from those in the cuscore model.

167

Figure 1 illustrates how attack-norm separation differs from signature recognition and anomaly detection. Each attack-norm separation model provides the detection of a given cyber attack signal in a given norm environment. We can construct a collection of models, which cover different attack signals and their norm environments in contrast. Unlike current solutions, which monitor only activity data, the attack-norm separation approach in Fig. 1 considers the true normal space, and attack signals from that space, to include activity, state and performance data, thus providing adequate coverage of the cause-effect propagation data space associated with attacks and normal user activities. Each attacknorm separation model calls for the monitoring and processing of only a small amount of specific data to provide certain characteristics. Therefore, each model is efficient, accurate, and adequate in detecting a given attack in normal noise. This approach can raise the level of detection accuracy, reduce the amount of monitored data, improve the relevance of monitored data to intrusion detection, and allow for easy protection of a small amount of specific data. Therefore, attack-norm separation may overcome accuracy and efficiency shortcomings of signature recognition and anomaly detection. Both attack-norm separation and signature recognition can detect known attacks, and since attack-norm separation allows for more accurate results, it can replace signature recognition. However, attack-norm separation cannot completely replace anomaly detection because it requires known characteristics of attack and norm and cannot detect novel attacks. However, caution should be taken when the anomaly detection approach is employed due to its shortcomings in detection accuracy, and its outcomes should be used for advisory purpose only, not for the definite detection of an attack. We expect that attack coverage of the attack-norm separation approach will expand with increasing knowledge of cyber attack and norm characteristics, just as signal detection knowledge and technologies in the physical world evolved. Ultimately, our scientific knowledge of cyber characteristics will grow to a sufficient level to replace anomaly detection as well. A comprehensive knowledge of cyber characteristics will establish a solid, scientific foundation of cyber attack detection, overcoming the shortcomings of empirical techniques. This leads to the emergence of science and engineering in cyber attack detection, providing confidence in detection accuracy, efficiency and adequacy, thus paving the way for practical applicability.

Attack-norm data, features, characteristics and detection models A model for the attack-norm separation approach requires a clear, scientific understanding of cyber attack and norm data. Since most existing work on intrusion detection is Springer

168

Inf Syst Front (2006) 8:163–177

Raw Data

Data Processing

Processed Data

Feature Extraction

Decision

Signal Detection Model (Characteristics of signal and noise in the model)

Feature

Fig. 2 Data, features, characteristics, and signal detection models in the signal-noise separation approach

empirical in nature, we currently have little scientific knowledge of attack and norm characteristics, and thus are not yet able to leverage well-established theories and technologies for physical signal detection to build attack-norm separation models. Hence, we must obtain the scientific understanding of these characteristics to enable our approach. A characteristic is defined as a feature of a data variable. Consider a Denial of Service (DoS) attack that sends large amounts of network packets with service requests to a web server port on a computer. The network data variable is the ratio of packet intensity (i.e., the number of packets received) for the web server port to the packet intensity for all network service ports. The feature of this data variable is the sample average (i.e., the average or mean value in a 5-s sample). The characteristic defined on this feature is step change (e.g., an increase of a certain amount). Thus, this DoS attack is characterized and modeled by a step change (characteristic) of the sample average (feature) of the intensity ratio (data variable), from that observed under normal operating conditions. Attack and norm in this example are distinguished by a difference between levels (high for attack, normal for norm) of the sample average (feature extracted in both attack and norm conditions) of the intensity ratio, (data variable obtained from network traffic data in both conditions). In this example, a cuscore model for detecting a step change can be used as the attack-norm separation model. Therefore, three elements need to be defined for a cyber attack in a given norm environment to build an attack-norm separation model: data, features, and characteristics. Figure 2 illustrates these three elements along with an attack-norm separation model. In Fig. 2, raw data (e.g., network traffic data) collected from computers and networks go through data

processing to obtain the desired data (e.g., the intensity ratio of packets for the web server to all packets) from which the feature is extracted using a feature extraction method (e.g., an arithmetic calculation of the sample average). The attacknorm separation model incorporates both attack and norm characteristics and monitors the feature to detect the attack characteristic mixed with normal noise. Table 1 illustrates an example of these three elements and associated signal detection models in physical space for the radar detection of a hostile object in the air, and in cyberspace for the detection of the DoS attack in the above example. Data must be relevant to attack-norm separation, and may include data variables representing the activities, performance, and state of computers and networks. A feature is a measure from an individual data observation or multiple data observations. Features may address mathematical, statistical, spatial, temporal, or causal properties of data observation(s) (e.g., such statistics as mean, variance, correlation, autocorrelation, transition probability, and others). A characteristic is an aspect of a given feature that enables the distinction of cyber signal from cyber noise. Characteristics may be shift (e.g., step change), intermittent spike or bump, drift (i.e., upward and downward), trend (e.g., slope, sinewave, square-wave, cyclic, and seasonal change), etc.

Research methodologies for attack-norm separation Although we can leverage well-established signal detection models in the physical space, we must carry out research work to investigate and obtain the scientific understanding of data, features, and characteristics of various kinds of attack and norm conditions in cyberspace. If the uncovered cyber attack and norm characteristics differ from those characteristics that exist in the physical space, it is also necessary to develop additional attack-norm separation models to cope with them. For example, in the physical space signal detection models usually assume Gaussian white noise due to the central limit effect and the cumulative effect of multiple environmental factors on noise (Box and Luceno, 1997). However, normal use activities on computers and networks, which contribute to noise in the cyberspace, are less random in nature than environmental noise in the physical space. Gaussian white noise

Table 1 Example of data, features, characteristics, and signal detection model in the physical and attack-norm separation model in the cyberspaces Element

Physical space

Cyberspace

Data Feature Characteristic Signal detection and attack-norm Separation models

Radar image data Shape and size of an object Shape is square and size is large A rule-based model: if shape is square and size is large, then signal

Packet intensity ratio Sample average (mean) Step change Cuscore model for step change

Springer

Inf Syst Front (2006) 8:163–177

169 State 0: Attacker probes victim for FTP service.

Attack Profiling Feature Sensor Model

Data Characteristic Data Mining

State 1: FTP server requiring password authentication running on victim

Focused Verification of features and characteristics from the physical space

Fig. 3 The elements and research work to enable the Attach-Norm Separation approach

has a normal probability distribution. However, preliminary observations from our initial investigations reveal that more data variables in the cyberspace follow skewed, uniform, or bimodal probability distributions. We present some of these early findings later in this paper. Note that there are signal detection models in the physical space to deal with colored noise, such as those in Box and Luceno (1997). To obtain the scientific understanding of cyberspace data, features, and characteristics, we are currently employing three research methodologies in parallel (see Fig. 3): attack profiling, data mining, and focused verification between cyberspace and signal detection models that exist in the physical space. The three methodologies are briefly described below. Detailed research work in these methodologies is presented in our other reports (Ye, Bashettihalli and Farley (in review); Johnson and Wichern, 1998; Ye, Napatkamon and Farley (in review)). The purpose of this paper is to introduce the attacknorm separation approach and call for research work in this area from the scientific community. For attack profiling, we let expert security analysts identify the steps involved in the setup and execution of an attack (the attack’s cause-effect chain of activity, state and performance changes) and probe each step for the data variables, features, and characteristics, and their correlation among the steps, which enable the detection of the attack. The outcomes of attack profiling are profiles for various known attacks, represented in cause-effect chains or networks. Each node of the chain includes the data, features, and characteristics to detect that aspect of the attack. Figure 4 illustrates the attack profile for the Dictionary attack in which the attacker uses words in a dictionary to conduct brute-force password guessing for a user account (Ye, Bashettihalli and Farley (in review)). We then generalize the data, features, and characteristics from all the attack profiles to construct an efficient, accurate, and adequate set of these elements for cyber attack detection. Hence, the methodology of attack profiling uses the expert knowledge of known attacks for manual analysis to uncover the data, features, and characteristics of cyber signal and noise. For data mining, we collect activity, state and performance data on computers and networks under various attacks, and normal use conditions. Table 2 describes the data that we

Activity 1: Attacker initiates dictionary attack program

Activity 2: Program attempts to authenticate using next entry in the dictionary database.

Observation A: Multiple login attempt failures

Failure

Success

Observation B: Subsequent password attempts follow dictionary pattern

State 3: Confidentiality of the application /file system compromised.

Activity 4: Abnormal use of application by attacker

Observation C: Time between successive login attempts follows pattern

Observation D: Successive attempts to login use same username

Fig. 4 The attack profile for a dictionary attack

collect using the Microsoft Windows operating system (OS) before, during, and after an attack. We then use various data analysis and mining techniques to uncover the data, features, and characteristics that enable the distinction of cyber attack and norm data. For example, we apply statistical analysis to data variables to obtain basic statistics (e.g., mean, variance, minimum and maximum for numerical variables, and range and frequency for categorical variables), examine randomness through run tests, create time-series plots, perform tests of variable correlation (e.g., Pearson correlation coefficients and Spearman, Kendal tow, and Gamma tests for non-parametric correlation coefficients) and autocorrelation, determine the probability distribution (e.g., histogram, skewness test, kurtosis test and KS test), and so on (Ye, Jearkpaporn and Lakshminarasimhan (in review)). We also perform tests for the difference in mean between cyber attack and norm conditions (i.e. t-test, Manwhitney test, KS two samples, Wards Wolf run test and exact test) and tests for selection of sensitive variables (e.g., decision trees such as C&RT). For example, Fig. 5 shows a skew distribution that we commonly find in our data (Johnson and Wichern, 1998). Springer

170

Inf Syst Front (2006) 8:163–177 Table 2 Data collection for data mining Data collected

Collection location

Tool used

OS performance counters (performance objects, each of which has several counters) Windows event logs (security, system and application logs) Network packet data (first 256 bytes of each packet on the network, which covers header and beginning of data)

Host Computer

Performance monitor utility in Windows OS Event monitor utility in Windows OS Windump utility

For focused verification in the cyberspace of features, characteristics and signal detection models from the physical space, we first conduct a literature review with the objective of surveying and classifying existing examples in the physical space. Our literature review includes literatures published in the years 1995 to 2004. We find 173 papers from the theoretical fields of cuscore statistics, wavelets transform, time series analysis, signal processing and detection, and the application fields of digital signal detection, quality and process control in production and manufacturing, earth and planetary science (e.g., earthquake detection), biomedical science (e.g., cancer detection), and economics (e.g., stock market analysis) (Ye, Napatkamon and Farley, (in review)). From each paper, we extract the data, features, characteristics and signal detection model discussed in the paper. For example, Fig. 6 shows these elements that we extract from Bailey et al. (1998): Goal: Data: Features: Characteristics: Sensor Model:

Detect dolphin sound Underground water data Energy in the raw sound Sum of squares of selected wavelet coefficients Step change in energy If step change occurs, then flag

We then generalize the data, features, characteristics and signal detection models from all papers surveyed. We check this information against cyber attack and norm data and use the feature extraction methods, and signal detection

Host computer Network

models, originally employed in the surveyed papers to examine if those features and characteristics also exist in cyberspace.

Method demonstration To illustrate the attack-norm separation method, we choose one attack, Ettercap, and one user activity, web browsing. Ettercap is an address resolution protocol (ARP) poison attack (http://ettercap.sourceforge.net). The attack begins by sending out a series of ARP requests to every internet protocol (IP) address on the current subnet to determine which computers are currently on the network. Then, the attacker sends out spoofed ARP replies to a victim. These routing updates contain IP addresses of computers on the network with the attacker’s physical computer (MAC) address. Once the victim computer updates its ARP table with the erroneous data, network traffic sent by all computers within the victim network goes through the attacker’s machine. Attack profiling leads to the Ettercap attack profile in Fig. 7. For web browsing, a user opens Internet Explorer and performs a pre-defined web search on www.google.com. For both the attack and user activity, we collect data from the activity alone as well as idle machine time. To collect the data for these activities, we perform the activity (attack or web browsing) and collect data from Windows logging facilities for 10 minutes before, and during the activity. This gives us

Fig. 5 The skew distribution found from our histogram investigation in data mining.

Springer

Inf Syst Front (2006) 8:163–177 Fig. 6 Data, feature, characteristic and signal detection model for detecting dolphin sound

171

Raw Data = Underground water data

Feature Extraction = Discrete Wavelet Transform (DWT)

Features = Sum squares of selected wavelets coefficients

Detection Model = Recursive joint distribution kernel estimate of sum squares of selected wavelet coefficients

the data to build the attack and norm models. For testing purposes, we make a third run with both the attack and user activity occurring at the same time. Then, we can test our models, made with pure data, against the mixed data. We look at performance objects, which include a number of activity, state and performance variables. Examples of each are

Decision: Any outliers from the kernel estimate is flag as the signal

r Activity—“Network Interface packets/s”—indicates the r

number of packets sent and received through the network interface card State—“Memory\Available bytes”—measures the amount of memory space available

Fig. 7 Attack profile for an Ettercap attack

Springer

172

Inf Syst Front (2006) 8:163–177

Fig. 8 Wavelet transforms considered in study

r Performance—“Process ( Total)\Page Faults/s”—a page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. For the focused verification methodology, we employ a method commonly used in physical signal detection to build an attack-norm separation, signal detection model for the Ettercap attack. The method we choose for this study is time-frequency analysis using wavelet transform. Given the time-series data for a variable (signal), distinguishing information may be hidden in its frequency content. Wavelet transforms allow us to view frequencies in time series data, signal strength (energy) in different frequencies at different times, and find the correlation between the data pattern and the unknown signal to approximate the wavelet pattern (Lakshminarasimhan, 2005). Selection of the best wavelet transform varies by data pattern. Our preliminary analysis reveals five dominant data patterns: spike, random fluctuation, step change, steady change and sine-cosine wave embedded in noise (Lakshminarasimhan, 2005). For the Ettercap attack, we collect 306 variables with a pattern distribution of 51.2% spike, 36.4% random fluctuation, 6.3% step change, 3.2% steady change and 2.9% sin-cosine with noise. Based on these observations, we consider the five wavelet transforms shown in Fig. 8, where data patterns are respectively approximated by the transforms: Paul for spike, Derivative of Gaussian (DoG) for random fluctuation (Guassian noise), Haar for step change, Daubechies for steady change, and Morlet for noise embedded sine-cosine. For a complete definition of these wavelets see Lakshminarasimhan (2005).

Springer

For each data variable, wavelet, and frequency band, we extract the signal strength feature. We then use analysis of Variance (ANOVA) and the Tukey comparison test to compare different signal strength values at low (L), medium (M) and high (H) frequency bands between idle and attack, and idle and normal user activity conditions. From these results, we discover the frequency bands where there is the significant difference in signal strength between the two conditions and whether the difference is increasing or decreasing between idle and attack/norm. Tables 3 and 4 show select variables from our ANOVA results for Ettercap and web browsing respectively (Lakshminarasimhan, 2005). Each cell shows the significant changes for each frequency for one variable (row) and wavelet (column), where “none” signifies no significant change in any frequency. From the characteristics in Table 3, we choose three, which do not appear in Table 4, to uniquely represent Ettercap attack characteristics. These characteristics, shown in the table embedded in Fig. 8 and described in detail in (Lakshminarasimhan, 2005), enable us to distinguish between attack and norm data. This analysis forms the basis for building a sensor model to detect a specific attack characteristic in a given norm environment. Armed with the information gathered thus far, we can build three sensor models to detect three distinguishing characteristics of the Ettercap attack in a web browsing environment. Thus we’ve completed the first step in attack-norm separation: building attack and norm models. We test our sensors on the mixed data we collected earlier, using the Cuscore statistic described previously to detect attack signals in the presence of norm data. For the second step in our approach, we cancel out normal noise in the cuscore

Inf Syst Front (2006) 8:163–177

173

Table 3 ANOVA results for the Ettercap attack Variable

DoG

Paul

Morlet

Haar

Daubechies

(1) Cache\Data Maps/s (2) Cache\Sync Data Maps/s (3) Cache\Data Map Pins/s (4) Cache\Pin Reads/s (5) Cache\Sync Pin Reads/s

H(+), M(−), L(+) H(+), M(−), L(+) H(+), M(−), L(+) H(+), L(+) H(+), L(+)

H(+), L(+) H(+), L(+) H(+), L(+) H(+), M(+), L(+) H(+), M(+), L(+)

L(+) L(+) L(+) L(+), M(+) L(+), M(+)

None None None None None

None None None None None

Table 4 ANOVA results for web browsing Variable

DoG

Paul

Morlet

Haar

Daubechies

(1) Cache\Data Maps/s (2) Cache\Sync Data Maps/s (3) Cache\Data Map Pins/s (4) Cache\Pin Reads/s (5) Cache\Sync Pin Reads/s

H(−), L(−) H(−), L(−) M(+) H(−), M(+), L(−) H(−), M(+), L(−)

H(−), L(−) H(−), L(−) M(+) H(−), M(+), L(−) H(−), M(+), L(−)

L(−) L(−) M(+) M(+), L(−) M(+), L(−)

None None None None None

None None H(+) L(−) L(−)

model by subtracting the norm model from our testing data as shown in Eq. (4). Q=

n 

[yt − f (t)]g(t)

(4)

t=1

where f(t) is the norm model, g(t) is the attack model, and yt is an observation of the test data at time t. Equation (4) also shows the final step of detecting the attack characteristic in the residual data by multiplying the two. A significant slope change in a plot of cuscore values reveals the presence of an attack signal. We use the slope prior to the attack as a threshold, so whenever the cuscore value for an observation exceeds this value, we trigger an alert. Figures 9–11 show the cuscore plots for our three variables under the Ettercap attack + web browsing scenario. For each plot, the first 300 observation are under web browsing only,

with the actual attack (mixed data) beginning at observation 301. The observation in which our attack model detects the attack characteristic is also given on the cuscore plot. We can see from these results the effectiveness of our models in quickly and accurately detecting this characteristic attack signals.

Preliminary findings support attack-norm separation Discoveries in our current work have also revealed the importance of our attack-norm separation approach for early and accurate attack identification. For example, although some variables show a significant change in probability distribution (e.g., from a skew distribution to a uniform distribution) from the norm to attack condition (with no user activity),

Fig. 9 Cuscore plot for Network Interface/Packets/s

Springer

174

Inf Syst Front (2006) 8:163–177

Fig. 10 Cuscore plot for Process ( Total)\IO Write Bytes/s

this change may be weakened when normal user activities are added in both conditions. This means that the presence of noisy norm data in the attack data weakens the characteristic change of the probability distribution, thereby making it much more difficult to detect. This demonstrates the importance of first filtering out the norm data effect for a better attack-norm data ratio, or improved data quality, before performing attack identification. Our attack-norm separation approach does this filtering, in contrast to signature recognition and anomaly detection, which do not take the step of “denoising” or “noise removal”. Figure 12 plots the values of a data variable first in the norm condition of a text editing activity and second in the condition of an ARP Poison attack mixed with the same text editing activity. We can see that this data variable is more active with a few spikes in the norm condition than in the attack condition.

Fig. 11 Cuscore plot for Process ( Total)\Page Faults/s

Springer

Using the signature recognition approach, we trained an artificial neural network (ANN) to learn attack signatures from such data (Ye, 2003). ANN performed poorly in detecting this attack since the attack data consists of small values (low values in the figure) which are also commonly found in the norm data. Thus, ANN could not distinguish between attack and norm. Using the anomaly detection approach, we applied the Exponentially Weighted Moving Average (EWMA) technique, which learns a statistical norm profile from the norm data and sets the threshold value for signaling an attack (Ye and Chen, 2003). The spikes in the norm data produce a large variance that does not exist in the attack data, resulting in a relatively large threshold value. Since the attack data fell below this threshold, EWMA could not detect the attack. In other words, the anomaly detection approach works when attack data is more volatile or has a more significant characteristic than norm data, whereas the data in Fig. 12 is less volatile

Inf Syst Front (2006) 8:163–177 Training data(Textedit-Local - ARP - Process 2v *958c) 1400

Textedit-Local-Process-ARP

Fig. 12 Variable observations under the normal user activity, text editing, and norm mixed with the ARP Poison attack starting at observation 301

175

1200 1000 800 600 400 200 0

1

66

131

196

261

Table 5 The four combinations of attack-norm characteristics

326

391

456

521

586

651

716

781

846

911

Attack characteristic Weak Norm characteristic Weak

Strong

in the attack condition but more volatile in the norm condition. These results suggest that the two existing approaches of signature recognition and anomaly detection do not work well for the combination of strong norm and weak attack characteristics, but may for weak norm and strong attack. Since all four combinations of attack-norm characteristics (shown in Table 5) are possible, our attack-norm separation approach incorporating the models of both attack data and norm data should be capable of handling all four combinations whereas signature recognition and anomaly detection work well for only one combination. In addition to the advantages of our attack-norm separation approach over signature recognition and anomaly detection, using the exact mathematical models of both attack data and norm data will allow not only the detection, but also the identification of an attack. In contrast, using only one model, such as the norm model for the anomaly detection approach, allows only the detection of something anomalous, not the identification of a specific attack. Direct attack identification will enable a quick attack reaction. Hence, the difference between “detection” and “identification” also distinguishes our attack-norm separation approach.

Related work Our solution requires a separation of cyber attack and norm data. We need to filter out normal noise from mixed data using an attack-norm separation model. For example, in the cuscore model of Eq. (3), a sine wave signal is separated from random noise in an observation. This requires information of both the noise model (Eq. (1)) and the signal model (Eq. (2)). This is

Weak norm, weak attack

Strong norm, weak attack

Strong

Weak norm, strong attack (signature recognition and anomaly detection may work for only this combination) strong norm, strong attack

just one example of filtering, there are many other ways to do this. Reviewed literature does not perform this attack-norm separation. Lee et al. propose data mining algorithms to learn the patterns of intrusive and normal activities to recognize known intrusions and anomalies for building intrusion detection models (Lee et al., 1998; 1999; 2000a; Lee and Stolfo, 2000b; Lee et al., 2001). This work does not have the step of separating attack and norm by filtering out normal noise before identifying the attack signal. Furthermore, the normal use data is there to help identifying the pattern of attack but not to build the norm model. Other work considers improving intrusion detection by detecting anomalous activities, but none considers separate attack and norm models (Lee et al., 2001; Lunt, 1988; Garvey and Lunt, 1991; Ghosh, Schwartzbard and Schatz, 1999; Warrander, Forrest and Pearlmutter, 1999; Lane and Brodley, 1999; Fan et al., 2001; Kruegel and Vigna, 2003; Vigna et al., 2003). The three steps involved in the attack-norm separation approach presented in this paper require a model of both cyber attack and norm data. This work differs from related work in that we separate attack and norm data to build individual models, and then use both models; the norm model to filter out normal noise from mixed data, and the attack model to identify the attack characteristics in the filtered data.

Summary This paper presents our vision on attack-norm separation as a new approach to intrusion detection. This approach aims Springer

176

to overcome problems with the two existing approaches: detection accuracy, efficiency and adequacy. The approach addresses not only activity data, but also state and performance data along cause-effect chains of attacks on computers and networks. This enables us to achieve the detection adequacy lacking in existing intrusion detection systems. We engage in the scientific discovery of data, features and characteristics for cyber attack and norm data, along with well-established signal detection models in the physical space, to build attacknorm separation models that incorporate characteristics of both cyber attack and norm data. This enables us to take the least amount of relevant data necessary to achieve detection accuracy and efficiency. Existing intrusion detection systems are developed mostly empirically, or on a heuristic basis with little scientific understanding of attack and norm characteristics. These systems are neither efficient nor accurate, and lack the scientific and engineering rigor of physical signal detection technologies based on the scientific understanding of signal and noise characteristics. For example, with known characteristics of attack signals and normal noise, we can develop cuscore statistics that will accurately detect an attack in noise even if the signal is low and slow. These innovations will aid us in achieving higher cyber attack (intrusion) detection efficiency, adequacy, and accuracy, and establish the science and engineering of cyber attack detection. Acknowledgments This material is based upon work supported in part by the Air force Research Laboratory (AFRL) and Advanced Research and Development Activity (ARDA) under Contract No. F30602-03C-0233, the Air Force Office of Scientific Research (AFOSR) under Grant No. F49620-03-1-0109, and a gift from Symantec Corporation. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of AFRL, ARDA, AFOSR, or Symantec Corporation. The authors would also like to acknowledge contributions made by Patrick Kelley for the execution and data collection of attacks used in our study.

References Atlas L, Duhamel P. Recent developments in the core of digital signal processing. IEEE Signal Processing Magazine 1999;16(1):16– 31. Bailey TC, Sapatinas T, Powell KJ, Krzanowski WJ. Signal detection in underwater sound using wavelets. Journal of the American Statistical Association 1998;93(441):73–83. Botella F, Rosa-Herranz J, Giner JJ, Molina S, Galiana-Merino JJ. A real-time earthquake detector with prefiltering by wavelets. Computers & Geosciences 2003;29(7):911–919. Box G, Luceno A. Statistical Control by Monitoring and Feedback Adjustment. New York: John Wiley & Sons, 1997. Fan W, Miller M, Stolfo S, Lee W, Chan P. Using artificial anomalies to detect unknown and known network intrusions. In: Proceedings of The First IEEE International Conference on Data Mining. San Jose, CA, 2001.

Springer

Inf Syst Front (2006) 8:163–177 Garvey T, Lunt T. Model-based intrusion detection. In 14th National Computer Security Conference (NCSC). Baltimore, Maryland, 1991. Ghosh A, Schwartzbard A, Schatz M. Learning program behavior profiles for intrusion detection. In 1st USENIX Workshop on Intrusion Detection and Network Monitoring. Santa Clara, CA, 1999. Jain AK, Duin P, Mao J. Statistical pattern recognition: Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000;22(1):4–37. Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. Upper Saddle River, New Jersey: Prentice Hall, 1998. Kruegel C, Vigna G. Anomaly detection of web-based attacks. In: Proceedings of the 10th ACM Conference on Computer and Communication Security (CCS ‘03). Washington, DC, ACM Press, 2003;251–261. Lakshminarasimhan DK. Wavelet based cyber attack detection. Master’s Thesis, Arizona State University, November 2005. Lane T, Brodley C. Temporal sequence learning and data reduction for anomaly detection. ACM Transactions on Information and System Security August 1999;2(3):295–331. Lee W, Stolfo S, Chan P, Eskin E, Fan W, Miller M, Hershkop S, Zhang J. Real time data mining-based intrusion detection. In: Proceedings of the 2001 DARPA Information Survivability Conference and Exposition (DISCEX II). Anaheim, CA, 2001. Lee W, Stolfo S, Mok K. A Data Mining framework for building intrusion detection models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, CA, 1999. Lee W, Stolfo S, Mok K. Adaptive intrusion detection: A data mining approach. In: Artificial Intelligence Review. Kluwer Academic Publishers, December 2000;14(6):533–567. Lee W, Stolfo S, Mok K. Mining audit data to build intrusion detection models. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD ‘98). New York, NY, 1998 Lee W, Stolfo S. A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information and System Security 2000;3(4). Lunt T. Automated audit trail analysis and intrusion detection: A survey. In: 14th National Computer Security Conference (NCSC), Baltimore, MD, 1988. Proctor PE. Practical Intrusion Detection HandBook. 3rd edn. Prentice Hall, 2001. Skoudis E. Counter Hack. Upper Saddle River, New Jersey, Prentice Hall PTR, 2002. Vigna G, Robertson W, Kher V, Kemmerer RA. A stateful intrusion detection system for world-wide web servers. In: Proceedings of the Annual Computer Security Applications Conference (ACSAC). Las Vegas, NV, 2003;34–43. Warrander C, Forrest S, Pearlmutter B. Dectecting intrusions using system calls: alternative data models. IEEE Symposium on Security and Privacy. Oakland, CA, 1999. Ye N. QoS-centric stateful resource management in information systems. Information Systems Frontiers 2002;4(2):149–160. Ye N. (ed.). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003. Ye N. Mining computer and network security data. In: Ye N. eds., The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003;617–636. Ye N, Bashettihalli H, Farley T. “Attack profiles to derive data observations, features, and characteristics of cyber attacks.” Information, Knowledge, Systems Management 2005–2006;5(1):23– 47.

Inf Syst Front (2006) 8:163–177 Ye N, Chen Q. Computer intrusion detection through EWMA for autocorrelated and uncorrelated data. IEEE Transactions on Reliability 2003;52(1):73–82. Ye N, Chen Q. Computer intrusion detection through EWMA for autocorrelated and uncorrelated data. IEEE Transactions on Reliability 2003;52(1):73–82. Ye N, Jearkpaporn D, Lakshminarasimhan DK. Extraction and detection of signal features and characteristics in the physical space: Towards signal detection in the cyberspace. Proceedings of the IEEE, (in review). Ye N, Li X, Chen Q, Emran SM, Xu M. Probabilistic techniques for intrusion detection based on computer audit data. IEEE Transactions on Systems, Man, and Cybernetics 2001;31(4):266–274. Ye N, Napatkamon A, Farley T. Correlations of activity, state and performance data on computers and networks in attack and normal conditions. IEEE Transactions on Dependable and Secure Computing (in review).

Nong Ye is a Professor of Industrial Engineering and an Affiliated Professor of Computer Science and Engineering at Arizona State University (ASU) the Director of the Information Systems Assurance Laboratory at ASU. Her research interests lie in security and Quality of Service assurance of information systems and infrastructures. She holds a Ph.D. degree in Industrial Engineering from Purdue University, West Lafayette, and M.S. and B.S. degrees in Computer Science from the Chinese Academy of Sciences and Peking

177 University in China respectively. She is a senior member of IIE and IEEE, and an Associate Editor for IEEE Transactions on Systems, Man, and Cybernetics and IEEE Transactions on Reliability. Toni Farley is the Assistant Director of the Information and Systems Assurance Laboratory, and a doctoral student of Computer Science at Arizona State University (ASU), Tempe, Arizona. She is studying under a Graduate Fellowship from AT&T Labs-Research. Her research interests include graphs, networks and network security. She holds a B.S. degree in Computer Science and Engineering from ASU. She is a member of IEEE and the IEEE Computer Society. Her email address is [email protected]. Deepak Lakshminarasimhan is a Research Assistant at the Information and Systems Assurance Laboratory, and a Master of Science student of Electrical engineering at Arizona State University (ASU), Tempe, Arizona. His research interests include network security, digital signal processing and statistical data analysis. He holds a B.S degree in Electronics and Communication Engineering from Bharathidasan University in India.

Springer