A General Framework for Evolving Multi-Shaped ... - CiteSeerX

1 downloads 0 Views 278KB Size Report
class classification problems, which have wide applications such as novelty ... security [8-10, 14,15], but also in other fields such as fault .... gn1 gn2… gnr … …
A General Framework for Evolving Multi-Shaped Detectors in Negative Selection Sankalp Balachandran, Dipankar Dasgupta, Fernando Nino*, Deon Garrett Computer Science Department The University of Memphis and *Universidad Nacional de Colombia {sblchndr,dasgupta, jdgarrtt }@memphis.edu, *[email protected] tolerant hardware [11], tool breakage detection [12] chemical spectrum pattern recognition [32], etc.

Abstract This paper presents a framework to generate multi-shaped detectors for real-valued negative selection algorithms. The detectors can take the form of hyper-rectangles, hyper-spheres and hyper-ellipses in non-self space. The uniform representation strategy and the evolutionary mechanism proposed in this work can serve as a baseline for further extension to use several shapes, generally providing an efficient coverage of detector space. In particular, these novel pattern detectors (in the complement space) are evolved using a genetic search (the structured genetic algorithm), which uses hierarchical genomic structures and a gene activation mechanism to encode multiple shapes. This genetic search helps in maintaining diverse shapes while contributing to the proliferation of best suited detector shapes in expressed genotype. The results showed that a significant coverage of the non-self space can be achieved using fewer detectors than other approaches using only a single type of detectors.

The negative selection (NS) algorithm was one of the first immune-inspired change detection algorithm proposed [33], which initially uses a binary representation for the elements in the self/non-self space. Gonzalez et al. [19] proposed a negative selection algorithm that used real-valued representation of the self/non-self space. This algorithm, called real-valued negative selection (RNS) tries to alleviate the scaling issues of binary negative selection algorithms, while it uses various schemes to speed up the detector generation process. Different RNS algorithms used different geometric shapes such as hyper-rectangles [18,19], hyper-spheres [21,26] and hyper-ellipses [22] for covering the non-self space. The work described in this paper focused on developing a framework for generating multi-shaped detectors in realvalued negative selection algorithm. It extends the model of RNS by incorporating multiple hyper-shape (hypersphere, hyper-rectangle or hyper-ellipse) detector representation in ℜ n . These detectors are evolved using an evolutionary algorithm. In addition, the detector-antigen matching rule for each specific detector is expressed by the membership function of that detector, which is a function of the detector-antigen distance.

Keywords Evolutionary Algorithms, Artificial immune systems, Negative selection, Monte Carlo estimation, Computational geometry.

1. Introduction Anomaly detection problems are very important oneclass classification problems, which have wide applications such as novelty detection [1], surprise detection [2], fault detection [3], fraud detection [4], outlier detection [30], chance discovery [31] and intrusion detection [35], among others. Some significant approaches to solve these problems include data mining [5], machine learning [6], and statistics [7].

The remainder of this paper is organized as follows: Section 2 discusses some preliminaries on negative selection, Structured Genetic Algorithm (st.GA), problem definition for anomaly detection and Monte Carlo volume estimation technique. Section 3 introduces multi-shaped evolution while Section 4 describes the proposed detector representation scheme and the detector membership criteria for an arbitrary given point. This section also presents the implementation details of the proposed approach, such as the genetic operators, fitness evaluation and termination condition. Section 5 describes the experimental setup, experimental parameters, datasets used, results and

Most of these works focus on building a model or profile of the system which reflects the normal behavior and specifies a range of variability for key parameters. On the other hand, immunological inspired techniques [8, 9, 10] have also been applied successfully to perform anomaly detection, not only in the area of computer security [8-10, 14,15], but also in other fields such as fault

1

implementation. In Section 6, the conclusions and future enhancements to this work are proposed.

matching rule [10] [13] [33]. However, binary representation is unable to capture the structure of even simple problem spaces [19]. In addition, the binary matching rules also present some scalability issues, since the number of detectors needed grows exponentially in the size of self.

2. Preliminaries In this section, some basic concepts necessary to present the proposed approach are summarized. In particular, a general description of an anomaly detection problem and the negative selection algorithm are provided. Also, the notion of a structured genetic algorithm is briefly presented.

In a NSA based on real-valued representation, elements of the shape-space and detectors (antibodies) are defined in an n-dimensional space (specifically, [0, 1]n). Accordingly, particular matching rules are associated to the detectors, typically, based on a Euclidean distance [19].

2.1 Anomaly detection In immunology, the anomaly detection process aims at distinguishing a new element in a problem space as either part of self or non-self, given a model of the self set. The problem space, denoted by X, is an n-dimensional space; the self set is denoted as S and let N be the complementary space of S. It is assumed that each attribute is normalized to [0,1], then

The algorithm takes as input a set of self samples represented by n-dimensional points. The goal of the algorithm is then to evolve a set of detectors to cover the non-self space. Several iterative algorithms have been developed to accomplish this objective [18, 20, 21, 22].

2.3 Structured GA

S ⊆ [0,1] , S ∪ N = X , S ∩ N = ∅ .

A structured GA (st.GA) is a particular form of evolutionary algorithm introduced in [29]. A structured GA incorporates redundant genetic material, which is controlled by a gene activation mechanism. It utilizes multi-layered genomic structures for the chromosome. All genetic material (currently expressed and currently neutral) is ‘structured’ into a hierarchical chromosome. The activation mechanism enables and disables the encoded genes. The explicit redundancy has the potential to help maintain genetic diversity – identified as necessary for dealing with complex applications. The capacity to maintain such diversity depends on the amount of redundancy in the structure.

n

Given a problem space behavior of a system, the  1, if defined as ℵS ( p ) =  0, if between self and non-self.

S, that represents the normal characteristic function of S, p∈S is used to distinguish p∈N

2.2 Negative selection The Negative Selection Algorithm (NSA), developed by Forrest et al. [33], is based on the principles of self/non-self discrimination in the immune system. It can be summarized as follows:



Define self as a collection S of elements in a feature space X, a collection that needs to be monitored. For instance, if X corresponds to the space of states of a system represented by a list of features, S can represent the subset of states that are considered as normal for the system.



Generate a set F of detectors, each of which fails to match any string in S.



Monitor S for changes by continually matching the detectors in F against S. If any detector ever matches, then a change is known to have occurred, as the detectors are designed not to match any representative samples of S.

The st.GA interprets the chromosome as a hierarchical structure; thus, genes at any level can be either active or passive, and high-level genes activate or deactivate sets of low-level genes. Thereby, the dynamic behavior at any level, whether the genes will be expressed phenotypically or not, is governed by the high level genes. control bits

GS1

GS2

… … g11 g12 …g1k g21 g22… g2m

In the original version of a NSA, elements in the shape space and detectors (matching rules) were represented using binary strings, but subsequently, some variations of this algorithm using real-valued representation were proposed. There are several versions of the algorithm based on the use of binary representation, which run in linear time with the size of self, dependent on the choice of the

Level 1

….

Level 2

GSn



gn1 gn2… gnr

Genes

Figure 1. Generalized representation of a chromosome with n different gene sets

2

The above diagram (Figure 1) shows a multi-level genomic structure which is organized as follows: the highest level control bits will activate different gene sets. Given that higher level bits act as the controls, care should be taken to ensure that only one control code is considered for any given chromosome, to individually express a set of genes.

set A. Y1, Y2, ..., YN is a sequence of independent Bernoulli random variables, with P (Yi = 1) equal to

2.4 Volume estimation using a Monte Carlo Method

Vˆ ( A) as follows. In order to estimate the volume with a

P ( X i ) = ∫ ...∫ dx1....dxn = V ( A) . A

The main advantage of this method is that it is possible to calculate a confidence interval for the estimated volume confidence of (1-α), using the Chernoff bound, it can be shown that if

In this work, Monte Carlo methods, well-established techniques with a strong mathematical foundation, are used for volume estimation, particularly, to estimate the coverage of a set of detectors. Also, this technique will be useful in probabilistically estimating the overlap among detectors with different shapes, which is otherwise cumbersome geometrically. The main aspects of the Monte Carlo method are summarized next (for a more comprehensive explanation see [24] [25]).

2 3 ln  α N ≥ 2   then ε V ( A)

(

dimensionality increases V(A) approaches zero exponentially quickly, which will require a sample size exponentially large [34]. Nevertheless, in many practical applications, dimensionality is such that reasonably small sample sizes are sufficient.

Let X = [0, 1]n be the system state space and A ⊆ X a subset of X, whose volume needs to be computed. Also, a typical assumption is that it is hard to compute the volume of A analytically. If x is drawn from a uniform distribution on X, then P ( x ∈ A) = volume of A, denoted as V(A), because V

([0,1] ) = 1 . Then the problem of computing

3. General framework for detector generation

n

V(A) can be seen as the problem of estimating

P( x ∈ A) .

The approach developed in this work uses a real-valued representation to characterize the self/non-self space and evolves a set of detectors (rules) that can cover the (nonself) complementary space. Specifically, the self/non-self space corresponds to the unit hypercube [0, 1]n, and the detectors are hyper-shapes contained in this space, which can be interpreted as anomaly detection rules.

Therefore, let U be a random variable uniformly distributed on X, denoted U ~ U(0, 1). Let U1, U2, …Un be a sequence of independent and identically distributed (iid) U(0,1) random variables. Then, the sequence X1, X2, ..., XN of random variables, generated as follows, are uniformly iid in [0,1]n, denoted Xi~ U([0,1]n).

Therefore, a detector is defined in an n-dimensional space as a geometrical shape, such as a hyper-sphere, a hyper-rectangle or a hyper-ellipse. The matching rule is expressed by a membership function associated to the detector, which is a function of the detector-antigen distance [19] (Euclidean or any other). A set of self samples represented by n-dimensional points are given as inputs to the algorithm.

X1= (U1,… , Un) X2 = (Un+1, .. .., U2n) … XN= (U(N-1)n+1, .. .., UnN) In order to estimate the volume of A, generate a sequence X1, X2, ..., XN, as defined above. Then, an estimation of the volume of A may be computed as

In order to apply an evolutionary algorithm for detector generation, suitable representation for each detector shape is proposed and respective membership functions are defined. Section 4.1, describes the encoding scheme (genotype) for representing a detector and presents a mathematical model for each detector shape.

{i : X i ∈ A} Vˆ ( A) = , where |·| denotes the number of N points in a set. In other words, the volume of A is estimated as the fraction of points that lie in A. An estimate of the

The volume estimation technique described in Section 2.4 using Monte-Carlo simulation has been used extensively in this work for the fitness evaluation step as well as the termination evaluation step, Sections 4.2 and 4.4, respectively.

N

volume of A can also be expressed as

Vˆ ( A) =

∑ Yi i =1

N

)

P V ( A) − Vˆ ( A) > εV ( A) ≤ α . However, as the

, with

Yi = IA(Xi ) , where IA(·) denotes the indicator function of 3

set of feature vectors S={x1, …, xm}, which correspond to normal samples. Each element xj in S is an n-dimensional vector xj =(xj1, xj2, …, xjn). Also, a variability parameter (denoted by r) to the entire set of normal samples is considered. Accordingly, r represents the level of variability or allowable deviation in the normal (self) space. This follows the notion that the self sample is never complete, so adding the variation parameter provides an approximation of the self set, given the set of self samples.

The goal of the algorithm is then to evolve a set of detection rules to cover the non-self space. The iterative process will generate a set of detectors driven by two main goals:



Minimize overlap with self, and



Make the detectors as large as possible and keep them separate from each other, in order to maximize the non-self covering.

Figure 2b shows the high level modular view of the strategy for detector generation. A certain number of runs of the evolutionary detector generation, named sAIS, are performed on a given self set, which generates a complete set of suitable detectors to cover the non-self space. Each run of the st.GA is expected to produce a good detector; however, it may happen that the st.GA will not produce any detector that provides coverage close to the desired one, after a set of detectors has been previously produced. Specifically, Figure 3 presents a pseudo-code for the evolutionary detector generation based on a st.GA.

(a)

(b) Figure 2 (a). General framework for detector generation. (b). Modular subdivision of the detector generation process. The hierarchical organization of the algorithm is shown.

Thus, an evolutionary search is used to evolve ‘good’ rules that cover the non-self space (see Figures 2a, 2b). The fitness of a detector rule is determined by the number of self samples it covers, its hyper-volume and the overlap with other rules (detectors). Therefore, the interest is to find a set of solutions that collectively cover the non-self space, rather than a single solution. Thus, we propose a structured Genetic Algorithm using sequential niching to generate a set of suitable detectors. In the following section, the details of this genetic algorithm are described, namely, individual representation, fitness function and genetic operators.

Figure 3. Evolutionary algorithm for detector generation

4.1 Multi-shaped Detector representation A structured GA gene with a two level representations (as shown in Figure 1) is used. The level 1 gene set holds the control information that either activates or deactivates genes at level 2. Any one gene set at Level 2 can be expressed for fitness measure. This is the generalized way of representing multi-shaped detectors. Each detector shape is characterized by its individual representation and a membership function, which determines whether an n-dimensional point lies on it or not. Specific details about the geometrical features of detectors used in this work are given next. However, it is important to notice that the approach proposed here can be used with detectors that have different shapes to the ones considered

4. Evolutionary algorithm for Detector Generation A niching technique, as described in Figures 2 and 3 [19, 39], is used along with a st.GA to generate a set of detector shapes. The input to the st.GA is a normalized data

4

here, as long as suitable representations and membership functions are provided.

4.1.2 Hyper-rectangular Detectors A hyper-rectangular detector d is defined by two ndimensional points low and high, i.e., d = (low, high), low = (low1, low2, …, lown) and high = (high1, high2, …, highn), which specify the lower and upper corners of the hyperrectangle, i.e., [lowi, highi] specify the range of the i-th coordinate in the hyper-rectangle [18, 19]. Thereby, an ndimensional feature vector x= (x1, x2, …, xn) will lie in the hyper-rectangle if xi ∈ [lowi, highi] for each i=1,2,…, n. Also, the volume of the subspace represented by detector d

The representation scheme used in this implementation is shown in Figure 4. Each individual (chromosome) expresses a specific shape, namely, hyper-sphere, hyperrectangle and hyper-ellipse in the phenotypic space. Accordingly, hyper-sphere genes indicate the hyper-sphere n-dimensional center and radius. Similarly, hyper-rectangle genes hold information of the two points that specify the minimum and maximum coordinates in each dimension, analogous to the lower-left and upper-right corners of a rectangle in two dimensions. On the other hand, hyperellipse genes contain its n-dimensional center, n semi-axes lengths and a square matrix of size n×n that specifies the orientation of the hyper-ellipse. This detector representation will allow having variable-sized detectors to cover the non-self space.

n

is computed as Π (highi − lowi ) . i =1

4.1.3 Hyper-ellipse Detectors A hyper-ellipse detector is represented in a general form, considering orientation of the semi-axes in any direction, as [22]:

(x − ω )T A(x − ω ) = 1 where A = VΛV and ω is the center of the hypern×n diagonal matrix such ellipse, Λ=(λi,j) is an 1 that λ i , i = , whose entries are eigenvalues associated l 2i T

with the eigenvectors in V; l i is the length of

i -th semi-

axis; V is an n×n matrix whose columns are orthonormal eigenvectors of A. Any n-dimensional point x satisfying the equation above lies on the surface of the hyper-ellipse; In

addition,

x

( p − ω ) A( p − ω ) < 1 .

is

inside

the

hyper-ellipse

T

In 2-dimensions, the equation above could be easily represented as: Figure 4. A single chromosome having high level control and low level parameters for three different hyper-shapes; hyper-spheres, hyper-rectangles and hyper-ellipses.

cosθ −sinθ 1 l12  ×  sinθ cosθ   0

(x1−ω1 x 2− ω 2) ×

4.1.1 Hyper-spherical Detectors

(x − ω )T × V

A hyper-spherical detector will be denoted as d = (c, R), where c = (c1, c2, .., cn) is its center, an n-multi-dimensional point, and R specifies the radius of the hyper-sphere.

Λ

general

0   cosθ sinθ   x1− ω1    =1  × 2 × 1 l2  − sinθ cosθ   x 2− ω 2

× V T × (x − ω ) = 1

where θ is the orientation angle of the hyper-ellipse.

Thus, an n-dimensional feature vector x=(x1, x2, …, xn) belongs to the hyper-sphere (matches the detector) if dist(c,x)< R, for some particular distance measure dist(·).

4.2 Fitness Evaluation The purpose of fitness calculation is to evaluate the quality of an individual in a population. Based on their fitness, individuals may be selected as part of the next generation. Here, the fitness evaluation is guided by two main objectives: to cover as large a non-self space as possible, while not covering any self point, and the overlap among the detectors should be kept at minimum. Accordingly, detectors with large volume should be favored over smaller ones.

In general, the volume of a hyper-sphere [40] is given as Vn ( R ) =

×

if

(π / 2) k (2 R ) n . The calculation of the n!!

volume inside the boundaries of the unit hypercube is a difficult problem in the case of edge hyper-spheres. Also, the estimation of overlapped volume of detectors is a challenging problem, specially, when different hyper-shape detectors are used. Therefore, a Monte Carlo technique is used to estimate the volume.

Also, a membership function associated to each detector that establishes whether a point lies inside a shape (or not),

5

is used to compute a detector’s fitness. This function will depend on the detector’s shape, and is defined based on a distance measure (Euclidean, Minkowski, Mahalanobis or any other). Minkowski distance between two points x and y, which is defined as

(

dist (c, x ) = Σ ci − xi

)

n 1n

will give an estimate of the overlap between detectors D and E. Therefore, the effective-coverage of a detector D is estimated as effective-coverage(D) = Vˆ (D ) − β , where β is an estimate of the net overlap of D with the other detectors, computed via the Monte Carlo Method described in Section

.

2.4, and

Let D be a set of detectors; each d in D is a subset of N. In this work, it is assumed that D will provide a better coverage of N if the volume of D is as large as possible, while minimizing the overlap of detectors on D with S. Thus, the goal of the evolutionary search may be expressed in a more formal way as

  max vol  U d   d∈D 

Vˆ (D ) gives an estimate of the volume of D.

Checking Self Overlap In order to detect the overlap of a detector with a selfsample, its overlap with a hyper-sphere of radius r around the self point is considered. In the case of a hyper-rectangle shaped detector, this is determined by checking if the circumscribed hypercube around the self hyper-sphere intersects the detector. For a hyper-sphere shaped detector with center c and radius R, there is an overlap with a self point x, if the distance between c and x is less than (R+r).

and

   minvol   U d  ∩ S  .   d∈D  

On the other hand, Figure 5 describes how to check whether a hyper-ellipse detector overlaps a self sample, for a defined self variation threshold r. In the figure, P2 is the self sample. If P2 does not lie in the hyper-ellipse then the algorithm checks whether P3 lies inside the hyper-ellipse, as depicted below. P1 is the hyper-ellipse’s center

Accordingly, proper reward and penalty functions are defined within the detector fitness function. A detector is rewarded for the total space covered, while it will be penalized if it overlaps with other detectors that were found in the previous runs of the GA, and it will also be penalized for all the self points that it covers. Then, the fitness function is defined as

and P3

 P −P = P2 − r  2 1  P2 − P1

  , where ⋅  

denotes the norm of

a vector.

fitness(D) = effective- coverage(D) – C(m)×m,

where m is the number of self points that lie in D, C(m) is a factor that is computed taking into account the number of self points that the detector D overlaps. The effective coverage is computed as the proportion of the non-self set that the detector covers and is not yet covered by other detectors. On the other hand, C(m) is defined in such a ways that a detector suffers a stronger penalization as its overlap with self sample set increases. Thus, C(m) is defined as

 m C ( m ) = tanh   ln S 

   , if m>0, and C(m)=0 if 

Figure 5. Determining the overlap of a hyper-ellipse detector with the self set.

If a detector overlaps the self set, then it should be discarded. Accordingly, a penalization factor is computed on the detector based on the number of self samples that it overlaps, as specified in the computation of fitness evaluation.

m is equal to 0. Therefore, C(m) gets higher as the number of self points increases.

Estimating Detector Overlap

4.3 Genetic Operators

In order to evaluate the overlap between detectors a Monte Carlo technique is used. Given two detectors D and E, a sequence of N random points uniformly distributed on [0,1]n is generated. Then, a test is performed on these points to check whether they belong to D and E simultaneously. The proportion of points that pass this test

The st.GA implemented uses tournament selection; in addition, typical crossover and mutation operators were applied on the hyper-shape detectors as described next.

6

check rate, λc is a parameter used to terminate the evolution process, even before the desired coverage is reached. This parameter denotes the interval between subsequent runs when the change in coverage is to be measured. Also, ∆ is used along with λc in determining the halt in the evolution. If the percentage of increase in the coverage is less than ∆, then the algorithm terminates.

Bit-flip mutation is applied to the control bits of the chromosome, along with two-point crossover. In addition, the parameters c and r of a hyper-sphere detector undergo two point crossover and bit flip mutation. Besides center and semi-axes, hyper-ellipse detectors possess an extra degree of freedom, orientation of its semiaxes. To perform mutation on the orientation, some particular mutation operator was implemented as explained next. Crossover points are restricted so that they will not appear inside the orientation matrix. Also, a special mutation operator, based on Gaussian mutation is defined for the orientation matrix segment of the chromosome.

Each time a new detector is added to the detector list, the list is sorted in descending order by volume. The next step is to traverse this sorted list and check the contribution of each detector to coverage over the non-self space. The coverage is computed using a Monte Carlo estimation as follows. Each detector is evaluated against a sequence of random points uniformly distributed in [0,1]n, to measure the percentage covered; those matched points are subsequently removed from the list. If any of the current detectors has an effective coverage of zero, it is assumed that the contribution of this detector is negligible, because it may be totally consumed within another bigger detector, and is thus eliminated from the detector list.

It is assumed that any valid hyper-ellipse can be changed into any other hyper-ellipse with same center and semi-axes but different orientation through a sequence of small changes in the orientation. In order to explain orientation mutation, let Si and Sj, for 1 ≤ i, j ≤ n, be two column vectors of V, corresponding to two semi-axes chosen at random. These two vectors Si and Sj specify a plane η, defined by the linear combinations of Si and Sj. It is important to notice that Si and Sj are orthogonal to each other, and that the mutated vectors in the resulting matrix should remain orthonormal after mutation. Also, let θ be a real value chosen from a Gaussian distribution with µ=0 and σ = π/2. Orientation mutation picks two components of vectors Si and Sj to be mutated. Thus, the mutated components of vectors Si and Sj are calculated

At the end of this post-processing of the detector list, the net coverage (Cov) of the non-self space has been estimated. If Cov has reached a desired value specified on the terminating condition, the algorithm stops and the current detector list is returned.

as:

Si′i = Sii cosθ + Si j sin θ S ′ji = S ij cosθ + S jj sin θ

Si′ j = − Sii sin θ + Si j cosθ Si′ j = − S ij sin θ + S jj cosθ

On the other hand, the n components of ω and l undergo 2-point crossover and bit flip mutation.

4.4 Termination Condition Shapiro’s work [22] mentions several measures of convergence checks. One suggested convergence check is to determine if the user defined coverage value has been reached. Another way to check convergence is by checking the change in the standard deviation of the population fitness, which is decreased by certain percentage at a predefined rate [19] [20] [22]. If this change saturates for certain number of runs, the algorithm is terminated: this is known as ‘generations between convergence check’, thus, a tuned parameter states the predefined number of runs between convergence checks. In this work, the number of attempts in changing (increasing) the coverage is checked; and if this number is above certain predefined threshold, the algorithm is terminated. The following discussion provides an insight into the method.

Figure 6. Star shaped dataset

Figure 7. Multi Cluster dataset

5. Experimentation The primary goal of these experiments is to test the new hyper-shaped model in comparison with other real-valued negative selection models that evolved single-shaped detectors. The experiments intend to help to answer the following questions: does the designed evolutionary algorithm successfully generate a hyper-shaped detector set? Is this new model comparable to other existing evolutionary models of AIS in terms of detection rate and coverage? what is the role of parameters in controlling this evolution process?. To address these questions, some synthetic data sets [26] and real world problem data sets for validating the model were used, which are described next.

A parameter, called coverage change threshold (∆) indicates the minimum amount of improvement in coverage required for the algorithm to continue. A convergence

The experiments were carried out on two synthetic datasets and two real world datasets. In order to find the

7

statistics of the detector generation process, 30 independent trials were done for each of the synthetic data sets. In the subsequent sections, data set description, preprocessing of the data, the experimental setup and the discussion of the results with the analysis is presented.

5.2 Real-World Test Data Sets

5.1 Synthetic Test Data Sets

Fisher’s Iris data set is obtained from the University of California at Irvine Machine Learning Repository [23]. This well known data set originates from a classic taxonomy paper describing three different types of iris plants: Iris-Setosa, Iris-Versicolour and Iris-Virginicia. Each of these classes contains 50 instances each, totaling 150 for the entire database. Each instance has 4 real-valued features: sepal length, sepal width, petal length and petal width in that order. Of the three classes, one class is linearly separable from the other two, but the other two are not linearly separable from each other.

To measure the performance on accuracy and scalability of the multi-shape algorithm, real-world data sets were used – Iris and time series discord for ECGs. A brief description of each of these data sets is given next.

Synthetic data sets used in [26] provide a good ground for validating the performance of an algorithm as expected on problems with certain known characteristics. Also, due to their small size and low dimensions the results can be easily visualized. Such datasets can be generalized for comparisons on parameters like point density, structural geometry and size in defined space. Figures 6 and 7 show star-shaped and multi-cluster synthesized data sets that are assumed as the normal or self sets; they are generated in a unit square. The entire self set in each case consists of 1000 points. The training data consists of 50% of the entire self set. The test data consists of 1000 points of which 50% is taken from the self data not in training data and the remaining 50% is randomly generated in the complement space.

The second dataset, the ECG dataset is a time series data [10] of a human electrocardiogram. An electrocardiogram (ECG) gives a time series of the electrical potential between two points on the surface of the body caused by a beating heart. It contains a single discord, an anomalous heartbeat that coincides with a premature ventricular contraction. The objective is to find the sequence that is least similar to all other sequences (See Figure 13). For the Iris data set, three sets of experiments were performed. In each of these experiments, one of the three types of Iris data is considered as normal data, while the other two are considered abnormal. In each case the normal data is used either completely (100% train) or partially (50% train) to train the system.

Figure 8. Multi shaped detectors, ~ 97.5% coverage, 0.01 Self Threshold

In the case of the ECG time series, the dataset was preprocessed. The dataset has 45000 sampled data collected for 3 minutes and sampled at an interval of 4 milliseconds.

Figure 9. Multi shaped detectors, ~ 94% coverage, 0.01 Self Threshold

The iris data set is nicely suited for the experimental validation of the algorithm due to its moderate sized dimensionality (4 features) and low data cardinality (~ 100 in each case). The ECG time series dataset is challenging in that it requires proper preprocessing and accurately distinguishing sudden deviations in time series data set that indicate anomalies.

5.3 Data Preprocessing

Figure 10. Multi shaped detectors, ~ 98.75% coverage, 0.05 Self Threshold

The iris data set was normalized, where the actual values of the variables are scaled to fit a defined range [0, 1] using 40% of maximum (addition factor) and 40% of minimum (subtraction factor) value for each dimension in the data set. Any value in this normalization above the maximum and minimum is taken as 1.0 and 0.0 respectively. This preprocessing is done for training and testing alike.

Figure 11. Multi shaped detectors, ~ 97.9% coverage, 0.05 Self Threshold

8

The ECG time series data set used the same normalization parameters as above along with a sliding overlapping window of size 7 and window shift 1, for both the training and testing datasets. As the data is already in discrete form, the preprocessing was necessary to convert the 1-dimensional dataset to a multi-dimensional dataset for detector generation in the real-valued space. In all cases, the values were scaled to fit the interval [0,1]. A sliding overlapping window of size 7, with a unit window shift was used to generate seventh-dimensional descriptors. The normal training data of sample size 1000 was selected from the time series beginning from the 2 second mark and discarding the first 1000 samples to eliminate initial value effect. The rest of the samples beginning at the 3 second mark were used as the test data set. This included the anomalous data as well.

Overlap is used to do post processing after each run and discard certain incompetent detectors. The average and standard deviation of the coverage in this case are depicted in Figures 14 to 19. For the Iris data set in the real world category, 25 independent experiments with the detector generation process were done on each of the three different groups of normal classes of data - setosa, versicolor and virginicia. The related statistics for this experiment set is depicted in Table 1. For the ECG data set in the real world data category, 10 sets of experiments were performed for window sizes of 3, 5 and 7 each. The main objective for this dataset was to clearly identify the subtle discord in the entire data set. One of the best detection, for the window size of 7 is shown in the Figure 13. The following parameters were used in the experiments: maximum iterations per run=100; convergence check between runs=5; convergence threshold=0.05, self threshold=0.01-0.07; Population size=40; selection strategy=tournament with size 2; Crossover rate: 0.4 – 0.6; general mutation rate: 0.03 – 0.05;, orientation mutation rate: {0.1, 0.2}; matrix swap mutation rate=0.5, and control code mutation rate= 0.2. In the Monte Carlo implementation, in order to achieve a 95% (0.95) confidence level (i..e, α equal to 0.05) and a standard error ε equal to 0.01, at least 50,000 sample points need to be generated. Also to check the convergence of the algorithm, an evaluation is done using the Monte Carlo estimation technique every 10 runs (i.e., λc =10)

Figure 12. A sample raw training dataset from ECG discord first series, showing the first 500 discretized data from the 2 second mark.

In addition, the following parameters were used in the evolutionary search.

 

sequential niching: maxattempts = 5 elitism rate = 0.05, cull rate=0.01, ∆ = 0.1.

The radius of a hyper-sphere detector is computed as the distance to its nearest self point, using Minkowski distance measure, with n = dimensionality of the problem.

5.5 Experimental Results and Analysis Figures 8 and 9 show the area actually covered by the evolved detector set that reached 97% coverage and 94% respectively for a self threshold of 0.01. Figures 10 and 11 demonstrate that multi-shaped detectors provide better coverage of non-self space for higher self threshold values. Several different self threshold values with slightly better coverage are shown in Figures 18 and 19. It could be observed that a smaller self radius would result in a high detection rate but high false alarm as well (also observed in [26]), while a larger self radius would result in low detection rate and low false alarm rate. Thus the self radius has a pronounced effect on detection results and alarm rates.

Figure 13. The ECG discord data set with the accurately detected anomaly during the time interval between 620 and 660.

5.4 Experimental Setting As mentioned before, the genetic algorithm evolves a single detector in one complete run, and a sequential niching scheme is involved. Also, a Monte Carlo technique is employed to perform overlap and coverage estimation.

9

Figures 14 and 15 show comparisons of running different single shaped algorithms and the mixed shaped algorithm on the synthetic data sets. Figures 14 and 15 are ROC curves [27] depicted for the synthetic datasets that show the trade-off between detection rate and false alarm rate (commonly used to measure the accuracy of classifiers). In both cases, mixed shape detectors produced high detection rates even for small false alarm rates as compared to other single shaped detectors.



Figures 16 and 17 show the variation of the number of detectors with the change in self threshold for each detector shape strategy. One important observation is to be made here: the mixed shaped detectors have an overall good performance in terms of the detector number, while the hyper-ellipse detectors are slightly better in some cases.





The coverage parameter shows that ~ 97% is sufficient to achieve good detection, although there are high false alarm rates for Versicolor and Virginicia classes. Thus tweaking this parameter could substantially reduce false alarms. The self threshold (ratio) is an important control parameter here that is needed to balance between high detection rate and low false alarm rate. As shown, with increase in ratio, detection rate decreases as well as false alarms. TABLE 1

However, Figures 18 and 19 show that the coverage attained by mixed shape detectors are comparatively better and faster as compared to single shaped detectors, except hyper-ellipses. Thus, the mixed shape detectors on an average cover the non self space better and thereby have a higher detector to coverage ratio as compared to other single shaped detector algorithms.

Accuracy of mixed shaped detectors with auto controlled coverage. (DR = detection rate, FAR = false alarm rate, ratio = self variability threshold) Normal Class

Setosa

The rate of change of coverage with respect to the sequential GA run for different algorithms is shown in Figures 18 and 19. In all cases, the mixed shape detectors have a better coverage rate and also the maximum coverage at the end of their respective evolutionary runs. However, in some cases the hyper-ellipse detectors also have better increase in coverage rates but still unable to attain the total maximum coverage as compared to the mixed one. Figures 18 and 19 also show that the deviations in results are slightly higher for mixed shaped detectors and hypersphere shaped detectors although these are within maximum limits of ± 0.11 of the range. In almost all the cases, the hyper-sphere generation algorithm had a comparatively lower coverage rate.

Versicolo r

Virginicia

Ratio

DR (%)

FAR (%)

Coverage (%)

Detectors

0.01 0.02 0.03 0.05

100 100 100 100

9.52 8.57 7.46 3.67

98.1 97.12 97.02 96.9

5 9 14 19

0.01 0.02 0.03 0.05

94.67 93.59 92 92.33

27 25 18.4 16.5

96.06 96.04 96 95.8

20 26 32 34

0.01 0.02 0.03 0.05

95.33 87 77.9 75.33

25.8 21 19.1 5.5

97 96.41 96.05 95.89

24 28 33 41

The experimentation with the ECG data set series is described as follows. The ECG training dataset after preprocessing is depicted in Figure 12. The detected discord clearly matches with the annotated part. Figure 13 shows the detection result. However, there are considerable false-negatives owing to the very subtle discords around the sharp anomaly. This could be improved by fine tuning the coverage parameter used in the MC technique. It is to be noted here that the range 620 to 670 (as per Figure 13) is the detected discord. It should also be noticed that the hyper-shaped algorithm produced the lowest number of false positives. The periphery of the discord was however misclassified as normal owing to subtle changes in anomaly as compared to non-anomalous ones. Some control was achieved by increasing the training samples and coverage parameters. The detection could also be made finer by changing the window size, though for the trials, a window size of seven produced optimal results.

The test results on Iris data set using multi shaped detectors are shown in Table 1. The results shown are averages of 25 runs for each normal class and for each of the different threshold parameters. The standard deviations of these results are all within 5%. Some important observations could be made from the results shown in Table 1.



Besides better detection results, the mixed shaped detector strategy produces potentially small number of detectors.

When the normal class is setosa or versicolor, the detection rates are high; in the case of setosa, the detection rate is approximately 100% due to the fact that this class is more clearly separated from the other two classes in the data space.

10

The use of coverage estimation technique along with a confidence interval parameter is justified, since in many cases training data is too noisy. This work made use of reproduction for candidate detectors with n-point crossover and tournament selection. In future work, different selection and crossover operators could be also studied. Testing on additional synthetic data sets and real world applications is important to validate the proposed approach.

6. Conclusions and Future Work A general framework to evolve multi-shaped anomaly detectors was proposed, and studied. It combines the potential of several detector shapes, namely hyper-ellipses, hyper-spheres and hyper-rectangles and a structured genetic algorithm to generate multiple detector shapes. Monte Carlo (MC) estimation technique is integrated in the negative selection algorithm to ensure enough coverage by the detectors. This makes the negative selection more reliable while eliminating the need for detector adjustment. The MC detector volume estimation technique is another advantage in this method, since it may be applied to any type of detectors and detection mechanism as long as a membership function of a sample point for each detector shape is provided. Hyper-shapes generated using an evolutionary approach are a considerable improvement over other geometrical shapes previously investigated. Their accuracy is comparable to hyper-ellipses [22], and better than hyperrectangles [8] and hyper-spheres [21]. In the paper [26], the authors concluded that detector set generated by V-detector is more reliable because the expected coverage instead of arbitrary detector number can be achieved by the algorithm. The proposed approach used variable-sized detectors with variable geometrical shapes that help to cover holes better, while small number of detectors reduces space requirement and time.

Future work should consider the use of self-adaptive [25] techniques to determine the mutation probability of detector generation, along the evolutionary search. The use of variable shape and size detectors could help to reduce the problems in high dimension, due to the Curse of Dimensionality. Also, other choices of distance measures could be explored to deal with high dimensions in the approach proposed in this work. The current work focused on developing a unified framework for generating multi-shaped detectors in realvalued negative selection algorithm. It showed that the multi-shaped detectors can provide better coverage compared to any single-shaped detectors. It is to be noted that other issues related to negative selection algorithms and its comparison with existing anomaly detection methods is beyond the scope of this work but will be investigated in future.

1

1

0.98

0.8 Detection Rate

Detection rate

ROC - Star Shaped dataset

0.96 0.94

Mixed Rectangular Spherical Ellipsoid

0.92 0.9 0.88

ROC- Multi Cluster dataset

0.6 0.4

Mixe Rectangular d Spherical Ellipsoid

0.2 0

0.86 0

0.01

0.02 0.03 FA rate

0.04

0

0.05

Figure 14. Comparing all shape strategies for Star dataset

0.02

0.04

0.06

0.08

FA rate

Figure 15. Comparing all shape strategies for MultiCluster

11

MultiCluster dataset Detectors Detectors Detectors Detectors

50 45 40 35 30

70

Mixed Ellipsoid Rectangle Sphere

Detectors Detectors Detectors Detectors

60 50 Number

Number

Star shaped dataset

25 20 15 10 5 0

Mixed Ellipsoid Rectangle Sphere

40 30 20 10 0

0.01

0.02

0.03

0.01

0.05

0.02

0.03

Ratio

0.05

Ratio

Figure 16. Effect of self- threshold variation on detectors

Figure 17. Effect of self- threshold variation on detectors

Star Coverage 0.01

Star Coverage 0.02

1

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Coverage

Coverage

0.9

0.5 0.4

0.5 0.4 0.3

0.3 ECoverage RCoverage MCoverage SCoverage

0.2 0.1

ECoverage RCoverage MCoverage SCoverage

0.2 0.1 0 1

0 1

4

8

12

15

19

22

25

28

31

34

37

40

43

4

7

10 13 16 19 22 25 28 31 34 37 40 43 45

45

Runs

Runs

Figure 18a. Rate of change of Coverage with the Runs for selfthreshold 0.01 in Star shaped self

Figure 18b. Rate of change of Coverage with the Runs for selfthreshold 0.02 in Star shaped self Star Coverage 0.05 1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

Coverage

C overage

Star Coverage 0.03 1

0.5 0.4 0.3

0.1

0.4 0.3

ECoverage RCoverage MCoverage SCoverage

0.2

0.5

ECoverage RCoverage MCoverage SCoverage

0.2 0.1 0

0 1

4

7

12

15

18

21

24

27

33

36

1

39

7

10

13

16

19

22

25

28

31

34

38

Runs

Runs

Figure 18c. Rate of change of Coverage with the Runs for selfthreshold 0.03 in Star shaped self.

4

Figure 18d. Rate of change of Coverage with the Runs for selfthreshold 0.05 in Star shaped self.

12

MultiCluster Coverage 0.01

1

MultiCluster Coverage 0.02 1

0.9

0.9 0.8

0.7

0.7

0.6

0.6

Coverage

Coverage

0.8

0.5 0.4

0.2

0.4 ECoverage RCoverage MCoverage SCoverage

0.3

ECoverage RCoverage MCoverage SCoverage

0.3

0.5

0.2 0.1

0.1

0

0 1

4

1

7 10 13 16 17 20 23 26 30 33 36 39 42 45 48 51 54

4

7

10 13 16 19 22 25 28 31 34 37 39 42 45 48 Runs

Runs

Figure 19a. Rate of change of Coverage with the Runs for selfthreshold 0.01 in Multi-Cluster shaped self.

Figure 19b. Rate of change of Coverage with the Runs for selfthreshold 0.02 in Multi-Cluster shaped self.

MultiCluster Coverage 0.05

MultiCluster Coverage 0.03

1

1

0.9

0.9

0.8

0.8

0.7 Coverage

Coverage

0.7 0.6 0.5 0.4

0.2 0.1

0.5 0.4 0.3

ECoverage RCoverage MCoverage SCoverage

0.3

0.6

RCoverage MCoverage SCoverage

0.2 0.1 0

0 1

4

7

10

13

16

19

22

25

28

31

34

37

38

1

42

Runs

Figure 19c. Rate of change of Coverage with the Runs for selfthreshold 0.03 in Multi-Cluster shaped self.

5

9

13

17

21

25

29

33

37

41

42

43

47

51

Runs

Figure 19d. Rate of change of Coverage with the Runs for selfthreshold 0.05 in Multi-Cluster shaped self. conference on knowledge discovery and data mining, Alberta, Canada, 2002.

Acknowledgements

[3] T. Y. Yoshikiyo. Fault detection by mining association rules from house-keeping data. In proceedings of international symposium on artificial intelligence, robotics and automation in space (i-sairas), Montreal, Canada, June 2001.

These authors gratefully acknowledge Zhou Ji (St. Jude, Children’s Research Hospital) for insightful comments and suggestions and providing the synthetic datasets.

References

[4] M. H. Cahill, D. Lambert, J. C. Pinheiro and D. X. Sun, Detecting fraud in real world. Technical report, Bell Labs, LucentTechnologies.http://citeseer.ist.psu.edu/cahill00dete cting.html.

[1] D. Dasgupta and S. Forrest. Novelty detection in time series data using ideas from immunology. In Proceedings of the International Conference on Intelligent Systems, pages 82– 87, June 1996.

[5] W. Lee and S. Stolfo. Data mining approaches for intrusion detection. In Proceedings of the 7th USENIX security symposium, San Antonio, TX, 1998. [6] T. Lane. Machine learning techniques for the computer security. PhD thesis, Purdue University, 2000.

[2] E. Keogh, S. Lonardi, and B. Y. Chiu. Finding surprising patterns in a time series database in linear time and space. In Proceedings of the eighth ACM SIGKDD International

13

[7] D. Denning. An intrusion-detection model. In IEEE computer society symposium on research in security and privacy, pages 118–31, 1986.

of the Genetic and Evolutionary Computation Conference (GECCO) Seattle, Washington June 26-30, 2004. [21] Z. Ji and Dipankar Dasgupta. Real-Valued Negative Selection using Variable-Sized Detectors. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) Seattle, Washington June 26-30, 2004.

[8] D. Dasgupta and F. González. An Immunity-Based Technique to Characterize Intrusions in Computer Networks. IEEE Transactions on Evolutionary Computation, 6(3):281–291, June 2002.

[22] J. M. Shapiro, G. B. Lamont, Gilbert L. Peterson. An Evolutionary Algorithm to Generate Hyper-Ellipse Detectors for Negative Selection. In proceedings of Genetic and Evolutionary Computation Conference (GECCO), Washington, D.C., June 25-29, 2005.

[9] F. González and D. Dasgupta. An imunogenetic technique to detect anomalies in network traffic. In the proceedings of the genetic and evolutionary computation conference, pages 1081–1088, New York, 9-13 July 2002. Morgan Kaufmann Publishers.

[23] C. L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.

[10] S. A. Hofmeyr and S. Forrest. Architecture for an artificial immune system. Evolutionary Computation, 8(4):443–473, 2000.

[24] “Computational Science Education project. Introduction to Monte Carlo methods,” Oak Ridge National Laboratory, 1995. Available at: http://csep1.phy.ornl.gov/mc/mc.html

[11] R. O. Canham and A.M. Tyrrell. A Multilayered Immune System for Hardware Fault Tolerance within an Embryonic Array. In Proceedings of 1st International Conference on Artificial Immune Systems, University of Kent at Canterbury, UK. September 9th-11th, 2002.

[25] J. S. Liu, Monte Carlo Strategies in Scientific Computing. Springer-Verlag, 2001. [26] Z. Ji and D. Dasgupta. Augmented Negative Selection Algorithm with Variable-Coverage Detectors. In Proceedings of the Congress on Evolutionary Computation (CEC), Portland, Oregon (U.S.A.) June 20-23, 2004.

[12] D. Dasgupta and S. Forrest. Tool Breakage Detection in Milling Operations using a Negative-Selection Algorithm. Technical Report No. CS95-5, Dept. of Computer Science, University of New Mexico, August, 1995.

[27] F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In the Proceedings of 15th International Conference On Machine Learning, pp. 445-453, San Francisco, CA, 1998. Morgan Kaufmann.

[13] P. D'haeseleer, S Forrest, P. Helman, An immunological approach to change detection: algorithms, analysis and implications. In the Proceedings of IEEE Symposium on Security and Privacy, pp. 110 - 119. June, 1996. [14] J. Balthrop, F. Esponda, S. Forrest and M. Glickman. Coverage and Generalization in an Artificial Immune System.. In Proceedings of the Genetic and Evolutionary Computation Conference. (GECCO), Morgan Kaufmann. New York, pp. 3-10, 2002.

[28] E. Keogh, J. Lin, A. Fu. HOT SAX: Finding the Most Unusual Time Series Subsequence: Algorithms and Applications. In Proceedings of ICDM-2005. [29] D. Dasgupta and D. R McGregor. sGA: A structured Genetic Algorithm. Research Report IKBS-11-93, April 1993.

[15] P. Harmer, G. Williams, P.D. Gnusch, and G.Lamont. An Artificial Immune System Architecture for Computer Security Applications. In IEEE Transactions on Evolutionary Computation, vol. 6, no. 3, pp. 252-280, June 2002.

[30] M. Breunig, H. Kriegel, R.Ng, and J. Sander. Lof. Identifying density-based local outliers. In Proceedings of ACM SIGMOD, International Conference on Management of Data, 2000.

[16] L. N. de Castro and J. Timmis. Artificial Immune Systems: A New Computational Approach. London, UK: SpringerVerlag, 2002.

[31] Y. Oshawa. Introduction to Chance Discovery and Chance Management: What we did, what we should, what we can. In AAAI Fall 2002 Symposium on Chance Discovery, November 15-17, Massachusetts, USA. 2002,

[17] M. Ayra, J. Timmis, L. de lemos, R. de Castro, R. Duncan. Negative Selection: How to generate detectors. In proceedings of the 1st International Conference on Artificial Immune Systems (TCARIS), Canterbury, UK, Sept. 2002, pp. 89-98.

[32] Y. Cao and D. Dasgupta. An Immunogenetic Approach in Chemical Spectrum Recognition. Chapter 36 in the edited volume Advances in Evolutionary Computing (Ghosh & Tsutsui, eds.), Springer-Verlag, Inc. January 2003.

[18] D. Dasgupta and F. González. An Immunity-Based Technique to Characterize Intrusions in Computer Networks. In the Journal IEEE Transactions on Evolutionary Computation, Volume: 6, Issue: 3, Page(s):281-291, June, 2002.

[33] S. Forrest, A.S. Perelson, L. Allen, R. Cherukuri. Selfnonself discrimination in a computer. In the proceedings of the 1994 IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press. 1994. [34] M. .Mitzenmacher, E. Upfal, Probability and ComputingRandomized Algorithms and Probabilistic Analysis. Cambridge University Press, pp. 252-254, 2005.

[19] F. González. A study of Artificial Immune Systems Applied to Anomaly Detection. PhD. Dissertation, May 2003. [20] L. J. Gonzalez, James Cannady. A Self Adaptive Negative Selection Approach for Anomaly Detection. In Proceedings

14

[38] H. Niederreiter, Random number generation and quasiMonte Carlo methods. Society for Industrial and Applied Mathematics. 1992.

[35] O. Mazhelis, S. Puuronen and J. Veijalainen. Modelling Dependencies between Classifiers in Mobile Masquerader Detection. In the Proceedings of ICICS, pp. 318-330. October, 2004.

[39] W. S. Mahfoud. "A comparison of parallel and sequential niching methods". In L. J. Eshelman, editor, 6th Int. Conf. on Genetic Algorithms, pages 136--143. Morgan-Kaufmann, 1995. Available at http://citeseer.ist.psu.edu/mahfoud95comparison.html

[36] T. Stibor, J. Timmis, and C. Eckert, A comparative study of real-valued negative selection to statistical anomaly detection techniques. In Proceedings of 4th International Conference on Artificial Immune Systems (ICARIS), Lecture Notes in Computer Science, vol. 3627. SpringerVerlag, 2005, pp. 262-275.

[40] Team, PAL Core Development. “PAL Project: Gamma Function”, October 2004.

[37] M. Verleysen, Learning high-dimensional data, Limitations and Future Trends in Neural Computation, vol. 186, pp. 141-162, 2003.

15

Suggest Documents