Artificial Immune Systems and Kernel Methods - Semantic Scholar

1 downloads 0 Views 513KB Size Report
Abstract. In this paper, we focus on the potential for applying Kernel. Methods into Artificial Immune Systems. This is based on the fact that the commonly ...
Artificial Immune Systems and Kernel Methods T. S. Guzella1,2 , T. A. Mota-Santos2 , and W. M. Caminhas1 1

Dept. of Electrical Engineering, Federal University of Minas Gerais, Belo Horizonte (MG) 31270-010, Brazil, {tguzella,caminhas}@cpdee.ufmg.br 2 Dept. of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte (MG) 31270-010, Brazil, [email protected]

Abstract. In this paper, we focus on the potential for applying Kernel Methods into Artificial Immune Systems. This is based on the fact that the commonly employed “affinity functions” can usually be replaced by kernel functions, leading to algorithms operating in the feature space. A discussion of this applicability in negative/positive selection algorithms, the dendritic cell algorithm and immune network algorithms is conducted. As a practical application, we modify the aiNet (Artificial Immune Network) algorithm to use a kernel function, and analyze its compression quality using synthetic datasets. It is concluded that the use of properly adjusted kernel functions can improve the compression quality of the algorithm. Furthermore, we briefly discuss some of the future implications of using kernel functions in immune-inspired algorithms. Key words: Artificial Immune System, Affinity Functions, Kernel Methods, Immune Network, aiNet

1

Introduction

Recently, Artificial Immune Systems (AISs), have emerged as a novel soft computing paradigm [1], seeking inspiration in the immune system for the development of computational models for solving problems. Most algorithms employ the concept of a so called “affinity function”, which describes the degree of matching between two entities (a cell or antibody and an antigen). Usually, these affinity functions are obtained by adapting corresponding distance functions, so that the affinity between two entities is inversely proportional to their distance in some metric space, and the algorithms can be described in terms of distances. In the framework for designing AISs proposed in [1], the design of the affinity function(s) follows the definition of the representation used for cells and molecules. In the case of systems employing real-valued representations, the Euclidean distance is one of the most commonly used affinity measures. This is inspired by early theoretical work by Perelson and Oster [2], which proposed the concept of a shape space, a metric space for quantifying the chemical interactions between molecules, where the Euclidean distance was originally used. However, in accordance with the convention followed in the machine learning community, we will use the term input space when referring to the shape space. This use of

2

Guzella, Mota-Santos, Caminhas

general affinity functions, without taking into consideration the characteristics of the target problem, has been recently criticized by Freitas and Timmis [3], who highlight the need to follow a problem-oriented approach in designing an AIS, in which the adoption of a certain affinity function is justified by characteristics of the target application. The impact of using some affinity functions has been recently studied by some researchers. Hart [4] has shown the effects of the affinity function in idiotypic networks based on real-valued representations, influencing the size and dynamics of the resultant networks, pointing out the importance of carefully defining the affinity function and network parameters when applying a network to solve a problem. Recent work by Hart et al. [5] provides additional evidence of effects on the topology of the network, influencing its properties. In the context of negative selection algorithms, Stibor et al. [6] have conducted an in depth analysis of the use of the Euclidean distance, showing that coverage problems arise when dealing with high-dimensional data. In parallel, kernel-based learning algorithms have been gaining an increasing focus in research, such as Support Vector Machines (SVMs) [7] and kernel PCA [8]. Kernel methods are based on mapping an input data point into a suitable Hilbert space, termed the feature space, allowing for very general representations of characteristics of the data being analyzed, and then performing computations in this new space. The underlying theory allows the manipulation of data in the potentially infinite-dimensional feature space without explicitly knowing the map from the input to the feature space. In considering that several AISs can be seen as similarity-based algorithms, due to the use of distance functions, we analyze the application of kernel methods in immune-inspired models, discussing, in an informal way, how some algorithms can be modified to work in the feature space. Argued by Timmis [9] as an important line of investigation to allow the advancement of AIS, theoretical aspects have been receiving an increasing interest (see review in [10]). Therefore, grounding the use of affinity functions into a proper theoretical framework is an important step towards the design of new algorithms. This paper is organized in the following way: section 2 presents a brief overview of the theory of Reproducing Kernel Hilbert Spaces, which provides a theoretical basis for the application of kernel functions. In sequence, section 3 discusses the applicability of kernel methods in some immune-inspired algorithms. As a practical application, section 4 derives and evaluates a kernel-based version of aiNet (Artificial Immune Network) [11]. Finally, section 5 presents the final conclusions of this work, along with future research directions.

2

Theory of Reproducing Kernel Hilbert Spaces

This section presents a brief discussion of the basic concepts of the theory of Reproducing Kernel Hilbert Spaces (RKHSs), closely following [8]. Throughout the discussion, we assume that the input space X is a non-empty set, and restrict the presentation to real-valued kernels. In addition, the dot product between two

Artificial Immune Systems and Kernel Methods

3

vectors x and y is represented by hx, yi. The dot product is the starting point in the theory of kernel methods due to the fact that it allows the generalization of several geometrical operations (such as projection, distances and the angle between two vectors). In learning algorithms, it is used to derive a notion of similarity between two elements (not necessarily vectors). Definition 1. (Positive Definite (PD) Kernel) A kernel function k : X ×X → < is a PD kernel if, for any X = {x1 , x2 , . . . , xn } ⊂ X , the n × n matrix with elements ki,j = k(xi , xj ) is positive definite. In particular, for a PD kernel, k(x, x) ≥ 0, ∀x ∈ X , k(x, y) = k(y, x) and the Cauchy-Schwarz inequality |k(x, y)|2 ≤ k(x, x)k(y, y) holds (from which the triangle inequality can be derived). Defining a map φ : X → H, where H is the space of functions mapping X into < (usually referred to as the feature space), φ(x) can then be seen as a function that assigns the value k(x, y) to y ∈ X , thereby transforming each point x into a function. Given an arbitrary set X = {x1 , x2 , . . . ,P xn } ⊂ X , considering the vector space defined by linear n combinations f (·) = i=1 αi k(·, xi ), the dot product can be computed as: X hf, f i = αi αj k(xi , xj ) ≥ 0 (1) i,j

where the non-negativity follows from the fact that the kernel k is PD, implying that h·, ·i in the feature space is also a PD kernel. The concept of the space where the mapped patterns φ(x) lie is formalized through the definition of a Reproducing Kernel Hilbert Space. Definition 2. (Reproducing Kernel Hilbert Space (RKHS)) A Hilbert Space of Functions H is a RKHS with kernel function k : p X × X → 0, c ≥ 0 and d ∈ N determine the shape of the mapped points in the feature space, and their appropriate adjustment is crucial for a good performance of the algorithms employing such kernel functions. In addition, due to the fact that the choice of a kernel for an application is rather arbitrary, there is an increasing focus on the development of kernel functions incorporating priorknowledge (e.g. [8]).

3

Applicability of kernel functions in immune-inspired algorithms

In this section, we briefly discuss the applicability of the theory of RKHS in various immune-inspired algorithms. A potential advantage of using kernel functions is that these allow for more general representations of data dependencies, which can improve the performance of some algorithms. In addition, from the discussion presented in the previous section, it follows that the only theoretical requirement is that X is a non-empty set. As AIS are not restricted to real-valued representations (see [3] for a discussion of the representations used in some models), the framework of kernel methods fits, initially, nicely in this area. In the following paragraphs, we center the discussion in three families of algorithms: positive/negative selection, the dendritic cell algorithm and immune network approaches, focusing on real-valued representations. 3.1

Positive and Negative Selection

In positive/negative selection approaches (e.g. [12]), a set D, containing detectors, is checked against a test point x to determine if it is indicative of normal or anomalous behavior (usually referred to as self and non-self, respectively). This procedure can be described for both algorithms by equation 9: ! X f (x) = θ θ (bi − d (x, si )) (9) si ∈D

Artificial Immune Systems and Kernel Methods

5

where θ(·) is the step function defined by θ(x) = 1, if x > 0 or 0, otherwise, and bi is the activation threshold of the i-th detector. The expression θ (bi − d (x, si )) represents the activation of the i-th detector, which happens if its distance for the test point x is smaller than the threshold bi . In positive detection schemes, it follows that, if f (x) = 1, then x is classified as normal, while, in negative detection algorithms, f (x) = 1 indicates that x is anomalous (non-self). An analysis of equation 9 indicates that replacing the commonly used Euclidean distance in positive/negative selection algorithms with a kernel function, so that the distance is evaluated in feature space (i.e. equation 5) should have a minor impact in the performance of such algorithms. Due to the fact that the activation of one detector does not influence the remaining detectors, the evaluation of the distance in feature space merely alters the recognition region of each detector in the input space (defined by Si = {x : d(x, si ) ≤ bi , x ∈ X }). As an example, using the Gaussian kernel (equation 6) to calculate the distance can be seen as merely changing the radius of detection in comparison with the one obtained with the Euclidean distance. 3.2

Dendritic cell algorithm

The Dendritic Cell Algorithm (DCA) [13] is a recent proposal in the area of AIS. It is based on the behavior of dendritic cells sampling antigens and signals from the environment, and assuming a migration behavior depending on the sampled signals. Given four input signals (danger, PAMP, safe and an inflammatory signal), three output signals are derived, indicating the co-stimulation of each dendritic cell, along with a mature (a pro-inflammatory phenotype) and a semi-mature output signal (anti-inflammatory phenotype). In the case that the inflammatory signal is constant, the output signals can are given as a linear combination of the input signals: Ψcs = wdcs Id + wpcs Ip + wscs Is

(10)

Ψmt = wdmt Id + wpmt Ip + wsmt Is Ψsm = wdsm Id + wpsm Ip + wssm Is

(12)

(11)

where Ψcs , Ψmt and Ψsm are the co-stimulation, mature and semi-mature output signals, Id , Ip and Is are the danger, PAMP and safe input signals, and the w’s are constants. It can be seen that, in this setting, the output signals can be represented as dot products3 between a vector containing the input signals  T (I = Id Ip Is ) and another vector containing the appropriate constants: Ψcs = hwcs , Ii Ψmt = hw 3

mt

(13)

, Ii

(14)

Ψsm = hwsm , Ii

(15)

It should noted that this was first pointed out by Dr. T. Stibor during the technical discussions at ICARIS-2007

6

Guzella, Mota-Santos, Caminhas

In this case, it follows that each output signal is obtained by multiplying the length of the appropriate weight vector by the projection of I onto the weight vector (kIk cos (α), where α is the angle between I and the w vector). Therefore, even though the DCA does not employ affinity functions, kernel functions could be applied to it, replacing the dot products in equations 13-15. However, the meaning of such modification is not clear at present, due to the fact that the relevant parameters of the algorithm have been derived from experimental data. It should become clearer as the general mathematical properties of the algorithm are investigated. 3.3

Idiotypic network algorithms

Idiotypic network algorithms (also called immune network algorithms) are based on a network theory of the immune system. Two examples are aiNet, proposed by de Castro and Von Zuben [11], and the network-based AIS presented by Timmis et al. [14]. The use of kernel functions in these algorithms should have a noticeable impact on these algorithms, due to the fact that the affinity between antibodies or B cells and antigens usually affects the structure of the networks (e.g. [4]). In particular, in the next section, we consider, as a practical example of the incorporation of kernel functions in AIS, the derivation of a modified aiNet algorithm, which operates in the feature space, and analyze how its performance is influenced.

4 4.1

A Practical Application: aiNet Derivation of kernel-based version of aiNet

In this section, we present a kernel-based version of aiNet, an immune network algorithm proposed in [11]. This algorithm was chosen due to the recent work of Stibor and Timmis [15], which investigates its compression quality, using the originally proposed Euclidean distance as affinity function. In that work, it was verified that aiNet may face problems when dealing with datasets containing dense regions, and it was argued that these problems are due to the optimization criterion used in the algorithm for suppression between clones, aimed at eliminating redundancy. They have suggested that it should be feasible to modify such criterion to overcome the problem. However, considering that the criterion was inspired by the Idiotypic Network Theory [16], in that a clone is suppressed if recognized by another clone, such modification may not be straightforward, and may affect the biological inspiration of the algorithm. This motivates us to investigate another modification: the affinity function. As we do not go into details regarding aiNet, the reader is referred to [11] and [15] for details of the algorithm. In addition, we follow the same notation of the parameters used in the algorithm as in [15]. Finally, it should be kept in mind that, because kernel methods can be applied to very general representations (for which a PD kernel is defined), not only vectors, the adapted version of aiNet considered here is not a true kernel method, as it requires a real-valued vector representation.

Artificial Immune Systems and Kernel Methods

7

The aiNet algorithm is based on a set of interconnected antibodies, which represent internal images of antigens to which the network is exposed, modeling the competition for antigenic recognition, while eliminating antibodies that recognize each other. An affinity measure, which, in [11], was originally based on the Euclidean distance, is used to quantify the interaction strength between an antibody and an antigen and, also between two antibodies. The affinity measure results from an adapted distance function, such that the affinity is maximum when the distance is minimum. Therefore, in the following discussion, we consider how such distance function can be modified. Assuming that the input data lie in an input space X =