recognition. 1. Introduction. It is well known that the artificial neural network .... Ag. kAg dt. dAg where, k is the multiplying ratio and dim k =T-1, meaning the number of new antigens produced in ..... IEEE/ASME International Conference on.
Proceedings of the International Conference on Artificial Intelligence (IC-AI'02), pp.147-153. Las Vegas, Nevada, USA. June 24-27, 2002.
A Novel Neural Network Based on Immunity Lei Wang and Michele Courant PAI Group, Department of Informatics, University of Fribourg Chemin du Musee 3, CH-1700 Fribourg, Switzerland Abstract- Based on analyzing natural immune phenomena and utilizing performances of the existing artificial neural networks, a novel network model (artificial neural network based on immunity-ANNI) is proposed which integrates the immune mechanism and the function of neural information processing. The learning algorithm of ANNI contains the method of selecting an excitation function and an adaptive algorithm of network learning. ANNI makes it easy for a user to use the characteristic information of a pending problem and to simplify the original network structure, and therefore is able to make the efficiency and the accuracy improved obviously. Besides theoretic analysis, simulation on the twin-spiral problem also shows that ANNI is not only effective but also feasible. Keywords- Artificial neural networks, excitation function, immunity, the twin-spiral problem, pattern recognition.
1. Introduction It is well known that the artificial neural network (ANN) is an important research area of people imitating and exploring the information processing mechanism of biological intelligence. Especially Rosenblatt and Holland et al. [1]-[3], they proposed some widely applied network models, which makes the study on this field extend from theoretical analysis to engineering practice, and many good results are received. However, it is necessary to note that these existing network models are all based on highly simplifying and abstracting natural neural systems, which avails their development and application in engineering practice, but losses some original functions of the natural system. With wide spread of ANN's application, there continually appear some problems, such as prone to plunging into local extreme when learning algorithm not selected suitably, existing a conflict between the network complexity and its generalization, and so on. On the other hand, during the process of artificial intelligence learning from biological intelligence, people are gradually waking up to the importance of
biological immunity. The concept of immunity in engineering can be traced back to 1986[4], when the parallel between immunology and classier systems was noted. In this work, a classifier system was used to model certain aspects of the immune system, by drawing an analogy between individual classifier rules and antibody types. Classifier strength represented the concentration of the antibody type, and interactions between classifier rules modeled Jerne's idiotypic network hypothesis [5]. After that, Varela, Stewart and Perelson et al. made the theory of biological immune system and the concerned mechanism more perfect [6]-[8]. It can be seen from their viewpoints that the function of biological immunity contains three aspects, that is, immune recovery, immune stabilizing and immune surveillance. In addition, Ishiguro et al. proposed an interconnecting immune network model [9], which is a large-scale immune network composed of many local networks and is used for controlling a mobile six-boot robot. Kumak, Lee and many other researchers proposed many models for different problems, and good results were received [10]-[18]. From deep analysis of these existing network models and algorithms, we can learn about that many of these methods lack the capability of meeting an actual situation when setting parameters, which is conducive to their universality but neglects the assistant function of the characteristic information and background knowledge. The loss due to the negligence is considerable when dealing with some complex problems. Based on this consideration, this paper aims at introducing the immune concepts into ANN and designing a novel network model, so as to use the characteristic information and background knowledge for solving problem. This model is presently called artificial neural network based on immunity (ANNI) and it is used for improving the capability of dealing with some difficult problems.
2. The neural network model based on immunity
2.1 Base of immune neural network in biology The material basis of ANN comes from the model of neuron in biology. It is generally realized that an intact neuron is made up of several components, such as cell body, dendrite, axon, synapse, neuron twig, etc., in which a cell body is the principal part of a neuron. On the other hand, there are about 1011 neurons in the brain system of a human. These neurons are essentially same in the physical structure, but are obviously different in their functions and effects they take in the system. One of the reasons of the above is that the cell body of different neurons has different sensitivity to outside or other neural signals and has different capabilities of dealing with signals. We take the neural cells of vision for example; there may be a fact that some neurons are very sensitive to red signal but insensitive to the blue. Moreover, the difference in this kind of response changes with the order of stimulation received by brain from the outside. In other words, the above phenomenon is very likely due to neurons intensively reacting to red signal has an instinctive or an imposed from the outside immunity to blue signal. In fact, this phenomenon of cell immunity affecting neural signal processing exists widely in nature. It is generally realized that the immune system is a kind of defense system of an organism, especially the vertebrate animals and the human beings. The immune system is composed of the concerned organs and cells with immune functions and the molecules with the immune effects, and it can keep an organism against the infringement from pathogens, harmful matters and cancer cells. The function of biological immunity contains three aspects, that is, immune recovery, immune stabilizing and immune surveillance. In view of biological immune system, the immune reaction is a mainly carried out through the interaction between antibodies and antigens. In which, an antigen is a general designation of all kinds of
non-self organs or matters and it is composed of a carrier and a hapten, and an antibody is an immune molecule secreted from a kind of immune cell, it can identify and bind an antigen and can further kill the antigen under the cooperation of other immune cells and molecules. In addition, a vaccine is regarded as a matter that has main features of an external antigen but do not infract the organism. There exists a proportional relationship between antigens and antibodies in an organism (this proportion is just can be regarded as a density), that is, if the number of antibodies directing against a certain kind of antigens is less, then these antibodies are lack of enough capabilities to bind this kind of antigens and kill them when the antigens intrude into the organism. However, if the antigens re-intrude into the organism, then some properties of the organism will take some change so as to make the number of antibodies increase and so kill the antigens. Therefore, the immune power of an organism aiming at the extrinsic antigens can be improved through injecting vaccines into the organism. It can also be seen that the immune system has the power of memory, selflearning, self-organization and self- adaptation. Agmax
k1 Ag
The immune network model proposed and developed by Varela composes two major aspects. The first one concerns the dynamics of the system, i.e., the differential equations governing the increase/decrease of the concentration of a set of lymphocyte clones and the corresponding immunoglobins (Ig). The other concerns the metadynamics of the system, and only this aspect will be emphasized and exploited here for problem solving. However, up to now, there has not been a widely accepted viewpoint about the structure of artificial immune network.
k2
Ag0
t
Figure.1 Approximate curves of the variation of antigen density with time Suppose the density of antigens intruding into an organism is Ag, and the maximal density of antigens, which all the nutrition can keep living, is Agmax. If nature decease of the antigens is not concerned, then the dynamics characteristics of variation of the antigen density with time accords with Logistic equation[20]. i.e., dAg Ag = kAg 1 − dt Ag max
where, k is the multiplying ratio and dim k =T-1, meaning the number of new antigens produced in each unit of time. Suppose the initial density of antigens is Ag0, and integrate eqn.1, then, Ag = Ag (t ) =
Ag 0 Ag max Ag 0 + ( Ag max − Ag 0 )e −kt
The curve of variation of antigen density with time is illustrated as Figure 1, in which different curve corresponds to different value of k, and the more the value of k, the more steep the curve, i.e., k1 > k2.
2.2 Model of ANNI In existing ANN models, a neuron is regarded as a unit that sums all the input signal, and then output a signal after comparing with a threshold. Distinctive features of this kind of models denote that they have simple structures and good versatility. However, these features simultaneously bring out a lack that the active and assistant functions of characteristic information is not considered when dealing with a concrete problem. To be exact, there is no such interface in these existing models. Based on this consideration, a vaccinating unit is designed in this paper for utilizing background information and prior knowledge of a pending problem. This model is shown as Figure 2.
where, fi(.) is a function family with a series of parameters V, and its concrete form depends on the pending problem. On the other hand, some features of the problem are also contained in the informationprocessing layer; therefore, the structure of this kind of network is usually simple.
3. Learning algorithm of ANNI In this paper, methods of how to select an excitation function and train the concerned parameters are mainly considered.
3.1 Methods of selecting an excitation function An excitation function in ANNI cannot be used universally under general conditions, and it is exclusively designed in accordance with characteristic information of the pending problem or the prior knowledge. There is usually some
Information processing layer Input layer
Output layer
f1 (X,V)
f2 (X,V)
X
Y
fM(X,V)
V = {v1 , v 2 ,L , v N }
Vaccinating unit
Figure.2 Model structure of artificial neural network based on immunity In the model shown above, a neuron is firstly considered to take an important action during information processing, and secondly, all the neurons are similar in basic properties, but different in idiographic forms. Therefore, the excitation function of a neuron should be designed as a variable form. To be exact, basic properties of the function keep unchanged, but its concrete form can be changed through adjusting some of its parameters. To be more exact, the excitation function of any neuron i can be designed as the following form: u i = f i ( X ,V )
difference in the concrete form although the properties among different excitation functions in a network are same. The process of designing a network model usually refers to the following steps. 1. Vaccines extracting. A vaccine here is a quote from biology, and means some characteristic information of the pending problem or the prior knowledge. The act of extracting vaccines means the process of analyzing the pending problem, collecting the characteristic information and then finding the constraint relation between input and output in accordance with prior knowledge. It is necessary to point out that this kind of constraint relation should be sample and easy to get, not required very general and accurate.
2.
3.
4.
Basic model of an excitation function designing. Select a function family with pending parameters in accordance with the above constraint relation. The basic model and properties of these functions are same, but usually different in the concrete form because of the deference of parameters. Vaccinating. Complete the vaccinating unit shown in Figure 1 in accordance with vaccines obtained from Step 1, i.e., confirming the immune vector V = {v1 , v 2 ,L , v N } , which is used for determining the parameters in concerned function of an excitation unit. Here it is necessary to note that there is usually not only one characteristic in a certain problem, that is to say, there may not be only one vaccinating vector in the vaccinating unit. Therefore, during the course of vaccination, either selecting any vaccine randomly or getting them together according to a certain logic relationship can carry out the injection. Network training. Select an algorithm of network training, such as LMS and BP-algorithm, and use training samples for parameters learning, i.e., the weight matrix, thresholds, etc.
3.2 Self-learning algorithm of network In some cases, it is difficult to extract the characteristic information due to knowing little priori knowledge, or searching this kind of information makes the workload increase greatly and the efficiency decreases, so that the value of this work is lost. At this time, we can also design the excitation unit as a form whose excitation function can be adjusted, however, parameters in the excitation function will be regarded as an object for training. Taking an ANNI with one hidden layer for example, suppose the weight matrixes of the input layer to the hidden and the hidden to the output are respectively W(1) and W(2), then the output vector Y is: Y = W ( 2) f (W (1) X ,V )
(2)
Suppose the real output of training samples is Z, then the error function can be defined as: 1 E= 2
(i ) (i ) ∆W (t + 1) = η ∆V (i ) (t + 1) = µ (i )
∂E ∑ ∂W (t ) + α[W
(i )
(i )
∂E ∑ ∂V (t ) + β[V (i )
(i )
(t ) − W (i ) ( t − 1)
(t ) − V (i ) ( t − 1)
i = (1, 2)
]
]
(4)
where α and β are all the momentum factors, and η(i) and µ(i) are learning rates. It is necessary to point out if η(i) and µ(i) is great, then the training process is more prone to convergence, but oscillation appears in the late process. Therefore, at the beginning of training, η(i) and µ(i) are usually set great values, and then decrease with the training process. 3.3 An example of ANNI's designing It is well known that the twin-spiral problem is a nonlinear classification problem [22], and here we suppose the rectangular coordinates of any point on the spiral line is (x, y), then the twin-spiral parameter equation can be expressed as: x = (αθ + β ) cosθ y = (αθ + β ) sin θ
(5)
where, α and β are both constants respectively denoting the angular velocity of the spiral line and the starting distance. ANNI is used for the twin-spiral classification as the following steps. 3.3.1 Vaccines extracting From eqn.5, we can easily get,
P
∑ (z
which, the training method group by group is proposed mainly for aiming to the method one by one, and used for increasing the training speed. It first adds all the modifying values produced by a group of samples, and then makes a modification for one time. The method of adding momentum items denotes using the modifying value produced by the former step to smooth the learning path, so as to avoid getting into local extreme. To be exact, the equation of modifying weight W and parameter V is shown as follows.
− yi )
2
i
(3)
i =1
For the purpose of convenient operation, the gradient-descending method is used for network training. During the training process, for smoothing learning path and increasing learning speed, a synthetic approach of training group by group and the adding momentum items should better be used. In
x2 + y 2 − α y = tan x β
(6)
Because the coordinates of any point are same in essence, if x and y are considered as input nodes of the network, their weights from the input layer to hidden layer are also same. Based on this prior knowledge, for simplifying the network, we can set
all weights equal to 1 (even if some of them are not, we can justify the parameters in the excitation function for compensation, or train the network to confirm the final weight matrix in accordance with Step 4). 3.3.2 Basic model of an excitation function designing
Based the consideration above, the concrete ANNI for the twin-spiral problem can be designed as the form shown as Figure.3: O y
3.3.4 Network training Select an algorithm of network training, such as LMS and BP-algorithm, and use training samples to modify the network weight matrix and thresholds. In this example, considering the features of the twin-spiral problem, we can sequentially adopt the original settings, and need not to make a re-training. 4. Simulations
f1
x
reaches the settings, and thereat obtain the vaccine vector V = {v1 , v 2 } .
f2
V = {v1 , v 2 }
Figure.3 Network structure for the twin-spiral problem
From eqn.6, the excitation function in hidden layer can be as: x 2 + y 2 − v1 f ( x, y, v1 , v2 ) = x tan − y v2
2
3.3.3 Vaccination
The capability of ANNI is studied with an example of the twin-spiral classification. At first, we generate 640 points with random noise which respectively belong to two spiral lines ρ1 and ρ 2 (320 points per each), shown as Figure 4. Here, the angular velocities of the two spiral lines are same (both 4), and in addition, the starting distances are respectively 1 and 7. We select alternately half of the sample points for the training data, the rest are used for the testing data. Finally, we utilize the method proposed in Section 3.3 for extracting the characteristics information, confirming the vaccine factor V = {v1 , v 2 } and training network. 100 80 60
At first, we should confirm the evaluation function of concerned parameters for optimizing the excitation function. Suppose the anticipant output of the network is Z, i.e.,
1 2
∑ [z
0
-40
where, M is the number of training samples, and the evaluation function of the ith (i = 1, 2) neuron in hidden layer is supposed to be: ∆
20
-20
Z = {z1 , z 2 , L , z M } ,
Ei (v1 , v2 ) =
40
-60 -80 -100 -100
-50
0
50
100
2
M
j
j =1
− f i ( x j , y j ,v1 ,v2 )
]
In the next, we begin to optimize the adjustable parameters v1, v2 based on the above evaluation function. In which, the iterative function adopted is shown as follows: Vi (k + 1) = Vi (k ) − k
∂Ei (v1 , v2 ) ,i = 1,2 ∂vi
where, k is step length of iteration. Iterate the above equation until the evaluation function E i (v1 , v 2 )
Figure.4 Twin-spiral lines with random noise In the operation, we use MatLab5.3 for programming and operate it on a Pentium-233 PC. When the trained network is used for classifying the testing data, there are 7 points in summon which are classified in error, and the correct distinguish ratio is 97.81%. If without noise, then the correct distinguish ratio is 100%, which is highly improved form what is reported in the references [13] and [14].
5. Conclusions ANNI proposed in this paper integrates the immune mechanism and the function of neural information processing, and is mainly applied for utilizing the background information and prior knowledge when dealing with a complicated problem. The learning algorithm of ANNI contains an excitation function self-adjusting and the network training. ANNI is conducive to simplifying the structure of an existent model and improving searching performance when dealing with a pending problem. However, it is necessary to point out that there is also a lot of work to do on the theory of designing and further optimization of this model, for example, the algorithm of optimizing excitation function of a neuron in hidden layer, the algorithm of adjusting parameters of the excitation function, and the algorithm of network weight matrix training, etc. References [1] F. Rosenblatt. Principles of Neurodynamics. Spartan, New York, 1962. [2] J. J. Hopfield. Learning algorithms and probability distributions in feed forward and feedback networks. Proc. Natl. Acad. Sci. USA, 84: 8429-8433, 1987. [3] S. Grossberg. Nonlinear neural networks: principles, machines and architectures. Neural Networks, 1: 15-57, 1988. [4] J. D. Farmer, N. H. Packard and A. S. Perelson. The immune system, adaptation and machine learning. Physica D, 22: 187-204, 1986. [5] N. K. Jerne. Towards a network theory of the immune system. Annual Immunology, 125C: 373-389, 1974. [6] F. J. Varela and J. Stewart. Dynamics of a class of immune networks. I) Global behavior. J. Theo. Biol., 144: 93-101, 1990. [7] J. Stewart and F. J. Varela. Dynamics of a class of immune networks. II) Oscillatory activity of cellular and humeral component. J. Theo. Biol., 144: 103-115, 1990. [8] A. S. Perelson. Immune Network Theory. Immunological Review, 10: 5-36, 1989. [9] A. Ishiguro, T. Kondo and Y. Watanabe. Emergent Construction of Artificial Immune Networks for Autonomous Mobile Robots. Proceeding 1997 IEEE International Conference on System Man and Cybernetics, Orlando, FL, USA, pp.1222-1228, 1997. [10] K. K. Kumak and J. Neidhoefer. Immunized Neurocontrol. Expert Systems with Applications, 13(3): 201-214, 1997.
[11] D. W. Lee and K. B. Sim. Artificial Immune Network-based Cooperative Control in Collective Autonomous Mobile Robots. Proceeding 6th IEEE International Workshop on Robot and Human Communication, Sendai, pp.58-63, 1997. [12] N. Mitsumoto, T. Fukuda and K. Shimojima. Control of the Distributed Autonomous Robotic System Based on the Biologically Inspired Immunological Architecture. Proceeding the 1997 IEEE International Conference on Robotics and Automation, Albuquerque, USA, pp.3551-3556, 1997. [13] K. Takahashi and T. Yamada. Self-tuning Immune Feedback Controller for Controlling Mechanical Systems. Proceeding 1997 1st IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Japan, pp.101-105, 1997. [14] Y. Ishida and F. Mizessyn. Learning Algorithms on an Immune Network Model: Application to Sensor Diagnosis. Proceeding International Conference Neural Networks, China, pp.33-38, 1992. [15] J. S. Chun, M. K. Kim and H. K. Jung. Shape optimization of Electromagnetic Drives Using Immune Algorithm. IEEE Trans on Magnetics, 33(2):1876-1879, 1997. [16] S. Forrest, A. S. Perelson, L. Allen and R. Cherukuri. Self-nonself Discrimination in a Computer. Proceeding IEEE Symposium on Research in Security and Privacy, Oakland, CA, USA, pp.202-212, 1994. [17] P. D. Haeseleer, S. Forrest and P. Herman. An Immunological Approach to Change Detection: Algorithms, Analysis and Applications. Proceeding IEEE Symposium on Research in Security and Privacy, CA, USA, pp.110-119, 1996. [18] L. Wang, J. Pan and L. Jiao. Evolutionary Algorithm Based on Immune Strategy. Science Progress in Nature, 10(5):451-455, 2000. [19] B. Jin. Cell and Molecular Immunology. Xi'an: World Books Press,pp.322-324, 1998. [20] A. S. Qi and C. Y. Du. Nonlinear Models in Immunity. Shanghai: Scientific and Technological Education Press,pp.6-8, 1998. [21] Fahlman S. E., Leibiere C. The Cascadecorrelation learning architecture. In: D. S. Touretzby (Ed), Advances in Neural Information Processing System. San Mateo, CA: Morgan Kanfmann, pp.524-532, 1990.