Self-Organization of Spiking Neural Network Generating Autonomous Behavior in a Real Mobile Robot Fady Alnajjar Department of Human and Artificial Intelligence System, University of Fukui, Fukui 910-8507, Japan
Kazuyuki Murase Department of Human and Artificial Intelligence System, University of Fukui, Fukui 910-8507, Japan
E-mail:
[email protected]
E-mail:
[email protected]
Abstract In this paper, we study the relation between neural dynamics and robot behavior to develop selforganization algorithm of spiking neural network applicable to autonomous robot. We first formulated a spiking neural network model whose inputs and outputs were analog. We then implemented it into a miniature mobile robot Khepera. In order to see whether or not a solution(s) for the given task exists with the spiking neural network, the robot was evolved with the genetic algorithm (GA) in an environment. The robot acquired the obstacle avoidance and navigation task successfully, exhibiting the presence of the solution. Then, a self-organization algorithm based on the use-dependent synaptic potentiation and depotentiation was formulated and implemented into the robot. In the environment, the robot gradually organized the network and the obstacle avoidance behavior was formed. The time needed for the training was much less than with genetic evolution, approximately one fifth (1/5)..
1. Introduction The spiking neural network is now widely used for signal processing especially in robotics. The largest advantage of the network is its ability to discriminate events in time domain [1][7]. That is, even if a network operates at relatively slow sampling rate, for example, the network can distinguish which event has occurred first with a relatively small circuitry. It is therefore thought to be suitable for the systems that require the analysis of flow of events such as the moving scenery or optical flow. Self-organization of autonomous systems is an important subject for the development of natural intelligence. Two frequently used strategies are
learning and evolution. Learning is to modify the system with training data [6], while the evolution is to adapt the system for the given environment or circumstances with a help of mutation and selection processes [3]. Another possible strategy for selforganization is the use-dependent synaptic modification. The best known is the Hebbian rule [5][7][8], where the signal transmission between nodes is strengthened if both nodes are activated simultaneously. This is thought to be the mechanism of biological memory at the most primitive level, and it is called the use-dependent synaptic plasticity in neuroscience. The Hebbian rule or the modified versions have been widely used for a variety of selforganizing neural networks. The use-dependent synaptic modification is switched on during training phase with training data, and then turned off during the operating or testing phase. For the training of autonomous robots, the evolutionary approach is generally used because of the difficulty to obtain training data for natural or open environment and of the accessibility to life-long evaluation of individuals. In the application of spiking neural networks to autonomous robots, evolutionary approaches are commonly used to find the best connection in SNN. However, self-organization of SNN has not been used widely in the application of the robotics. The use-dependent training is much more advantageous in real-time learning, because it can dynamically adapt the system for new environment and does not require trial-and-error process. In this study, we try to formulate the algorithm for autonomous robot with which adaptive behavior is autonomously acquired in real time with use-dependent process in a spiking neural network. Thus, the high time discrimination ability of spiking neural network is self-organized during operation in environment. In this paper, we first explain the Spike Response Model, the type of spiking neurons we used, and
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) 0-7695-2504-0/05 $20.00 © 2005
IEEE
describe how the network behaves in simulation. We then adopted it into a small mobile robot Khepera, tried to obtain the best individual by artificial evolution, and analyzed the operation. Finally, we formulated a usedependent rule for the autonomous robot by combining it with a grovel evaluation of the behavior, and tested the performance in real environment.
synaptic connection, presence or absence, and its sign; the variable
ω tj
is used.
2. Spike Response Model (SRM) Artificial neural networks are classified into pulsecoded neural networks and rate-coded neural networks from the viewpoint of abstraction level, for that Spiking Neuron Network (SNN) works in a Binary System. There are several models of spiking neurons with various degrees of details [2], however, in the experiments described in this article, we focus on Spike Response Model (SRM), which is the easiest to understand and to implement [3]. Spike Response Model (SRM) depends on the time since the last output spike. In other words, SRM defined as: A single variable υi describes the state of a neuron. In the absence of a spike, the variable υi is in its resting value, each incoming spike will perturb υi and it takes times before
υi
returns to zero. The
function Ȋ descries the time course of the response to an incoming spike. If the summation of the effects of the several incoming spikes reaches the threshold, an output spike is triggered. Then refractory period will start (low negative value where it cannot emit a new spike) and gradually returns to its resting potential as shown in figure 1. Mathematically speaking υi can be calculated by equation (1)[4]:
υ i (t ) =
∑ω ∑ε t j
j
f
j
(s j ) +
∑η
i
( si )
Fig. 1. Neuron spikes when the summations of incoming signals reach the threshold. After that, neuron enters in to refractory period (negative value). Finally, it gradually returns to its resting potential υi = 0.
2.1. Spiking Neural Network SNN with SRM Floreano’s work (2001) has shown that by using the vision as sensory input [4], a mobile robot khepera can be navigated in the area plated with black and white strips using simple GA to evolve the connection between spiking neurons. We instead used khepera’s Infra-Red IR sensors as sensory input and made it navigate in the normal environment with no black and white stripes. We used a different way of neuron’s update and motor’s control. Figure 2 shows the network connection. From the figure, the network includes, left and right sensors (input layer), 10 neurons are fully connected among each other (hidden layer), and two motors (output layer). (White circle = excitatory neurons, Black circle = inhibitory neurons), in other words, all the connections coming from an excitatory (inhibitory) neuron are positive (negative).
(1)
f
ε ( s ) = exp[−( s − Δ) / τ m ](1 − exp[−( s − Δ) / τ s ])
η ( s) = − exp[−s / τ m ]
(1.1) (1.2)
Where the function ε j (s j ) is to find the time when the spike was emitted, and the delay between the generation of the spike at the pre-synaptic neuron and the time of arrival at the synapse. The function ηi ( si ) describes the refractory function and the speed of the recovery to its resting potential. For describing each
Fig. 2. The network connection (White circle = excitatory neurons, Black circle = inhibitory neurons), Neurons 9 and 10 are connected directly to motor 0 and 1 respectively.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) 0-7695-2504-0/05 $20.00 © 2005
IEEE
2.2. Time Management and Spike Generator In the program, we discrete the time in small time steps of 1ms each, and consider the effect of the function Ȋ (s) only within the time window of 20ms. At each time step t, the synaptic contribution to the membrane potential is the sum of all the spikes arriving at the synapse over a 20ms time window, weighted by the corresponding value of the function of the difference between current time t, and the time when the spike was emitted tf at each time step [(s)= (t - tf)]. In next time step t2, assuming that there are no new spikes, we simply shift all the previous firing times of each spike by one position, more details available in the reference [4]. The probability of emitting a spike at a particular time step is determined by sensor’s value as described in the (Fig. 3-A). In addition, the time management that we chose in the program lets the neurons operate with update rate of 1ms while accessing the robot’s sensors to pre-processed at a longer interval time equal 25 steps. Wheel speeds are set at the end of the 25 steps with the membrane potential values at that time. During the 25 steps, the robot moves using the constant speed values (Fig. 3-B).
The robot employed in our experiments is Khepera, a miniature mobile robot (Fig. 4-B). The robot is provided with eight Infrared proximity sensors. Six sensors are located on front side of the robot, the remaining two on the backside. In our program, we summed sensors 1 and 2 to be left sensor of the robot, and sensors 3 and 4 to be the right sensor. In this situation, khepera can be able to avoid the front, right and left obstacles. Khepera communicate via a serial port with a host computer. The high level processes such as (NN activation, genetic operators) run on the host computer while the low level processes (sensor reading, motor control) run on khepera processor. The robot was navigated in an environment with approximately size of 60*60 cm, and the obstacles distributed in the environment as in (Fig. 4-A).
(A)
(B)
Fig.4.A. The hardware connection and the experiment’s environment (Workspace). B. Khepera, The miniature mobile robot. Sensors 1 and 2 (left-sensor), Sensors 3 and 4 (right-sensors). (A)
4. Genetic Evolution of SNN
(B) Fig.3. A. Flowchart shows the probability of emitting a Spike from sensors values. B. Schematic diagram of time management used in the program.
3. Robot and The Environment
We tested the above-mentioned SNN architecture with GA whether or not it can perform the navigation task in natural environment. In GA program, a population of 10 individuals is evolved using two-point crossover and one bit mutation, each individual of the population is decoded and tested on the khepera robot for 30 seconds. The fitness function, we chose in the program, depends on the sum of Khepera’s wheels while it is moving forward. An individual is composed of a number of blocks, the first bit of the block encode the sign of the neuron, 0 for excitatory neurons or 1 inhibitory neuron, and the remaining bits encode, 1 forpresence/0 for absence, of a connection from the n neurons and from the s receptors.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) 0-7695-2504-0/05 $20.00 © 2005
IEEE
Figure 5 shows the evolution process measured across five runs for 30 generations. The upper, lower and the middle lines show the maximum, the minimum and the average fitness respectively. We can see, in the figure, the improvement of the behavior; notice that, the time that khepera needs to reach to generation 30 is not less than 2.5 hours. The best individual in generation 30 performed the obstacles-avoidance and navigation task successfully. Fig.6. Show the Evolvement of ǍO(L motor – R motor) during the generations 0, 10, 20 and 30, across 6 times the left and the right sensors activated by the obstacle, while the khepera Robot navigating freely in the environment in each generation.
5. Use-dependent Synaptic modification
Fig.5. Fitness value obtained on the physical khepera robot. Evaluation of spiking neural network.
4.1. Evolution of Behavior In order to evaluate the behavior, we applied the network connection of the generations 0, 10, 20 and 30 (fig.5) to the khepera robot. Then we monitored the difference in speed between the left and the right motor (ǍO= L motor – R motor), in case of the left and right sensors activated during free navigation in the environment. Figure 6 shows the average in difference between the left and right motor ǍO across the 6 times left and right sensors activated in each generation. As a result, we notice that: during the evolution in the generations in genetic evolution, the connections were gradually organized and the difference values between the left and the right motor ǍO gradually turned to positive/negative when left/right sensor was activated respectively. I.e. Khepera is able to avoid the left obstacle by turning to the right direction and to avoid the right obstacle by turning to the left direction. In next section, we show that it is possible to achieve the same result by using self-organization method and ignoring the genetic evolution.
In SRM, all neurons are using the firing rate measured during the previous 20ms to decide the next state of the neuron. GA with the fitness function commonly used does not consider this kind of time dependent process, and it is difficult to formulate the fitness function reflecting the events at every time step. This causes long evolution time and unnecessary connection remaining after evolution. In this section we will describe new method gradually building the network connections. The experimental results show that the use-dependent synaptic modification of SNN connections allows the robot navigating successfully in natural environment. Connections between neurons are adjusted depending on the relations between sensors and motors values, and the dynamics of excitatory and inhibitory neurons. The principle for synaptic modification is the usedependent process where connections between strongly activated neurons are enforced for better performance. For simplicity, we used presence or absence of connections, not adjusting weight value [5]. For example, the robot could avoid a left obstacle by creating an excitatory neuron connection from left sensors to left motor.
5.1. Algorithm for synaptic modifying (Step of modification) The following steps are the typical cycle to modify the connection. 1- In the initial lifetime of the Khepera, the program randomly creates initial connection between the neurons, as well as, defines each neuron’s sign,
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) 0-7695-2504-0/05 $20.00 © 2005
IEEE
excitatory or inhibitory. Notice that, the sensory neurons (input layer) are always excitatory. 2- Khepera starts moving by fixed speed in the arena (Left-motor = Right-motor = 5). 3- Whenever any IR-sensor is activated, the values of motors and sensors will be tested with the condition shown in table 1. If the result doesn’t met the condition, the error flag will sign. Otherwise, Khepera will be able, according to the table 1, to avoid the wall and navigate successfully. 4- In case the error sign found, the program starts adjusting the connections between the IR-sensor that activated, and left, right motors. This adjustment depends on the dynamics of each neuron in the last 20 steps. In other word, the neurons with high/low firing rate are connected/disconnected to/from the suitable motor. The connection change takes place, if and only if, khepera robot hit the obstacle and the sensor activated. Equations (2) and (3) can be replaced instead of table 1, where the functions of Ls(t) and Rs(t) should be larger than 0. In our experiments, these two functions are gradually organized to meet the condition. In the table (L. left, R. right, S. sensor and M. motor) Ls (t ) =
∑ L.sensor − ∑ R .sensor
* [ L .motor − R .motor ] > 0
(2)
Rs (t ) = ∑ R.sensor − ∑ L.sensor * [ R.motor − L.motor ] > 0
(3)
before in section 5.1, happened only if khepera hit or stuck with the obstacles. In the arena, during the initial period indicated by Figure7-I (A), the robot hit the obstacles 6 times, and at each time, the network was reorganized. During the intermediate period B, the behavior was improved and the network was modified only once. In the last period C the robot avoid the obstacles successfully, not hitting the wall and no connection change took place. The total of 15 connections were changed between the input layer (sensors) that connected forward to the hidden layer and the hidden layer toward the output. Figure 7-II shows the khepera motion during the period C, after the connection was organized; notice the movement curve and the condition of table 1. Experiment 1 shows that the robot gradually organized the network and the obstacle avoidance behavior were found.
(I)
Table1. Condition to re-correct the network connection
Sensor value
Motor Action
L.S > R.S
L.M > R.M
X
R.S> L.S
X
R.M > L.M
5.2 Experimental results and discussion 5.2.1 Experiment 1. Figure 7-I shows the root square values of Left and right sensors while khepera were moving in the environment for 30 minutes started with initially random connection between 4 sensors, 10 full connected (excitatory/inhibitory) neurons and two motors. Arrows, in the figure, indicate the time when connection change took place. It is observed that the changes happened 7 times during the 30 minutes; notice that the changes in connection, as mentioned
(II) Fig.7. (I) Self-organization process. The root square value of Left and Right sensors are plotted for 30 minutes. (II) Khepera motion during period C. 5.2.2 Experiment 2. In this experiment, we monitored the difference in speed between the left and the right motor (L motor – R motor), in case of left and right sensors of khepera robot were activated during 40 minutes free navigation in environment, notice that khepera started moving with random connection. Figure 8 shows the difference between the left and right motor when left and right sensors were activated. From the graph (Fig.8), we notice that, using our modification rule (section 5.1) the connections were gradually organized and the difference values between the left and the right motor gradually were turned to positive/negative when left/right sensor was activated. By comparing this interesting result with the result of section 4.1 and figure 6, we found that both of (GE in
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) 0-7695-2504-0/05 $20.00 © 2005
IEEE
section 4 and our rule of modification in section 5.1) could find the best behavior, but in large different of time (our rule in less than 30 min, GE in more than 2.5hours).
㪏㪇
㪩㩷㪪㪼㫅㫊㫆㫉㩷㪘㪺㫋㫀㫍㪸㫋㪼㪻
㪣㩷㪪㪼㫅㫊㫆㫉㩷㪘㪺㫋㫀㫍㪸㫋㪼㪻
㪋㪇 㪉㪇 㪋㪇
㪊㪎
㪊㪋
㪊㪈
㪉㪏
㪉㪌
㪉㪉
㪈㪐
㪈㪍
㪈㪊
㪎
㪄㪉㪇
㪈㪇
㪋
㪇 㪈
㪣㩷㪤㫆㫋㫆㫉㩷㪄㩷㪩㩷㪤㫆㫋㫆㫉
㪍㪇
[8] B. Ruf and M. Schmitt, “Self-Organization of Spiking Neurons Using Action Potential Timing”, IEEE Trans on Neural Networks, Vol. 9, No. 3, May 1998, pp. 575-578, 1997. [9] Sala, D.M., Cios, K.J., and Wall, J.T., “Self-Organization in Network of Spiking Neurons”, Australian Journal of Intelligent Information Processing Systems, Vol. 5, No. 3: pp.161-170, 1998.
㪄㪋㪇 㪄㪍㪇 㪄㪏㪇 㪫㫀㫄㪼㩷㩿㫄㫀㫅㪀
Fig.8. Show the Evolvement of (L motor – R motor) during 30 minutes khepera Robot navigating freely in the environment using method section 5.1.
6. Conclusion In this study, we proposed a use-dependent synaptic modification algorithm of SNN for autonomous robot, and tested it with a real mobile robot. The results show interesting behavior that the initial random connections are modified gradually and the robot acquired the desired behavior within much less time than GA. Usedependent training (type of self-organization) is advantageous in real time learning, because it dynamically reacting to its environment and does not require trial-error process. This is an interesting starting point to study widely a neural dynamics, excitatory and inhibitory, in coupling with robot behavior. References [1] W. Maass, “Networks of Spiking Neural Network, the third generation of Neural networks models”, Australian Conference on Neural Networks, 1996. [2] Wulfram Gerstner and Werner M. Kistler, “Spiking Neuron Models” ISBN:0521890799, Publisher: Cambridge University Press, August 2002 [3] W. Maass, “Computation with Spiking Neurons”, In Proc. of the International Conference on Neural Information Processing, 1998. [4] D. Floreano and C. Mattiussi, “Evolution of Spiking Neural Controllers for Autonomous Vision-Based Robots”, LNCS 2217, pp. 38-61, 2001. [5] K. Murase, Md. Monirul, “Intcermental Evolution with learning to develop the control system of autonomous robot for complex task”, IECE TRANS, Vol. E85-d, No.7, pp. 1118-1129, July 2002. [6] B. Ruf and M. Schmitt, “ Learning Temporally Encoded Patterns in Networks of Spiking Neurons”, Neural Processing Letters, vol. 5, no.1, pp. 9-18, 1997.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) 0-7695-2504-0/05 $20.00 © 2005
IEEE