Embedded Neural Network for Swarm Learning of ... - Springer Link

Embedded Neural Network for Swarm Learning of Physical Robots Pitoyo Hartono and Sachiko Kakita Department of Media Architecture, Future University-Hakodate Kamedanakanocho 116-2, Hakodate, Japan [email protected] http://www.fun.ac.jp/~hartono/hartono.html/

Abstract. In this study we ran real time learning of multiple physical autonomous robots situated in a real dynamic environment. Each robot has an onboard micro controller where a simple neural network is embedded. The neural network was built with the consideration of the power and calculation resources limitation which is a general characteristic of simple robots. In the experiments, several autonomous robots were placed in one environment, where each of them was given a specific task which was expressed as the evaluation function for the robot’s neural network. The learning processes of the robots were started simultaneously from their randomized initial conditions. The presence of several robots consequently formed a dynamic environment, in which an action of one robot affected the learning process of others. We demonstrated the efficiency of the embedded learning mechanism with respect to different environmental factors.

1

Introduction

In the past decade, several multi-robot systems have been proposed. This relatively young field encompasses a number of interesting research topics [1]. The fact that so far we have not succeeded in building highly autonomous singular robots capable of reliably and continuously operating in dynamic environment, is one of the factors that motivates the study of multi-robot systems. The emergence of swarm intelligences in nature such as in ants’ or bees’ colonies is one of the main inspirations for these studies[2]. In [3] a decentralized control theory for multiple cooperative robotic system was developed. A group of robots, each with a simple predefined behavior, capable of cooperating in self-organized manner was proposed in [4]. A study in [5] implemented a social reinforcement learning to encourage the emergence of social behaviors in multi-robot system. Most of the previous studies agreed that bottom-up behavior-based mechanism is a more realistic approach compared to the centralized top down control in running multi-robot systems for achieving sophisticated group behaviors applicable to real world task. It is also becoming obvious that development of learning mechanism of a single robot within a multi-robot system to obtain intelligent strategies in supporting the group behavior is one of the core topics. V. K˚ urkov´ a et al. (Eds.): ICANN 2008, Part II, LNCS 5164, pp. 141–149, 2008. c Springer-Verlag Berlin Heidelberg 2008

142

P. Hartono and S. Kakita

In this research we developed a simple neural network that can be embedded on the onboard processor of a physical simple autonomous robot which is limited in power, computational resources and sensors’ precision. The efficiency of the learning mechanism is demonstrated with the execution of a collective learning scheme where several robots are placed in a same environment. Initially the robots did not have any moving strategy, hence they moved in a random manner. A learning target is then given to each robot in the form of an evaluation function for the embedded neural network. There is no particular limitation on the learning task, hence it is possible that all of the robots have different individualistic goals. However, in our research we set a common primitive goal for each of the robots, which is the acquirement of obstacle avoidance strategy while random walking in the presence of each other. Although simple, we consider that an effective obstacle avoidance strategy is strongly correlated with self-preservation capability of autonomous robots situated in dynamic environments. In this research, because each robot must execute its learning process in the presences of other robots, the learning environment becomes highly dynamics, where the learning process of one robot is affected by the behaviors of others. So far, this kind of collective learning is rarely executed in physical multi-robot systems, at least without any partial help from simulators. One of our near future interests is to learn about the collective behavior of the multi-robot system at various stages of the learning process of the respective individual robots. This will lead to a clearer understanding in the relation between the individual behavior and the collective behavior of the multi-robot system. We believe this research contributes in providing experimental platform not only the further multi-robot systems studies but also in understanding the emergence of the swarm intelligence in nature.

2

Robot’s Structure and Behavior

For this study we built several robots with two different structures shown in Fig.1. The robot in the right side of the figure is built based on a commer-

micro controller

micro controller micro switchs

proximity sensors

micro switchs

type1

proximity sensors

Fig. 1. Simple Robots

type 2

Embedded Neural Network for Swarm Learning of Physical Robots

143

cially available toy. The size and weigh of each robot are approximately 230 x 180 x 130 [mm], and 0.5 [kg] (without batteries), respectively. For each robot, we attached four proximity sensors which measure distance from obstacles and four micro-switch to detect collisions. Each robot has two DC motors, one micro controller(H8/3052) where a simple neural network is implemented and two EEPROMs to store the internal state of the neural network. Using ten 1.2[V] nickel hydrogen batteries, a robot can operate for up to 3.5 hours. The internal structure, common for all the robots, is shown in Fig.2.

Fig. 2. Internal Structure

3

Learning Algorithm

For the controller of each robot, a simple two-layered neural network is fully embedded onboard. In the input layer, the current sensory values from the sensors, the delayed values of the past sensors’ measurements, the past actions and the associated evaluation values are given as the current inputs. Each neuron in the output layer is associated with a primitive behavior of the robot. In this study we designate four primitive behaviors, which are f orward, backward, turn − lef t − f oward, and turn − right − f orward. The execution time of each behavior is the same. The last two behaviors are designated to prevent the robot from rotating in the same position. It should be noted that stop is not one of the primitive behaviors, hence the robot is forced to continuously move. In this research, there is no distinction between a learning phase and a running phase. In its life time, a robot continuously utilizes the embedded neural network to generate movements while also updating its neural network based on the feedback from the environment. Consequently, the robot continuously adapts the gradual change in the environment.

144


Fig. 3. Neural Network

For the neural network of a robot, the value of the j-th output neuron at time t, Oj (t), can be shown as follows. N wij (t)xi (t)) Oj (t) = f (

(1)

i=1

xi (t), and wij (t) are the value of the i-th input neuron and the connection weight between the i-th input neuron and the j-th output neuron at time t, respectively, while N is the number of the input neurons. All input x is normalized so that, x ∈ [−1, 1]. The activation function is shown as follows. f (y) = y if max yj (t) < 1 j

(2)

y otherwise maxj yj (t) j ∈ {1, 2, 3, 4} =

The normalization in Eq.2 is needed so that the past behaviors that are given as feedbacks to the input layer have the same scale as other inputs. After processing a given input vector, the values of all of the output neurons are calculated, and then the output neuron that generates the largest value is designated as a winner neuron, as follows. win(t) = arg max{Oj (t)} j

(3)

The robot then executes a primitive behavior associated the winner neuron as follows. a(t) = A(win(t)) (4) a(t) denotes the primitive behavior executed at time t, while A is a function to map an output neuron into its associated primitive behavior.


145

Fig. 4. Flowchart of Behavior of Individual Robot

After the robot executed a winning primitive behavior, it calculates the cumulative value of the distance measurements of all the proximity sensors, as follows. K pi (t + 1) (5) D(t + 1) = i=1

In Eq.5, D(t + 1) shows the cumulative distance value at time t. pi (t) is the value of the i-th proximity sensor at time t, while K is the number of proximity sensors. The evaluation for the robot’s movement at time t, E(t) is defined as follows. E(t) = D(t + 1) − D(t)

(6)

The evaluation function is simply the difference between the cumulative distances before and after the robot executed its primitive behavior. A positive E(t) is an indication that the robot moved away from obstacles in the environment as the implication of its recently executed behavior. In this case the winning neuron will be rewarded by correcting the connection weights so that its output increases with respect to the same input. At the same time the losing neurons will be punished by correcting the connection weights so that their values decrease with regards to the same input. Hence, in this case the learning process works in a positive feedback manner for the winning neuron, and a negative feedback manner for other neurons. A negative E(t) indicates that the robot

146


executed a wrong behavior because it moved towards obstacles. In this case, the winning neuron is punished by correcting the connection weights so that its value decreases. At the same time, the robot has to be encouraged to execute another behavior, hence the connection weights should be corrected so that the values of the non-winning neurons increase. In this case, the learning process works in a negative feedback manner for the winning neuron and a positive feedback manner of others. In the case that the primitive behavior of the robot did not bring a significant improvement or deterioration, no correction is made to the internal state of the neural network. The outline of the robot’s learning process is shown in Fig.4 and the weight correction algorithm is formulated as follows. wij (t) = wij (t − 1) + α(E(t)) φ(E(t), xi (t), j)

(7)

α(E(t)) = η if |E(t)| > T

(8)

= 0 otherwise T and η in Eq.8 are empirically decided positive values. The direction of the correction is decided by the function φ() which is defined in the table below.

The correction rule in Eq.7 guarantees that when the executed behavior at time t improved the evaluation of the robot, for the same input vector X, Ojn+1 (X) > Ojn (X) j = win Ojn+1 (X)

Ojn (X)

(10)

j = win

Ojn (X) denotes the output of the j-th output neuron in reaction to input vector X given for the n-th time. Equation 9 implies that the neural network reinforces a good behavior while simultaneously suppresses other alternatives, while Eq.10 shows that the neural network suppresses a proven harmful behavior and explores alternative ones. While the implemented learning mechanism has similar properties with the conventional Reinforcement Learning, it operates in continuous value domain and requires significantly less calculation resources (especially memory). The characteristics of the learning method allow it to be fully implemented on an onboard processor that supports the real time learning process of a robot situated in a dynamic real world environment.


4

147

Experiments

In this research we run experiments in two types of environments. The first environment is shown in the left side of Fig.5. In this experiment, three robots were arbitrary placed in a straight line surrounded by four walls, and the primitive behaviors are limited to f orward and backward. The connection weights of these robots’ neural networks were randomly initialized. The result of the first experiment is shown in Fig.6. The left graph shows the average of cumulative distance measured by the proximity sensors. From this graph it is obvious that the cumulative distance increases with the progress of the learning process. The unit for the ordinate of this graph is the unit of the distance measurement of the proximity sensor, while the unit of the abscissa is the learning step. The total learning time for 500 learning steps are equivalent to approximately 5 minutes. The right graph shows the average number of collisions of the three robots. This graph indicates that the number of collisions decreases with the progress of the learning process. From these two graphs, it is clear that, through the proposed learning process, the embedded neural

a)environment1

b)environment2 Fig. 5. Environments

Fig. 6. Experimental Result in Environment 1

148


Fig. 7. Experimental Result in Environment 2

Fig. 8. Multi-Robot Learning

networks gradually obtained a strategy to control the robots in achieving the given task (obstacle avoidance) in a dynamic environment. As the consequence of the learning process, we observed that the three robots were able to evenly intervaled themselves in this environment while random walking, thus forming a simple formation. In the second experiment, three robots are randomly placed inside an environment shown in the right side of Fig.5. In this experiment, all of the primitive behaviors are activated. Figure 7 shows the result of this environment. The two graphs also show the efficiency of the proposed learning method in real time learning. We ran 10000 learning steps which are equivalent to approximately 75 minutes. Although from this experiment we cannot observe any formation generation, the three robots were able to achieve the task of obstacle avoidance while random walking in a two dimensional environment. The shown graphs are the average over 5 iterations of this experiment. Some snapshots of multi-robot learning are shown in Fig.8. The learning parameter (in Eq.8) η is set to 0.1 for the first experiment, and 0.001 for the second experiment, T is commonly set to 10.

5

Conclusions

In this research we proposed a simple learning algorithm that can be fully implemented on onboard processor for low cost autonomous robots with very limited


149

calculation and power resources. Although the proposed learning mechanism is very basic and shares similar properties with Hebbian Learning and Reinforcement Learning, it is sufficiently powerful to be implemented for strategyacquisition real time learning in dynamic environment. We run several experiments with two types of simple but dynamic environments, in which three robots must learn the obstacle-avoidance strategy while random walking in the presence of each other, based only on local information and local evaluation. To show the structure-independent property of the learning mechanism, it is also implemented on two types of robots with different structures, which gave similar results. Altough it is not reported in this paper, scalability of the learning algorithm is also tested on the swarm learning containing 10-100 robots on a simulator which also gave good results. Although very simple, we believe that the proposed learning mechanism including the internal hardware configuration can contribute in setting a low cost standard experimental platform not only in the field of swarm robotics but also for helping to understand the emergence of swarm intelligences in nature.

References 1. Parker, L.E.: Current state of the art in distributed autonomous mobile robotics. Distributed Autonomous Robotic System, vol. 4, pp. 3–12. Springer, Heidelberg (2000) 2. Bonabeau, E., Theraulaz, G.: Swarm Smarts. Scientific American, 82–90 (March 2000) 3. Feddema, J., Lewis, C., Schoenwald, D.: Decentralized control of cooperative robotic vehicles: theory and application. IEEE Trans. on Robotics and Automation 18(5), 852–864 (2002) 4. Parker, C., Zhang, H., Kube, R.: Blind bulldozing: Multiple Robot Nest Construction. In: 2003 Int. Conf. on Robots and Systems, pp. 2010–2015 (2003) 5. Mataric, M.: Learning social behavior. Robotics and Autonomous System 20, 191– 204 (1997)

Embedded Neural Network for Swarm Learning of ... - Springer Link

Embedded Neural Network for Swarm Learning of ... - Springer Link

Suggest Documents

Particle swarm optimization of a neural network model ... - Springer Link

Niche Particle Swarm Optimization for Neural Network

Grammatical Swarm for Artificial Neural Network Training

Neural Networks Evolutionary swarm neural network game engine for ...

Modular Neural Network Learning Using Fuzzy ... - Springer Link

Fast Learning Neural Network Intrusion Detection System - Springer Link

Swarm Robotics - Springer Link

Spiking neural network simulation - Springer Link

A Neural Network Typology - Springer Link

ARTIFICIAL NEURAL NETWORK ANALYSIS IN ... - Springer Link

Development of a neural network model for predicting ... - Springer Link

Application of generalised neural network for aircraft ... - Springer Link

Application of artificial neural network for calculating ... - Springer Link

Swarm-Based Simulations for Immunobiology - Springer Link

Particle Swarm Optimization and Neural Network Application for QSAR

Back-Propagation Artificial Neural Network for ERP ... - Springer Link

Retraining the Neural Network for Data Visualization - Springer Link

General Regression Neural Network for Software Effort ... - Springer Link

A neural network-based point registration method for ... - Springer Link

An artificial neural network software tool for the ... - Springer Link

A general artificial neural network for the modelization ... - Springer Link

Feature-Weighted CBR with Neural Network for ... - Springer Link

Kalman and Neural Network Approaches for the ... - Springer Link

A Simple Feedforward Neural Network for the PM10 ... - Springer Link