Fitness functions for evolving box-pushing behaviour

1 downloads 0 Views 221KB Size Report
Imagine a cat strolling about on a roof-top. It has just spotted a juicy pigeon sitting on a nearby window-sill and is wondering how to catch this prey. The situation ...
Fitness functions for evolving box-pushing behaviour

Ida G. Sprinkhuizen-Kuyper Rens Kortmann Eric O. Postma Universiteit Maastricht, IKAT, Neural networks and adaptive behaviour group, P.O. Box 616, NL-6200 MD Maastricht, The Netherlands fkuyper,kortmann,[email protected]

Abstract

Simple behaviours form the underpinnings of complex high-level behaviours. Traditional AI research focuses on high-level behaviours and ignores the underpinnings that are considered to be irrelevant for the problem at hand. As a rst step towards a hybrid approach to complex behaviour, this paper studies the simple behaviour of box-pushing that underlies complex planning in, for instance, the game of Sokoban. We investigate neural-network controllers for box-pushing behaviour in a simulated and a real robot. Using an evolutionary algorithm we assess the tness of individuals by means of the four combinations global vs. local measure and internal vs. external measure. We compared the eciency of the four tness functions for evolving box-pushing behaviour. The results show the global external measure to outperform the other three measures when evolving controllers from scratch in simulation. In addition, the local tness measures are argued to be appropriate for ne-tuning the box-pushing behaviour.

1 Introduction Imagine a cat strolling about on a roof-top. It has just spotted a juicy pigeon sitting on a nearby window-sill and is wondering how to catch this prey. The situation is tricky: either route to the bird involves walking on narrow ledges and requires daring tricks of balance. The cat now plans to take the shortest route and tries to lower itself onto a ledge beneath. While trying to do so, it notices that the chance of toppling over and falling three storeys down onto a busy shopping street is becoming increasingly more realistic. The cat then abandons the plan and sets its senses to something more practicable. From a traditional AI point of view this navigation problem is not that dif cult. The begin and goal positions are well-known, the possible routes are clear, and apparently, the cat has developed a plan to catch the bird. However, the successful execution of the plan depends critically on the cat's low-level interactions with the real world, rather than its high-level planning capabilities

(see Figure 1). Hitherto, the AI community has given too little attention to low-level control mechanisms (e.g., equilibrium controllers) as compared to high-level control (e.g., symbolic problem-solving systems). The present study studies design methods for low-level control mechanisms to compensate for this imbalance.

Target behaviour and design method

In this paper we restrict ourselves to the low-level behaviour of pushing a box between two walls. Pushing an object is the prime low-level behaviour in, for instance, the game of Sokoban, a classical domain for the application of planning and search al- Figure 1: Low-level interactions in three gorithms. Moreover, pushing a ball is autonomous systems. (Figure taken from an important aspect of robot soccer, the SAB 2000 conference.) a modern platform for autonomous systems research. We generalise over these behaviours by teaching a mobile robot to push a circular box. Box-pushing was investigated earlier [6] but without the additional diculty of two nearby walls. We employ the design method of evolutionary robotics [1, 3, 8]. The method applies evolutionary algorithms (EAs) [2, 7] to the development of robot controllers. In the present study, we focus on the di erent viewpoints from which the tness of an individual is assessed. More speci cally, we study four di erent viewpoints as de ned by the combinations of global vs. local and internal vs. external tness measures. The research question for this paper reads: `What viewpoint of tness assessment is best suitable for the evolution of box-pushing behaviour in a mobile robot?' We continue by describing our experimental set-up in section 2, followed by a description of the four di erent methods of tness assessment in section 3. Then, section 4 presents the results of simulation studies. Finally, section 5 discusses the results and concludes on the best-suitable viewpoint for tness assessment.

2 Experiments We evolved neural network controllers for a box-pushing task inspired by Lee et al. [6]. The task entails pushing a box from a start to a goal position. Our work di ers from Lee et al.'s work in three respects. First, we evolve box-pushing between two

Figure 2: Screen shot of the simulator, with the newly implemented movable object (large black circle) and the environment in which the experiments were conducted. The six small black dots indicate starting positions of the box (upper row) and of the robot (bottom row). The bright area in the top represents a large light source. walls, whereas Lee et al. evolved the behaviour in an open space. Second, we increase the number of di erent tness measures to four vs. one in the paper by Lee et al.. Third, we do not use genetic programming (GP) for generating a controller, whereas Lee et al. did. We employed a mobile robot simulator1 that was augmented by the implementation of a movable object (see gure 2).

The neural control mechanism

The simulated robot possessed eight infra-red (IR) proximity sensors and two motors. The IR-sensors were numbered (0 to 7) clockwise starting at the left side of the robot: sensors 2 and 3 pointed forward, whereas 6 and 7 pointed backward (see the upper right panel in gure 2). The neural controller consisted of a fully connected single-layer perceptron with 15 input nodes and 2 output nodes (see gure 3). The input layer consisted of the 8 IR-sensors (labeled 0 to 7), 6 nodes that represented the di erence between the values of neighbouring IR-sensors (`edgedetectors'; labeled `0-1' to `6-7') and 1 bias node (labeled b). The output nodes (labeled ML and MR) were each connected to a motor. The connective weights were largely symmetrical (see gure) which resulted in 15 values to be optimised by the EA. In order to obtain the inputs to the neural network, we normalised the values of the IR-sensors (or their di erences) to the interval [-1,1]. Then the network activation was propagated and the output nodes' activation values were mapped to the interval (0,1) by applying the sigmoid transfer function. The output was then linearly up-scaled and truncated to the integer values f?10 ?9    10g and sent to the motors of the robot. ;

1 http://diwww.epfl.ch/lami/team/michel/khep-sim/index.html

;

;

ML

MR

w_b_ML

b

0

1

w_6-7_MR

2

3

4

5

6

7

0-1

1-2

2-3

3-4

4-5

6-7

Figure 3: The neural controller for box-pushing between walls. The connective weight values are largely symmetrical: w b ML = w b MR; w 0 ML = w 5 MR, ..., w 5 ML = w 0 MR; w 6 ML = w 7 MR (v.v.); w 0-1 ML = -w 4-5 MR, ..., w 4-5 ML = -w 0-1 MR; w 6-7 ML = -w 6-7 MR.

The evolutionary algorithm

The connective weights of the neural controller were represented in the arti cial chromosome by an array of 15 doubles. The population size was 30. The rst population was initialised by assigning random values between -1 and 1 to each gene in each chromosome of the population. We used a (30+1) evolutionary algorithm: each iteration one child was generated and the worst member of the population including that child was removed. We used uniform crossover (probability 0.6), followed by mutation (probability 0.1 per weight, range (?0 5  j j 05 j j)). We employed tournament selection for selecting parents (tournament size 3). The tness of a member of the population was determined by running the corresponding controller during 100 steps in the simulator. Each run is repeated 10 times from 7 start positions out of 3 di erent positions of both the box and the robot (see gure 2). De ning the origin as the upper-left corner, the and -coordinates range from 0 to 1000. The exact starting points of the robot are: (470, 900), (500, 900), and (540, 900). The box is placed at (480, 800), (500, 800), or (545, 800). From these 9 start positions we left out the two most dicult combinations: the box in its leftmost position and the robot in its rightmost position, vice versa. :

weight ;

:

weight

x

y

3 Fitness assessment from di erent viewpoints The tness of an individual can be assessed from di erent viewpoints. We vary our tness measure along two dimensions: Global vs local A global tness measure assesses an individual based on the di erence between begin and end state of both the individual and its environment.

A local tness measure, instead, follows the behaviour of the individual at every timestep in the simulation. Internal vs external An internal tness measure takes the agent's perspective, i.e., it only uses information from the robot's sensors. In contrast, an external tness measure takes a bird's eye perspective, i.e., it measures the robot's and box' position from the outside. We thus arrive at four di erent tness functions. We here summarise the return values of each function: Global external (GE) The distance between the box at its start position and the box at its end position minus half of the distance between the robot and the box at the end position (to see if the robot is still pushing). The GE tness function is de ned as: ( T 0 ) ? 21 ( T T ), with () the distance function and t and t the position at time of the box and robot, respectively. Local external (LE) After each step the distance change of the box minus half the change in distance between the robot and the box is calculated as a local PT result. All local1 results are summed. The LE tness function is de ned as: t=1 ( t t?1 ) + 2 [ ( t?1 t?1 ) ? ( t t )]. Global internal (GI) We needed some \landmark" such that the robot could determine how good its new position was. We solved this problem in our experiments by adding lights to the world. The more light the robot saw, the better its new position. After one run the return value was calculated by taking the normalised sum of the values of the front distance sensors and adding the normalised sum of the four light P sensors. The GI tness function is de ned 1 as: 21024 (ds[2] + ds[3]) + (1 ? 41500 4i=1 ls[ ]), with ds[ ] 2 f0 1    1023g and ls[ ] 2 f0 1    500g denoting the -th sensor value of the distance and light sensors, respectively. Local internal (LI) Here we used an equivalent of the evaluation function used by Lee et al. [6]. At each step a local result was calculated by adding the normalised sum of the two front distance sensors to half of the normalised average of the motor values minus one third of the normalised rotation component of the motor values. So pushing against something was good, forward speed was good but turning behaviour was bad. The LI tness function is de ned as:  PT  1 1 1 t=1 21024 (ds[2] + ds[3]) + 220 ( [ ] + [ ]) ? 320 j [ ] ? [ ]j , where [ ] and [ ] 2 f?10    10g denote the motor values of the left and right motor, respectively. d B

R

d B

;B

;R

d

B

t

d B

d B ;B

;R

d B ;R

i

i

;

;

;

M L

M L

M R

i

;

;

;

i

;

M R

M L

M R

;

4 Results Typically, with every tness function the tness converged within 250 evaluations (we continued the experiments up to 1000 evaluations). The neural-network weights evolved in the best controller are shown in Table 1. In the table the rst two rows show the bias node (b) and the IR-sensors (0 to 7) and the values of the corresponding connective weights to the left motor (w * ML). The third and fourth rows show the `edge-detecting' nodes with their connective weights to the left motor. It is dicult to understand the behaviour of the robot directly from the

inputs b 0 1 2 3 4 5 6 7 weights 0.592 0.897 -0.723 -0.518 0.381 0.774 -0.305 -0.430 -0.979 inputs 0 ? 1 1 ? 2 2 ? 3 3 ? 4 4 ? 5 6 ? 7 weights 0.796 -0.593 -0.706 0.701 0.712 0.411 Table 1: The evolved connective weight values of the best controller evolved controller GE LE GI LI

GE stdGE 286.7 50.2 278.8 57.4 245.3 78.1 168.3 117.0

cross-comparison LE stdLE GI stdGI 346.6 39.0 1.75 0.20 341.0 44.6 1.70 0.22 304.4 67.6 1.61 0.29 238.9 109.8 1.34 0.36

LI stdLI 132.7 3.66 132.3 3.58 125.2 7.43 139.5 5.61

Table 2: Cross-comparison of the tness values of the evolved controllers with all four tness measures and their standard deviations. Underlined: comparison with same measure as evolved. Bold: best score. connective weight values. Therefore we evaluated the behaviours of the evolved networks in the robot simulator. We evolved controllers for each tness function and tested the resulting controllers on all four tness functions (cross-comparison). The results of the crosscomparison are given in Table 2. The results of evaluating a controller with the same evaluation function as it was evolved with are underlined in the table. The best evaluations for each tness measure are printed in boldface. The numbers were calculated as the average of 700 runs: 100 times from 7 starting positions (see Sect. 2). From the table we derive that the controllers evolved from the global external (GE) viewpoint perform best when evaluated with other tness functions. This is a fortunate result since GE required the least number of computations and was easiest implemented. Also the other tness functions do reasonably well in crosscomparison, except for the local internal (LI) tness function. LI was copied from Lee et al. [6] and works well in the absence of walls, but is of limited use for our experimental set-up: pushing full speed against a wall undesirably results in a high evaluation score with this tness measure. Figure 4 (a) shows a typical track of the simulated robot pushing the box. The right most (wide) circle is the starting position of the robot. The somewhat smaller circle to the left represents the starting position of the box. The robot moves towards the box and pushes it forward even when the box hits the wall. This means that the robot learned to distinguish between the box and the wall while just using its IR-sensors. Application of the controller to the real Khepera robot showed indeed a good box-pushing behaviour. Figure 4 (b) shows the robot (dark spot) pushing a plastic cup (light spot). We remarked that the robot was better able to keep the box parallel to a wall at its righthandside than at its lefthandside. Still, the robot

(a)

(b)

Figure 4: A typical track of the simulated robot (wide circle moving to the left) and a box (small circle) (a) and a top-view of the real Khepera robot pushing a plastic cup (b). showed slight deviation from its forward direction when pushing the box.

5 Discussion and conclusions Returning to our research question posed in Sect. 1 we conclude that a global external tness measure is best suitable for evolving box-pushing behaviour between two walls from scratch. The evolved controllers perform best when cross-compared to other evaluation methods. Moreover, the computational costs are minimal as compared to other tness measures and the measure is easiest implemented. The quality of the controllers not only depends on the tness measure, but also on the design of the perceptual system: in an earlier experiment we had not incorporated the `edge-detecting' units in the neural network. This network con guration performed signi cantly worse than the con guration presented in this paper since the `edge-detecting' units make it possible to distinguish between the box and a wall. The evolved controllers also work well on the real robot. However, the real robot exhibited a slightly asymmetrical behaviour. Therefore a controller evolved in a simulator for time considerations should be ne-tuned to yield similar results in the real world. Since our controllers are neural networks it should be possible to ne-tune them using on-line learning algorithms. On-line learning also compensates for changes in the working of the sensors and the motors of the robot due to wear and tear. Presumably, in this phase of development the local internal tness measure is best suitable. This measure does not need a supervising signal from an external source and is therefore highly suitable for continuous on-line learning in unsupervised autonomous systems.

Another advantage of the controllers developed in this paper is their simplicity. The networks evolved allow for real-time interaction with the environment. The method is therefore also suitable for the design of parts of the low-level behaviour of a football playing robot in the domain of RoboCup [5]. Another application domain is the game of Sokoban, a traditional platform for search and planning research [4]. In Sokoban a robotic agent is to push objects to a goal area. It is often referred to as a robotics problem, but the low-level basics have never been studied. Returning to the design of autonomous systems, we have presented a simple and robust method for the development of a behaviour-primitive in a mobile robot. The logical next step will be to evolve more behaviour-primitives such as box-circling and wall-following.

References [1] R. Arkin. Behavior-based robotics. MIT-press, Cambridge, 1998. [2] T. Back. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996. [3] I. Harvey, P. Husbands, D. Cli , A. Thompson, and N. Jakobi. Evolutionary robotics: the sussex approach. Robotics and autonomous systems, 20:205{224, 1997. [4] A. Junghanns and J.Schae er. Sokoban: a challenging single-agent search problem. In H.J. van den Herik and H. Iida, editors, Games in AI research, pages 139{158. Universiteit Maastricht, Maastricht, 2000. [5] H. Kitano, M. Asada, Y. Kuniyoshi, I. Noda, and E. Osawa. Robocup: the robot world cup initiative. In Proc. of The First Internationa Conference on Autonomous Agents. The ACM Press, 1997. [6] W-P. Lee, J. Hallam, and H.H. Lund. Applying genetic programming to evolve behavior primitives and arbitrators for mobile robots. In Proceedings of IEEE 4th International Conference on Evolutionary Computation. IEEE Press, 1997. [7] Zbigniew Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, third edition, June 1996. [8] S. Nol . Evolutionary robotics: Exploiting the full power of self-organization. Connection Science, 10(3-4):167{183, 1998.

Suggest Documents