Document not found! Please try again

Generative Design of Co-Evolved Agents - School of Informatics

2 downloads 611 Views 1MB Size Report
tionary robotic fields of automatic design, competitive fitness evaluation and the min- .... design. With the techniques of generative encoding, competitive fitness ...
Generative Design of Co-Evolved Agents Noah M. Keen

NI VER

S Y

TH

IT

E

U

R

G

H

O F

E

D I U N B

Master of Science Artificial Intelligence School of Informatics University of Edinburgh 2005

Abstract Research has demonstrated significant success in applying the techniques of evolution to the art of robot development. Robots can be completely designed or have their existing designs improved through processes inspired by the same principals that are responsible for the evolution of biological systems. This project will review the evolutionary robotic fields of automatic design, competitive fitness evaluation and the minimal simulation work of Nick Jakobi [Jakobi, 1997]. Based on this research a new technique will be presented which combines the ideas to evolve agent controllers and morphologies to perform predator and prey behaviour in both simulation and reality. Results will demonstrate the technique’s potential as a tool for automatic agent design.

i

Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified.

(Noah M. Keen)

ii

Table of Contents 1 Introduction

1

1.1

Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Goal of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

2 Background

3

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2.2

Artificial Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.1

Automatic Design . . . . . . . . . . . . . . . . . . . . . . .

8

Co-Evolution: Fitness Evaluation . . . . . . . . . . . . . . . . . . . .

12

2.3.1

Minimal Simulation . . . . . . . . . . . . . . . . . . . . . .

16

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.3 2.4

3 Experiment Design 3.1

3.2

3.3

21

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.1.1

Physical Design . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.1.2

The Controller . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.1.3

Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.1.4

Genotype . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.1.5

Genotype Validation . . . . . . . . . . . . . . . . . . . . . .

37

Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.2.1

The Genetic Algorithm . . . . . . . . . . . . . . . . . . . . .

44

3.2.2

Crossing the Reality Gap . . . . . . . . . . . . . . . . . . . .

48

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

iii

4 Experiment Results

50

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

4.2

Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.3

Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

4.4

Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

4.5

Experiment 3 - A reality transfer . . . . . . . . . . . . . . . . . . . .

61

5 Discussion 5.1

67

Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . .

67

A XML Genotype of the Predator

71

B Automatically Generated CPL Controller Code

75

Bibliography

79

iv

Chapter 1 Introduction Research has demonstrated significant success in applying the techniques of evolution to the art of robot development. Robots can be completely designed or have their existing designs improved through processes inspired by the same principals that are responsible for the evolution of biological systems. Early artificial life work [Sims, 1994] introduced the application of evolutionary methods to agent design. This laid the foundation for much of the current evolutionary robotics research including the field of automatic design. Chapter 2 begins by exploring these subjects and then introduces two techniques for evolving agents in simulation. A common issue in robotic research is the faithful transfer of an agent from simulation to reality. This process is commonly referred to as crossing the reality gap and is the final and most significant stage in producing real robots from simulated designs. A number of modern techniques for crossing this gap are investigated including the minimal simulation work of Nick Jakobi [Jakobi, 1997]. Chapter 3 presents a new process for automatic design which uses XML for genetic representation. Results demonstrate the syntax is both robust to reproduction and easily interpreted by a human designer. This new technique is used to evolve predator and prey designs in simulation. Chapter 4 presents the results from these experiments and validates their success by transferring the designs to real physical robots. 1

Chapter 1. Introduction

2

1.1 Document Structure Chapter 1 - Introduction: Reviews the structure of the document and outlines the goals of the research. Chapter 2 - Background: Reviews the two fields of evolutionary robotics research which served as background for this work: artificial life and automatic design. The chapter also investigates two techniques, co-evolution and minimal simulation, and illustrates how they are used during the evolutionary process to increase the likelihood of a simulated agent crossing the reality gap. Chapter 3 - Methods: The research reviewed in Chapter 2 is combined in a coherent process of automatic agent design. This chapter illustrates the experiment setup and methods used. Chapter 4 - Results: Presents the setup, results and observations of three evolutionary trials. These were selected as representive of the noteworthy results from the hundreds of trials executed. Chapter 5 - Conclusion: Discusses the results from Chapter 4 and relates findings to the current research reviewed in Chapter 2. Finally suggestions are made for extending the research with future work.

1.2 Goal of Research Automatic design research[Hornby et al., 2001] and the artificial life work of Karl Sims [Sims, 1994] will be combined in a process which attempts to balance the openended expression of artifical evolution with the guaranteed buildability of automatic design. With the techniques of generative encoding, competitive fitness evaluation and the minimal simulation work of Nick Jakobi [Jakobi, 1997] robotic agents will be evolved to compete in a predator prey contest and then transfered to real physical robots for validation of the designs. Success will demonstrate that these techniques can not only be combined but that they are indeed complementary.

Chapter 2 Background 2.1 Introduction Artificial life, initially introduced by Chris Langton[Langton, 1989] and popularized by Karl Sims [Sims, 1994], is the applications of evolutionary methods to autonomous agent design. Simulating fitness based selection, mutation and sexual reproduction over thousands of generations [Sims, 1994] evolved agents to learn a variety of locomotive behaviours. The work focused on exploring the process of evolution through simulation rather than evolving buildable robots. The final creatures were fascinating but were never intended to be realized physically. The field of automatic design builds on the work of [Sims, 1994] by researching methods to guarantee the buildability of evolved agent design. Success is determined by how accurately a simulated agent’s body and behaviour is implemented in an actual physical robot. This process is commonly referred to as crossing the reality gap and is the final and most significant stage in producing real robots from simulated designs. A number of modern techniques for crossing this gap are investigated including more recent research [Pollack, 2001] which aims to remove the human designer from all stages of robot design including physical assembly. A fitness function is used to encourage simulated agents towards learning a goal

3

Chapter 2. Background

4

behaviour. Inferring the fitness function for even simple behaviours is rarely straightforward and agents learn to exploit aspects of the simulation to cheat. To avoid this two different species can be co-evolved by defining competing fitness functions. The functions evolve with the agents as the increased fitness of one species automatically increases the selection pressure on the other. Every simulator is only accurate to some level of precision and most hardware is prone to various types of noise. As a result behaviour learned in simulation is very difficult to accurately transfer to a physical robot. [Jakobi, 1997] suggests a technique which leverages this inaccuracy to evolve robust designs more capable of operating in the noisy real world.

2.2 Artificial Life [Sims, 1994] presented some of the first successful applications of evolutionary methods to agent design. Rather than architecting a finished solution, evolutionary [de Garis, 1990] and genetic [Goldberg, 1989] techniques were used to improve agent designs over thousands of iterations. Fitness selection, mutation and crossover, the same evolutionary pressures responsible for all life, were simulated to produced a variety of effective and successful agent designs. Creature bodies were constructed as a connected set of rigid parts. A genotype represented as a directed graph of nodes defined these constructions. Nodes encoded part sizes and connections to other parts. Connections were static or active with parameters defining actuation limits and the position of a part relative to its parent. Encapsulating all this information in a single node allowed each body part to be mutated independently. An agent was constructed by parsing its graph from the root. Figure 2.1 illustrates the mapping between some genotype graphs and their corresponding morphologies. A virtual brain [Sims, 1994] determined the actions of a creature by processing

Chapter 2. Background

5

Figure 2.1: Genotype graphs from [Sims, 1994] and the corresponding agent designs.

sensor readings as input and producing various effector outputs. Effectors values corresponded to the forces and torques applied at the creature’s joints and sensors monitored joint angles, boolean contact values and photosensors. Internal nodes connected the sensors and effectors. Each node performed one of a variety of possible mathematical functions including: sin, cos, integrate, and max. Nodes took input from a sensor or another node and provided output to either another node or an effector. At each time step, the network produced some new actuation based on the current sensor inputs and the state of the network. The genotype for the controller was also encoded as a directed graph of nodes and connections. Each node encoded its type (input, hidden or output) and connection information for the flow of signals within the network. To allow a creature’s control and morphology to evolve simultaneously each morphological node contained the genotype for its control. When genetic operations modified the genotype body parts and their control instructions were adjusted together. Both underwater and dry land worlds were simulated. Agents were evolved using

Chapter 2. Background

6

Figure 2.2: The two possible genetic results from a cross over operation in [Sims, 1994].

fitness based selection and the genetic operations of mutation and crossover. Selection for reproduction was based on a fitness score proportional to an agent’s effectiveness at one of the four locomotive behaviours: swimming, walking, jumping and following. The agents with the highest fitness from each generation were selected for reproduction. During reproduction an agent would either undergo mutation or sexual reproduction with another. With mutation each dynamic parameter of an agent, such as the torque of joint or length of a rigid part, was set to a new value with some low probability. Binary values would flip state and scalars would have random values added. With crossover the genotypes of two parent agents were split at a randomly determined crossover points and the genetic information before and after the points were swapped. Figure 2.2 illustrates this process. The two new child agents were then added to the next generation. Each behaviour was evolved in a different simulated environment and scored according to unique fitness criteria. The swimming environment was simulated as viscous water with no gravity. Creatures were scored based on total distance travelled with continuous movements rewarded higher than single actions. Walking was evolved in an environment with gravity and a ground plane with friction. Agents were rewarded based on top velocities achieved. The jumping environment was similar to the walking and agents were rewarded according to the maximum height above the ground reached by the lowest part of the creature. A following environment included a light source.

Chapter 2. Background

7

Agents were rewarded based on the speeds at which they moved towards it. Evolution produced a number of successful and interesting locomotive strategies for each behaviour. Examples of evolved walking and jumping creatures are shown in 2.3. The designs are in some cases so bizarre, yet effective, that it is difficult to imagine them resulting from traditional deliberate design.

Figure 2.3: Examples of evolved creatures from [Sims, 1994].

The evolved agents, however, were not capable of being transferred to reality. Effort was made to enforce basic physical laws in the simulator but too many of an agent’s parameters were simulated with unrealistic values. For example, joints between body parts could produce a bend and twist[Sims, 1994] motion from an infinitely small actuator. In reality such a component would add mass, require power, influence the agent’s balance and generally need to be accounted for. The simulated agents, although successful, were idealistic and little use for physical robot design. The quality of the behaviour achieved did, however, demonstrated the potential for evolutionary methods and encouraged future work in automatic agent design.

Chapter 2. Background

8

2.2.1 Automatic Design With traditional control theory, as the behaviour requirements of a robotic agent increases so does the complexity of its design. As a robot’s morphology becomes complicated with additional sensors and moving parts the cost of developing its controller grows proportionally. Traditional control approaches attempting to construct a top down solution for this type of system soon become overwhelmed. It is suggested [Pollack, 2001] that this increased complexity will eventually become an unmanageable obstacle to robot construction. One approach that has shown some progress at managing this complexity employs evolutionary methods to drive a process of automatic design. Automatic design, as investigated by [Hornby et al., 2001], uses evolutionary methods similar to [Sims, 1994] but rather than simulating with unrealistic parameters, designs are constrained to constructions that can be built in reality. To achieve this buildability the simulated components of an agent’s morphology correspond directly to real world components with known parameters. LEGOTM and Tinker-ToysTM [Pollack, 2001] are commonly used for this purpose. Simulated agents are not free to evolve. Results of mutation and crossover are validated to ensure no parameters are set to unrealistic values. Additionally, the number of components available are limited. Where the work of [Sims, 1994] allowed over five actuated joint types and rigid bodies of near any length [Pollack, 2001] allowed only a single type of servo for actuation and only three lengths of rigid body. Each of these was simulated as accurately as possible. To maintain tight coupling between the simulated agent and its physical implementation a generative grammar is created. The grammar defines a genotype syntax where elements encode for the few real world components available. A grammar checker then validates relationships between elements. For example, if two servos are connected to each other a grammar check flags the error and invalidates the construction, preventing it from reproducing into the next generation. Any agent with a grammatically correct

Chapter 2. Background

9

genotype is guaranteed a corresponding real world construction. Researchers at Brandeis University[Pollack, 2001] have demonstrated some success using Lindenmayer Systems (L-System) as a grammar. L-Systems are a grammatical re-writing system introduced to model the biological development of multicellular organisms[Lindenmayer, 1968]. During reproduction, genes can be added and removed and the parameters of existing genes can be modified. These evolutionary operations are applied in parallel to agent genes in a process modelled on cell division in multicellular organisms. This parallelism builds a symmetric structure of both genotype and phenotype. The assumption [Hornby and Pollack, 2001], which is also supported by experts like Bongard [Bongard and Paul, 2000], is that symmetry is an advantageous biological trait and is likely to be similarly advantageous to a mobile robot.

Figure 2.4: A turtle from [Hornby and Pollack, 2001] constructing an agent phenotype.

The Brandeis group constructed robot morphologies in simulation using a LOGOstyle turtle [Alberson and deSessa, 1982] controlled by a command language similar to the L-System. A turtle is, in effect, a cursor on a grid. The command language controls its movements which create bars and actuated joints that become the morphology of the robot. An example [Hornby and Pollack, 2001] is shown in Figure 2.4. Slide (a) is built from the command string [left (1) forward (1)]. Slide (b) from [ left(1)

Chapter 2. Background

10

forward(1) ] [ right(1) forward(1) ] . Slide (c) is built from [ left(1) forward(1) ] [ right(1) forward(1) ] [revolve(1) forward(1)] . Slide (d) is the same as c, but with the joint half-way through its joint range. Notice the symmetry of the commands and the resulting morphology. Simulation produced a variety of quirky and sometimes sophisticated creatures. An example of some of these is shown in Figure 2.5.

Figure 2.5: Examples of evolved creatures from [Hornby and Pollack, 2001].

The goal of the approach, however, was to define a process which evolved designs transferable to reality. The physics of the simulated environment were not strict and creatures were afforded significant leniency - especially with factors like balance. Also, the simulated constructions were idealized versions and incapable of being directly transferred. For example, the simulator considered joint actuators points where in a realistic construction a moving joint requires a servo-motor which has a non-trivial mass that influences a creature’s locomotion. To address these issues a 2D physical version of the 3D simulated creature was built. A wall which supported the robots third dimension provided balance and support for the heavier motors. Figure 2.6 and 2.7 show the simulated creature and the real 2D equivalent respectively. With some slight modification the robot did perform on par with its simulated equivalent. The difficulties in making the transfer from simulation to reality were readdressed by the Brandeis group in their Golem project [Pollack, 2001]. An encoding gram-

Chapter 2. Background

11

Figure 2.6: The simulated agent from [Hornby and Pollack, 2001].

Figure

2.7:

The

physical

realization

of

the

agent

in

Figure

2.6

from

[Hornby and Pollack, 2001].

mar and evolutionary techniques similar to [Hornby and Pollack, 2001] were used. To avoid the noisy transfer to reality a new process involving a rapid prototyping machine was employed. This machine used a solid printing technique where a temperature controlled print head extrudes plastic layer by layer to assemble the 3D structure. Motors are then snapped in manually. An illustration of this process can be seen in Figure 2.8. Assembled under these conditions robots performed comparably to their simulation. Final results varied depending on factors like crawling surface but the results were superior to the first experiment above. This clean transfer was due to the rapid prototyping technology. Automatic design research is promising. By improving the encoding grammar and further experimenting with techniques like rapid prototyping, generative construction should produce increasingly complex designs. However, this approach was developed from the desire to overcome the complexity of traditional robot design. Experimental results have yet to achieve the level of complexity that is currently attainable with traditional robot design. Only when the complexity of the two approaches is equivalent can

Chapter 2. Background

12

Figure 2.8: An illustration of the rapid prototyping process using in the Golem Project [Pollack, 2001].

we fairly compare this technique’s approach. Rapid prototyping has made the transfer from simulation to reality more efficient and clean. However, as the complexity of individuals increase, this approach may become unwieldy. A human is still required for assembly. Although they play only a small role they are a necessary link in a robot’s lifecycle. To attain the goal of short term special purpose robots and especially self repairing intelligent constructions human intervention must be eliminated.

2.3 Co-Evolution: Fitness Evaluation There are two common challenges to using evolutionary methods. The first is representation. A species genotype needs to encode all dynamic parameters and be mutable to crossover and mutation. An effective representation can focus the evolutionary pressures to achieve significant fitness gains over generations[Mitchell, 1998] while a poor representation can resist evolution due to its large search space or ineffective reproduction. Achieving an adequate solution often involves refining the representation over a number of iterations. The second challenge is fitness evaluation. An agent that performs a desired behaviour better than another in its generation should be awarded a higher fitness. This evaluation requires quantizing the value of something that is not the result of a single action but the series of actions over the length of a test. Scoring is done by awarding

Chapter 2. Background

13

points for certain behaviour related metrics. For example, with [Pollack, 2001], the distance an agent travelled was used. Scoring directly influences learned behaviour. Agents often cheat by learning an action that, although successful in the strictest definition, does not achieve the goal through means desired by the designer. For example, a robot rewarded for distance travelled might learn to move forward and back over the same spot rather than continuously forward. Such an agent would achieve a high score but is performing useless behaviour. Alternatively, agents may exploit inaccuracies of the physical model to perform actions which are unrealistic for the intention of the research. For example, simulated agents learning to use legs may evolve to move their limbs extremely fast so that they skate across a surface instead of balancing and walking as intended. In these cases a designer needs to re-model the evaluator to redefine success. This can also be a long iterative process as every generation finds new ways to exploit the fitness function. Simulating the co-evolution of different species of agents in a single environment can shift the burden of developing an adequate fitness function from the designer to the evolutionary process itself. In nature animals survive and reproduce by exploiting fitness affordances [Beer, 1995] in their environment. These can be properties of the physical environment or other animals in the ecosystem. Competition between individuals results from a scarcity of resources or directly in a predator-prey scenario. These competitive evolutionary pressures can be modelled and used for evaluating fitness in simulation. Using competition [Cliff and Miller, 1996] evolved two species of agent: a pursuer and evader. Two populations, one of each species, were maintained. The agent phenotypes were modelled after the popular Khepera robots. The thresholds and link weights of a neural network controller and the position of infrared sensors along the perimeter of an agent’s body were the evolved parameters. Fitness trails applied a last elite opponent strategy [Cliff and Miller, 1996]. In the first generation random individuals from each population competed and in all subsequent generations individuals competed with

Chapter 2. Background

14

the fittest opponent from the previous generation. In each trial an evaders score was based on how long it lasted before being touched by a pursuer. A pursuer received a bonus for approaching or stalking an evader and an even larger bonus for touching one. The genotypes of the creatures were fixed length bit-strings which encoded for both sensor morphology and the controller network. The string was partitioned into fields with each field representing a neuron in the network. A field’s first sequence of bits encoded whether it was expressed or not. The remaining bits specified a location on the animats 2D body and a growth value or neuron size. Neurons which grew to overlap one another were connected. Two central regions of the animats body defined its motor output neuron. Neurons that grew to overlap with it could send signals which adjusted the speed and steering of a creature. Neurons that grew beyond a bodys perimeter became visual input sensors which had fields that encoded for their range and distance. Symmetric expression was enforced such that when a neuron became expressed a reflection along the longitudinal axis was also created. Standard genetic operations were used for evolution with some modification. A population was scanned and any individuals with an inadequate number of sensors and motors were excluded from reproduction. Because of this garbage collection [Cliff and Miller, 1996] a much higher rate of mutation was used. In order to co-evolve both species populations were separated and a spatially distributed GA was used. This strategy positioned individuals on a virtual grid. Individuals could only reproduce with neighbours and offspring replace less fit neighbours. In addition to regular mutation and crossover a duplication operator was introduced. Random initialization often began an experiment with a number of agents similar to Braitenberg’s vehicles [Braitenberg, 1984]. These vehicles were controlled with simple crossed or uncrossed excitatory connections between sensors and motors. The evolutionary operations provided little improvement from these initial individuals even after 1000 or more generations. The addition of new neurons introduced noise and significantly reduced the performance of the relative simpler of the networks. To

Chapter 2. Background

15

counter this the duplication operator was introduce which copied an entire parent field unchanged into the genotype of a child. Results[Cliff and Miller, 1996] showed an increase in pursuit and evasion behaviour. By generation 1000 fairly sophisticated actions including acceleration and protean (adaptively unpredictable) behaviour were demonstrated. An example is given in 2.9.

Figure 2.9: Illustrations of evolved pursuer and evader behaviour

More interestingly, by generation 1000 the phenotypes of the creatures had evolved into constructions on par with what an intelligent designer might have developed. Evaders had sensors to their sides and rears but none in front. This made sense as they spent their time running away from a pursuer. Pursuers on the other hand had developed elaborate sensor placements with four wide and four narrow sensors on its sides and two narrow rear sensors. Co-evolution determines fitness with a contest between agents of two antagonistic species. Both agents begin with unsophisticated behaviour and evolve together. As one species gets better at its task the other is put under an increased amount of pressure to improve. Behaviour which achieved high fitness against an early generation opponent might be inadequate when used against a more evolved competitor. In this way the process of fitness evaluation itself evolves with agents. This ability to maintain an ef-

Chapter 2. Background

16

fective and arguably less artificial fitness evaluation makes co-evolution an appropriate choice for training sophisticated behaviour.

2.3.1 Minimal Simulation Evolving a design in simulation requires thousands of fitness trials. Behaviour is not scored based on a single action but rather on the benefit its aggregate actions have produced. To be successful agents transferred to reality should accurately exhibit their simulated behaviour in the real world. Simulators, however, can only implement a subset of reality with some real-world features modelled at the expense of others. Precision can always be improved but this additional accuracy does not always [Jakobi, 1997] result in a more successful transfer. Fitness scores and the evolutionary process in general are greatly influenced by the implementation of the simulation. Agents will quickly take advantage of any simulated aspect which leads to a higher fitness score. A problem arises if the agent evolves dependant on this aspect for its successful behaviour. If the aspect is inaccurate or unavailable in reality the learned behaviour will not successfully transfer to a physical robot. Most simulators model physical interaction identically each time. This is both naive and inaccurate considering the noise inherent in the real world. Small perturbations which will influence an outcome in reality may exist below a model’s level of precision. An agent’s velocity is a good example. Small differences in gearing or motor output, which are inevitable in reality due to variation in construction, will influence the velocity of a wheel. In some cases these small differences may be enough to alter an agent’s trajectory. An agent evolved in a simulator which omits this noise will expect precise actuation in reality. If not available the noise of reality may be significant enough to render the simulated behaviour useless. A view suggested by Nick Jakobi [Jakobi, 1997] is rather than increasing precision a simulator should instead model the right amount of noise. Success is measured by how closely the real robot’s behaviour corresponds to its simulated counterpart. The

Chapter 2. Background

17

noise inherent to reality makes learning precise behaviour of little practical use. Instead, by including noise, a method can focus on evolving robust behaviour which can be performed under a variety of conditions. [Jakobi, 1997] defines a general method for creating robust or minimal simulations. First, a subset of real-world features and processes which have some bearing on the behaviour of a robot are determined from the set of all possible physical interactions. These may include agent features like wheel size or sensor range and environmental features like arena size and obstacle placement. Not all aspects are modelled but only those that an agent relies on to perform its behaviour. This set is then further divided into two subsets: base-set aspects and implementation aspects [Jakobi, 1997]. These are defined as follows: (1) Base-Set Aspects The set of robot-environment interactions that are modelled as the basis for learned behaviour. Evolving controllers can safely depend on these to not differ between simulation and reality. An example would be the interaction of sensors with objects and the values returned. (2) Implementation Aspects Those aspects of a simulation that have no basis in reality but on which agents learn to rely nonetheless. This set represents the difference between the base-set and the complete set of robot-environment interactions. A robust simulator needs to fill this gap with arbitrary values and processes so that learned behaviour does not depend on these aspects. An evolved agent is base-set exclusive if its behaviours are exclusively grounded in the base-set aspects of the simulation. An agent which is base-set exclusive should successfully cross the reality gap. Any agent that is not base-set exclusive depends in some way on the implementation aspects of the simulation for its behaviour. This dependence indicates that the agent has no realistic physical counterpart [Jakobi, 1997]. To ensure that an agent is base-set exclusive implementation aspects are randomly varied between fitness trials. Noise is added to parameters so that an agent does not

Chapter 2. Background

18

expect the same value each and every time. An example of an implementation aspect would be the position and size of obstacles in an arena for an agent learning to avoid obstacles. If the values remained constant between trials an agent may learn to follow a safe path through the given arena rather than developing the intended avoidance behaviour. Placed in a new arena with different obstacles the agent would fail. By varying the size and position of the obstacles between trials agents learn not to rely on these parameters but to instead base their behaviour on those aspects which can be relied upon: the base-set aspects. Robot-environment interactions are inherently noisy. Infrared sensors, for example, may produce different range values for the same measurement due to unreliable environmental conditions like ambient light levels or the reflectance of an object. An agent evolved to expect accurate values every time would fail to achieve them in reality and its learned behaviour would fail. Base-set aspects also need to be modified with noise between trials. Agents evolved with a noisy base-set are called base-set robust [Jakobi, 1997]. They do not rely on precise interactions but have learned robust behaviour which performs under the many conditions which are inevitable in reality. [Jakobi, 1997] presents a general process for implementing minimal simulation. Choosing which aspects to model and determining how much noise to apply are decisions which need to be determined experimentally. In [Jakobi et al., 1995] a robot was evolved using minimal simulation to navigate a T-Maze. Fitness was based on navigation and required both reactive and non-reactive behaviours to be successful. Figure 2.10 shows this maze. A robot navigated down a corridor passing a beam of light that was shone randomly from the left or right. At the T-junction the robot had to remember which side the light was on and turn in this direction. Once out of the maze if the floor was white the robot was to travel towards the light and if the floor was black it was to travel away from the light. Khepera robots were used as the physical robot. The robots had two drive wheels, eight infrared sensors positioned around its perimeter and an ambient light

Chapter 2. Background

19

Figure 2.10: The maze used in [Jakobi et al., 1995]. An agent learned to turn right or left at the junction based on the state of the light source.

sensor. The simulators base-set and implementation aspects are listed below: Base Set Aspects Robot motor commands and the resulting movement The infrared sensor values The ambient light sensor values

Implementation Aspects The side of the corridor the light was on The width of the corridor. Varied between 13cm and 23cm The starting orientation of the robot. Varied between

22.5 degrees

The length of the illuminated section. Varied between 2cm and 12cm The total length of the corridor. Varied between 40cm and 60cm Robots were successfully evolved in simulation. The transfer to reality was also successful in all instances except one. The failure occurred due to an improperly simulated angle of acceptance for one of the ambient light sensors. Once this was corrected, however, every real robot performed as well as its simulated counterpart.

Chapter 2. Background

20

2.4 Conclusion Evolutionary methods can be used to produce a variety of novel robotic agent designs. To be useful to current robotics research the designs produced need to be able to be constructed in reality. Simulated behaviour means nothing if it can not be realized. Automatic design investigates methods for constraining the evolution of an agent’s body and controller to realistic conditions. In doing this it is hoped that any evolved agent can successfully cross the reality gap. Although traditional automatic design[Pollack, 2001] has demonstrated some success, no panacea for evolving transferable robot designs has been determined. Rather than assuming that increased simulator accuracy will produce more accurate transfers, minimal simulation [Jakobi, 1997] models the noise inherent in reality. Deliberately introducing noise to certain aspects of a simulation results in learned behaviour which is more robust under realistic conditions. The techniques of artificial life, automatic design and minimal simulation complement one another and suggest a powerful combination for evolving robotic agents and successfully crossing them across the reality gap. The next chapter presents a new process of automatic design which combines these methods to produce successful agents in both simulation and reality.

Chapter 3 Experiment Design 3.1 Introduction This chapter describes the methods used to encode, evolve and instantiate predator and prey agents in simulation and the process employed to transfer the successful agents to reality. Design decisions were rooted in the examples discussed in the background work of Chapter 2 but a new approach to genetic representation was explored. Examples have been included to illustrate the techniques. In some cases decisions were made to limit the scope due to time constraints.

3.1.1 Physical Design LEGO TechnicTM bricks were used as the basic building blocks for agent morphologies. Constrained to a valid Lego assembly designs could be evolved with some guarantee of buildability. LEGOTM bricks are generic shapes which can be easily described by their number of pegs. This ensured a simple genetic representation. With one or more pegs any LEGOTM brick can attach to and be attached to by any other brick. This allowed for a large number of potential morphologies. Finally, any two bricks of the same shape with the same number of pegs are guaranteed to be of the same dimensions. This generality of standard components and its wide availability made LEGO TM ideal. A Lego peg measures roughly 1cm. Three shapes of brick were used. Each was 21

Chapter 3. Experiment Design

22

one peg in width, one peg in height and one of three lengths: six, eight or twelve pegs. Actuation was limited to two motors. Each was geared at a 28:1 ratio and drove a wheel 20cm in diameter. The ratio of 28:1 was experimentally determined to provide an adequate balance of speed and torque. Each motor could interpret one of 5 basic motor commands: fast forward, slow forward, slow reverse, fast reverse and stop. The speed of the wheel was determined by the gearing. Table 3.1 lists the real world velocities that correspond to each of the motor commands at a gearing of 28:1. To simplify the learning of basic behaviour although 5 motor commands were possible only 3 were used. Motor Command

Velocity

fast forward

33cm/sec

slow forward

20cm/sec

stop

0

fast reverse

33cm/sec

slow reverse

20cm/sec

Table 3.1: Motor Commands and corresponding velocities for agents with a 1:28 gearing. Speeds in italics not implemented.

Different agent speeds could have been achieved by including gearing parameters in the genotype. Although this may have resulted in a greater variety of behaviour the additional dimensions in the search space were likely to impact the learning rate of the agents. Due to limited time a decision was made to begin with a small search space and then expand with additional parameters after initial success was demonstrated. This effort was continued beyond gearing and similar decisions were made throughout to constrain the search space to a manageable area. This required keeping the evolvable parameters in the genotype to a minimum. Throughout this work a constant tradeoff maintained this balance between the size of the search space and the potential for behaviour and novel morphologies.

Chapter 3. Experiment Design

23

[Pollack, 2001] used locomotion as a goal behaviour for the evolutionary process. Simple servo actuators could be positioned at joints between rigid bodies. The aggregate action of these servos evolved into the higher level behaviour of agent locomotion. In this work all agents began evolution capable of locomotion. Agents were initialised with two independently driven wheels and a third freely rotating dolly wheel for support. This drive train was then built into a square chassis which comprised the core body of all agents. Figure 3.1 shows this shared chassis.

Figure 3.1: The default chassis with drivetrain. Agent genotypes, regardless of species, express phenotypes which are built atop this base.

Endowing agents with such a jump start to evolution does ignore the traditional automatic design goal[Pollack, 2001] of developing short term, special purpose robots from only a set of elementary components. However, in this work it is not the entire design that is evolved but rather the elements of the design that most directly influence predator-prey behaviour. Automatic design techniques were used because they constrained evolved designs to realistic constructions. A flat plane, 28 Lego pegs wide and 16 pegs tall, was placed atop the default chassis described above. This plane was the root for all bricks of an agent’s body. Any connected series of bricks could be attached so long as one peg from one brick was attached to this root. As evolution progressed new bricks were attached to this plane or bricks attached through other bricks to this plane. Figure 3.2 shows a top view of this default chassis plane.

Chapter 3. Experiment Design

24

Figure 3.2: A top down view of the default chassis plane. All agent phenotypes were expressed with this plane as their root.

Every agent had this chassis as its base. Evolution could extend this design by adding or subtracting new bricks but no design could be reduced from this default construction. Initialised in this way, evolutionary pressures focused on evolving behaviour directly applicable to a predator-prey contest rather low level control.

3.1.2 The Controller

Figure 3.3: The neural network architecture used by all agents.

A single hidden layer neural network was chosen to encode agent control. Agent sensors mapped to nine network input nodes. There was a single hidden layer of nine neurons and two output neurons mapped to the left and right motors. The network

Chapter 3. Experiment Design

25

structure remained static throughout evolution. No new neurons or links were added. The network’s evolvable parameters were limited to the thresholds of the hidden and output neurons and the weights of the links. Links between neurons were initialised with a weight between -2.0 and 2.0. Hidden layer neurons were initialised with a threshold between -1.0 and 1.0. The activation function for the hidden nodes was: Hj





0 if

∑ Ii wi j

1 if

∑ Ii wi j

 

Tj Tj

Where H j is the jth hidden node, Ii is the ith input node and wi j is the link weight between the two. The activation function for the output nodes was:

Oj



  

SlowReverse

if

Stop

if

FastForward i f

 T 1 T ∑H w T ∑H w  T 1 ∑ Hi wi j

j

i ij

j

i ij

j

1

j

Where O j is the jth output node, Hi is the ith hidden node and wi j is the link weight between the two. At each time step the current sensor readings were normalized to a real value between 0 and 1.0 and input to the network. A weighted sum of the inputs was calculated by each hidden node based on the current input-to-hidden link weights. Each hidden node was then set to a value of 0 or 1 according to the hidden node activation function. Output nodes then calculated a weighted sum from the hidden layer and initiated one of the three motor commands according to the output node activation function. When constructed in reality the agent controllers were translated to CPL code and uploaded to a single board computer with a MotorolaTM 68000 compatible processor running at 30MHz. CPL is a language similar to Lisp which was developed in the Department of Artificial Intelligence at the University of Edinburgh to interface with the robotic hardware used here. The main board was connected to a 12V battery and included connections for brush sensor input and for two daisy chained infrared(IR)

Chapter 3. Experiment Design

26

sensor boards. The main board was built into a large LEGOTM casing and the infrared sensor boards into two smaller casings for attachment to a LEGOTM body.

3.1.3 Sensors Sensors were limited to four infrared and four brush sensors. IR sensors had a maximum range of 90cm and were composed of an emitter and receiver pair. Infrared waves were emitted and intersections with a solid body reflected waves back to the receiver which calculated a distance based on the measured time lapse. The IR sensors were accurate enough to produce 256 (0 thru 255) measurements with 0 being no measurement and 255 indicating less than 10cm distance. A choice was made to simplify control by defining three value ranges between 0 and 225 and mapping these to a discreet value of 0, 0.5 or 1.0. The table below lists raw sensor values, the corresponding distance measurement and the value set on the neural network input node. The senor values and corresponding distance measurements were calculated as an average from 10 experiments.

IR Sensor Reading Between 150 and 255 Between 50 and 150 Between 0 and 100

Measured Distance





Input Node Value

40cm

1.0

70cm

0.5

90cm

0.0

Table 3.2: IR sensor readings and the corresponding distance measurement

Brush sensors were each 20cm long and operated as a simple contact switch. If a brush came in contact with a solid body it completed a switch, otherwise the switch remained open. Input nodes corresponding to the brush sensor were set to 1.0 when contact was sensed or 0.0 otherwise. A Hall effect sensor was added after early runs produced agents which could not recover from a collision with a wall. The Hall effect generates voltage transversely to

Chapter 3. Experiment Design

27

the current flow direction in an electric conductor (the Hall voltage) if a magnetic field is applied perpendicularly to the conductor. The sensor uses this effect to measure the revolution of its wheel. The sensor produced a network input value of 0 or 1.0 depending on whether or not the agent significantly changed its position over time. This input gave an agent the ability to potentially develop a proprioceptive sense of its own motion. With IR and brush sensors alone an agent is unable to sense the difference between a collision with a wall or its opponent but input from a Hall effect provides this information. If the input from the Hall Effect was 1.0, and the prey had not been caught then an agent could learn to recognize a stuck state. It was hoped that this would eventually encourage the learning of appropriate recovery behaviour. IR sensors were connected to the main board via a sensor board. Each board accepted two IR sensor inputs and each agent was constructed with two boards. Brush sensors connected directly to the main board as switches. The Hall effect sensor was connected in a similar fashion directly to the main board.

3.1.4 Genotype The genotype maintains all of an agent’s evolvable parameters in a format mutable to genetic operations. A gene or set of genes describe the construction of an agent’s controller and body. A genotype is valid if its genes encode a design which is both buildable and permitted within the constraints of the research. A valid instance of the genotype represents a sample from the design subspace being explored. A complete genotype is presented in Appendix A for reference and corresponds to the predator from the first experimental results in Chapter 4. Agent genotypes were defined in an XML format. An











Agent



root element

contained two child elements Chassis and Controller which encoded an agent’s body and brain respectively. Each of these are discussed separately.

Chapter 3. Experiment Design

28

3.1.4.1 The Controller Genotype

The controller was a single hidden layer neural network with a fixed structure. Its genotype was represented within a



ment had nine hidden hidden nodes.







Controller





element. Each

Controller



ele-

hidden



child elements each of which corresponded to one of its nine elements had a threshold attribute which stored the hidden

node’s threshold (a value between -1.0 and 1.0) and a name attribute which stored the node’s label. Each



Hidden



element had nine



input



children elements. These encoded



the links between one of the input sensors and the hidden node.

input



elements

were defined with a from attribute which indexed the sensor the link read from and a weight attribute which stored the link’s weight: a value between -2.0 and 2.0. Figure 3.4 illustrates a Hidden Node gene. Attributes in red were evolvable values.

Figure 3.4: An example of a hidden node gene in the genotype of a controller. There are 9 hidden nodes. Attributes in red indicate the evolvable values.

A



Controller



element had two



output

to the agent’s left and right motor. Each elements. These



input











child elements. These corresponded

output



element had nine

input elements were similar in structure to the





input

hidden





child

element

nodes but here the from attribute stored the label of the hidden node the link

received input from. The weight attribute stored the link’s weight: a value between -2.0 and 2.0. An output node gene is illustrated in figure 3.5. Again, the evolvable

Chapter 3. Experiment Design

29

parameters are shown in red.

Figure 3.5: An example of an output node gene in the genotype of a controller. Attributes in red indicate the evolvable values.

3.1.4.2 The Body Genotype

All agent bodies were represented as assemblies of LEGOTM bricks atop a standard chassis. Bricks could be connected in series without restriction so long as at least one of its members were connected to the chassis plane. Evolvable parameters included brick size, connection position, orientation and sensor placement. The standard chassis was shared between all agents regardless of species and was not evolvable. The solution space for agent phenotypes was more open-ended than the solution space for the controller. The controller had definite bounds for each of its evolvable parameters: thresholds were between -1.0 and 1.0 and weights between -2.0 and 2.0. During mutation an existing brick’s position could change or a new brick could be added. This put no definite bound on the potential solution space for agent phenotypes. Instead, phenotype limits were determined experimentally during fitness evaluation. Force limits for LEGOTM brick connections were crudely estimated based on observation and educated guessing. A number of simple robots with differently constructed appendages were driven at full speed into a wall and the breaking of the connections was observed. Identical phenotypes were then created in simulation and an equivalent

Chapter 3. Experiment Design

30

test was executed. A force limit for the connections was tuned until breakage similar to that observed in reality was seen in simulation. This force limit was then used during fitness trials. If forces exceeded the joint limit the connection was deleted and the bricks and all their child bricks, including sensors, would snap apart. As a result fragile designs were usually rendered useless after a collision. Correct representation required a frame of reference from which to construct a coordinate system. The chassis plane from 3.2 was used for this purpose. It was 28x16 pegs in size and bricks connected to it by specifying row and column positions.



The genotype for an agent body was defined within a This root contained one or more



ChassisConnection

defined how one





ChassisConnection

contained a single

LEGOTM



Brick





Chassis



root element.

elements as children. Each

element. Together these elements



brick attached to the chassis.

ChassisConnection

tributes, their possible values and what they represent are listed in Table 3.3. Attribute name

Legal Values Representation

row

1 to 16

column

1 to 24

Table 3.3:

A



The

Brick





row

which



ParentConnection



ParentConnection

The

column

ChassisConnection



the

child



peg attaches to.



peg attaches to.

which

the

child



at-

brick’s brick’s

element attribute values.

element corresponds to a single LEGOTM brick. Its attributes, their

legal values and what they represent are listed in Table 3.4. Attribute name

Legal Values Representation

size

16, 12, 6 or 4

The length of the brick in pegs.

orientation

X or Y

The axis of the chassis the brick is parallel with.

Table 3.4:



Brick



element attribute values.

Chapter 3. Experiment Design

31

3.1.4.3 Positioning a Brick





Each Brick element contained a child ParentConnection element which maintained a peg attribute. The value of peg indexed the peg of the brick that connected to the chassis peg defined by the row and column attributes of its parent The



orientation







ChassisConnection .

attribute indicates whether the brick is oriented parallel to the X or

Y axis of the chassis. This is best illustrated with an example. Let us take the chassis connection gene:

Figure 3.6: A sample chassis connection gene

Depending on the value of its orientation attribute a brick is positioned at one of two default start positions. If orientation=X the brick is placed parallel to the X-axis of the chassis plane and translated so that it’s right most peg is adjacent to the chassis peg at row 1, column 1. If orientation=Y the brick is placed parallel to the Y-axis of the chassis plane and translated so that it’s bottom most peg is adjacent to the chassis peg at row 16, column 1. A brick is indexed by its number of pegs. When orientation=X index 1 is its left most peg. When orientation=Y index 1 is its bottom most peg. Brick indexing and the default start positions are illustrated in Figure 3.7. The gene from 3.6 encodes the connection of a brick of size 6 with orientation=X. The brick’s



ParentConnection



element has a peg value of 4 indicating the brick’s

4th peg attaches to the chassis. Each brick has a than or equal to the size of the brick. The





ParentConnection

ChassisConnection





peg value less

element has row=4

and column=6. These values indicate the connection peg on the chassis plane. Combining the information in this Parent-Child pair we can read the gene as expressing the connection of the brick’s peg at index 4 to the chassis peg at row 4, column 6. The

Chapter 3. Experiment Design

32

Figure 3.7: A brick’s orientation value(X or Y) determines its indexing and start position relative to the chassis plane.

other peg connections which result from overlap are automatically determined by the orientation of the brick. Positioning the gene from 3.6 gives us the final assembly in Figure 3.8.

Figure 3.8: The position of a brick on the chassis plane represented by the XML from 3.6.

3.1.4.4 Brick to Brick Connections

A



Brick



element can have any number of child bricks connected to it. Each of

these brick to brick connections are encoded within a



BrickConnection



element.

Chapter 3. Experiment Design

A





BrickConnection

ParentConnection





33

has a single



Brick



child and identical to 3.4 except that its

element’s peg attribute refers to the parent bricks peg to which

it connects. Again, this is best illustrated with an example. If we extend the gene from 3.6 with a



BrickConnection



we get the genotype in Figure 3.9

Figure 3.9: Genotype for two bricks connected in series.



BrickConnection

sented in Table 3.5. Attribute name



attributes, their legal values and what they represent are pre-

Legal Values



peg

the size of par-

ent orientation

Representation





The peg on the parent Brick to which this



Brick .

Above or Below

Table 3.5:



BrickConnection



’s child Brick connects.

Indicates whether the child Brick is connected on top of or below the parent Brick.

BrickConnection



attribute values.

All child bricks are positioned relative to their parent brick. If we take the gene



from 3.9 we see the BrickConnection



defining a second brick attached to the initial

chassis brick from 3.8. It’s peg value is 6. The child brick’s



ParentConnection



peg

value is equal to 2. The child brick is size 6, connected above the parent brick and has an orientation=Y. This results in the construction in Figure 3.10. This indicates that the bricks are connected above peg 6 of the parent brick and to peg 2 of the child brick. 3.1.4.5 Sensor Placement



Sensors can be attached to any brick and are defined within a child SensorConnection element. A



SensorConnection



can be a child of any



Brick





element and has a

Chapter 3. Experiment Design

34

Figure 3.10: This genotype encodes a brush sensor attached to a brick which is connected to the chassis. Genetic parameters are highlighted with the corresponding brick colour.

similar syntax to





BrickConnection . A



SensorConnection



has a peg attribute

which defines the peg of the parent brick to which the sensor connects. It has a single



Sensor



child which defines the parameters and position of the sensor. An example

of a gene is show in Figure 3.11. The attributes of a 3.6.



Sensor



are described in Table

Figure 3.11: The genotype of a Brush sensor attached to a chassis brick.

Chapter 3. Experiment Design

35

Attribute name

Legal Values

Representation

type

Brush or IR

Indicates whether the sensor is a brush or infrared sensor.

id

Any label

orientation

forward

Used by the controller to index this sensor. or

backward Table 3.6:

Indicates the orientation of the sensor relative to its parent brick.



Sensor



attribute values.

A sensor’s connection to its parent brick is defined with syntax similar to brick on brick connections. In the example the





SensorConnection





peg value is 6. The peg

value of it’s child brick’s ParentConnection is 3. This defines a connection between the parent brick’s 6th peg and the sensor’s 3rd. Both brush and IR sensors connect with a base brick of size 6. Which direction a sensor points is a function of it’s orientation and the orientation of it’s parent brick. This function is summarized in Table 3.7. The phenotype resulting from the genotype in 3.11 is shown in Figure 3.12. Parent Brick

Sensor

Direction

Orientation

Orientation of focus

X

forward

Sensor points to the front on the agent.

X

backward

Sensor points to the rear of the agent.

Y

forward

Sensor points out from the centre of the agent.

Y

backward

Sensor points towards the centre of the agent.

Table 3.7:



Sensor



orientation.

Chapter 3. Experiment Design

36

Figure 3.12: The phenotypic expression of the genotype in 3.11.

3.1.4.6 Symmetry

[Bongard and Paul, 2000] demonstrates that there is a positive correlation between morphological symmetry and locomotive efficiency for evolved creatures. [Pollack, 2001] uses the rules of a generative grammar to enforce symmetry by using a L-Systems. In biology symmetry is the observed norm. For this work, symmetry was enforced artificially by evolving only the left half of an agent and then translating an identical mirror image to the right. A vertical median between the 14 and 15th column of the chassis divided the halves. Agents were initialized with bricks and sensors on the left side. During mutation and crossover only the left side was changed. Before a fitness trial an agent genotype was translated by reflecting the left side to the right. This complete phenotype was then used for fitness evaluation.

Chapter 3. Experiment Design

37

3.1.5 Genotype Validation This genetic syntax is robust enough that a hierarchy of any number of bricks and sensors can encode any standard Lego brick configuration. Enforcing the following rules ensured the phenotype corresponding to the genotype was valid and could be built. Applying these rules during mutation and crossover is discussed below. (1) Any



ParentConnection

if its parent is a



(2) For symmetry, a





peg attribute must be



Sensor .





size of its parent brick, or

ChassisConnection ’s column must be





6

14.

There is no genetic restriction on bricks occupying the same position in 3D space. Additionally, genes are not unique in their morphological expression. 3.13 shows two different genes which map to the same position. Although not ideal, this has minimal impact to an agent. The only issues is that during reproduction there is a probability that genetic operations of mutation and crossover do not alter the phenotype of the agent, losing the benefit of the evolutionary step. In simulation an agent’s own bricks will not collide. Bricks in redundant positions are simply not drawn and do not produce collisions with external bodies. When transferred to reality, if two bricks overlap one is ignored. Partial overlaps are dealt with by hand usually by finding an alternate configuration of smaller bricks which results in an identical construction.

Figure 3.13: Redundant encoding. Different genes can encode for the same phenotype.

Chapter 3. Experiment Design

38

3.1.5.1 Mapping Sensors

Every agent had four infrared sensors and four brush sensors. Because sensors were mirrored during symmetrical translation the left side of every agent was initialised with two infrared sensors and two brush sensors. Genetic operations could reposition the sensors but not add new ones or delete existing ones. This maintained the network



structure. Sensors were indexed in the controller gene by the from fields in the input



elements of the hidden nodes. The sensors on an agent’s left side were labelled: IR1, IR2, Brush1, Brush2. When mirrored, the right side included IR3, IR4, Brush3, Brush4 such that 3 was the reflection of 1 and 2 was the reflection of 4. 3.1.5.2 Equivalent Representation

Binary bit strings are often used as genotypes for evolutionary methods[Mitchell, 1998]. Their generality with respect to the quantity and variety of information that can be encoded, and robustness to the sometimes disruptive operations of mutation and crossover make them a common and often effective choice. The XML format used here is equivalent to more traditional encoding with real valued genotypes. 3.14 illustrates one of many possible mappings between the XML genotype used here and a real valued genotype. The XML format was chosen over other options for a number of reasons: (1) Intrinsic hierachy. XML organizes data in a simple parent-child relationship. The relationships themselves, however, are unrestricted. Any element is able to be a parent to any number of child elements. This enforces structure without constraining the content. This property was especially useful and allowed bricks to connect in a large variety of configurations without any extension to the language. (2) Generic Elements. There was no hard distinction between two elements of the same type. This allowed for the straightforward definition of crossover points.



Any BrickConnection



element from one agent, regardless of parent-child rela-

tionships, could be swapped into a second agent without disrupting either agents existing hierarchy. (3) Human Readable. Some experience with XML is necessary to interpret an agent’s genotype but once familiar with the structure an agent’s phenotype can be envision

Chapter 3. Experiment Design

39

directly from the genotype. Much of this is due to the descriptive element and attribute names. Being human readable in this way allowed the genotype to second as an agent’s complete assembly instructions - no additional documentation or knowledge was required. (4) Easy validation. Errors between elements were easy to debug. (5) Easy of development. XML has become a standard format for data files. There are numerous tools available which can be used to improve quality and decrease development time.

Figure 3.14: One of many potential real value genes equivalent to the XML gene. A equivalent binary string is not shown.

Chapter 3. Experiment Design

40

3.2 Simulator The open source physics simulator Open Dynamics Engine (ODE) was used to enforce real world physics during fitness trials. ODE provides an API for defining constructions of solid bodies within a 3D world. Bodies and the environment can be parameterized with physical attributes like mass, velocity and torque to interact within an accurate model of the real world.

Figure 3.15: The simulated default chassis.

The agent chassis was modelled as a 1cm thick plane atop a simple box 8cmx16cmx6cm. Two wheels, modelled as spheres with a radius of 5cm, were attached to the left and the right side of the box with a special joint. The joint acted as a motor and could generate force to rotate the wheel forwards and back. A third sphere with radius 1.5cm modelling the dolly wheel was connected to a rear tail piece with dimensions of 2cmx9cmx0.5cm. The dolly wheel was modelled to rotate freely. This base chassis from which all agents were initialized from is shown 3.15. The simulated motors were parameterized to achieve the observed motor speeds 3.1 in simulation. ?? refers to all aspects of a simulation that the robot relies upon to perform it’s behaviour as base aspects. Agents must be robust to the differences between the base-set aspects of a simulation and the real base-set in order to cross the reality gap. To be robust to these differences random noise is added to the simulator’s base set aspects. This prevents agent behaviours from relying on precise sensing and control. Instead general behaviour which is more effective in a variety of conditions is learned.

Chapter 3. Experiment Design

41

The base-set aspects for the simulation were the IR sensors, the brush sensors and the motor velocities. In each trial one of the following three functions was randomly selected at equal probability to generate noise for each IR sensor. When an IR beam was broken during a trial the selected function generated the sensor’s input value to the neural network. In the functions below the method rand(N) generates a value between N.

Noisy Infrared

Sensor

Function

1

 

distance

 Infrared

Sensor

Function

2

distance

 

Sensor

Function

 

3

08

50cm

05

70cm

02

90cm

00

 

distance







10



30cm

08

50cm

05

70cm

02

90cm

00

  

 

distance

 

distance



distance

distance





10cm

distance distance

  rand  0 8  rand  0 5  rand  0 2  rand 1 0



distance

 Infrared



30cm



distance



Noisy

10

distance



Noisy

10cm

distance

 





distance









30cm

08



50cm

05

70cm

02

90cm

00













  rand  0 4  rand  0 2  rand  0 1  rand 0 5

  rand  0 3  rand  0 2  rand  0 1  rand 0 4

Three similar although simpler functions were used for modelling the brush sensors. Brushes were more reliable than IR sensors but they were not without problems. The thin wires that composed the sensors were observed to bend and catch on the wall or other agent. Some brushes also registered contact when none occurred. Although rare this was caused by a collision with a wall or a sharp change in velocity. Because return values in these cases were not guaranteed a small amount of random noise modified the sensor in simulation. One of the following three noise functions were

Chapter 3. Experiment Design

42

randomly selected at the beginning of a trial for each of an agent’s brush sensors. Brush Sensor Function 1: Contact

 

registered

No Contact

95% o f 5%



registered

Brush Sensor Function 2: Contact



Contact

No Contact

 



registered

of

the time

98% o f





of

the time the time the time

10

10

00

the time

the time

99% o f 1% o f

00

00

the time

99% o f 1%

00

the time

the time

2% o f



10

the time

98% o f

registered

registered

95% o f

2%

Brush Sensor Function 3:

the time

5% o f



registered

No Contact



of

the time

10

10

00

00

10

Real robot velocities were influenced by the charge of the battery, the slipping of the wheel around its rim and tiny differences in gear spacing. At the beginning of each trial a small amount of noise modified each motor’s velocity. Unlike sensors which have a different noise value calculated at each time step, noise was only added to a motor once at the beginning of each trial. A random value within one of the following ranges was added to each motor’s velocity with equal probability: 1.0cm/sec and

1.5cm/sec.

0.5cm/sec,

Chapter 3. Experiment Design

43

All bricks were modelled as solid boxes 1cm tall, 1 cm wide and N cm long, where N is one of the three available brick lengths: 6, 8, 12. Bricks were connected to the chassis and each other with fixed joints. During a collision the current force at a fixed joint could be accessed and joints with forces above a threshold were deleted. If all joints between two bricks were deleted the bricks snapped apart. An arena was randomly initialised at the beginning of each fitness contest. The dimensions of the arena were an implementation aspect of the simulator and not a base-set aspect. The arena’s dimensions didn’t directly affect the behaviour of an agent but they were a parameters of the simulation that an agent could never the less become dependent on. The goal behaviours were not intended to be specific to an arena but effective in all arenas. To prevent an adverse influence on behaviour ?? suggests making these parameters implementation aspects. The arena dimensions were initialised each run and fluctuated between 180cm and 380cm in height and 225cm and 425cm in width according to the following function: Arena Height



280cm

Arena Width



325cm







rand 100cm



rand 100cm





At the beginning of a trial each genotype was translated to its symmetrical representation. The phenotype was constructed brick by brick in the 3D world according to the genotype and the



Controller



element was parsed to build the agents neural

network. These complete predator and prey were then randomly placed in the arena. A single trail executed 600 time steps. Each time step modelled 0.1 real seconds with 600 steps representing a minute of real time. At each time step the sensors were updated based on their current state and the output value was used as input to the neural network which generated motor commands. The trial ended if a predator and prey collided or if 600 time steps elapsed. Fitness scores were then updated based on the result.

Chapter 3. Experiment Design

44

3.2.1 The Genetic Algorithm Each evolutionary run was initialised with one predator and one prey population. The size of the populations varied between 15 and 50 individuals with most runs having a size or 20 or 25. Agents controllers were randomly initialised with threshold and weight values. Agent phenotypes began with the default chassis design. A random configuration of chassis bricks were initialised and connected to the agent’s left side. A hierarchy of child bricks were then randomly connected to these chassis bricks. Parameters such as orientation, brick length and the configuration of parent-child connections were initialised randomly within their legal ranges. Finally, two IR sensors and two brush sensors were positioned on the agent body. The XML genotypes for the populations were stored in separate containers. At the beginning of each generation predator and prey were randomly selected from the containers to compete in fitness trials. After a trail both individuals were removed from their container and the competitors for the next trail were randomly selected from those remaining. This ensured that the number of fitness trails executed in each generation was equal to the size of the populations. Reproduction was driven by fitness proportional selection. After all fitness trials were complete an intermediate population for each species was assembled as a roulette wheel in memory. Each individual in the population was allocated space on the wheel proportional to the ratio of its fitness compared to the entire population. For example, if there were 5 prey, each with a score of 25, the percentage of the wheel allocated to each prey would be 20%. Each wheel was spun a number of times equal to the current population size and the individual selected in each spin was copied into the final population for each species. High fitness designs might be reproduced multiple times into the new population while low fitness designs might not be included at all. A test for mutation was then performed for each evolvable parameter an agent’s

Chapter 3. Experiment Design

45

genotype. If a randomly generated real value between 0 and 1.0 was below the mutation rate the parameter was assigned a new random value within the legal range. The mutation rate for most runs was set to 0.05, or 5%. In some cases mutation cascaded through a genetic hierarchy. For example, if the size of any parent mutated to a new smaller size there was no guarantee that its child bricks still specified legal connection values. They could now specify a nonexistent peg. To minimize this disruption if a parameter was mutated each gene which depended on its value was also mutated but within the new constraints. This mutational cascade only affected one level up and one level down in the hierarchy of brick and sensor connections. Two agents of the same species could sexually reproduce and crossover genes between them to produce two new individuals. For each individual in a population a random real number between 0 and 1.0 was generated. If the value was less than the set crossover rate a second individual was randomly selected as a mate. The crossover rate was usually set to 0.05, or 5%. A single crossover point was randomly chosen at an existing



BrickConnection



element. The exact position for crossover could

be different for each individual but could only occur at elements of this type. The structure of the genotype encapsulated all the information pertinent to the position and construction of a brick within a



BrickConnection



element. A brick and its children

could be swapped into a design by simply inserting a



BrickConnection



element

and performing a validation on the connection peg. This provided a very clean and powerful method for reproducing phenotypes. Because crossover swapped a significant amount of genetic material the operation often resulted in major morphological changes. Crossover was not seamless. Like mutation, a validation step was needed correct invalid designs. An agent was limited to four IR sensors and four brush sensors. Because



Sensor



elements were encoded as children of a



BrickConnection



a

crossover operation could result in a new agent with too many sensors. To ensure valid designs when crossover produced an extra sensor a fitness comparison between the two parents determined the sensor used. If the original parent had a higher fitness score the

Chapter 3. Experiment Design



SensorConnection



46

element was deleted from the crossed over gene and no sensors

were deleted from the original design. Otherwise, one of the original parent’s two sensor of the same type were randomly deleted and a new position was defined by for the crossed over sensor. If the designs were equally fit, the original parent maintained its configuration.

3.2.1.1 Fitness Evaluation

Each fitness contest matched a single predator and prey against one another in an arena enclosed by walls. Agents were initialized in a random starting position. No obstacles were present within the arena and trails were limited to one minute in length. The goal for a predator was to collide with a prey in the shortest amount of time possible. The contrary prey goal was to avoid contact with a predator for as long as possible. A trial ended when a prey was caught or the time had expired. Although simple in definition a straightforward fitness function which rewarded a predator for catching a prey and a prey based on the amount of time it remained alive was inadequate. Agents quickly learned to cheat and achieved high scores through undesired behaviour. For example, prey learned to hide in a corner for an entire trial, effectively protecting themselves from 2 directions. This was effective but was not the active avoidance behaviour sought. As a result the function was updated with the following rewards. A predator was rewarded with a stalking score. Every 10 time steps, which simulated one real second, the Euclidean distance between the predator and the prey was measured. If the predator had decreased the distance between its centre and the prey by more than 1cm a bonus of 0.1 was added to its score. This encouraged a predator to turn towards and chase a prey. The prey in initial experiments were not avoiding predators. Prey were awarded 0.1 points for every second of the trial they remained alive. If a prey was caught its

Chapter 3. Experiment Design

47

score was penalized by 50%. Although this seemed to reward those prey which could avoid a predator the formula was determined to unfairly emphasise the final seconds of a trail. A prey that may have actively avoided a predator until the final seconds would receive a lower score than one which had just been lucky with its start position. The fitness function was changed so that prey were instead rewarded an avoidance bonus. This was calculated as the inverse of the predator’s stalking score. Prey were awarded a bonus of 0.1 every second they moved more than 1cm from the predator. This encouraged prey towards active avoidance of predators. The fitness function needed to encourage robust stalking and avoidance behaviour and not just lucky outcomes. To ameliorate cheating and encourage learned behaviour the rewards above were combined with a simple prey caught function. Together the rewards listed in Table 3.8 and Table 3.9 comprise the final fitness function used. Predator Reward

Contest Event

Reward Justification

10.0

If a prey was caught after the first 2

Standard prey caught reward.

seconds of the trial. 2.0 0.1

If a prey was caught within the first

Catching a prey early was luck and

2 seconds of the trial.

not learned behaviour.

If the predator advanced



1cm

Encouraged stalking.

closer to the prey in last second. 0.01

If a predator moved itself more than

Encouraged recovery from a colli-

0.5cm in the last second.

sion with a wall.

Table 3.8: Fitness Function: Predator Rewards

Chapter 3. Experiment Design

48

Prey Reward

Contest Event

Reward Justification

0.1

Every second a prey remained un-

Standard prey reward for avoid-

caught.

ance.

If a prey had decreased distance be-

Encouraged active avoidance.

0.1

tween predator by second. 0.01



1cm in the last

If a prey moved itself more than

Encouraged recovery from a colli-

0.5cm in the last second.

sion with a wall.

Table 3.9: Fitness Function: Prey Rewards

3.2.2 Crossing the Reality Gap 3.2.2.1 Controller Translator

Automatic design aims for a fully automated agent design process unencumbered by human intervention. Although the more advanced assembly technologies were unavailable an effort was made to eliminate as much human effort as possible during robot assembly. LEGOTM constructions had to be assembled manually but the process for converting and uploading a agent’s brain was automated. To upload controllers to physical designs a translator was created to convert the genotype representation of the neural network into CPL code. The XML was easily parsed and used to populate a template CPL document. Only a small amount of intervention was then required to compile and upload the code to a main board. An example of the translated CPL code is included as Appendix B. It is the control code for the predator constructed in Experiment 3 in the next section. The main board, the battery and the additional sensor boards were not modelled in the simulator. When transferring designs to real robots these boards were connected after an agent’s phenotype was assembled. Depending on the assemblys the positioning was unique to a design. Care was taken to ensure the configuration did not interfere with the sensors or the morphology. The battery was housed within the chassis where it could not obstruct phenotype expression.

Chapter 3. Experiment Design

49

The genotype was designed to second as assembly instructions for building the robot. To perform assembly the genotype was mirrored so that an agent’s complete symmetric design was available. The genotype was read from its root and the phenotype was assembled by reading through the element’s hierarchy in order and constructing the design as specified. Because sensors required a connection to a sensor board an agent’s body was completely constructed before any sensors were added. The process was slow but assembly was very straightforward and required no special knowledge of the research. To more efficiently construct agents the simulator was also used for to view designs in 3D. The genotype was only used afterwards for verification or to resolve ambiguity. Once constructed the agent controllers were automatically translated into CPL controller code and uploaded to each agent without modification.

3.3 Conclusion The methods presented here were combined in a unified process to evolve agent designs. The final software was implemented in Java and the salient results from the over 100 hundred trials executed are presented in the next chapter.

Chapter 4 Experiment Results 4.1 Introduction This chapter presents the results from three evolutionary runs. Although over one hundred runs were performed these three, presented in chronological order, best illustrate the stages of learning observed over the course of the research. In the first experiment some agents exhibited behaviour very similar to simple Braitenberg vehicles [Braitenberg, 1984]. Reproductive operations in some cases, however, negatively impacted these simple but effective designs. Small changes to phenotype would render a previously successful controller useless. These early findings lead to a modified process for the remaining results. The IR and Brush sensors generated the same value whether contact was made with a competitor’s body or a wall. However, only a collision with a competitor ended the game. This made it difficult for agents to discern the difference between the two. For different reasons early predators and prey both learned to navigated towards the first solid object sensed. This was undesired behaviour. The second experiment presents agents which learned to back away from a wall after a collision and continue with the contest. The third experiment presents the most sophisticated predator and prey observed

50

Chapter 4. Experiment Results

51

in all evolutionary trials and focuses on the validation of the agent designs with actual physical robots. An illustrated example of automatic design process shows the modest amount of human intervention required for simulated agents to cross the reality gap.

4.2 Experiment Setup A complete evolutionary run included agent initialization, fitness evaluation, selection and reproduction for both predator and prey populations over a set number of generations. The process was self contained and parameterized with the mutation rate, crossover rate, population size and number of generations. Trials were run on a 2 GHz Pentium 4 laptop and a set of 2 GHz dual processor Pentium 4 Linux servers. The average run time for a trial of 1000 generations with a population of 20 agents was just under 4 days on the Linux server and just over 6 on the laptop. Experiments showed a linear increase in time as more generations were added and an increase greater than linear as the population size was increased. Initial trials with various population sizes were used to profile the evolutionary code for areas of low performance. The longest running stage by far was the fitness trail. A single trail executed 600 time steps. Each time step modelled 0.1 real seconds with 600 steps representing a minute of real time. Each trial took between 15 and 60 seconds on the laptop and 10 and 45 seconds on one of the servers with variation a result of the other concurrent processes executing on the machines. The reproductive operations of selection, mutation and crossover could be performed on an entire population of 100 agents in just under 3 seconds on both machines and were determined to not be a significant performance concern. To address performance the fitness trial code on the dual processor servers was multithreaded. The laptop was a single threaded windows machine and would not have benefited from multithreading. Multithreading did improve performance on the server when other processes were absent but provided little improvement during peak usage. As a result many trials were parameterized differently and trained with concur-

Chapter 4. Experiment Results

52

rent processes on separate machines. Although runs took over 6 days to complete, this allowed the results of multiple runs to be generated at once. Refining the process involved parameterizing multiple trials and then observing the final generations of agents for any unwanted behaviour. After many runs the influence of the parameters could be predicted and updated to improve results in the next trials. This brute force approach was required as many failed trials occurred before interesting behaviour was observed. The fitness function presented in Chapter 3 was refined over the course of the research. Early runs rewarded prey based on the amount of time they could avoid a predator and predators for catching prey, but nothing else. This resulted in little learning. The first two results presented here were used to refine the fitness function into its final form. In both cases unwanted behaviour was observed and the fitness function was tuned to encourage a more desirable outcome. The starting positions of the agents were randomly initialised each trial. In some cases the agents would start the contest very close to one another giving an unfair advantage to the predator. If a predator caught a prey within the first 5 seconds of a trail the bonus awarded to the predator was scaled by 0.5. This heavily penalized a successful predator which may have caught the prey regardless but it was necessary to train behaviour from an arbitrary starting position. Alternative methods for ensuring appropriate starting positions were considered but even a scaled prey caught score was usually enough to ensure that the predator was reproduced into the next generation, giving it another chance to compete. A number of neural network configurations were experimented with before the final structure was chosen. Early networks with a large number of nodes failed to develop interesting behaviour and were unreactive to input even after 1000 generations. Debugging revealed that a large number of hidden node values, which were set to either 0 or 1 depending on the activation function, seemed to cancel each other when processed by the output nodes. Agents would begin and end the trial without any increase in fitness or change in behaviour. Mutation and crossover would at times

Chapter 4. Experiment Results

53

reproduce a fitter predator or prey but the increased search space was unmanageable for extended experimentation given the limited resources available. A similar problem was observed if the number of hidden nodes were set to a value less than the number of inputs. With only a few hidden nodes control was fragile and small mutations to the network parameters usually crippled any successful existing behaviour.

4.3 Experiment 1 The population of both species was initialised to a size of twenty. Agents began with a random phenotype of three to six chassis bricks and had a maximum of two child bricks attached to each of these. Every design included four IR sensors and four brush sensors. In this early experiment no Hall effect sensor was used. A single hidden layer neural network with eight inputs nodes, eight hidden nodes and two output nodes was used. Link weights were randomly set between -2.0 and 2.0 and thresholds between -1.0 and 1.0. This is identical to the final network shown in Chapter 3 except that there is no Hall Effect sensor for input and there is one less hidden node. An example of this network is shown in Figure 4.1.

Figure 4.1: The neural network architecture used for this experiment. This is identical to the final network structure except that no Hall effect sensor is present.

Chapter 4. Experiment Results

54

The base-set aspects for the simulation were the IR sensors, the brush sensors and the motor velocities. A function was randomly selected each trial from those presented in Chapter 2 to generate random noise for each aspect. Introducing noise to this baseset prevented agents from becoming dependent on precise readings and actuation for control. The genetic algorithm was parameterized with a mutation rate of 5%, a crossover rate of 10% and an add brick mutation rate of 5%. Prey scores were proportional to their survival time and predator scores were a function of the amount of stalking performed and whether or not the prey was caught. Formally these functions are: Prey

Each

Caught

second









Predator

10 0

Prey

Current

Score



Predator

0 1 if

1cm closer

Prey

Current

Score



0 25

to

prey

01

800 generations were evolved with the simulation saving the current generation’s agent designs for both species every 10 generations. Contests between predators and prey were then replayed in a 3D viewer which allowed the progress of learned behaviour to be observed. In the final generation the best predator demonstrated simple reactive behaviour. It was assembled with two IR sensors facing forward and two oriented to its left and right side. All brush sensors had been evolved to point inwards which prevented them from actively sensing a prey. Figure 4.2 shows this agent. When its IR beams were broken by another object measured to be less than 50cm away the predator turned towards the object. Otherwise the predator travelled in a wide counter-clockwise circle. This could be considered reasonable hunting behaviour for a predator. It circles, sweeping a large area of the arena with its infrared beams until it senses a prey which is then turns towards. The problem was that a predator had no way of discerning prey from a wall. The same behaviour which attacked a prey would send the predator crashing into

Chapter 4. Experiment Results

55

a wall. Once against a wall the predator would remain stuck there until the end of the trial. Depending on its start position the predator could be found in this position within the first few seconds.

Figure 4.2: The most effective predator evolved in the first experiment.

Of the other final generation predators only a three seemed to have a hint of similar reactivness. Others all drove straight into a wall or execute some random uninteresting gesticulations before driving into a wall. There were also five which simply circled regardless of sensor input and depending on their start position, arena size and the radius of their circle eventually collided with a wall. No prey was observed to have evolved true avoidance behaviour. Most began a fitness trial by driving quickly into a wall and remaining pushed against it for the remainder of the trial. At first glace this behaviour seems naive with no hint of learning. However, against a wall a prey was protected from a predator in the direction of the wall. In a corner prey was protected in two directions. Figure 4.3 shows the best prey from this generation. A myriad of intricate phenotypes were present in the final 100 generations. Example of some of the more interesting can be observed in Figures 4.4 and 4.5. Many were assembled with over 100 bricks with most of these bricks sharing the same space.

Chapter 4. Experiment Results

56

Figure 4.3: The most effective prey evolved in the first experiment. Although possessing an interesting phenotype, this prey began a trial by driving into a wall and remaining there.

For predators, there was a trend towards larger assemblies. This was expected as those predators which occupied more area had an increased the probability of a collision with a prey. For prey, a smaller phenotype seemed more beneficial. This could be expected as the less there was to come in contact with the lower their probability of being caught.

Figure 4.4: Some of the interesting prey evolved in simulation for experiment 1.

For both species additional bricks increased the number of available sensor positions. This could potentially increase fitness, however, only when an agent’s controller

Chapter 4. Experiment Results

57

could keep up. Significant changes to sensor position or orientation seemed to impair or in most cases, completely neutralize any previous control. For example, imagine a controller trained to execute reactive behaviour for a phenotype with two infrared sensors both oriented forward. If during reproduction a crossover operation repositioned its sensors to the left and right side, the controller would no longer produce the same behaviour. Sensor inputs would no longer correspond to the same observation they had previously. This disruptive effect of phenotype change was investigated further with smaller test trials which charted each evolutionary step. A predator with two IR sensors to each of its sides was observed behaving similar to the best predator above. A single crossover event repositioned its sensors so that all four IR sensors were now pointing to its rear. When the beams were broken the predator turned but because the sensors were now to its rear the action simply perturbed its circling, whereas before, the predator seemed to track the object in its beam.

Figure 4.5: Some of the interesting predators evolved in simulation for experiment 1.

The results from this work lead to two significant changes to the process. First, both species had difficulties with the arena wall. Predator’s couldn’t discern between

Chapter 4. Experiment Results

58

a wall and prey while prey spent most of the trial hiding against one. No agent was ever observed to recover after a collision with a wall. This was undesired. The fitness contests were uninteresting and the behaviour was anything but robust. As a remedy the fitness function was modified to penalize both species immobility. Additionally, an avoid bonus was added to prey scores. This was the inverse of a predator’s stalking bonus and rewarded a prey a bonus of 0.01 for every second it succeeded in moving more than 0.5cm further from a predator. These rewards were combined with the existing ones to produce the final fitness functions presented in tables 3.8 and 3.9 of Chapter 3. Due to the disruptive effects of sensor repositioning the crossover operation was disabled. Although likely responsible for some of the more intricate designs its impact to control was too great. Mutation of sensor position was also limited. A sensor could now only be repositioned to either the parent or child bricks of its current brick and mutation of the orientation attribute was disabled. This constrained the change in sensor position to an area close to its current position and ensured only small changes to neural network input. It was hoped this would eliminated the large disruptions to already learned control. Finally, the add brick mutation rate was dropped to 1%. Most of the bricks in large phenotypes occupied the same area making them redundant. A large number of bricks also taxed the performance of the simulator. Contest between agents with more than 30 bricks took twice as long as those with no more than 20. When agents had over 100 bricks the trial was over 20 times as long.

4.4 Experiment 2 A population of both species was again initialised to a size of twenty. Agents began the same as before with a random phenotypes of three to six chassis bricks and a maximum of two child bricks attached to each of these. Each design included four IR sensors and four brush sensors and one Hall effect sensor which could be used by an agent to sense

Chapter 4. Experiment Results

59

its own movement. A single hidden layer neural network with nine inputs nodes, nine hidden nodes and two output nodes was created with link weights randomly set between -2.0 and 2.0 and thresholds randomly set between -1.0 and 1.0. This is the final network shown in Figure 3.3 of Chapter 2. The base-set and implementation aspects of the simulator were left identical to the first experiment. The genetic algorithm was parameterized with a mutation rate of 5% and an add brick mutation rate of 1%. Crossover was disabled. The fitness function for both predator and prey were updated to the scoring presented in Chapter 2. 1000 generations were evolved. The most successful behaviours observed in the final generations were superior to those from the first experiment. Figure 4.6 shows the best predator from the final generation. It’s configuration was similar to the predator from the first experiment with two IR sensors facing forwards and one to either of its sides. One pair of brush sensors pointed inwards while two close to the front were pointing out to its sides. A connection of bricks had evolved and extended to its front. This extra reach likely conferring a small advantage to the predator. When one of the front IR beams was broken the predator turned towards that position. If both its beams were broken the predator travelled straight ahead. When one of the rear beams were broken the predator turned to the side of the sensor. This was an improvement from the previous best predator. This behaviour executed whenever an object was sensed



70cm away. Prey were indeed stalked and chased down if they

were sensed by the front IR beams. Unless the prey was within approximately 40cm of the predator it had a chance of manoeuvring out of the predator’s IR beams before being caught. When this happened the predator went back to the circling. If the prey was less than 40cm away, however, the predator was observed to win every time.

Chapter 4. Experiment Results

60

Figure 4.6: The most effective predator evolved in the second experiment.

The predator still could not immediately discern prey from the wall. However, after a collision, if the contest did not end the predator had learned to execute a series of backup and turn actions. As long as the predator was not in a corner, it would eventually straighten itself parallel to the wall. This particular predator always circled counter clock wise. If the wall was on its right, after straightening out the predator circled away from the wall back into the centre. If the wall was on its left the predator seemed to hug the wall and travel against it until it encountered the first corner where it then remained for the rest of the trial. This was not ideal but indicated progress towards robust wall avoidance behaviour. The best prey from the final generation is shown in Figure 4.7. It had two IR sensors pointing forward from the corners of its chassis and two backwards attached to bricks which extended outward. It’s brush sensors all pointed forward but only extended from the chassis by a few centimetres. Otherwise the phenotype was very compact. This prey had learned a set of basic reactions that it used to avoid all objects - both predator and wall. The prey began trials driving a wide circle. If any objects broke one of its front IR beams the prey turned away from the sensed direction. If the IR beam to its rear left beam was broken the prey turned to the right away from the object. A break in the right IR beam, however, produced no change in the prey’s behaviour. This re-

Chapter 4. Experiment Results

61

Figure 4.7: The most effective prey evolved in the second experiment.

action resulted in very primitive but effective avoidance behaviour that generally kept the prey away from walls and predators. A problem occurred if both a prey’s front IR beams were broken. This caused both wheels to stop turning and the prey to remain still until either the object moved or the trial ended. Often, if a wall had broken both beams, the prey simply sat motionless in the middle of the arena waiting for a predator to find it. If a predator had broken the beams the prey waited until the predator had passed. Unfortunately, if the predator was travelling along a trajectory towards it the prey remained pathetically motionless, like a deer in headlights, waiting to die. This was amusing but ineffective. It is likely that further changes to the fitness function and additional evolution could discouraged this behaviour.

4.5 Experiment 3 - A reality transfer In this experiment an evolutionary run with parameterization identical to the second experiment was executed over 1000 trials. The genotypes of the best predator and prey

Chapter 4. Experiment Results

62

from the final generation were then constructed in reality according to the process described in Chapter 2.

Figure 4.8: A top view of the predator from experiment 3 in simulation.

Figures 4.8 and 4.9 show the predator evolved in simulation. Its IR sensors point in front and behind. One pair of brush sensors point inside and the second set extend just out to its sides. This predator evolved similar reactive behaviour to the others presented. In simulation it turned towards any object it sensed with its front IR sensors. If its rear IR beams were broken the predator turned toward the side the sensor was on. After a collision with a wall the predator made backup and turn movements to straighten itself out. Although no obvious reaction was observed when the brush sensors were contacted this predator exhibited more efficient wall recover than the previous predators. This predator backed up further and always turned away from the side where contact was sensed. A behaviour likely due to these brushes. This was the only agent observed that made real use of its brush sensors. Figures 4.10 and 4.11 show the prey evolved in simulation. It had two IR sensors close to its centre pointing forward and two further apart pointing backwards. One pair of brush sensors were contained within the radius of the chassis and one pair pointed out to the prey’s rear. It had evolved the simple avoidance behaviour like the prey from

Chapter 4. Experiment Results

63

Figure 4.9: Another view of the predator from experiment 3 in simulation.

the second experiment presented except it did not stop moving when both its front beams were broken. Instead, when both beams were broken the prey always turned to the left. No action was ever observed to result from the brush sensors being contacted.

Figure 4.10: A top view of the prey from experiment 3 in simulation.

Chapter 4. Experiment Results

64

Figure 4.11: Another view of the prey from experiment 3 in simulation.

Two default chassis were built according to the process from Chapter 3. To speed assembly time the agents were displayed in the 3D environment. The tool built allowed real time movement of the camera in the world. Through inspection the agent phenotypes could be assembled from direct observation. The XML genotypes were only used at the end to verify the constructions. Any bricks encoded to occupy the same space were either represented by the same brick or an assembly of multiple smaller bricks which resulted in an identical construction. Finally, the battery, main board and sensor boards were secured the robot so that none of the sensors were obstructed. Figures 4.12 and 4.13 show the real predator and prey constructions respectively.

Figure 4.12: The transferred version of the simulated predator shown in 4.8 .

The controller for the genotypes of both species was parsed and translated to the equivalent CPL code. This code was then uploaded to the predator and prey main

Chapter 4. Experiment Results

65

Figure 4.13: The transferred version of the simulated prey shown in 4.10 .

boards. The robots were now fully assembled. The complete process took less than a half hour for both agents. This speed was partially due to familiarity with the hardware components and the genotype representation but it is unlikely that assembly would ever take longer than an hour for two agents. The genotypes for morphologies under 25 bricks are very straightforward and could easily be modified into even simpler assembly instructions. No human intervention other than uploading is required for controller translation. Although not fully automatic like the Golem project [Pollack, 2001] this process can be used by anyone, even those unfamiliar with evolutionary robotics, so long as they have a grasp of basic LEGOTM assembly. The real robots were tested by comparing their behaviour with that achieved in simulation. The robots were randomly place in a walled maze 280cm by 325cm and turned on for trials which lasted one minute. The behaviour in reality was not nearly as consistent as in simulation but basic reactivness was observed. The prey did turn away from an object that broke its front IR beams but only at approximately half the distance observed in simulation. The predator had difficulty sensing anything with its front IR sensors. Only when under 20cm from an object, and only some of the time, did the predator react. This was rarely effective for catching prey. In over 10 trials, neither agent could execute its wall recovery behaviour. Most of

Chapter 4. Experiment Results

66

the time the agents simply continued pushing against the wall, unaware of the impasse. In two trials, the predator did back away but the corrective turns it made were executed for too short a time and the predator simply drove straight back into the wall. The prey was caught by the predator in two trials. In one successful case the predator did recognize the prey with its right IR sensor and turned into it. In the second case, the predator got lucky as the prey collided with it from behind. The inconsistency between the simulated and real trials can be attributed to a number of likely causes. The most obvious being the unreliability of the real IR sensors. More noise should have been added to the simulator to improve the robustness of the agents reactions. The real issued seemed to be not that the sensors were inaccurate but that their values were not updated as quickly as in simulation. If travelling in opposite directions an opponent could pass through the others beams completely before either sensor registered the reading. The controller may then execute a quick motor command to turn, for example, but because input does not continue the motor command is replaced with whatever the default motor command was. The final result is a hiccup in movement but no consistent reaction. This indicated a more serious bug in the simulator. The simulated timing was wholly incompatible with real time. As it was, the timing was always instantaneous. This was naive and evolved the agents with a dependence on that precision. When not available in reality their learned behaviours were unable to execute. The time between an event and a sensor receiving a value from that event should have been included in the base set and modified with random noise as [Jakobi, 1997] suggests for all aspects which a robot depends upon for its core behaviour. Neglecting this factor was likely the major cause for a poor transfer to reality.

Chapter 5 Discussion 5.1 Discussion and Future Work The results from this research show potential as a process for automatic agent design. The research was performed in iterations with each less successful trials illuminating some yet unaddressed factor. Even in the short time available agent designs with interesting phenotypes and moderately effective control were evolved in simulation. The evolutionary process was not without its problems but with adequate time and more significant computing resources it could be refined to evolve more sophisticated agents. Using XML as the genotype proved very successful. Although no different in content than any real valued representation, XML organized the genes in a human readable form. The natural hierarchy of LEGOTM constructions were easy to represent with the parent-child relationships of XML elements. This greatly increased efficiency when constructing the real robots. XML requires a lot of syntax for representation. Extremely intricate designs with many bricks would eventually result in extremely large genotype documents. This size, however, should be preferable to a less annotated representation that would be much harder to interpret. Crossover of genes that contained sensors was determined to be detrimental to any existing learned behaviour. XML, however, made the operation very generic and

67

Chapter 5. Discussion

68

clean to implement. With only single validation whole hierarchies of bricks could be swapped between genes, growing elaborate designs very quickly. If properly balanced with the evolution of control the method presented here could be refined into a powerful tool for artificial evolution. LEGOTM provided an excellent base set of physical components for the automatic design process. Its common component dimensions and universal connectivity allowed a large variety of morphologies to be defined with simple construction rules. Because bricks can only be connected at right angles above and below other bricks it was very easy to expand and modify any existing assembly with mutation. These straightforward rules also allowed the simple validation of designs. Morphologies were somewhat constrained by these real world rules but this balance guaranteed their buildability. Numerous other LEGO products are available. Once refined, components of different shapes and new actuators could be introduced into the process. Agents might then evolve even more intricate phenotypes with the potential for more sophisticated action. The genetic algorithm performed adequately but early generation agents with simple control were observed to lose behaviour as a result of mutation and crossover. Eliminating crossover and restricting mutation to a spatial region close to a brick’s current position did seem to eliminate the issue. Although effective, introducing duplication in addition to the other reproductive operations may have proven more effective and less artificial than the chosen constraints. This was very similar to findings in [Cliff and Miller, 1996]. Agents were chosen randomly from their populations to compete in fitness contests. [Cliff and Miller, 1996] used a last-elite opponent strategy where new agents competed against the best opponent from the last generation. A method similar to this might have been used to organize better predator-prey pairings for contests. This may have increased the evolutionary pressures and improved the efficiency of evolution.

Chapter 5. Discussion

69

In general, the problem of performing phenotype change in parallel to learning control is not an easy one and has been addressed before [Lee et al., 1996] [Pollack, 2001]. It is intuitive that any drastic change to phenotype, especially one repositioning a sensor input used for control, will result in a design with different behaviour. The problem is that fitness is scored in a single trial. If the new behaviour does not immediately score well the new design will be eliminated from the population and fail to reproduce. This does not indicate that the design was inferior, rather that control had not yet evolved to take advantage of the new design. A new method could separate new agents when their phenotype is modified. New control could then be evolved against less competitive opponents before the designs are reintroduced to the main population. A modified island genetic algorithm could be used for this purpose. Populations of newly crossed over or mutated agents could be evolved separately from the main populations until demonstrating some level of fit control. The final transfer to reality was disappointing. In retrospect the failure occurred for a number of reasons. The real sensors were far less reliable than their simulated counterparts even given the introduction of noise to address this. Sensor reactivness should also have been added to the simulator’s base-set along with the sensor return values. The timing in general for the simulator was in error. It was likely this factor which was the major cause for the unclean transfer. Future work will focus on introducing this as a base-set aspect and modifying it with the appropriate noise. Collisions involving the LEGOTM assemblies were modelled inaccurately. Real constructions were far more fragile than their simulated counterparts. This was especially true for the predator whose phenotype extended more than 20cm in front. Instead of a more precise physical model which attempts to accurately model the LEGO TM brick connections under three dimensional forces a much simpler function which broke assemblies after any moderate contact should have been employed. Otherwise the constructions should have been secured in reality so that phenotype strength was not a factor. This second option would have been useful in focusing at least early evolutionary

Chapter 5. Discussion

70

progress on sensor placement and control. Agent designs should have been transferred to reality early on. Observations would then have illustrated the errors in the current process, allowing for their correction in the next iteration. The simulator could then be parameterized incrementally, introducing only those aspects which were base-set aspect observed in real trials. A 3D physics simulator with the precision of ODE was unnecessary and antagonistic to the idea of minimal simulation [Jakobi, 1997]. The level of detail needed for representation required many assumptions and estimates to be made about physical properties and their interaction with the environment. In retrospect, a simpler 2D simulator should have been used. The brick assemblies were three dimensional with bricks able to connect and occupy different regions of space in the Z plane. However, for collisions a top down 2D representation that worked with simple X and Y dimensions would have sufficed. Evolution could have progress much faster as fitness trials would have completed in seconds rather than the minute or more ODE required. With more efficient evolution physical transfers would have been performed earlier. New base-set and implementation aspects would then have been added as they were made evident by the real robots. This way the simulator would represent only those parameters important to learning behaviour rather than the many real physical requirements of ODE. The simulator would also have been more efficient and easy to maintain than the 5000+ lines of Java code that was used.

Appendix A XML Genotype of the Predator

71

Appendix A. XML Genotype of the Predator



72

Appendix A. XML Genotype of the Predator



73

Appendix A. XML Genotype of the Predator



74

Appendix B Automatically Generated CPL Controller Code ; Auto translated controller.

(team Noah (config {stdcpl.cnf} (fact (int (MUL 100))) ) (cp main() (env (local (int ; nodes ; input - sensors (ir_1 0) (ir_2 0) (ir_3 0) (ir_4 0) (brush_1 0) (brush_2 0) (brush_3 0) (brush_4 0) ; hidden (H0 0) (H1 0) (H2 0) (H3 0) ; output (rightOutput 0) (leftOutput 0) ; container vars for calc (H0_tmp 0) (H1_tmp 0) (H2_tmp 0) (H3_tmp 0) ; weight values for links to hidden node 0 (H0_ir_1 -22) (H0_ir_2 248) (H0_ir_3 -156) (H0_ir_4 -134) (H0_brush_1 -157) (H0_brush_2 -191) (H0_brush_3 81) (H0_brush_4 -164) ; weight values for links to hidden node 1 (H1_ir_1 77) (H1_ir_2 -238) (H1_ir_3 -236) (H1_ir_4 -30) (H1_brush_1 -248) (H1_brush_2 32) (H1_brush_3 -118) (H1_brush_4 -131) ; weight values for links to hidden node 2 (H2_ir_1 171) (H2_ir_2 163) (H2_ir_3 -87) (H2_ir_4 -239) (H2_brush_1 -149) (H2_brush_2 -47) (H2_brush_3 204) (H2_brush_4 91) ; weight values for links to hidden node 3 (H3_ir_1 -14) (H3_ir_2 4) (H3_ir_3 -105) (H3_ir_4 243) (H3_brush_1 -193) (H3_brush_2 11) (H3_brush_3 -168) (H3_brush_4 11)

75

Appendix B. Automatically Generated CPL Controller Code

; weight values for links to output nodes (left_H0 -57) (left_H1 36) (left_H2 -36) (left_H3 7) (right_H0 -19) (right_H1 -52) (right_H2 -9) (right_H3 62) ; threshold values for hidden nodes (H0_thresh 28) (H1_thresh -39) (H2_thresh -82) (H3_thresh -97) ; threshold values for output nodes ; right motor (right_upperThresh 30) (right_lowerThresh -14) ; left motor (left_upperThresh -28) (left_lowerThresh -94) ) ) ) ; turn all lights off (send light1 off) (send light2 off) (send light3 off) (send light4 off) (send light5 off) (send light6 off) (send light7 off) (send light8 off) (while true ; init calc vars (setf H0_tmp 0) (setf H1_tmp 0) (setf H2_tmp 0) (setf H3_tmp 0) ; get current IR values and scale (setf ir_1 (* (receive infra1) MUL)) (setf ir_2 (* (receive infra2) MUL)) (setf ir_3 (* (receive infra3) MUL)) (setf ir_4 (* (receive infra4) MUL)) ; get current brush values (setf brush_1 0) (setf brush_2 0) (setf brush_3 0) (setf brush_4 0) (cond ((and (receive switch1) true) (setf brush_1 100)) ) (cond ((and (receive switch2) true) (setf brush_2 100)) ) (cond ((and (receive switch3) true) (setf brush_3 100)) ) (cond

76

Appendix B. Automatically Generated CPL Controller Code

((and (receive switch4) true) (setf brush_4 100)) ) ; normalize (setf ir_1 (/ ir_1 255)) (setf ir_2 (/ ir_2 255)) (setf ir_3 (/ ir_3 255)) (setf ir_4 (/ ir_4 255)) ; calc each hidden neuron value ; H0 (setf H0_tmp (+ (* ir_1 H0_ir_1) (* ir_2 H0_ir_2) (* ir_3 H0_ir_3) (* ir_4 H0_ir_4) )) (setf H0_tmp (+ H0_tmp (* brush_1 H0_brush_1) (* brush_2 H0_brush_2) (* brush_3 H0_brush_3) (* brush_4 H0_brush_4) )) (setf H0_tmp (/ H0_tmp MUL)) ; H1 (setf H1_tmp (+ (* ir_1 H1_ir_1) (* ir_2 H1_ir_2) (* ir_3 H1_ir_3) (* ir_4 H1_ir_4) )) (setf H1_tmp (+ H1_tmp (* brush_1 H1_brush_1) (* brush_2 H1_brush_2) (* brush_3 H1_brush_3) (* brush_4 H1_brush_4) )) (setf H1_tmp (/ H1_tmp MUL)) ; H2 (setf H2_tmp (+ (* ir_1 H2_ir_1) (* ir_2 H2_ir_2) (* ir_3 H2_ir_3) (* ir_4 H2_ir_4) )) (setf H2_tmp (+ H2_tmp (* brush_1 H2_brush_1) (* brush_2 H2_brush_2) (* brush_3 H2_brush_3) (* brush_4 H2_brush_4) )) (setf H2_tmp (/ H2_tmp MUL)) ; H3 (setf H3_tmp (+ (* ir_1 H3_ir_1) (* ir_2 H3_ir_2) (* ir_3 H3_ir_3) (* ir_4 H3_ir_4) )) (setf H3_tmp (+ H3_tmp (* brush_1 H3_brush_1) (* brush_2 H3_brush_2) (* brush_3 H3_brush_3) (* brush_4 H3_brush_4) )) (setf H3_tmp (/ H3_tmp MUL)) ; set hidden node values based on threshold (setf H0 1) (cond ((< H0_tmp H0_thresh) (setf H0 0) ) ) (setf H1 1) (cond ((< H1_tmp H1_thresh) (setf H1 0) ) ) (setf H2 1) (cond ((< H2_tmp H2_thresh) (setf H2 0) ) ) (setf H3 1) (cond ((< H3_tmp H3_thresh) (setf H3 0) ) ) ; calc output node value - results in values between sum of values in the range of (-250 and 250) (setf rightOutput ( + (* H0 right_H0) (* H1 right_H1) (* H2 right_H2) (* H3 right_H3) )) (setf leftOutput ( + (* H0 left_H0) (* H1 left_H1) (* H2 left_H2) (* H3 left_H3) )) ; determine motor command based on output node values and thresholds ; right motor (MOTOR 1) (cond

77

Appendix B. Automatically Generated CPL Controller Code

((< rightOutput right_lowerThresh) (send motor1 slowreverse) ) ((> rightOutput right_upperThresh) (send motor1 fastforward) ) ( true (send motor1 stop) ) ) ; left motor (MOTOR 2) (cond ((< leftOutput left_lowerThresh) (send motor2 slowreverse) ) ((> leftOutput left_upperThresh) (send motor2 fastforward) ) ( true (send motor2 stop) ) ) ; sleep for a sec (send main 100) ) ) )

78

Bibliography [Alberson and deSessa, 1982] Alberson, H. and deSessa, A. (1982). Turtle Geometry. MIT Press. [Beer, 1995] Beer, R. (1995). On the dynamics of small continuous-time recurrent neural networks. In Adaptive Behaviour, pages 471–511. [Bongard and Paul, 2000] Bongard, J. and Paul, C. (2000). Investigating morphological symmetry and locomotive efficiency using virtual embodied evolution. In et al., J.-A. M., editor, From Animals to Animats: The Sixth International Conference on the Simulation of Adaptive Behaviour. [Braitenberg, 1984] Braitenberg, V. (1984). Vehicles: Experiments in Synthetic Psychology. MIT Press Bradford Books, Cambridge, MA. [Cliff and Miller, 1996] Cliff, D. and Miller, G. F. (1996). Co-evolution of pursuit and evasion II: Simulation methods and results. In Maes, P., Mataric, M. J., Meyer, J.-A., Pollack, J. B., and Wilson, S. W., editors, From animals to animats 4, pages 506–515, Cambridge, MA. MIT Press. [de Garis, 1990] de Garis, H. (1990). Genetic programming: building artificial nervous systems using genetically programmed neural network modules. In Porter, B. W. and Mooney, R. J., editors, Machine Learning: Proceedings of the Seventh International Conference, pages 132–139, Austin, TX. Morgan Kaufmann, Palo Alto, CA. [Goldberg, 1989] Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Professional. 79

Bibliography

80

[Hornby et al., 2001] Hornby, G., Lipson, H., and Pollack, J. (2001). Evolution of generative design systems for modular physical robots. [Hornby and Pollack, 2001] Hornby, G. S. and Pollack, J. B. (2001). Body-brain coevolution using L-systems as a generative encoding. In Spector, L., Goodman, E. D., Wu, A., Langdon, W. B., Voigt, H.-M., Gen, M., Sen, S., Dorigo, M., Pezeshk, S., Garzon, M. H., and Burke, E., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 868–875, San Francisco, California, USA. Morgan Kaufmann. [Jakobi, 1997] Jakobi, N. (1997). Evolutionary robotics and the radical envelope-ofnoise hypothesis. Adapt. Behav., 6(2):325–368. [Jakobi et al., 1995] Jakobi, N., Husbands, P., and Harvey, I. (1995). Noise and the reality gap: The use of simulation in evolutionary robotics. Lecture Notes in Computer Science, 929:704–?? [Langton, 1989] Langton, C. G. (1989). Artificial Life: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems. AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA. [Lee et al., 1996] Lee, W.-P., Hallam, J., and Lund, H. H. (1996). A hybrid GP/GA approach for co-evolving controllers and robot bodies to achieve fitness-specified tasks. In Proceedings of the 1996 IEEE International Conference on Evolutionary Computation, Nagoya, Japan. IEEE Press. [Lindenmayer, 1968] Lindenmayer, A. (1968). Mathematical models for cellular interaction in development, parts i and ii. Journal of Theoretical Biology, 300:280– 299. [Mitchell, 1998] Mitchell, M. (1998). An Introduction to Genetic Algorithms (Complex Adaptive Systems). The MIT Press, Cambridge, MA. [Pollack, 2001] Pollack, J. (2001). Three generations of automatically designed.

Bibliography

81

[Sims, 1994] Sims, K. (1994). Evolving virtual creatures. In Computer Graphics, Annual Conference Series (Siggraph ’94 Proceedings), pages 15–22.