Cellz: A simple dynamical game for testing ... - Semantic Scholar

22 downloads 86 Views 201KB Size Report
using a simple random hill-climber operating in a cut-down. DFA space [18] ... is built on top. 1Posting to Genetic Programming email list, and personal communication. ..... 39–44. [Online]. Available: citeseer.nj.nec.com/brave96evolving.html.
Cellz: A simple dynamical game for testing evolutionary algorithms Simon M. Lucas Department of Computer Science University of Essex Colchester, Essex CO4 3SQ [email protected] Abstract— The game of Cellz has been designed as a test bed for evolutionary algorithms. The game has a minimal set of rules that nonetheless offer the possibility for complex behaviour to emerge. Computationally, the game is cheap to simulate, which leads to rapid runs of evolutionary algorithms. A key feature of the game is the cell division process, which can lead to evolution in situ without reference to any externally defined fitness function. This paper describes the rationale behind the development of Cellz, the rules of the game and the software interfaces for the cell controllers. The randomness in the game initialisation leads to extremely noisy fitness functions, which adds to the challenge of evolving high-performance controllers. Initial results demonstrate that an evolved perceptron-type controller can achieve mediocre performance on the single-species game.

I. I NTRODUCTION When solving a problem with an evolutionary algorithm, arguably the central question is which space to perform the evolutionary search in. For example, standard genetic programming searches the space of programs specified as expression trees. Another common space to choose is that of feed-forward neural networks, since these are known to be capable of representing any functional mapping. Therefore, one approach to evolutionary computing is to choose a sufficiently general solution space, and search in this space for solutions to any given problem. On the other hand, intuition suggests that choosing an overly-general space will impede progress towards an acceptable solution, since the size of the space will be much larger than necessary (though note that the evolvability [9], [2] of solutions to a particular problem in a particular space can be more important than the size of the space. Evidence can be found for this in many problem domains, one example being the learning of deterministic finite automata (DFA). Some of the best results for this have been obtained using a simple random hill-climber operating in a cut-down DFA space [18], which gives significantly better performance than more complex genetic programming [4] or cellular encoding [6] approaches to the problem. The advantage of choosing a problem-specific space to search in is that the learning algorithm only has to learn to solve the problem instance, it does not have to learn to represent solutions to the problem in general. When trying to evolve a competitive game player, however, the appropriate space in which to search is usually far from ap-

parent. An overly general space can make the search problem too hard, while an overly simple space runs the risk of being unable to represent good solutions. Cellz provides a simple dynamic environment in which controllers based on different representations and search spaces can compete against each other. A well designed game should have simple rules yet be difficult to master. Cellz has been designed with this in mind. The aim of the game is for a species of cell to eat food, multiply, and collectively attain as much mass as possible within a specified time limit. Cellz is based on a simple continuous 2D physics model simulated in discrete time. The adoption of a continuous state space means that conventional discrete space search procedures such as minimax cannot directly be applied. Furthermore, the way the cells smoothly accelerate and move in chaotic patterns makes the game interesting to observe. The game can involve any number of species, though this paper focusses mostly on the simpler single species game. A. Related Games Cellz has been inspired by many previous games and alife simulations. Braitenberg’s vehicles [5] demonstrated how interesting and apparently intentional behaviour could arise from very simple circuits when placed in certain environments. Conway’s game of life [14] demonstrated how complex patterns of activity could arise in a grid of very simple cellular automata. Recent work related to this has been done by Miller [20] on evolving automata rules that would generate some desired simple pattern, such as a French flag. In cellular automata work, however, while the rule-set may be evolved, the automata themselves are connected in a fixed topology. The cells in Cellz on the other hand, are free to move. It would be an interesting challenge to get the cells in the game of Cellz to evolve movement rules with the aim of forming particular patterns, but this is not explored in this paper. The simulated version of RoboCup [3] soccer provides an environment in which controllers can be evolved [19]. Russell Abbot1 has developed also developed a version of football called billiard soccer. This version is based on a simple physics model like Cellz, and this also provides an interesting test-bed for evolving controllers. Abbott’s billiard soccer is built on top 1 Posting

to Genetic Programming email list, and personal communication.

of the MASON2 event simulator. General purpose simulation engines such as MASON and Breve3 represent powerful highlevel platforms for researchers to build their simulations on, but Cellz is sufficiently simple to implement directly in a highlevel language such as Java. The direct implementation in this case also allows for easier deployment in web pages as a Java Applet, and for simpler downloads. Stanley and Miikulainen [24] used a deterministic robot duel game to study the evolution of neural network robot controllers through gradually evolving them from simple initial networks through to more complex ones, a process they call complexification. In their game, the robots start in the same respective positions and move to find food (always in the same fixed positions). When the two robots collide, the larger robot eats the smaller robot and wins the game. The fixed nature of the game’s initial setup, however, means that the robot controllers are evolving to best exploit that set of circumstances rather than evolving general survival skills. Sodarace4 is a challenge developed by Sodaplay5 in conjunction with researchers at Queen Mary College (University of London). In Sodarace the aim is to develop a springmass-muscle model to race along some smooth or bumpy terrain, depending on the particular race. The majority of entries until very recently have been models designed by humans using the Sodaplay interactive design Applet. The AI community has recently taken up the challenge, however, and evolved controllers are now becoming competitive with human designed models. Sodarace is a challenging arena involving the evolution of complex graph structures with highly nonlinear interactions. Even the design of passive spring-mass structures pose interesting problems for evolution [16]. A current limitation on Sodaplay models, however, is that they do not have sensors, and the muscles in the model are activated by fixed frequency sine waves (each muscle chooses only its phase and amplitude). Sims [23] showed how the morphology and control systems of 3D creatures could be evolved. More recently, the Framsticks environment6 allows simulated stick-model creatures to be controller by sensor driven neural networks of arbitrary complexity. These 3D stick-based models are much more costly to simulate than the simple 2D objects used in the game of Cellz, however. The power of evolutionary methods can be severely restricted when the fitness function is expensive to compute. Hence, Cellz offers an advantage in this respect. Sugarscape [12] is a grid-based system inhabited by agents who follow the rule of moving to the neighbouring grid-square with the most sugar in it and consuming the sugar. Sugarscape has been used to show how interesting social phenomena arise from simple low-level rules. Cellz adopts many ideas from the above-mentioned games and simulations, but offers a particular blend of game rules 2 http://cs.gmu.edu/

eclab/projects/mason/

3 http://www.spiderland.org/breve/ 4 http://sodarace.net 5 http://sodarace.com 6 http://www.frams.alife.pl/

together with a simple physics model that provide interesting dynamical game-play. The fact that reproduction is built in to the game offers the possibility of evolution in situ, though this has not yet been exploited. One noteworthy feature of Cellz is the way the game is being deployed on the Web, such that players wishing to design controllers can upload the controller specification to a web site and have the controller automatically evaluated and placed in a league table. II. T HE GAME OF C ELLZ The game of Cellz involves controlling a set of single-celled agents in a 2D plane with the aim of attaining as much cellmass as possible within a specified time. A cell grows by eating food and cells of other species. When a cell becomes large enough it splits into two cells. Food particles are always placed randomly within a defined rectangle, but cells are free to wander the entire plane. Each game of Cellz begins with a number of randomly placed pieces of food in the play area, and at least one randomly placed cell. The food once placed does not move, and is eaten by the first cell to touch it. Whenever a piece of food is eaten, its mass is added to the cell that ate it and another piece of food is added to a random position in the play area. All pieces of food have the same fixed mass. Each cell has an initial mass, which corresponds to an energy supply. The mass of a cell dissipates as it uses energy to move. Cells die when their energy level drops below a specified minimum. Finally, in the current game, when cells of the same species touch no interaction takes place, they simply pass through each other. Future versions of Cellz could allow recombination to take place between cells of the same species. When a cell becomes large enough, it splits into two cells. The only decision a cell controller makes at each point in time is the force vector f to output. This is used to determine the acceleration a : a = f /m, where m is the cell mass. The differential equations are integrated with the simple Euler method: st+1 = st + vt ∆t

(1)

vt+1 = vt + at ∆t

(2)

We always use ∆t = 1, so this factor can be ignored. The fact that the controller chooses a force rather than a velocity leads to complex behaviour even for simple greedy controllers that try naively to head for the nearest piece of food. A. Single Species Strategy The single-species game is the simplest version of Cellz, but the design of a good controller even for this game is still nontrivial. If there were only a single cell, then the problem would be similar to the travelling salesman problem, except for the fact that a new randomly positioned food particle is added each time a particle is eaten, and that the cells have momentum, and therefore cannot change direction instantaneously. As cells

divide, however, the system becomes more complex. Cells should now aim to not only efficiently find food, but also avoid going for the same food as another cell, since this would be an inefficient duplication of effort. The Greedy Control introduced below suffers from this weakness, where many cells go for the same piece of food, but the Sensor Control attempts to overcome this problem. B. Multiple Species Strategy When two cells of different species collide, the larger cell eats the smaller cell and consumes all of its energy, which may cause the larger cell to subsequently divide. Having multiple species of cell play the game adds a new level of complexity. A cell is at its most vulnerable i.e. at its smallest just after it has divided. Therefore, some interesting ploys become possible, whereby a cell of one species might hover around a food particle waiting for a cell of a different species to eat it. If the cell of the other species is large enough to divide, then, just after division the two offspring cells can be immediately eaten by the cell that was cunningly hovering. C. The Cellz Setup Parameters There are a number of parameters that control the simulation. The values used for these are shown in Table I. The parameters have been given S.I. units where appropriate, to aid understanding of their purpose. Appropriate transforms are assumed where necessary when running the game, e.g. the simulation runs at a map-scale of one metre per pixel. The effects of these parameter settings are described next. a) Frictional loss factor: µ: this is used loosely a particular type of friction coefficient. The effect of the loss factor is to reduce the velocity at each time step such that the final value of the velocity v is smaller than the calculated velocity v 0 , and is given by v := v 0 (1 − µ). b) Energy loss factor: ν: this specifies the efficiency with which cell mass is converted to energy to drive the cell. High values of µ mean that cells must conserve their energy very carefully or risk expiring before reaching their next piece of food. The loss factor acts directly on the mass of the cell such that mt+1 = mt (1 − ν|ft |). c) Max velocity: This is a hard limit placed on the the magnitude of the velocity vector at each time step. This is chosen to be large enough to make significant events in the simulation happen in a relatively short number of time steps, and for the simulation to be fun to watch. Making it too large would complicate the collision detection routines. Providing it is kept to the radius of a cell or smaller, then collision detection may be implemented by checking to see if the distance between two particles is less than the sum of their radii. d) Max force: This limits the force that can be applied by a controller at each point in time. This is not strictly necessary, since the velocity is hard-limited, but it prevents poorly designed controllers from using all their energy in a single time step.

e) nFood: The number of food particles in the game. This is the number placed randomly in the play area at the start, but is constant throughout the game since a food particle is added every time one is consumed. f) nCells: This is the number of cells added to the play area at the start of the game. The number of cells in the game varies, as cells divide, get eaten by other cells, and die through exhausting their energy supply. g) foodMass: This is the mass of each piece of food. h) initCellMass: The mass that each cell has in the initial random game configuration. The cell mass decreases as energy is used, and increases as food or other cells are consumed. i) splitMass: A cell splits when it’s mass is greater than or equal to splitMass. TABLE I T HE C ELLZ SETUP PARAMETERS USED FOR THIS PAPER . Parameter Frictional Loss Factor µ Energy Loss Factor ν Max Velocity Max Force nFood nCells foodMass initCellMass splitMass

Value 0.03 0.003 15 ms−1 10N 10 1,5 16kg 50kg 40kg

Early simulations of the game were run with nCells = 1, but this caused an interesting effect when trying to evolve controllers. Starting with a single cell introduces a ’hole’ in the fitness landscape. Controllers with no guidance typically just waste their energy but may occasionally get lucky and eat some food. However, controllers that have some ability to aim for food will eventually eat enough food to split, at which point the cell sensor array goes non-zero for the first time in the evolutionary history of that controller, which may result in a sudden deterioration in performance, the result being that more successful food seekers could finish with a lower score than less successful food seekers. This effect would be interesting to investigate further in its own right. Having observed this effect, the number of cells in the initial game was then set to 5, which solved the problem. More successful evolutionary runs were obtained by starting the game with both multiple cells and multiple food pieces in the play area. It was also noted that using multiple cells from the start could reduce the number of simulation cycles needed to reliably detect a difference between two controllers. Figure 1 shows the trace of a cell for approximately 1,000 time steps. This is given the simple hand-designed controller from Figure 3. The trace of a cell is shown by the curves, while the small particles represent the food. Note that no effort was made to design an optimal controller - the idea here was to provide a simple design that did something useful. This controller implements the simple strategy of always heading straight for the nearest cell. Without friction in the system, this policy could cause the cell to orbit the nearest piece of food forever.

public Vector2d getForce( CellModel me, Collection particles) { // find closest particle, head for it Particle closest = closestFood( me , particles ); force.set( closest.s ); force.subtract( me.s ); force.setMag( maxForce ); return force; } Fig. 3. A simple greedy implementation of a controller conforming to the complete information interface.

Fig. 1. steps.

The trace of a game from the start for approximately 1,000 time

B. A Hand-Designed Greedy Control

public interface CellControl { public Vector2d getForce( CellModel me, Collection particles); } Fig. 2.

The interface for the complete information controller.

III. S PECIFYING A C ELL C ONTROLLER The inputs and outputs to the controller are defined by the simple software interfaces given below. Here we specify them using Java, but they can readily be translated into other languages. At each time step, the cell is presented with information about the state of game. It must then process this and calculate its force vector to output for that time step. We designed two interfaces to the controller: a complete information interface, and an ’ego-centric’ sensor-based interface (similar to that used in [8]). Note that giving the controller full information does not necessarily simplify the design of the control system. It might make the controller design simpler if we could extract the most relevant information from the complete set. In a similar way, when solving a pattern recognition problem, it is usual to perform some kind of feature extraction before presenting the data to a pattern classifier. These interfaces are now described in more detail, and we also provide a simple hand-built implementation of each one.

An implementation of the sole method of the complete information interface is shown in Figure 3. This uses a helper method called closestFood which iterates over the Collection of particles passed as an argument, and returns the closest food particle to the current cell. The instance variable force is of class Vector2d, which is a general Vector class developed by the author for 2d games. This variable is instantiated in the constructor to prevent allocating memory each time the getForce method is called. The vector arithmetic methods of this class are used to calculate the heading vector to the nearest food particle. Note that each particle in the simulation has an instance variable s, also of class Vector2d, which specifies its current location. A marginally more sophisticated controller of this type would also take into account the current cell velocity to avoid the tendency to orbit food particles, but this was not implemented. The fact that solutions of this type can be specified so concisely once suitable helper methods and helper classes have been developed suggests the use of Object Oriented Genetic Programming (OOGP) [17] [22] [1] [7] as a good space in which to evolve controllers. This is interesting in two ways. Firstly, given that one or more helper classes already exist, OOGP can utilise reflection to automatically discover how to use their methods [17]. Also, given that classes such as Vector2d are so useful in designing controllers for this kind of game, it would be a challenge to develop an environment under which such classes can evolve.

A. Complete Information Interface

C. Sensor-Based Interface

The complete information interface is shown in Figure 2. The signature for the getForce() method specifies two parameters: the identify of the cell whose method is being called, and a Collection of all the other Particles in the game. For this type of controller, it is necessary to provide the identity of the current cell, since the same controller is used for all cells of this species in the game. The CellModel class extends the Particle class, which holds the current position and velocity. Hence, a controller that implements this interface has the complete set of information that fully describes the current state of the system.

The sensor-based interface specifies the inputs to the cell as seen by a set of wrap-around sensors on the cell perimeter, as depicted in Figure 4. In general there could be a sensor array for each particle type. In the current game we have only two types of particle: food, and a single species of cell, so we have two sensor arrays, as shown in Figure 5. The number of sensors in each array is one of the set-up parameters of the system, and is currently set to eight. The procedure for setting up the sensor array for a cell is as follows: for each particle in the game, quantise its direction from the current cell by taking the angle to the particle,

as possible. Note that random mutation hill-climbers can be very competitive with more complex evolutionary algorithms [21] [15], especially in discrete search spaces. In continuous spaces the adaptation of the step size becomes important, and better results could probably be obtained with an adaptive evolutionary strategy. A. Fitness Function Design Fig. 4.

The wrap-around sensor arrays.

public interface SensorLogic { public Vector2d getForce( double[] food, double[] cell); } Fig. 5.

The interface for the Sensor Controller.

and increment the value of that sensor in proportion to the reciprocal of the squared Euclidean distance to that particle. D. Sensor-based implementation The code for this implementation is a listed in Figure 6. The basic idea is simple: look in each of the directions associated with the elements of the sensor arrays. If the activity of the food sensor exceeds that of the cell sensor, and of the maximum food level seen so far, then record that food level as the maximum food level so far, and set the index maxEl to that direction index. The force vector associated with the chosen direction index is then set up with a call to the helper method directionVec. IV. E VOLVING A S ENSOR -BASED C ONTROL N ETWORK In order to provide a demonstration of how a simple controller could be evolved for the single species game of Cellz, some simple experiments were conducted to evolve a single layer perceptron controller. For the evolutionary algorithm we used a random mutation hill climber, to keep things as simple public Vector2d getForce( double[] food, double[] cell) { int maxEl = -1; double max = -1; for (int i=0; i cell[i] && food[i] > max) { maxEl = i; max = food[i]; } } // set vector associated with maxEl force.set( directionVec( maxEl ); return force; } Fig. 6.

A hand-designed implementation of the Sensor Controller.

The design of the fitness function plays a vital role in the performance of an evolutionary algorithm. The simplest choice of fitness function would be to directly use the score obtained by a controller on a single run of the game. Since every game starts with a random configuration of food and cells, however, there is a significant variation in the score attained by a given controller on different runs of the game. This would therefore lead to a very noisy fitness function. Discussion of noise handling within evolutionary algorithms may be found in [10] and [13]. The work reported in [11] on co-evolution in noise would be of particular relevance to the multi-species version of Cellz. Here the problem was addressed in two ways. First, the fitness function was calculated as the mean of n randomly configured games. This reduces the noise in the evaluation. If the statistics of individual game scores for a given controller have a standard deviation of σ, then the standard error √ (standard deviation of the mean) is given by σ/ n. Hence, we reduce the noise of the fitness evaluation in proportion to the square root of the number of trials. We chose n = 10 as a compromise between noise reduction, and wasting trials evaluating the same controller (which limits the time that can be devoted to exploring new areas of the search space). More sophisticated techniques could be used for dealing with noise, but this sufficed for the current purposes. The second technique adopted was to evaluate both the current controller and its mutated version at each iteration of the hillclimber. Otherwise, the search process could get stuck with a solution that just happened to have a lucky set of n runs. The other important aspect of the fitness function is the number of time-steps to run the simulation for before evaluating the score. It was observed that in general, the longer the simulation was run, the clearer the distinction became between the performance of different controllers. The graph in Figure 7 illustrates this by plotting the score versus the time-step for the greedy and sensor controllers. For the first few hundred time-steps, the difference in performance is not significant, but after one thousand time steps the difference is very significant, passing a t-test with greater than 99% confidence. Based on this result, we set the number of time-steps to 1000 for fitness function evaluation. This is not a rigorous analysis, and it may be that more time-steps would be better at resolving smaller differences in controller performance. However, the more time steps that are used in the fitness function, the fewer fitness function evaluations can be made, and this figure was chosen as a compromise.

2500

runs typically has a mean fitness of around 1,200, about the same as the greedy controller (standard error approx. 50). The superior sensor controller has a mean fitness of around 2,000, which is significantly better than the other controllers.

Greedy Sensor

2000

1500 score

684

1000

574

500

464

0 0

400

800

1200

1600

score

t

354

245

Fig. 7. The score versus time for the Greedy and Sensor controllers respectively. Error bars at one standard error from the mean.

135 0

200

400

B. Evolving a Perceptron Based Controller The linear perceptron with i = 16 inputs (8 directions for food, and 8 directions for cell) and o = 2 outputs (x and y components of force vector) is specified by an i × o weight matrix. The weight matrix was initialised with random numbers drawn from a Gaussian distribution with zero mean and unit variance. At each iteration of the hill-climber a mutated individual was produced by adding Gaussian noise with zero mean and standard deviation 0.1 to every element of a copy of the current individual’s weight matrix. Using a linear perceptron in this way can result in undesirable characteristics in the magnitude of its force vectors. These could be corrected to a large extent by using tanh nonlinearities at the output stage. Here an alternative method was adopted of always setting the magnitude of the force vector to the maximum allowed - hence the evolved controller is only choosing the direction of the force. This strategy was also used for both the hand-designed controllers, on the basis that if you know where you’re going, you might as well go there quickly (though note that this is a rather wasteful strategy and is unlikely to be optimal). C. Runs Per Fitness Evaluation Following on from the discussion above, a comparison was made between using 1 run per fitness evaluation (one-shot), and 10 runs per fitness evaluation (ten-shot). Each iteration of the random hill-climber used two fitness function evaluations (for current individual and the mutated one). The number of game runs was kept constant at 2,000 for each experiment in this comparison. The results for one-shot and ten-shot are shown in Figures 8 and 9 respectively. These experiments were repeated several times, and indicated a clear advantage in using the ten-shot function. The single-shot method is too noisy for the hill-climber to optimise effectively. The ten-shot algorithm was then run for 1,000 iterations (20,000 game runs) and the best evolved perceptron from that run was saved. A typical 1,000 iteration run is shown in Figure 10. Note that the deterioration in fitness observed in this graph towards the end of the run is explained by the noisy nature of the fitness function. The best individual found in such

600

800

1000

iterations

Fig. 8. A run of the random mutation hill-climber for 1,000 single-shot iterations.

925 mean max

791

657 score 522

388

254 0

20

40

60

80

100

iterations

Fig. 9. A run of the random mutation hill-climber for 100 ten-shot iterations (hence the same number of game runs as the previous graph).

D. Analysis of Evolved and Hand-Designed Controllers All the figures in this section trace the movement of a set of cells over 500 time steps from an initial random game configuration. Each figure caption indicates the score attained for that 500 step run. This number of time steps is sufficient to give a good idea of the typical behaviour of a controller, without the trace becoming too cluttered. Note that the cells do not leave any trace in the game model of where they have been, these images are simply generated by turning off the cell erasing that would normally form part of the animation. Figure 11 shows the behaviour of a particular randomly generated controller. This illustrates how the behaviour can be highly dependent on the surroundings of a cell. The two cells bottom left and bottom right move in a diminishing spiral, but not towards food. The three other cells move upwards off the play area and do not return, though all five cells are controlled by exactly the same weight matrix. This controller scored 190 on this particular run. Figure 12 also shows a randomly generated perceptron controller. This one has a

2006 mean max

1647

1289 score 930

571

213 0

200

600

400

800

1000

iterations

Fig. 10. A run of the random mutation hill-climber for 1000 ten-shot fitness evaluations.

Fig. 12.

A 500 step run of a random perceptron controller (score of 253).

different behaviour, which mostly drives cells to disappear off the top of the screen, but sometimes a cell gets lucky and eats some food. The cell moving up the middle of the play area appears to studiously avoid food, however.

Fig. 13.

Fig. 11.

A 500 step run of a random perceptron controller (score of 190).

Figure 13 shows typical behaviour of the greedy controller. Although it heads straight for food, it has two problems. The first problem is that the force vector always points straight toward the food, which can make it an inefficient way of getting there, since without friction it could orbit the nearest food particle indefinitely. With friction it spirals into it. The second problem is that many cells frequently head for the same piece of food. This can be observed in Figure 13, where the trace of several cells frequently follow similar paths. The most successful controller yet developed for this singlespecies version of Cellz is the sensor controller, whose behaviour can be seen in Figure 14. The strategy behind this controller is to head for the nearest food, providing that there is no other cell in between itself and the food. This gives it a significant advantage over the greedy controller, a fact borne out by the statistics in Figure 7. In practice, the observed behaviour is roughly as expected, with much less flocking than the greedy controller. The behaviour of the sensor controller at later stages of the game (e.g. t > 2000) can sometimes become quite efficient, with many near-stationery cells spread

A 500 step run of the greedy controller (score of 560).

out over the play area. As each new piece of food appears, the closest cells from each direction dart towards it, while more distant cells move in small chaotic cycles. Figure 15 shows the behaviour of a typical evolved perceptron controller. This particular one had a ten-shot fitness of 1280. The fitness of this is significantly worse than the sensor controller, but not significantly different from the greedy controller. The way in which the evolved and greedy controllers achieve their scores is a rather different, however, with the evolved one exhibiting more complex behaviour patterns, with less chance of redundant flocking, but unfortunately, also less effective food seeking. V. D ISCUSSION AND C ONCLUSIONS This paper introduced the game of Cellz, a minimalist game with interesting dynamics to be used as a test-bed for studying many different aspects of simulated evolution, such as evolution and co-evolution in the presence of noise, and emergent cooperative and competitive behaviour. Initial experiments were made to demonstrate the viability of evolving controllers for Cellz. So far, the evolved controllers cannot compete with the better hand-designed ones, but this is probably because the evolved model was limited to being a single-layer perceptron. Single-species Cellz is currently being run as a competition

R EFERENCES

Fig. 14. 847).

A 500 step run of the hand-designed sensor controller (score of

Fig. 15.

A 500 step run of an evolved perceptron controller (score of 701).

for GECCO 2004, and has already a good deal of interest, although the exact specifications for the contest have yet to be finalised. The mode of entry for this contest is that researchers download a Cellz developer kit which allows them to evaluate their own controllers. They then upload their controller (specified as a directed function graph) to the competition web site, where it is evaluated and entered into a league table. An important aspect of Cellz for future research is that the cell division process allows evolution to happen directly within the game simulation, and without the need for an externally defined fitness function. This idea of performing evolution in situ has been investigated previously in a-life simulations [25] [27], and even on physical robots [26] with the concept of Embodied Evolution. The multi-species version of Cellz provides a simple but interesting game in which to explore such ideas. ACKNOWLEDGMENT I thank members of the Natural and Evolutionary Computation group at the University of Essex, the anonymous reviewers, and Gary Fogel for helpful comments on this work.

[1] R. Abbott, “Object-oriented genetic programming: An initial implementation,” in International Conference on Machine Learning: Models, Technologies and Applications, 2003, pp. 24 – 27. [2] L. Altenberg, “The evolution of evolvability in genetic programming,” in Advances in genetic programming, K. Kinnear, Ed. Cambridge, Mass.: MIT Press, (1994). [3] M. Asada, M. Veloso, M. Tambe, I. Noda, H. Kitano, and G. K. Kraetzschmar, “Overview of RoboCup-98,” Lecture Notes in Computer Science, vol. 1604, pp. 1–??, 1999. [Online]. Available: citeseer.nj.nec.com/asada00overview.html [4] F. E. P. B. D. Dunay and W. P. Buckles, “Regular language induction with genetic programming,” in Proceedings of the IEEE World Congress on Computational Intelligence. Orlando, Florida: IEEE Press, 1994, pp. 396– 400. [5] V. Braitenberg, Vehicles: Experiments in Synthetic Psychology. Cambridge, MA: MIT Press, (1986). [6] S. Brave, “Evolving deterministic finite automata using cellular encoding,” in Genetic Programming 1996: Proceedings of the First Annual Conference, J. R. Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, Eds. Stanford University, CA, USA: MIT Press, 1996, pp. 39–44. [Online]. Available: citeseer.nj.nec.com/brave96evolving.html [7] W. S. Bruce, “Automatic generation of object-oriented programs using genetic programming,” in Genetic Programming 1996: Proceedings of the First Annual Conference, J. R. Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, Eds. Stanford University, CA, USA: MIT Press, 1996, pp. 267–272. [Online]. Available: citeseer.nj.nec.com/bruce96automatic.html [8] B. D. Bryant and R. Miikkulainen, “Neuroevolution for Adaptive Teams,” in Proceedings of Congress on Evolutionary Computation, 2003, pp. 2194 – 2201. [9] M. Conrad, “Evolution of the adaptive landscape,” in Theoretical Approaches to Complex Systems, R. Heim and G. Palm, Eds., 1978, pp. 147 – 169. [10] K. D. D. E. Goldberg and J. H. Clark, “Genetic algorithms, noise and the sizing of populations,” Complex Systems. [11] P. J. Darwen and J. Pollack, “Co-evolutionary learning on noisy tasks,” in Proceedings of Congress on Evolutionary Computation, 1999, pp. 1724 – 1731. [12] J. M. Epstein and R. L. Axtell, Growing Artificial Societies Social Science From the Bottom Up. MIT Press, 1996. [13] J. Fitzpatrick and J. Grefenstette, “Genetic algorithms in noisy environments,” Machine Learning, vol. 3, pp. 101 – 120, 1988. [14] M. Gardner, “Mathematical games: The fantastic combinations of john conway’s new solitaire game ”life”,” Scientific American, vol. 223, pp. 120 – 123, October 1970. [15] K. J. Lang, “Hill climbing beats genetic search on a boolean circuit synthesis of Koza’s,” in Proceedings of the Twelfth International Conference on Machine Learning. Tahoe City, California, USA: Morgan Kaufmann, July 1995. [16] S. M. Lucas, “Evolving spring-mass models: a test-bed for graph encoding schemes,” in Proceedings of Congress on Evolutionary Computation, 2002, pp. 1952 – 1957. [17] ——, “Exploiting reflection in object oriented genetic programming,” in Proceedings of European Conference on Genetic Programming. Springer Verlag, 2004, p. to appear. [18] S. M. Lucas and T. J. Reynolds, “Learning DFA: Evolution versus Evidence Driven State Merging,” in Proceedings of Congress on Evolutionary Computation, 2003, pp. 351 – 358. [19] S. Luke, “Genetic programming produced competitive soccer softbot teams for robocup97,” in Genetic Programming 1998: Proceedings of the Third Annual Conference, J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, Eds. University of Wisconsin, Madison, Wisconsin, USA: Morgan Kaufmann, 22-25 1998, pp. 214–222. [Online]. Available: citeseer.nj.nec.com/luke98genetic.html [20] J. F. Miller, “Evolving a self-repairing, mature, french flag organism,” in Proceedings of GECCO, 2004, p. submitted. [21] M. Mitchell, J. Holland, and S. Forrest, “When will a genetic algorithm outperform hill climbing?” in Advances in Neural Information Processing Systems 6, J. Cowan, G. Tesauro, and J. Alspector, Eds. San Mateo, CA: Morgan Kaufman, 1994, pp. 51 – 58.

[22] P. Schmutter, “Object-oriented ontogenetic programming: Breeding computer programs that work like multicellular creatures,” Diploma thesis, University of Dortmund, Germany, June 2002. [Online]. Available: citeseer.nj.nec.com/schmutter02objectoriented.html [23] K. Sims, “Evolving 3d morphology and behavior by competition,” in Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, R. Brooks and P. Maes, Eds. Cambridge, MA: MIT Press, 1994, pp. 28–39. [24] K. Stanley and R. Miikulainen, “Competitive coevolution through evolutionary complexification,” Journal of Artificial Intelligence Research, vol. 21, pp. 63 – 100, 2004. [25] J. Ventrella, “Attractiveness vs. efficiency (how mate preference affects location in the evolution of artificial swimming organisms),” in Proceedings of the sixth international conference on Artificial life, 1998, pp. 178 – 186. [26] R. A. Watson, S. G. Ficici, and J. B. Pollack, “Embodied evolution: Embodying an evolutionary algorithm in a population of robots,” in Proceedings of the Congress on Evolutionary Computation, P. J. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, and A. Zalzala, Eds., vol. 1. Mayflower Hotel, Washington D.C., USA: IEEE Press, 6-9 1999, pp. 335–342. [Online]. Available: citeseer.nj.nec.com/watson99embodied.html [27] G. Werner and M. Dyer, “Evolution of communication in artificial organisms,” in Artificial Life 2, C. Langton, C. Taylor, J. Farmer, and S. Rasmussen, Eds., 1991, pp. 659–687.

Suggest Documents