Vrije Universiteit, Amsterdam, The Netherlands. D.G. Nedev. [email protected]. Vrije Universiteit, Amsterdam, The Netherlands. S.K. Smit.
Parameter Tuning in Evolving Simulated Robot Controllers in the Teem Environment A.Q. Arif Vrije Universiteit, Amsterdam, The Netherlands
NAARADS @ GMAIL . COM
W. Kruijne Vrije Universiteit, Amsterdam, The Netherlands
W. KRUIJNE @ GMAIL . COM
D.G. Nedev Vrije Universiteit, Amsterdam, The Netherlands
NEDEV. DIMITAR @ GOOGLEMAIL . COM
S.K. Smit Vrije Universiteit, Amsterdam, The Netherlands
SKSMIT @ CS . VU . NL
Abstract We present a case study demonstrating that using the REVAC parameter tuning method we can find ‘optimal’ parameters for an evolutionary robotics problem with only little effort. The problem at hand is a standard robotics problem, in which a robot has to cover as much distance as possible within a limited amount of time. The controller of the robot is based on a neural network, with weight values that are found using an evolutionary algorithm. Our experiments indicate which of the parameters of this algorithm is the most important, and how they have to be defined to reach the best performance. Unlike parameter setups that are chosen by experts, this does not require much user effort or preliminary problem analysis. Within one working day of setup-time, and a few days of running time, REVAC found the parameters that are at least as good as the ones defined by experts.
1. Introduction Designing efficient controllers for autonomous robots is a challenging task. Autonomous robots have to interact with a real environment, which can be very much unstructured and subject to changes. Their controllers have to acquire the control rules automatically, which are not known beforehand. So, their controllers should be intelligent enough Appearing in Proceedings of the 19th Machine Learning conference of Belgium and The Netherlands. Copyright 2010 by the author(s)/owner(s).
to control robots’s behavior appropriately. A neural network is a good model for robot control, as it can acquire the control rules automatically by learning (Nolfi et al., 1994; Sakamoto & Zhao, 2005). Using evolution to build a smart controller for an autonomous robot might be a very good idea (Nolfi et al., 1994). So, a controller basically learns to better adapt itself according to the environment. In evolutionary algorithms (EAs), we not only need to choose the algorithm, representation, and operators for the problem, but we also need to choose parameter values and operator probabilities for evolutionary algorithm so that it will find the solution. Furthermore, in the field of evolutionary robotics, it does not only need to find a solution, but to find it efficiently too. This process of finding appropriate parameter values and operator probabilities is a timeconsuming task and considerable effort has gone into automating this process (Hinterding et al., 1997). Finding good parameter values is crucial as it can significantly impact the performance of an EA (Hinterding et al., 1997; Smit & Eiben, 2009b). We can broadly distinguish two forms of setting parameters: parameter tuning and parameter control (Smit & Eiben, 2009b). Parameter tuning is a commonly practiced approach that amounts to finding good values for the parameters before the run of the algorithm and then running the algorithm using these values, which remain fixed during the run. While parameter control forms an alternative, as it amounts to starting a run with initial parameter values which are changed during the actual run of the algorithm. There are several evolutionary techniques proposed until today to find optimal parameters. (Smit & Eiben, 2009b) has presented a nice overview of different parameter tuning techniques like meta-evolutionary algo-
Parameter Tuning in Evolving Simulated Robot Controllers in the Teem Environment
rithms(Greffenstette, 1986), meta estimation of distribution algorithm(Nannen & Eiben, 2007a) and sequential parameter optimization(Bartz-Beielstein et al., 2004). There are also different ’add-ons’ described, which are basically introduced to improve search efficiency like sharpening, racing and their hybrid(Smit & Eiben, 2009a). The Relevance Estimation and VAlue Calibration (REVAC)(Nannen & Eiben, 2007b) method, which is based on information theory to measure parameter relevance, is an efficient parameter tuning technique(Nannen & Eiben, 2007c). We used this technique to tune different parameters of an EA for robot battery recharge problem. The robot battery recharge problem could be described as follows. We have one robot that has to locate a battery recharge point efficiently and at the same time travel as much distance as possible. The fitness of the robot is given by the number of distance traveled, which of course depends on how long it can stay alive by recharging it. We used Teem framework that allows easy creation and execution of robotics experiments (Magnenat et al., 2009). This paper discusses the results of our experiments and the tuning of different parameters using REVAC. This paper contains the experimental setup of the experiments, analysis of their results followed by a conclusion.
2. Materials & Methods 2.1. Teem To test and evolve simulated robot controllers we made use of Teem, the evolutionary framework (http://lis.epfl. ch/index.html?content=resources/teem/). This evolutionary framework makes use of the 2D open source robot simulator Enki. 2.1.1. T HE ROBOT, ITS ENVIRONMENT AND ITS TASK Enki provides several types of robots. For these experiments the modeled Khepera robot is used. This robot makes use of eight IR sensors, two light sensors and a floor sensor that can measure the type of floor. It acts through two motors attached to its wheels. It is powered by a battery that runs empty within a fixed amount of time. The robot can sense its battery level. This robot is placed at a random position in a square environment. It can sense the walls with its IR sensors. Throughout the square the ambient light is different, which is detected by the ambient light sensors. In one corner of the square a recharge station is located. This recharge station has a different floor than the rest of the world, so the floor sensor flips when the robot enters the recharge zone. When the battery is empty, the run is over. Therefore the robot can lengthen the run by frequently returning to the
recharge zone where its battery can be recharged. The objective is to make the robot cover as much distance as possible, within a limited amount of time. 2.1.2. T HE CONTROLLER The controller consists of a neural network that maps values from the sensors (IR, light, floor, battery level) onto the motors. The network consists of an input layer, one hidden layer composed with five nodes, and an output layer that represents vales for both motor speeds, which are normalized with respect to their maximum speeds. The weights for this network are evolved in the range [-2, 2]. The network is recurrent: the output layer and the hidden layer propagate data back to their input layers. 2.1.3. T HE EVOLUTIONARY ALGORITHM The algorithm which is optimized is a genetic algorithm. The genome of each individual is formed by one chromosome; a real-valued vector which represents all the weights in the neural network in the range [-2,2]. Every iteration all the individuals are evaluated twice by means of the task described above. The distance both wheels cover in their runs determines the fitness. The individuals are then sorted on their fitness. Subsequently the new generation is formed. First, the highest ranked individuals E are directly copied into the next generation. The size of E is determined by E = elitismratio· populationsize. The next K individuals in the ranking are used for reproduction. The size of K is determined by K = reproductionratio· populationsize. From these individuals two parents are chosen randomly. With a probability of pcrossover single-point crossover between the genomes of the parents takes place, otherwise the genotype of one parent is copied directly into the next individual. Then the genomes of these individuals are mutated using bitwise mutations with a probability of pmutation . This process repeats populationsize − E times, so that after reproduction the generation once again consists of populationsize individuals. 2.2. Tuning Teem To evaluate the evolutionary algorithm we used REVAC, an algorithm by Nannen and Eiben that tries to find the optimal parameter set for an EA. REVAC does this by estimating how optimal values within evolutionary algorithms are distributed over the domain of each parameter, and sequentially draws parameter vectors from these distributions. These vectors are then evaluated, and the distributions are updated. Each cycle one vector is replaced. After the last cycle, the estimated distributions provide a model for the utility landscape. A detailed explanation of the al-
Parameter Tuning in Evolving Simulated Robot Controllers in the Teem Environment
gorithm is in (Nannen & Eiben, 2007c). To use REVAC we have made use of MOBAT v1.3(Smit, 2009), a tool that implements REVAC and other algorithms to tune EA’s, as well as ways to distribute the evaluation of the EA’s over a network. MOBAT requires for several parameters in REVAC to be defined. For these experiments we left them at the default values, except for the number of repeats (how often a parameter vector was evaluated) which was set to 4.
a population. Observing the experimental run data one can see that the higher values for this attribute were quickly eliminated. The highest recorded fitness in the experiment was scored with Population Count parameter set to 40. This would indicate that smaller populations of robots in TEEM provide for better chance of improvement of the whole population and thus - higher overall fitness.
The parameters of the EA implemented in Teem were tested with the values shown in table 1 Parameter Population count Mutation probability Crossover probability Reproduction ratio Elitism ratio
min 20 0.0 0.0 0.2 0.0
max 200 0.9 1.0 1.0 0.9
Table 1. Parameters to be tuned by REVAC Figure 2. Mutation Probability
To assure that in every evolution the same number of new individuals are tested the generation count for each evolution is adjusted to the population count and the elitism ratio.
3. Results The results of the experiment show that highest fitness values is achieve by TEEM when it is setup with very low Population Count, very low Mutation Probability, Crossover Probability of around 39%, mid values for Reproduction Ration and very high Elitism Ratio with the highest recorded fitness value of 0.8874. Figure 3. Crossover Probability
Figure 1. Population Count
More specifically Figure 1 shows that the the best fitness values are recorded when TEEM is run with very small Population Count - in the range of 20 to 40 individuals in
Then the results of the TEEM setup of the Mutation Probability also show that the highest fitness is usually achieved with the parameter set to very low values. As seen in Figure 2 most of the high-fitness TEEM runs have the parameter set between 0.4% and 1.6% and the best fitness value is recorded with Mutation Probability set to 0.0113 (or around 1.13%) and the next best with it set to 0.7% values. The results also indicate that the better fitness is achieved with Crossover Probability between 38% and 40% (Figure 3. These results would suggest that better fitness is achieved with very little chances (around or less that 1%) of random mutation in the individuals and much higher chance for Crossover of around 39%. Figure 4 compares the values of the Reproduction Ratio parameter to the fitness achieved during the experimental run.
Parameter Tuning in Evolving Simulated Robot Controllers in the Teem Environment
fitness
0.86
0.88
0.90
and the resulting fitnesses were compared. The results are shown in a box-plot in figure 6.
0.84
●
0.82
●
0.80
While there are a lot of high-fitness runs covering almost the full spectrum of possible values for this parameter, one can clearly see a high concentration of runs with good results in the region between 44% and 52%. The best fitness recorded during the TEEM runs is with Reproduction Ratio set to 0.4999% or 49.99%. The Elitism Ratio parameter on the other hand show better fitness with higher values, with the majority of the high-fitness runs having it set between 0.78 and 0.84 (Figure 5), with highest fitness achieved with the parameter set to 0.8288. This would indicated that the more of the currently fit individuals preserved across generations, the better the fitness. What is also clearly seen is that REVAC converged to high values for Elitism Ration very quickly and they were maintained throughout the experimental run. The biggest downside of having such high values of the Elitism Ratio parameter is that it is possible to actually find and concentrate on local optima, thus hindering the further evolution and improvement of the individuals.
●
default
tuned parameter settings
Figure 6. Comparing the fitness of the default settings (left) and the tuned settings (right)
The tuned settings tend to result in a higher fitness, and the variance in the results is smaller. However, a t-test shows that there is no significant difference between the two sets of results (p-value = 0.2989).
4. Discussion Figure 4. Reproduction Ratio
In this paper we have tried to investigate the effects of certain parameters on an evolutionary algorithm evolving a simulated robot controller. The results indicate that a higher fitness is reached using a very low Mutation Probability, a high Elitism Ratio, and average crossover probability and reproduction ratio, with a reasonably small population size (around 20 to 40 individuals).
Figure 5. Elitism Ratio
The value for the mutation probability that appears optimal is remarkable. Values around 1% seem to indicate that rather few mutations will occur. First of all it should be noted that it is not that surprising for a Simple Genetic Algorithm that it will not rely on mutation all that much; unlike in most forms of Evolutionary Algorithms, Crossover plays a more important role. However, taking this into consideration, a mutation probability of only 1% appears to be very low.
To study if the parameter set found by REVAC that yielded the highest results performs significantly better than the default settings, both evolutions were run fifteen times,
However, we must consider that mutation here is implemented as a bitwise mutation. Each value in the chromosomes (represented as double, so in 8 bytes) consists of 64 bytes. They are flipped with a probability determined
Parameter Tuning in Evolving Simulated Robot Controllers in the Teem Environment
by the mutation ratio. This would imply that every time the genome is mutated with a Mutation Ratio of 1%, each parameter has a 64% chance of having one of its bits mutated. Also, since gray coding is not applied here, the resulting value after mutation could differ greatly from the value before mutation. This might be the reason why REVAC chooses such low values for Mutation Ratio. One other notable finding is that the algorithm seems to perform better when a reasonably small Population Size is used with a high elitism ratio. This means that a large part of the individuals persist over the generations, and not many new individuals are created per generation. Since in our implementation the number of generations is varied such that the number of newly tested individuals remains the same over all the evolutions, this means that the algorithm will develop to have many generations. One explanation for this finding could be that REVAC was implemented here such that the number of new individuals created would be the same for each evolution, but still each generation all of the individuals will be tested – even the ones that have already been evaluated in earlier generations. This repetitive testing ensures a thorough evaluation of the individuals: only those that perform well consistently will be preserved. This idea of thorough evaluation relates to, and provides an explanation for, the reasonably low population count that is preferred by REVAC. As can be seen in figure 1, REVAC strongly favors population sizes of 20 to 40 individuals, whereas the default settings in Teem run with a population size of 80 individuals. In our implementation a lower population count means a larger number of generations, during which all new individuals are repeatedly evaluated before they are replaced, and thereby converging towards robust solutions that perform well consistently. A major disadvantage of this is, however, that the part of this evolutionary algorithm that is computationally expensive is in fact the evaluation of the individuals. Therefore the algorithm becomes very expensive with the currently found settings that lead to optimal fitness values. A way to prevent this in future research, would be to somehow penalize the number of evaluations or generations and embed this in the fitness passed on to REVAC. It should be noted that the parameter settings that yielded the highest fitness values are comparable to the default settings of the experiment. It only showed great differences in Elitism Ratio and Population Count. When comparing the tuned settings with the default settings the t-test showed no significant difference between the resulting fitnesses. We can therefore state that REVAC has been able to uncover some interesting aspects on the evolution depicted here – for example the tendency towards using a large number of
generations– but for improving this specific algorithm it has not been very helpful. However, it should be taken into account that the user effort in achieving these results have been minimal. Setting up MOBAT so that it could make calls to Teem and tune it using REVAC took no longer than a working day. Since MOBAT provided a method for running evolutions on different machines in parallel, four evolutions with the same settings could be run at the same time. The fifteenth parameter vector that was evaluated by REVAC already yielded an average fitness of 0.8804, which exceeds the average fitness value produced by the default settings. The parameter vectors checked after that resulted in fitness values over 0.8 rather consistently, and the 110th parameter vector that was checked resulted in the maximum fitness found in over 450 evaluations. In this light, the t-test leads to a different conclusion, namely: with very low user effort and without any preliminary problem analysis, REVAC has been able to provide an ‘optimal’ parameter set within a reasonable amount of time.
References Bartz-Beielstein, T., Parsopoulos, K., & Vrahatis, M. (2004). Analysis of Particle Swarm Optimization Using Computational Statistics. Proceedings of the International Conference of Numerical Analysis and Applied Mathematics (ICNAAM 2004) (pp. 34–37). Greffenstette, J. (1986). Optimisation of Control Parameters for Genetic Algorithms. IEEE Transactions on Systems, Man and Cybernetics (pp. 122–128). Hinterding, R., Michalewicz, Z., & Eiben, A. E. (1997). Adaptation in evolutionary computation: a survey. Proceedings of the Fourth IEEE Conference on Evolutionary Computation, Indianapolis, IN (pp. 65–69). Magnenat, S., Beyeler, A., & Waibel, M. (2009). Teem, the next generation open evolutionary framework. http://teem.epfl.ch. Nannen, V., & Eiben, A. E. (2007a). Efficient Relevance Estimation and Value Calibration of Evolutionary Algorithm Parameters. IEEE Congress on Evolutionary Computation (pp. 103–110). IEEE. Nannen, V., & Eiben, A. E. (2007b). Relevance Estimation and Value Calibration of Evolutionary Algorithm Parameters. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1034– 1039). Nannen, V., & Eiben, A. E. (2007c). Relevance estimation and value calibration of evolutionary algorithm parameters. IJCAI’07: Proceedings of the 20th international
Parameter Tuning in Evolving Simulated Robot Controllers in the Teem Environment
joint conference on Artifical intelligence (pp. 975–980). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Nolfi, S., Floreano, D., Miglino, O., & Mondada, F. (1994). How to Evolve Autonomous Robots: Different Approaches in Evolutionary Robotics. 4th International Workshop on Artificial Life. Cambridge: MA: MIT Press. R. A. Brooks and P. Maes (eds.). Sakamoto, K., & Zhao, Q. (2005). Generating smart robot controllers through co-evolution. EUC Workshops (pp. 529–537). Smit, S. (2009). MOBAT. http://mobat.sourceforge.net. Smit, S., & Eiben, A. (2009a). Comparing parameter tuning methods for evolutionary algorithms. Proceedings of the 2009 IEEE Congress on Evolutionary Computation (pp. 399–406). Trondheim: IEEE Press. Smit, S. K., & Eiben, A. E. (2009b). Comparing parameter tuning methods for evolutionary algorithms. CEC’09: Proceedings of the Eleventh conference on Congress on Evolutionary Computation (pp. 399–406). Piscataway, NJ, USA: IEEE Press.