Speeding up Online Evolution of Robotic Controllers with ... - CiteSeerX

Speeding up Online Evolution of Robotic Controllers with Macro-neurons Fernando Silva1,3 , Lu´ıs Correia3 , and Anders Lyhne Christensen1,2 1 Instituto de Telecomunica¸co ˜es, Lisboa, Portugal Instituto Universit´ ario de Lisboa (ISCTE-IUL), Lisboa, Portugal 3 LabMAg, Faculdade de Ciências, Universidade de Lisboa, Portugal {fsilva,luis.correia}@di.fc.ul.pt, [email protected] 2

Abstract. In this paper, we introduce a novel approach to the online evolution of robotic controllers. We propose accelerating and scaling online evolution to more complex tasks by giving the evolutionary process direct access to behavioural building blocks prespecified in the neural architecture as macro-neurons. During task execution, both the structure and the parameters of macro-neurons and of the entire neural network are under evolutionary control. We perform a series of simulation-based experiments in which an e-puck-like robot must learn to solve a deceptive and dynamic phototaxis task with three light sources. We show that: (i) evolution is able to progressively complexify controllers by using the behavioural building blocks as a substrate, (ii) macro-neurons, either evolved or preprogrammed, enable a significant reduction in the adaptation time and the synthesis of high performing solutions, and (iii) evolution is able to inhibit the execution of detrimental task-unrelated behaviours and adapt non-optimised macro-neurons. Keywords: Online evolution, evolutionary robotics, artificial neural network, prespecified behaviours, neuronal model.

1

Introduction

Online evolution is a process of continuous adaptation that potentially gives robots the capacity to respond to task changes and unforeseen circumstances by modifying their behaviour. An evolutionary algorithm (EA) is executed on the robots themselves while they perform their task. The main components of the EA (evaluation, selection, and reproduction) are carried out autonomously by the robots without any external supervision. This way, robots may be capable of long-term self-adaptation in a completely autonomous manner. The first example of online evolution in a real mobile robot was performed by Floreano and Mondada [6]. The introduction of embodied evolution by Watson et al. [15] followed, in which the use of multirobot systems was motivated by the speed-up of evolution due to the inherent parallelism in groups of robots that evolve together in the task environment. Over the past decade, different approaches to online evolution have been proposed. Examples include the (µ + 1)online EA of Haasdijk et al. [8], mEDEA by Bredeche et al. [1], and odNEAT

by Silva et al. [10]. Notwithstanding, there are still a number of fundamental issues and technological challenges that must be addressed before online evolution becomes a viable approach to adaptation in real robots. The prohibitively long time that the online evolutionary process requires and the fact that ER techniques have not yet scaled to real-world tasks [4] are central impediments to adoption. In this paper, we introduce a novel approach to the online evolution of neural network-based robotic controllers. We propose the combined use of standard neurons as elementary components, and higher level units that we shall refer to as macro-neurons. The macro-neurons are behavioural building blocks, either evolved or preprogrammed, that are integrated in the neural architecture before the evolutionary process is conducted. During task execution, both the structure and the parameters of macro-neurons and of the entire ANN are under evolutionary control. In this way, evolution is able to continuously optimise and adapt controllers by using the behavioural building blocks as a substrate. Our proposed method contrasts with previous approaches in which: (i) ANN outputs are used to execute one out of a finite set of predefined behaviours, either evolved or preprogrammed [3, 7, 14], which may forestall the synthesis of theoretically optimal controllers, or (ii) ANN-based controllers synthesised through hierarchical decomposition of the task, and structured composition of both evolved and preprogrammed behaviours [4, 7, 5], which require a substantial amount of experimentation and human intervention. The viability of our method is assessed through a set of simulation-based experiments in which an e-puck-like robot [9] must perform a deceptive and dynamic phototaxis task with three light sources. To the best of our knowledge, this is the first demonstration of unified online evolution of the weights and the ANN topology, and higher level units representing behaviours.

2

Background

In this section, we describe our proposed macro-neuron-based architecture, and we introduce odNEAT, the online neuroevolution algorithm used in this study. 2.1

Specification of Macro-neurons

The main goal in using macro-neurons is to give online evolution behavioural building blocks in order to: (i) synthesise increasingly more complex behaviours by capitalising on the existing ones, both evolved and preprogrammed, and (ii) adapt the structure and the parameters of existing solutions through evolution. Therefore, it is fundamental to specify the behavioural building blocks in a way that enables the evolutionary process to optimise them. Each macro-neuron M is defined as {Ic , Oc , F, P }, where Ic is the set of input connections of M , Oc is the set of output connections, F is the function computed by the macro-neuron, and P is the set of parameters of M subject to evolution. Each connection Ic,i ∈ Ic contains a weight wi ∈ w and transmits to

O1

O2

O3

O1 Priority

O1

O1

O2

H1

H2

I1

H2

I2

Ac tiv.

PP1

Pa ra me te r

H1

Priority Ac tiv.

O2

PP2

I3

I1

I2

(a) Evolved macro-neuron

I3

I4

I1

I2

(b) Preprogrammed neurons

I3

I4

macro-

Fig. 1: Examples of the integration of different types of behavioural building blocks in neural architectures. (a) An evolved ANN-based macro-neuron. (b) Two preprogrammed macro-neurons.

M an input value xi ∈ x. The computation of M is given by f (w, x) = y, where y is the output vector of M . Each yj ∈ y is transmitted to other neurons via the corresponding connection Oc,j ∈ Oc . Depending on the type of macro-neuron, P refers to different elements. If M is a preprogrammed macro-neuron, P contains the numerical parameters of the behaviour (see below). If M is an evolved ANN, P refers to the connections and neurons that can be manipulated by evolution. The construction and functioning of neural architectures using macro-neurons is shown in Fig. 1. Figure 1a shows how a previously evolved ANN is translated to a macro-neuron. The connections from the macro-neuron to the output neurons of the network enable evolution to arbitrate and shape the output values of different macro-neurons. Complementary, Fig. 1b illustrates how different preprogrammed macro-neurons are specified. Each macro-neuron transmits two values to each output neuron: (i) a priority value, which represents the effective need of the behaviour to execute at a given time, computed based on the input values, and (ii) an activity value representing the signal to be sent to the actuators controlled by the output neurons. The goal of incorporating both priority and activity values in preprogrammed macro-neurons is to add human knowledge to better resolve conflict situations in which different preprogrammed macro-neurons compete for control [2]. In this case, each output neuron considers the activity of the preprogrammed macroneuron with the highest priority. If no preprogrammed behaviour produces a positive priority value, the output neuron performs the weighted sum of its remaining inputs from standard neurons such as I4 in Fig. 1b. An important feature of our approach is that macro-neurons are prespecified in the neural architecture before the evolutionary process is conducted. During task execution, online evolution is able to: (i) optimise the structure of evolved macro-neurons and of the entire network by adding new neurons and new con-

nections, and by adjusting the connection weights, (ii) adapt the parameters of preprogrammed behaviours such as PP1 in Fig. 1b, and (iii) modulate the execution of macro-neurons by increasing or decreasing the strength of connections such as those related to the priority and activity values. In this way, evolution can, for instance, disable the execution of unnecessary macro-neurons. By combining ANNs and prespecified macro-neurons, either evolved or preprogrammed, we compound: (i) the ANNs’ flexibility, robustness, and tolerance to noise [6], (ii) the benefits of each type of macro-neuron, which can be synthesised by distinct evolutionary processes or hand-designed in order to shortcut complex evolutionary processes, (iii) a higher level bootstrap process, which potentially allows robots to adapt to complex and dynamic tasks in a timely manner. 2.2

odNEAT: An Online Neuroevolution Algorithm

NEAT [13] is a state-of-the-art neuroevolution method that evolves the weights and the topology of ANNs. odNEAT [10] is an online, steady-state, decentralised version of NEAT, originally designed for multirobot systems. odNEAT is used in our study because it has shown to enable efficient online adaptation in single robot domains [11]. As we conduct our experiments using one robot, we only describe odNEAT’s features with respect to a single agent. The robot is controlled by an ANN that represents a candidate solution to the task, and maintains a virtual energy level representing its performance. The fitness value is defined as the average of the energy level, sampled at regular time intervals. The robot maintains a population of genomes, the direct genetic encoding of ANNs, and their respective fitness scores in an internal repository. The repository implements a niching scheme comprising speciation and fitness sharing, which allows the robot to maintain a healthy diversity of candidate solutions with different topologies. In the original definition, odNEAT starts executing with a population of random networks in which each input neuron is connected to every output neuron. When the virtual energy level reaches zero, the current controller is considered unfit for the task. A new genome representing a new controller is created by choosing a parent species from the internal population, and selecting two parents, each one via a tournament selection of size 2. Offspring is created through crossover of the parents’ genomes and mutation of the new genome. Once the genome is decoded into a new controller, it is guaranteed a minimum maturation period α during which it controls the robot. Mutation is both structural and parametric, as it adds new neurons and connections, and optimises existing parameters such as connection weights and neuron bias values. In this way, odNEAT avoids a priori specification of the network topology and can evolve an appropriate degree of complexity for the task.

3

Methods

In this section, we define our experimental methodology, including the robot model, the deceptive phototaxis task, and the experimental setup.

3.1

Robot Model and Behavioural Control

We use JBotEvolver1 , an open-source, multirobot simulation platform and neuroevolution framework for our experiments. The simulated robot is modelled after the e-puck [9], a 75 mm in diameter, differential drive robot capable of moving at a maximum speed of 13 cm/s. The robot is equipped with eight IR sensors for obstacle detection. The IR sensors have a range of 25 cm.2 Each IR sensor and each actuator is subject to noise, which is simulated by adding a random Gaussian component within ±5% of the sensor saturation value or of the maximum actuation value. The robot is also equipped with an internal sensor that allows it to perceive its current virtual energy level. During task execution, the robot is controlled by a discrete time recurrent ANN synthesised by odNEAT. The ANN’s connection weights ∈ [-5,5] and the activation function is the steepened sigmoid [13]. The inputs of the ANN are the normalised readings from the sensors mentioned above. The input layer consists of 17 neurons: (i) eight for wall detection, (ii) eight for light source detection, and (iii) one neuron for the virtual energy level readings. The output layer contains two neurons, one for setting the signed speed value of each wheel. 3.2

Deceptive Phototaxis

Phototaxis is a standard task in evolutionary robotics, in which robots have to search for and move towards a light source. We study a deceptive and dynamic version of the phototaxis task with three light sources. One source is beneficial to the robot, one source is neutral, and the remaining source is detrimental. The sources are static but they periodically switch their type causing, for instance, the beneficial light source to become neutral or detrimental, and vice versa. The task requires the robot to perform phototaxis when faced with the beneficial light source, and to perform anti-phototaxis as the alternative action when in close proximity to either the neutral or the detrimental light source. Deceptiveness is introduced by the fact that the three light sources are indistinguishable to the robot’s light sensors. The robot must therefore discriminate between different lights based on the temporal correlation between its energy sensor readings and proximity to a given source. The task environment is illustrated in Fig. 2. The robot operates in a square arena surrounded by walls. The size of the arena is chosen to be 3 x 3 meters. The arena contains four obstacles with dimensions 0.5 x 0.125 meters and one obstacle with dimensions 0.125 x 0.5 meters. The obstacles are of the same material as walls, and increase the difficulty of the task by reducing the area for navigation, and ensuring that there is no straight path between different light sources. The placement and size of the light sources is inspired in the experimental setup of Sperati et al. [12]. The sources have a diameter of 0.32 meters and are positioned 1 2

https://code.google.com/p/jbotevolver The original e-puck infrared range is 2-3 cm [9]. In real e-pucks, the liblrcom library, available at http://www.e-puck.org, extends the range up to 25 cm.

Fig. 2: The task environment. The arena measures 3 x 3 meters. The dark areas denote physical obstacles, while the white areas denote the arena surface on which the robot can navigate. The circular areas represent the different sources.

symmetrically with respect to the centre of the arena. The distance between the light sources is set at 1.5 meters. The type of each light source is rotated every five minutes of simulated time in a clockwise manner. Initially, the robot is placed in a random position. During simulation, the energy level E is updated every 100 ms according to the following equation:   if Sr > 0.5 and near beneficial source Sr ∆E (1) = −(Sr + Ec ) if near detrimental source  ∆t  −Ec if near neutral source or not close to any source where Sr is the maximum value of the readings from the light sensors, between 0 (no light) and 1 (brightest light), and Ec is a constant energy consumption value of 0.5. Note that the robot is only rewarded if it is significantly close to the beneficial source, i.e., if Sr > 0.5. 3.3

Experimental Setup

We conducted experiments using two types of macro-neurons: evolved ANNs and preprogrammed behaviours. We synthesised three basic primitives of each type: (i) a move forward behaviour, executed if there is no obstacle ahead of the robot, (ii) a turn left behaviour and (iii) a turn right behaviour. The two ”turn” behaviours enable turning in the respective direction if there is an obstacle in sensor range. The move forward behaviour has access to the readings from the robot’s three front sensors. The turn left and turn right behaviours process the inputs from the three front-right sensors and three front-left sensors, respectively. The macro-neurons are all fully-connected to the output neurons. The evolved macro-neurons were synthesised using the offline NEAT algorithm [13]. For obtaining each ANN, we conducted 30 independent runs. Each run was performed using a population of 100 genomes and lasted 100 generations. The fitness score of each genome was averaged over 20 samples at each evaluation. After the evolutionary process ended, we post-evaluated the highest

scoring controller of each run in 100 samples, and we selected the best controller to form a macro-neuron. The ”turn” behaviours were evolved in a T-maze environment. The move forward behaviour was evolved in a long corridor. We developed three preprogrammed macro-neurons functionally similar to the evolved ANNs. As described in Section 2.1, each of these macro-neurons produces priority values and activity values. The move forward behaviour executes with fixed priority p = 0.5 and transmits activity values a = 1.0 to each output neuron, i.e., the robot moves at maximum speed. The turn left behaviour produces a priority value p proportional to the distance of the closest obstacle on the front-right side of the robot, and activity values aleft = 0 and aright = 0.1. The turn right behaviour operates in a similar manner with respect to the obstacles on the front-left side of the robot, and produces activity values aleft = 0 and aright = 0.1. The priorities of the ”turn” behaviours permit a flexible behaviour selection depending on the distance to obstacles, and the activity values enable smooth turns while avoiding an obstacle. Each preprogrammed macro-neuron executes for exectime control cycles of the robot.

Experimental Configuration We conducted four sets of evolutionary experiments to assess the performance levels of our approach. In the first set of experiments, macro-neurons were not used and odNEAT relied on evolution alone. To provide a meaningful and fair comparison of performance, we conducted a series of preliminary tests to determine the best initial topology for evolution alone. We seeded evolution with a fully-connected hidden layer, and we varied the number of hidden neurons from 1 to 10. We consistently verified better performance when evolution alone started without hidden neurons, i.e., with each input neuron connected to every output neuron (ρ < 0.05, Mann-Whitney test). In the second set of experiments, evolution was given access to the three preprogrammed macro-neurons. In the third set of experiments, evolution was seeded with the three evolved macro-neurons. The topology and the connection weights of evolved macro-neurons were also subject to evolution, allowing the behaviours to be adapted during task execution. In the last set of experiments, we evaluated an hybrid approach with access to the preprogrammed move forward behaviour and to the evolved ”turn” behaviours. For each experimental configuration, we performed 30 independent runs. Each run lasted 100 hours of simulated time. The virtual energy level of the robot was limited to the range [0,100] energy units. If the virtual energy level reached zero, a new controller was generated and assigned the maximum energy value of 100 units. Crossover was not used. Other parameters of odNEAT were the same as in [10]. The parameter exectime of each preprogrammed macroneuron was initially set at 1 and subject to a Gaussian mutation with mean 0 and standard deviation of 1. During task execution, exectime was rounded to the nearest integer value. Note that in the case of the ”turn” behaviours with low exectime values, it is the consecutive execution of the behaviour while an obstacle is in range that enables the robot to avoid the obstacles.

4

Experimental Results

In this section, we present and discuss the experimental results. We use the Mann-Whitney test to compute statistical significance of differences between sets of results because it is a non-parametric test, and therefore no strong assumptions need to be made about the underlying distributions. We first compare the performance of the four neural architectures. We analyse: (i) the number of evaluations, i.e., the number of controllers tested by the robot before a solution to the task is found, (ii) the evaluation time elapsed before the solution is synthesised, and (iii) the task performance in terms of fitness score. Evaluation time is measured to complement the number of evaluations. In odNEAT, controllers execute as long as they are able to solve the task, and the duration of evaluations therefore tends to vary (see Sect. 2.2). The distributions of the number of evaluations and of the evaluation time are shown in Fig. 3. All runs produced controllers well adapted to the periodic changes in the task requirements. The three neural architectures using macroneurons significantly outperform evolution alone as they require fewer evaluations and shorter evaluation times to synthesise solutions for the task (ρ < 0.01, Mann-Whitney). Differences in the number of evaluations and in the evaluation time of architectures using macro-neurons are not significant (ρ ≥ 0.05). The most efficient synthesis of controllers occurs when using the evolved macro-neurons, which require an average of 57.60 evaluations and 1.78 hours of evaluation time. Controllers using preprogrammed macro-neurons need 71.50 evaluations and have an evaluation time of 1.91 hours, and the hybrid setup requires an average of 68.53 evaluations and 2.91 hours of evaluation time. Evolution alone requires 201.60 evaluations and 9.50 hours of simulated time. Overall, the results support the conclusion that online evolution can be significantly accelerated by using prespecified behavioural building blocks. Macro-neurons Distribution of evaluation time - deceptive phototaxis

Distribution of evaluations - deceptive phototaxis

Distribution of evaluation time (hours)

900 Distribution of evaluations

800 700 600 500 400 300 200 100 0 evo alone

evol macro

prog macro hybrid macro

(a) Number of evaluations

20

15

10

5

0 evo alone

evol macro

prog macro

hybrid macro

(b) Evaluation time

Fig. 3: Distribution of: (a) the number of evaluations necessary to synthesise a solution to the task, and (b) duration of the evaluation period. Outliers above 20 hours in (b) are not shown for better reading of the plot. The missing values are 20.20, 26.96, 54.67, and 77.59 hours, all with respect to evolution alone.

speed up online evolution substantially by reducing the number of evaluations between 69% and 71%, and the evaluation time between 53% and 80%. An analysis of the fitness scores of solutions to the task shows that the four neural architectures provide comparable results, with a slight advantage in favour of ANNs synthesised with access to the evolved macro-neurons. Controllers using evolved macro-neurons have an average fitness score of 75.85. In the remaining approaches, which include evolution alone, the average fitness score varies from 67.21 to 69.95. Differences in the fitness scores are not statistically significant across all comparisons (ρ ≥ 0.05, Mann-Whitney). In this way, using macroneurons not only allows for a speed-up in the adaptation process, but also leads to the synthesis of competitive and potentially superior performing solutions. 4.1

Dynamics of Neural Architectures

The results described above show that neural architectures using macro-neurons enable a significantly faster adaptation process. In this section, we analyse the topologies of networks evolved in the four experimental configurations in order to determine differences in neural augmentation and dynamics. The complexity of solutions that solve the task is listed in Table 1. Overall, evolution alone presents the least complex topologies. Despite solving the task with less structure, the number of evaluations and evaluation time are higher, as discussed in the previous section. Given the deceptiveness and complexity of the task, the evolutionary process without access to macro-neurons displays significant difficulties in bootstrapping and finding functioning controllers. Complementarily, ANNs with evolved macro-neurons present the most complex topologies although they require fewer evaluations and shorter evaluation periods. Compared to evolution alone, evolved macro-neurons enable higher level bootstrapping as ANNs are seeded with basic general competences for the task. The new connections and new neurons added through evolution augment and adjust both the ANN and the structure of the evolved macro-neurons. The capability to continuously evolve the macro-neurons is particularly important in the case of online evolution as the evolutionary process is given a means to adapt the prespecified behavioural building blocks to the task requirements, and to synthesise more complex behaviours by capitalising on the existing ones.

Table 1: Neural complexity of the final controllers. Initial complexity, and number of neurons and connections added through evolution (average ± std. dev.). Initial topology Experimental configuration Evolution alone Evolved macro Preprogrammed macro Hybrid macro

Neurons

Connections

19 39 22 34

34 69 45 60

Structure added Neurons 3.23 5.80 0.17 2.59

± ± ± ±

0.43 1.56 0.38 0.91

Connections 6.60 12.00 0.67 8.31

± ± ± ±

0.77 3.16 0.84 1.56

Final solutions synthesised using preprogrammed macro-neurons are not substantially augmented in terms of neural complexity. The main source of optimisation was the mutation of the connection weights. Online evolution takes advantage of the preprogrammed behaviours’ functionality and adjusts primarily the way in which they are used in order to synthesise a solution to the task. Evolution adapts the ANN by: (i) arbitrating the execution of different preprogrammed behaviours for navigation in the environment, and (ii) modulating the excitatory and inhibitory signals of connections related to the operation of the macro-neurons. For instance, when the robot finds the beneficial light source, the programmed behaviours are often inhibited. The robot remains close to the light source by moving around it until the type of the source is changed. Complementarily, solutions using the preprogrammed move forward behaviour and the evolved ”turn” behaviours exhibit the two characteristics described above. On one hand, evolved macro-neurons are adjusted and adapted. On the other hand, evolution optimises when and how the preprogrammed macroneuron is used in order to maximise task performance. 4.2

Assessing the Robustness of Evolution

In our approach, evolved and preprogrammed behavioural building blocks are prespecified in the neural architecture. In this section, we assess if the online adaptation process is able to inhibit or adapt task-unrelated or non-optimised macro-neurons. We setup two series of experiments, each composed by 30 independent runs. In the first set of experiments, ANNs are initialised with the three evolved macro-neurons and a do not move preprogrammed macro-neuron. The preprogrammed macro-neuron continuously produces a priority value p = 1.0 and an activity value a = 0 to each output neuron, therefore indicating that the most important action is always for the robot not to move. In the second set of experiments, neural architectures are initialised only with the three navigation-related evolved macro-neurons. Part of the structure of the evolved macro-neurons is ablated, making them less optimised or even unsuited for the task. Before online evolution is conducted, each connection weight of the evolved macro-neurons is reset to 0 with probability prob = 0.25, sampled from a uniform distribution. The goal of the experiment is to analyse the potential costs and the adaptability of online evolution when given access to incomplete and non-optimised macro-neurons. Table 2 summarises the results of the robustness experiments. Results show that evolution is able to successfully overcome the presence of task-unrelated or non-optimised macro-neurons, either evolved or preprogrammed. In the two setups, solutions are synthesised faster than by evolution alone with respect to the number of evaluations and to the evaluation time (ρ < 0.01, Mann-Whitney). In the experimental setup with the do not move behaviour, the macro-neuron is preprogrammed and therefore its structure is not subject to optimisation. Evolution is obliged to perform a finer-grain adjustment of connection weights in order to inhibit the outputs of the detrimental macro-neuron, hence the higher number of evaluations and the longer evaluation time.

Table 2: Comparison of results across different experimental configurations. The table lists the average values of each experimental configuration. Experimental configuration Evolution alone Evolved macro Evolved + do not move macro Ablated evolved macro

Evaluations

Eval. time (hours)

Fitness score

201.60 57.60 73.17 77.33

9.50 1.78 3.13 1.59

67.21 75.85 75.32 69.00

In the experimental setup using the partially ablated evolved macro-neurons, evolution produces solutions faster but with lower performance when compared to the non-ablated counterparts. In general terms, the ablated macro-neurons cause unadapted solutions to fail rapidly during task execution, hence the low evaluation time. The structure and the parameters of the ablated macro-neurons and of the entire ANN are then progressively optimised until a solution capable of solving the task is synthesised. However, as the ablated evolved macro-neurons are less optimised, the solutions generally yield lower performance levels.

5

Conclusions

In this paper, we introduced a novel approach to the online evolution of robotic controllers. We give evolution direct access to behavioural building blocks prespecified in the neural architecture as macro-neurons. The structure and the parameters of macro-neurons are under evolutionary control, and they are optimised together with the ANN’s weights and topology in a unified manner. We showed that macro-neurons significantly outperform evolution alone as they enable a substantial reduction in the adaptation time, and the synthesis of high performing solutions. We also showed that distinct types of macroneurons allow the evolution of solutions in different ways. When using evolved macro-neurons, both the macro-neurons and the entire ANN are progressively augmented and adjusted in order to adapt the building blocks to the task requirements. When using preprogrammed macro-neurons, evolution adds significantly less structure to the ANN. The evolutionary process adapts the ANNs by arbitrating the execution of the preprogrammed behaviours, and modulating the input and output signals of the macro-neurons. To conclude, we have also shown that online evolution can successfully: (i) inhibit the execution of detrimental behaviours, and (ii) adapt non-optimised macro-neurons. The immediate follow-up work includes extending our approach to multirobot systems that exchange solutions to the task [10], in order to potentially facilitate online evolution for real-world complex tasks. Acknowledgments. This work was partially supported by the Funda¸cão para a Ciência e a Tecnologia under the grants SFRH/BD/89573/2012, EXPL/EEIAUT/0329/2013, PEst-OE/EEI/LA0008/2013, and PEst-OE/EEI/UI0434/2011.

References 1. Bredeche, N., Montanier, J.M., Liu, W., Winfield, A.F.: Environment-driven distributed evolutionary adaptation in a population of autonomous robotic agents. Mathematical and Computer Modelling of Dynamical Systems 18(1), 101–129 (2012) 2. Correia, L., Steiger-Gar¸ca õ, A.: A useful autonomous vehicle with a hierarchical behavior control. In: 3rd European Conference on Artificial Life, LNCS, vol. 929, pp. 625–639. Springer, Berlin, Germany (1995) 3. Duarte, M., Oliveira, S., Christensen, A.L.: Automatic synthesis of controllers for real robots based on preprogrammed behaviors. In: 12th International Conference on Simulation of Adaptive Behavior, LNCS, vol. 7426, pp. 249–258. Springer, Berlin, Germany (2012) 4. Duarte, M., Oliveira, S., Christensen, A.L.: Hierarchical evolution of robotic controllers for complex tasks. In: IEEE International Conference on Development and Learning and Epigenetic Robotics. pp. 1–6. IEEE Press, Piscataway, NJ (2012) 5. Fernandez-Leon, J.A., Acosta, G.G., Mayosky, M.A.: Behavioral control through evolutionary neurocontrollers for autonomous mobile robot navigation. Robotics and Autonomous Systems 57(4), 411–419 (2009) 6. Floreano, D., Mondada, F.: Automatic creation of an autonomous agent: Genetic evolution of a neural-network driven robot. In: 3rd International Conference on Simulation of Adaptive Behavior. pp. 421–430. MIT Press, Cambridge, MA (1994) 7. Godzik, N., Schoenauer, M., Sebag, M.: Evolving symbolic controllers. In: Applications of Evolutionary Computing, LNCS, vol. 2611, pp. 638–650. Springer, Berlin, Germany (2003) 8. Haasdijk, E., Eiben, A., Karafotias, G.: On-line evolution of robot controllers by an encapsulated evolution strategy. In: IEEE Congress on Evolutionary Computation. pp. 1–7. IEEE Press, Piscataway, NJ (2010) 9. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat, S., Zufferey, J., Floreano, D., Martinoli, A.: The e-puck, a robot designed for education in engineering. In: 9th Conference on Autonomous Robot Systems and Competitions. pp. 59–65. IPCB, Castelo Branco, Portugal (2009) 10. Silva, F., Urbano, P., Oliveira, S., Christensen, A.L.: odNEAT: An algorithm for distributed online, onboard evolution of robot behaviours. In: 13th International Conference on Simulation & Synthesis of Living Systems. pp. 251–258. MIT Press, Cambridge, MA (2012) 11. Silva, F., Urbano, P., Christensen, A.L.: Adaptation of robot behaviour through online evolution and neuromodulated learning. In: 13th Ibero-American Conference on Artificial Intelligence, LNCS, vol. 7637, pp. 300–309. Springer, Berlin, Germany (2012) 12. Sperati, V., Trianni, V., Nolfi, S.: Self-organised path formation in a swarm of robots. Swarm Intelligence 5(2), 97–119 (2011) 13. Stanley, K., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99–127 (2002) 14. Urzelai, J., Floreano, D., Dorigo, M., Colombetti, M.: Incremental robot shaping. Connection Science 10(3–4), 341–360 (1998) 15. Watson, R., Ficici, S., Pollack, J.: Embodied evolution: Distributing an evolutionary algorithm in a population of robots. Robotics and Autonomous Systems 39(1), 1–18 (2002)